RDS PostgreSQL Migration and Splitting a Monolithic Database into Multiple RDS Instances

Written by:

Igor Gorovyy
DevOps Engineer Lead & Senior Solutions Architect
LinkedIn

Migrating Amazon RDS PostgreSQL databases and splitting a monolithic PostgreSQL database into multiple RDS instances can significantly enhance performance, scalability, and manageability. In this article, we will explore best practices for RDS PostgreSQL migration and database splitting, covering challenges, strategies, and tools to streamline the process.

Why Migrate or Split RDS PostgreSQL?

Reasons for Migration:

Cost Optimization: Moving from on-demand pricing to reserved instances or using a more cost-efficient database engine.

Performance Improvement: Upgrading to a larger instance type or a different database engine version.

High Availability & Disaster Recovery: Migrating to a multi-AZ setup or cross-region replication for redundancy.

Compliance Requirements: Moving to a region that meets data sovereignty regulations.

Reasons for Splitting a Monolithic RDS PostgreSQL Database:

Performance Bottlenecks: Large databases may lead to slower queries and increased latency.

Scalability: Decoupling services and reducing the load on a single instance.

Microservices Architecture: Different services might need their own databases for isolation and optimized access.

Easier Maintenance: Reducing the impact of schema changes or updates.

The Importance of Python for DevOps Engineers

Python is an essential tool for DevOps engineers, providing automation, integration, and flexibility in managing cloud infrastructure, CI/CD pipelines, and monitoring systems. Some key benefits include:

Infrastructure as Code (IaC): Python helps in managing infrastructure through tools like boto3 for AWS automation and Terraform scripting.

Automation & Scripting: Automating database migrations, backups, deployments, and monitoring using Python scripts.

Cloud Management: Managing AWS resources (EC2, RDS, S3, etc.) efficiently with boto3.

CI/CD Integration: Using Python to create deployment scripts, integrate with Jenkins, GitHub Actions, and Kubernetes.

Monitoring & Logging: Parsing logs, monitoring metrics with Prometheus, and automating alerts with Python.

For database migration and splitting, Python plays a crucial role in automating the process using AWS SDK (boto3), PostgreSQL utilities (pg_dump, pg_restore), and scripting to manage database connections, snapshots, and cross-region transfers.

RDS PostgreSQL Migration Strategies

1. Same-Engine Migration (Minimal Downtime)

If you are migrating within PostgreSQL: - AWS DMS (Database Migration Service): Supports near-zero downtime migration. - Logical Replication: Set up logical replication for seamless migration. - Snapshots & Restores: Create a snapshot and restore it in the new environment. - pg_dump / pg_restore: Manually export and import data for smaller databases.

2. Cross-Engine Migration

If migrating from another engine (e.g., MySQL → PostgreSQL), you will need: - AWS DMS: Converts schema and migrates data. - Schema Conversion Tool (SCT): Helps with schema translation. - Manual Export/Import: Using pg_dump and pg_restore.

3. Cross-Region Migration

If moving a database across regions: - Enable Multi-AZ with cross-region replication. - Use AWS DMS for continuous replication. - Leverage S3 for data transfer.

4. Automated Migration Approach with boto3

An automated approach to migrating RDS PostgreSQL instances across regions can be implemented using Python and the boto3 library. The key steps include:

Identify source RDS instances
Create snapshots for each instance
Copy snapshots to the target region using copy_db_snapshot
Restore instances in the target region
Validate the migration

Visual representation of migration process:

sequenceDiagram
    participant User
    participant Script
    participant Source RDS
    participant Target RDS
    participant AWS KMS

    User->>Script: Start migration
    Script->>Source RDS: List DB instances
    Source RDS-->>Script: Return instances list
    loop For each instance
        Script->>Source RDS: Create snapshot
        Source RDS-->>Script: Snapshot created
        Script->>AWS KMS: Get KMS key (if encrypted)
        AWS KMS-->>Script: KMS key
        Script->>Target RDS: Copy snapshot to target region
        Target RDS-->>Script: Snapshot copied
        Script->>Target RDS: Restore DB from snapshot
        Target RDS-->>Script: DB restored
    end
    Script-->>User: Migration complete

Example Python implementation:

import boto3
import time
import concurrent.futures
import botocore.exceptions

# Configuration
source_region = 'eu-central-1'  # Source region
target_region = 'eu-west-1'  # Target region
availability_zone = f'{target_region}a'  # Availability zone in the target region
multi_az = False  # Set to True for Multi-AZ deployment
publicly_accessible = False  # Set to False if the instance should not be publicly accessible
# Security group ID to be applied to the restored DB instance
security_group_ids = ['sg-0fea3798bf54e9fe6']

# Specify your KMS key ID in the target region (required for encrypted snapshots)
kms_key_id = '18d******d-89df-4690-82a1-f2**********ccb'  # Ireland
db_subnet_group_name = 'multiregion-dev-vpc-blue'  # Your specific DB subnet group name

# Create a client for the source region
rds_client_source = boto3.client('rds', region_name=source_region)

# Create a client for the target region

rds_client_target = boto3.client('rds', region_name=target_region)

# Get the account ID for ARN generation
account_id = boto3.client('sts').get_caller_identity().get('Account')

# Define an exclude list for instance identifiers
exclude_list = ['env-dev-msv1', 'env-dev-msv2', 'env-dev-msv3', 'env-msv4', 'env-msv5', 'env-msv6', 'env-msv7']  # Add your specific instance IDs to exclude

# Record the total start time
total_start_time = time.time()

def migrate_rds_instance(db_instance):
    try:
        db_instance_identifier = db_instance['DBInstanceIdentifier']

        if db_instance_identifier in exclude_list:
            print(f"Skipping excluded RDS instance: {db_instance_identifier}")
            return

        instance_start_time = time.time()

        db_instance_class = db_instance['DBInstanceClass']
        storage_type = db_instance.get('StorageType', 'Not specified')
        engine = db_instance['Engine']
        allocated_storage = db_instance['AllocatedStorage']

        print(f"Processing RDS Instance: {db_instance_identifier}")
        print(f" - Instance Class: {db_instance_class}")
        print(f" - Storage Type: {storage_type}")
        print(f" - Engine: {engine}")
        print(f" - Allocated Storage: {allocated_storage} GiB")

        snapshot_identifier = f"{db_instance_identifier}-snapshot-{int(time.time())}"
        target_db_instance_identifier = f"{db_instance_identifier}-migrated"

        print(f"\nCreating snapshot for RDS instance: {db_instance_identifier}")
        snapshot_start_time = time.time()

        response_snapshot = rds_client_source.create_db_snapshot(
            DBSnapshotIdentifier=snapshot_identifier,
            DBInstanceIdentifier=db_instance_identifier
        )

        print("Waiting for snapshot to be available...")
        snapshot_waiter = rds_client_source.get_waiter('db_snapshot_available')
        snapshot_waiter.wait(
            DBSnapshotIdentifier=snapshot_identifier
        )

        snapshot_end_time = time.time()
        snapshot_duration = snapshot_end_time - snapshot_start_time
        print(f"Snapshot {snapshot_identifier} is now available for {db_instance_identifier}. Time taken: {snapshot_duration:.2f} seconds.")

        print(f"Copying snapshot to the target region: {target_region}")
        copy_start_time = time.time()

        snapshot_details = rds_client_source.describe_db_snapshots(
            DBSnapshotIdentifier=snapshot_identifier
        )['DBSnapshots'][0]

        encrypted = snapshot_details['Encrypted']

        copy_params = {
            'SourceDBSnapshotIdentifier': f"arn:aws:rds:{source_region}:{account_id}:snapshot:{snapshot_identifier}",
            'TargetDBSnapshotIdentifier': snapshot_identifier,
            'SourceRegion': source_region
        }

        if encrypted:
            copy_params['KmsKeyId'] = kms_key_id
            print(f"Snapshot is encrypted. Using KMS key: {kms_key_id}")

        response_copy = rds_client_target.copy_db_snapshot(**copy_params)

        print("Waiting for the snapshot to be available in the target region...")
        copy_waiter = rds_client_target.get_waiter('db_snapshot_completed')

        max_attempts = 60
        delay = 30

        try:
            copy_waiter.wait(
                DBSnapshotIdentifier=snapshot_identifier,
                WaiterConfig={
                    'Delay': delay,
                    'MaxAttempts': max_attempts
                }
            )
        except botocore.exceptions.WaiterError as e:
            print(f"Error waiting for snapshot copy: {e}")
            return

        copy_end_time = time.time()
        copy_duration = copy_end_time - copy_start_time
        print(f"Snapshot {snapshot_identifier} copied successfully to {target_region} for {db_instance_identifier}. Time taken: {copy_duration:.2f} seconds.")

        print(f"Restoring DB instance from snapshot {snapshot_identifier} in {target_region}")
        restore_start_time = time.time()

        restore_params = {
            'DBInstanceIdentifier': target_db_instance_identifier,
            'DBSnapshotIdentifier': snapshot_identifier,
            'DBInstanceClass': db_instance_class,
            'DBSubnetGroupName': db_subnet_group_name,
            'MultiAZ': multi_az,
            'PubliclyAccessible': publicly_accessible,
            'VpcSecurityGroupIds': security_group_ids
        }

        if not multi_az:
            restore_params['AvailabilityZone'] = availability_zone

        response_restore = rds_client_target.restore_db_instance_from_db_snapshot(**restore_params)

        print("Waiting for the new DB instance to be available...")
        instance_waiter = rds_client_target.get_waiter('db_instance_available')
        instance_waiter.wait(
            DBInstanceIdentifier=target_db_instance_identifier
        )

        restore_end_time = time.time()
        restore_duration = restore_end_time - restore_start_time
        print(f"DB instance {target_db_instance_identifier} is now available in {target_region}. Time taken: {restore_duration:.2f} seconds.")

        instance_end_time = time.time()
        instance_duration = instance_end_time - instance_start_time
        print(f"Migration of {db_instance_identifier} completed. Total time taken: {instance_duration:.2f} seconds.\n")

    except Exception as e:
        print(f"Error migrating instance {db_instance['DBInstanceIdentifier']}: {e}")

def main():
    response = rds_client_source.describe_db_instances()
    db_instances = response['DBInstances']

    if not db_instances:
        print("No RDS instances found in the source region.")
        return

    print(f"Found {len(db_instances)} RDS instances.")

    with concurrent.futures.ThreadPoolExecutor() as executor:
        executor.map(migrate_rds_instance, db_instances)

    total_end_time = time.time()
    total_duration = total_end_time - total_start_time
    print(f"All RDS instances have been processed for migration. Total time taken: {total_duration:.2f} seconds.")

if __name__ == '__main__':
    main()

Splitting a Monolithic RDS PostgreSQL Database

To split a monolithic database into multiple RDS instances, automation using pg_dump and pg_restore can simplify the process.

Automating the Split Process with Python

Read mapping file for old and new database connections
Export each source database using pg_dump
Import into the corresponding new database using pg_restore

Visual representation of splitting process:

sequenceDiagram
    participant Script
    participant Mapping File
    participant Source DB
    participant Target DB
    participant Filesystem

    Script->>Mapping File: Read DB mappings
    Mapping File-->>Script: Return mappings
    loop For each mapping
        Script->>Script: Parse DB URLs
        Script->>Source DB: pg_dump export
        Source DB-->>Script: Database dump
        Script->>Filesystem: Save dump file
        Script->>Target DB: pg_restore import
        Target DB-->>Script: Import complete
    end
    Script->>Script: Calculate total time

Example Python script:

import subprocess
import time
from urllib.parse import urlparse, parse_qs

def read_mapping_file(file_path):
    with open(file_path, 'r') as f:
        lines = f.readlines()

    mappings = [line.strip().split('@@') for line in lines if line.strip()]
    return mappings

def parse_db_url(db_url):
    parsed_url = urlparse(db_url)
    username = parsed_url.username
    password = parsed_url.password
    host = parsed_url.hostname
    port = parsed_url.port
    dbname = parsed_url.path.lstrip('/')
    query_params = parse_qs(parsed_url.query)

    return {
        "user": username,
        "password": password,
        "host": host,
        "port": str(port),
        "dbname": dbname,
        "options": query_params
    }

def export_database(config, dump_file):
    print(f"Starting database export for {config['dbname']}...")
    start_time = time.time()

    dump_command = [
        "/usr/bin/pg_dump",
        "-h", config['host'],
        "-p", config['port'],
        "-U", config['user'],
        "-d", config['dbname'],
        "-F", "c",
        "-f", dump_file
    ]

    env = {"PGPASSWORD": config['password']}

    try:
        subprocess.run(dump_command, check=True, env=env)
        end_time = time.time()
        elapsed_time = end_time - start_time
        print(f"Database exported to {dump_file} in {elapsed_time:.2f} seconds.")
    except subprocess.CalledProcessError as e:
        print("Error exporting database:", e)

def import_database(config, dump_file):
    print(f"Starting database import for {config['dbname']}...")
    start_time = time.time()

    import_command = [
        "/usr/bin/pg_restore",
        "-h", config['host'],
        "-p", config['port'],
        "-U", config['user'],
        "-d", config['dbname'],
        "--clean",
        dump_file
    ]

    env = {"PGPASSWORD": config['password']}

    try:
        subprocess.run(import_command, check=True, env=env)
        end_time = time.time()
        elapsed_time = end_time - start_time
        print(f"Database imported from {dump_file} in {elapsed_time:.2f} seconds.")
    except subprocess.CalledProcessError as e:
        print("Error importing database:", e)

def process_mappings(file_path):
    mappings = read_mapping_file(file_path)

    total_start_time = time.time()
    for old_db_url, new_db_url in mappings:
        old_db_config = parse_db_url(old_db_url)
        new_db_config = parse_db_url(new_db_url)

        print(f"Processing mapping: {new_db_config['dbname']}")
        dump_file = f"{old_db_config['dbname']}.dump"

        export_database(old_db_config, dump_file)

    total_end_time = time.time()
    total_elapsed_time = total_end_time - total_start_time
    print(f"Total process completed in {total_elapsed_time:.2f} seconds.")

mapping_file_path = "db_mapping.txt"
process_mappings(mapping_file_path)

This method ensures smooth database transfers into new RDS PostgreSQL instances.

Conclusion

Migrating and splitting an RDS PostgreSQL database can significantly enhance performance, scalability, and manageability. Python plays a crucial role in automating infrastructure management, database migration, and DevOps workflows. Leveraging AWS tools, boto3, and PostgreSQL utilities minimizes downtime and ensures efficient data transfer.