Migrating from MySQL to PostgreSQL represents a significant architectural decision that can unlock powerful features, better standards compliance, and enhanced performance characteristics. This comprehensive guide provides a detailed roadmap for executing a successful migration, covering everything from initial assessment through post-migration optimization.
Why Migrate to PostgreSQL? #
Before diving into the technical details, it’s worth understanding what drives organizations to migrate from MySQL to PostgreSQL. PostgreSQL offers several compelling advantages including superior handling of complex queries, advanced data types like JSONB and arrays, robust support for concurrent operations, extensive indexing options (GiST, GIN, BRIN), and better adherence to SQL standards. Additionally, PostgreSQL provides sophisticated transaction handling, powerful full-text search capabilities, and native support for geographic data through PostGIS.
The decision to migrate shouldn’t be taken lightly, as it involves more than simply moving data between systems. You’ll need to account for syntax differences, rethink certain design patterns, update application code, and potentially retrain your team. However, for applications requiring advanced features, complex analytical queries, or strict data integrity, the migration often pays significant dividends.
Phase 1: Pre-Migration Assessment #
Comprehensive Database Analysis #
Begin by conducting a thorough analysis of your existing MySQL database. This assessment helps identify potential challenges early and allows you to create a realistic migration timeline. Start by documenting your database’s current state, including table sizes, row counts, index usage, and dependency relationships.
Create a detailed inventory of all database objects including tables, views, stored procedures, triggers, functions, and events. Pay special attention to MySQL-specific features that may require special handling, such as ENUM types, spatial data types, full-text indexes, and partitioned tables.
Understanding Your Data Volume and Complexity #
The size and complexity of your database significantly impact your migration strategy. A small database with simple schemas might be migrated in a single operation, while large production databases may require a phased approach with minimal downtime strategies.
Analyze your query patterns to understand which queries are most critical to your application’s performance. This information helps you prioritize index creation and optimization efforts in the new PostgreSQL environment. Use MySQL’s slow query log and performance schema to identify queries that may need rewriting or optimization.
Creating a Migration Timeline #
Develop a realistic timeline that accounts for schema conversion, data migration, testing, application updates, and team training. Factor in buffer time for unexpected challenges. For production systems, plan your migration during low-traffic periods and ensure you have rollback procedures in place.
Phase 2: Environment Setup and Preparation #
PostgreSQL Installation and Configuration #
Install PostgreSQL on your target system, choosing a version that balances feature requirements with long-term support considerations. PostgreSQL 14 and above offer significant performance improvements and useful features for most use cases.
Configure PostgreSQL appropriately for your workload. Start with these baseline configurations and adjust based on your specific hardware and requirements:
# postgresql.conf - Initial Configuration
# Memory Settings
shared_buffers = 4GB # 25% of total RAM for dedicated servers
effective_cache_size = 12GB # 50-75% of total RAM
work_mem = 64MB # Per-operation memory, adjust based on concurrent queries
maintenance_work_mem = 1GB # Used for maintenance operations like VACUUM, CREATE INDEX
# Write-Ahead Logging
wal_level = replica # Enables replication and point-in-time recovery
max_wal_size = 4GB # Maximum WAL size before forced checkpoint
min_wal_size = 1GB # Minimum WAL size between checkpoints
checkpoint_completion_target = 0.9 # Spread checkpoints over 90% of checkpoint_timeout
# Query Planner
random_page_cost = 1.1 # Lower for SSD storage (4.0 for spinning disks)
effective_io_concurrency = 200 # Higher for SSD arrays
default_statistics_target = 100 # Increase for better query plans on complex queries
# Parallel Query Execution
max_parallel_workers_per_gather = 4 # Parallel workers per query
max_parallel_workers = 8 # Total parallel workers
max_worker_processes = 8 # Total background workers
# Connection Settings
max_connections = 100 # Adjust based on application requirements
shared_preload_libraries = 'pg_stat_statements' # Enable query statistics tracking
Setting Up Migration Tools #
Install and configure the tools you’ll use for migration. Popular options include pgLoader (excellent for automated MySQL to PostgreSQL migration), AWS Database Migration Service (for cloud migrations), or custom Python/Ruby scripts using database connectors.
For pgLoader, install it using your system’s package manager or build from source. Create a configuration file that maps your MySQL database to PostgreSQL:
LOAD DATABASE
FROM mysql://user:password@mysql-host/source_database
INTO postgresql://user:password@postgres-host/target_database
WITH include drop, create tables, create indexes, reset sequences,
workers = 8, concurrency = 1,
multiple readers per thread, rows per range = 50000
SET PostgreSQL PARAMETERS
maintenance_work_mem to '1GB',
work_mem to '256MB'
CAST type datetime to timestamptz
drop default drop not null using zero-dates-to-null,
type date drop not null drop default using zero-dates-to-null,
type tinyint to boolean using tinyint-to-boolean,
type year to integer
BEFORE LOAD DO
$$ DROP SCHEMA IF EXISTS public CASCADE; $$,
$$ CREATE SCHEMA public; $$
AFTER LOAD DO
$$ ALTER SCHEMA public OWNER TO your_user; $$;
Phase 3: Schema Conversion #
Data Type Mapping Strategy #
One of the most critical aspects of migration involves mapping MySQL data types to their PostgreSQL equivalents. While many types have direct counterparts, others require careful consideration.
Here’s a comprehensive mapping reference:
Numeric Types:
TINYINT
→SMALLINT
(PostgreSQL has no single-byte integer)SMALLINT
→SMALLINT
(direct equivalent)MEDIUMINT
→INTEGER
(no medium integer in PostgreSQL)INT
/INTEGER
→INTEGER
(direct equivalent)BIGINT
→BIGINT
(direct equivalent)DECIMAL(p,s)
→NUMERIC(p,s)
(direct equivalent)FLOAT
→REAL
(4-byte floating point)DOUBLE
→DOUBLE PRECISION
(8-byte floating point)
String Types:
CHAR(n)
→CHAR(n)
(fixed-length, blank-padded)VARCHAR(n)
→VARCHAR(n)
(variable-length with limit)TEXT
/MEDIUMTEXT
/LONGTEXT
→TEXT
(PostgreSQL TEXT has no size limit)ENUM
→ Create custom ENUM type or use CHECK constraintSET
→TEXT[]
(array) or use normalized tables
Binary Types:
BINARY
/VARBINARY
→BYTEA
(binary data)BLOB
/MEDIUMBLOB
/LONGBLOB
→BYTEA
(no size variants needed)
Date and Time Types:
DATE
→DATE
(direct equivalent)DATETIME
→TIMESTAMP
(without timezone) orTIMESTAMPTZ
(with timezone, recommended)TIMESTAMP
→TIMESTAMPTZ
(PostgreSQL TIMESTAMP can store timezone)TIME
→TIME
(direct equivalent)YEAR
→SMALLINT
orINTEGER
(store as regular integer)
Special Types:
JSON
→JSONB
(PostgreSQL’s binary JSON format with indexing support)SPATIAL types
→ PostGIS geometry types (requires PostGIS extension)
Handling AUTO_INCREMENT and Sequences #
MySQL’s AUTO_INCREMENT
maps to PostgreSQL’s SERIAL
types or explicit sequences. When converting:
-- MySQL
CREATE TABLE users (
id INT AUTO_INCREMENT PRIMARY KEY,
username VARCHAR(50)
);
-- PostgreSQL (Option 1: SERIAL)
CREATE TABLE users (
id SERIAL PRIMARY KEY,
username VARCHAR(50)
);
-- PostgreSQL (Option 2: Explicit sequence)
CREATE SEQUENCE users_id_seq;
CREATE TABLE users (
id INTEGER DEFAULT nextval('users_id_seq') PRIMARY KEY,
username VARCHAR(50)
);
ALTER SEQUENCE users_id_seq OWNED BY users.id;
After data migration, ensure sequences are set to the correct value:
-- Set sequence to maximum existing ID + 1
SELECT setval('users_id_seq', COALESCE((SELECT MAX(id) FROM users), 1), true);
Converting ENUM Types #
MySQL’s ENUM type doesn’t have a direct equivalent in standard SQL. PostgreSQL offers two approaches:
Option 1: Custom ENUM Type (Type-Safe)
-- Create custom type
CREATE TYPE order_status AS ENUM ('pending', 'processing', 'shipped', 'delivered', 'cancelled');
-- Use in table
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
status order_status DEFAULT 'pending',
created_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP
);
Option 2: CHECK Constraint (More Flexible)
-- Use TEXT with CHECK constraint
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
status TEXT CHECK (status IN ('pending', 'processing', 'shipped', 'delivered', 'cancelled')),
created_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP
);
The ENUM type approach offers better type safety and slightly better performance, while the CHECK constraint approach provides more flexibility for adding values without schema changes.
Automated Schema Conversion Script #
For complex databases, automate schema conversion with a Python script:
import re
import sys
class MySQLToPostgreSQLConverter:
"""Converts MySQL CREATE TABLE statements to PostgreSQL syntax."""
def __init__(self):
self.type_mappings = {
r'\bTINYINT\b': 'SMALLINT',
r'\bSMALLINT\b': 'SMALLINT',
r'\bMEDIUMINT\b': 'INTEGER',
r'\bINT\b': 'INTEGER',
r'\bBIGINT\b': 'BIGINT',
r'\bFLOAT\b': 'REAL',
r'\bDOUBLE\b': 'DOUBLE PRECISION',
r'\bDATETIME\b': 'TIMESTAMP',
r'\bBLOB\b': 'BYTEA',
r'\bTEXT\b': 'TEXT',
}
def convert_schema(self, mysql_sql):
"""Main conversion method."""
sql = mysql_sql
# Replace backticks with double quotes
sql = re.sub(r'`([^`]+)`', r'"\1"', sql)
# Convert data types
for mysql_type, pg_type in self.type_mappings.items():
sql = re.sub(mysql_type, pg_type, sql, flags=re.IGNORECASE)
# Handle AUTO_INCREMENT
sql = re.sub(
r'(\w+)\s+INT(?:EGER)?\s+(?:UNSIGNED\s+)?AUTO_INCREMENT',
r'\1 SERIAL',
sql,
flags=re.IGNORECASE
)
# Remove MySQL-specific clauses
sql = re.sub(r'\s*ENGINE\s*=\s*\w+', '', sql, flags=re.IGNORECASE)
sql = re.sub(r'\s*DEFAULT\s+CHARSET\s*=\s*\w+', '', sql, flags=re.IGNORECASE)
sql = re.sub(r'\s*COLLATE\s*=\s*\w+', '', sql, flags=re.IGNORECASE)
sql = re.sub(r'\s*AUTO_INCREMENT\s*=\s*\d+', '', sql, flags=re.IGNORECASE)
# Convert UNSIGNED to CHECK constraints
sql = self._convert_unsigned(sql)
# Convert ENUM types
sql = self._convert_enums(sql)
return sql
def _convert_unsigned(self, sql):
"""Convert UNSIGNED to CHECK constraints."""
def replace_unsigned(match):
column_name = match.group(1)
data_type = match.group(2)
return f'{column_name} {data_type} CHECK ("{column_name}" >= 0)'
pattern = r'(\w+)\s+(\w+(?:\([^)]+\))?)\s+UNSIGNED'
return re.sub(pattern, replace_unsigned, sql, flags=re.IGNORECASE)
def _convert_enums(self, sql):
"""Convert ENUM to PostgreSQL ENUM or TEXT with CHECK."""
enum_pattern = r"(\w+)\s+ENUM\s*\(([^)]+)\)"
def replace_enum(match):
column_name = match.group(1)
values = match.group(2)
# Clean up values
values_list = [v.strip().strip("'\"") for v in values.split(',')]
values_joined = ', '.join(f"'{v}'" for v in values_list)
return f'{column_name} TEXT CHECK ("{column_name}" IN ({values_joined}))'
return re.sub(enum_pattern, replace_enum, sql, flags=re.IGNORECASE)
def main():
if len(sys.argv) < 2:
print("Usage: python convert_schema.py <mysql_schema.sql>")
sys.exit(1)
converter = MySQLToPostgreSQLConverter()
with open(sys.argv[1], 'r') as f:
mysql_schema = f.read()
postgresql_schema = converter.convert_schema(mysql_schema)
output_file = sys.argv[1].replace('.sql', '_postgres.sql')
with open(output_file, 'w') as f:
f.write(postgresql_schema)
print(f"Converted schema written to: {output_file}")
if __name__ == "__main__":
main()
Phase 4: Data Migration #
Choosing a Migration Strategy #
Your data migration strategy depends on several factors including database size, acceptable downtime, and complexity. Common approaches include:
Offline Migration: Suitable for smaller databases where you can afford downtime. Take the MySQL database offline, export all data, import to PostgreSQL, verify integrity, then switch over.
Online Migration with Replication: For large databases requiring minimal downtime. Set up continuous replication from MySQL to PostgreSQL, let it catch up, then perform a quick cutover during a maintenance window.
Phased Migration: Migrate tables or modules incrementally, updating application code to work with both databases during the transition period.
Using pgLoader for Automated Migration #
pgLoader provides an excellent automated solution for MySQL to PostgreSQL migration:
# Install pgLoader
sudo apt-get install pgloader # Debian/Ubuntu
brew install pgloader # macOS
# Run migration
pgloader mysql://user:pass@localhost/mydb postgresql://user:pass@localhost/mydb
For more control, create a detailed configuration file:
LOAD DATABASE
FROM mysql://root:password@localhost:3306/source_db
INTO postgresql://postgres:password@localhost:5432/target_db
WITH include drop, create tables, create indexes, reset sequences,
workers = 8, concurrency = 1,
multiple readers per thread, rows per range = 50000,
preserve index names
SET PostgreSQL PARAMETERS
maintenance_work_mem to '2GB',
work_mem to '512MB'
CAST type datetime to timestamptz
drop default drop not null using zero-dates-to-null,
type date drop not null drop default using zero-dates-to-null,
type tinyint to boolean using tinyint-to-boolean,
type year to integer,
type bigint when unsigned to bigint drop typemod,
type int when unsigned to bigint drop typemod,
type smallint when unsigned to integer drop typemod
EXCLUDING TABLE NAMES MATCHING 'temp_', 'backup_'
INCLUDING ONLY TABLE NAMES MATCHING 'users', 'orders', 'products'
BEFORE LOAD DO
$$ DROP SCHEMA IF EXISTS public CASCADE; $$,
$$ CREATE SCHEMA public; $$,
$$ CREATE EXTENSION IF NOT EXISTS "uuid-ossp"; $$
AFTER LOAD DO
$$ ALTER SCHEMA public OWNER TO app_user; $$,
$$ GRANT ALL ON ALL TABLES IN SCHEMA public TO app_user; $$,
$$ GRANT ALL ON ALL SEQUENCES IN SCHEMA public TO app_user; $$;
Custom Python Migration Script #
For complex migrations requiring custom data transformations, use a Python script:
import psycopg2
import mysql.connector
from datetime import datetime, timezone
import logging
from typing import List, Dict, Any
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)
class DatabaseMigrator:
"""Migrates data from MySQL to PostgreSQL with robust error handling."""
def __init__(self, mysql_config: Dict, postgres_config: Dict, batch_size: int = 10000):
self.mysql_config = mysql_config
self.postgres_config = postgres_config
self.batch_size = batch_size
self.mysql_conn = None
self.pg_conn = None
def connect(self):
"""Establish database connections."""
self.mysql_conn = mysql.connector.connect(**self.mysql_config)
self.pg_conn = psycopg2.connect(**self.postgres_config)
logger.info("Database connections established")
def disconnect(self):
"""Close database connections."""
if self.mysql_conn:
self.mysql_conn.close()
if self.pg_conn:
self.pg_conn.close()
logger.info("Database connections closed")
def get_table_list(self) -> List[str]:
"""Get list of tables from MySQL database."""
cursor = self.mysql_conn.cursor()
cursor.execute("SHOW TABLES")
tables = [table[0] for table in cursor.fetchall()]
cursor.close()
return tables
def get_table_columns(self, table_name: str) -> List[str]:
"""Get column names for a table."""
cursor = self.mysql_conn.cursor()
cursor.execute(f"DESCRIBE {table_name}")
columns = [row[0] for row in cursor.fetchall()]
cursor.close()
return columns
def convert_value(self, value: Any, column_name: str) -> Any:
"""Convert MySQL values to PostgreSQL-compatible format."""
# Handle NULL values
if value is None:
return None
# Convert datetime objects
if isinstance(value, datetime):
# Ensure timezone awareness
if value.tzinfo is None:
value = value.replace(tzinfo=timezone.utc)
return value
# Convert MySQL's zero dates
if isinstance(value, str) and value in ('0000-00-00', '0000-00-00 00:00:00'):
return None
# Convert boolean values (MySQL stores as 0/1)
if isinstance(value, int) and value in (0, 1):
# Check if column name suggests boolean
if any(keyword in column_name.lower() for keyword in ['is_', 'has_', 'enabled', 'active']):
return bool(value)
return value
def migrate_table(self, table_name: str) -> Dict[str, int]:
"""Migrate a single table from MySQL to PostgreSQL."""
logger.info(f"Starting migration of table: {table_name}")
mysql_cursor = self.mysql_conn.cursor(dictionary=True)
pg_cursor = self.pg_conn.cursor()
# Get total row count
mysql_cursor.execute(f"SELECT COUNT(*) as count FROM {table_name}")
total_rows = mysql_cursor.fetchone()['count']
logger.info(f"Total rows in {table_name}: {total_rows}")
if total_rows == 0:
logger.info(f"Table {table_name} is empty, skipping")
return {'migrated': 0, 'failed': 0}
# Get columns
columns = self.get_table_columns(table_name)
# Disable triggers during migration for better performance
pg_cursor.execute(f'ALTER TABLE "{table_name}" DISABLE TRIGGER ALL')
migrated_count = 0
failed_count = 0
# Migrate in batches
for offset in range(0, total_rows, self.batch_size):
try:
# Fetch batch from MySQL
mysql_cursor.execute(f"SELECT * FROM {table_name} LIMIT {self.batch_size} OFFSET {offset}")
rows = mysql_cursor.fetchall()
if not rows:
break
# Prepare batch insert
placeholders = ','.join(['%s'] * len(columns))
column_names = ','.join(f'"{col}"' for col in columns)
insert_query = f'INSERT INTO "{table_name}" ({column_names}) VALUES ({placeholders})'
# Convert values
converted_rows = []
for row in rows:
converted_row = tuple(
self.convert_value(row[col], col) for col in columns
)
converted_rows.append(converted_row)
# Execute batch insert
psycopg2.extras.execute_batch(pg_cursor, insert_query, converted_rows, page_size=1000)
self.pg_conn.commit()
migrated_count += len(rows)
logger.info(f"Migrated {migrated_count}/{total_rows} rows in {table_name}")
except Exception as e:
logger.error(f"Error migrating batch at offset {offset}: {str(e)}")
self.pg_conn.rollback()
failed_count += len(rows) if rows else 0
# Re-enable triggers
pg_cursor.execute(f'ALTER TABLE "{table_name}" ENABLE TRIGGER ALL')
self.pg_conn.commit()
# Update sequence for auto-increment columns
self._update_sequences(table_name, columns)
mysql_cursor.close()
pg_cursor.close()
logger.info(f"Completed migration of {table_name}: {migrated_count} migrated, {failed_count} failed")
return {'migrated': migrated_count, 'failed': failed_count}
def _update_sequences(self, table_name: str, columns: List[str]):
"""Update PostgreSQL sequences after data migration."""
pg_cursor = self.pg_conn.cursor()
for column in columns:
try:
# Try to update sequence if it exists
pg_cursor.execute(f"""
SELECT pg_get_serial_sequence('"{table_name}"', '{column}') AS sequence_name
""")
result = pg_cursor.fetchone()
if result and result[0]:
sequence_name = result[0]
pg_cursor.execute(f"""
SELECT setval('{sequence_name}',
COALESCE((SELECT MAX("{column}") FROM "{table_name}"), 1),
true)
""")
self.pg_conn.commit()
logger.info(f"Updated sequence for {table_name}.{column}")
except Exception as e:
logger.debug(f"No sequence for {table_name}.{column}: {str(e)}")
pg_cursor.close()
def migrate_all_tables(self) -> Dict[str, Dict[str, int]]:
"""Migrate all tables from MySQL to PostgreSQL."""
tables = self.get_table_list()
results = {}
for table in tables:
try:
results[table] = self.migrate_table(table)
except Exception as e:
logger.error(f"Failed to migrate table {table}: {str(e)}")
results[table] = {'migrated': 0, 'failed': -1}
return results
# Usage example
def main():
mysql_config = {
'host': 'localhost',
'user': 'root',
'password': 'mysql_password',
'database': 'source_db',
'charset': 'utf8mb4'
}
postgres_config = {
'host': 'localhost',
'user': 'postgres',
'password': 'postgres_password',
'database': 'target_db'
}
migrator = DatabaseMigrator(mysql_config, postgres_config, batch_size=10000)
try:
migrator.connect()
results = migrator.migrate_all_tables()
# Print summary
print("\n" + "="*50)
print("MIGRATION SUMMARY")
print("="*50)
for table, counts in results.items():
print(f"{table}: {counts['migrated']} migrated, {counts['failed']} failed")
finally:
migrator.disconnect()
if __name__ == "__main__":
main()
Phase 5: Migrating Database Objects #
Converting Stored Procedures and Functions #
MySQL and PostgreSQL use different procedural languages. MySQL uses its own procedural SQL syntax, while PostgreSQL primarily uses PL/pgSQL (though it supports multiple languages).
MySQL Stored Procedure:
DELIMITER $$
CREATE PROCEDURE get_order_total(IN order_id INT, OUT total DECIMAL(10,2))
BEGIN
SELECT SUM(quantity * price) INTO total
FROM order_items
WHERE order_id = order_id;
END$$
DELIMITER ;
PostgreSQL Function:
CREATE OR REPLACE FUNCTION get_order_total(order_id INTEGER)
RETURNS NUMERIC AS $$
DECLARE
total NUMERIC(10,2);
BEGIN
SELECT SUM(quantity * price) INTO total
FROM order_items
WHERE order_items.order_id = $1; -- Use parameter notation
RETURN total;
END;
$$ LANGUAGE plpgsql;
-- Usage
SELECT get_order_total(12345);
Converting Triggers #
Triggers in PostgreSQL require a trigger function and a separate trigger declaration:
MySQL Trigger:
CREATE TRIGGER update_modified_time
BEFORE UPDATE ON users
FOR EACH ROW
SET NEW.updated_at = CURRENT_TIMESTAMP;
PostgreSQL Trigger:
-- First create the trigger function
CREATE OR REPLACE FUNCTION update_modified_time()
RETURNS TRIGGER AS $$
BEGIN
NEW.updated_at = CURRENT_TIMESTAMP;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
-- Then create the trigger
CREATE TRIGGER update_modified_time
BEFORE UPDATE ON users
FOR EACH ROW
EXECUTE FUNCTION update_modified_time();
Converting Views #
Views generally transfer easily, but watch for MySQL-specific functions:
-- MySQL view
CREATE VIEW active_orders AS
SELECT
o.id,
o.customer_id,
DATE_FORMAT(o.created_at, '%Y-%m-%d') as order_date,
CONCAT(c.first_name, ' ', c.last_name) as customer_name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.status != 'cancelled';
-- PostgreSQL view
CREATE VIEW active_orders AS
SELECT
o.id,
o.customer_id,
to_char(o.created_at, 'YYYY-MM-DD') as order_date,
c.first_name || ' ' || c.last_name as customer_name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.status != 'cancelled';
Phase 6: Indexes and Constraints #
Recreating Indexes #
After data migration, recreate indexes for optimal performance. PostgreSQL offers several index types:
-- B-tree index (default, good for equality and range queries)
CREATE INDEX idx_orders_customer_id ON orders USING btree (customer_id);
-- Hash index (good for equality comparisons only)
CREATE INDEX idx_users_email ON users USING hash (email);
-- GIN index (for full-text search, JSONB, arrays)
CREATE INDEX idx_products_tags ON products USING gin (tags);
CREATE INDEX idx_products_metadata ON products USING gin (metadata);
-- GiST index (for geometric data, full-text search)
CREATE INDEX idx_locations_coords ON locations USING gist (coordinates);
-- Partial index (index only subset of rows)
CREATE INDEX idx_active_orders ON orders (created_at)
WHERE status IN ('pending', 'processing');
-- Multi-column index
CREATE INDEX idx_orders_customer_date ON orders (customer_id, created_at);
-- Covering index (include additional columns)
CREATE INDEX idx_orders_customer_covering ON orders (customer_id)
INCLUDE (status, total_amount);
-- Create indexes concurrently to avoid blocking
CREATE INDEX CONCURRENTLY idx_large_table_column ON large_table (column_name);
Adding Foreign Key Constraints #
Add foreign keys after data migration to avoid constraint violations:
-- Add foreign key with explicit name
ALTER TABLE orders
ADD CONSTRAINT fk_orders_customer
FOREIGN KEY (customer_id)
REFERENCES customers(id)
ON DELETE CASCADE
ON UPDATE CASCADE;
-- Add foreign key with index for performance
CREATE INDEX idx_order_items_order_id ON order_items(order_id);
ALTER TABLE order_items
ADD CONSTRAINT fk_order_items_order
FOREIGN KEY (order_id)
REFERENCES orders(id)
ON DELETE CASCADE;
-- Check constraint
ALTER TABLE products
ADD CONSTRAINT chk_price_positive
CHECK (price > 0);
-- Unique constraint
ALTER TABLE users
ADD CONSTRAINT unq_users_email
UNIQUE (email);
Phase 7: Query Conversion #
Common SQL Syntax Differences #
Several SQL syntax differences require query updates:
String Concatenation:
-- MySQL
SELECT CONCAT(first_name, ' ', last_name) as full_name FROM users;
-- PostgreSQL
SELECT first_name || ' ' || last_name as full_name FROM users;
-- Or use concat function (handles NULL better)
SELECT concat(first_name, ' ', last_name) as full_name FROM users;
Date Functions:
-- MySQL date formatting
SELECT DATE_FORMAT(created_at, '%Y-%m-%d') FROM orders;
-- PostgreSQL date formatting
SELECT to_char(created_at, 'YYYY-MM-DD') FROM orders;
-- MySQL date arithmetic
SELECT DATE_ADD(created_at, INTERVAL 7 DAY) FROM orders;
-- PostgreSQL date arithmetic
SELECT created_at + INTERVAL '7 days' FROM orders;
String Functions:
-- MySQL substring
SELECT SUBSTRING(name, 1, 10) FROM products;
-- PostgreSQL substring (same syntax works, but also has alternative)
SELECT SUBSTRING(name, 1, 10) FROM products;
SELECT substring(name from 1 for 10) FROM products;
LIMIT and OFFSET:
-- MySQL (both syntaxes work)
SELECT * FROM products LIMIT 10, 20; -- Skip 10, return 20
SELECT * FROM products LIMIT 20 OFFSET 10;
-- PostgreSQL (only standard syntax)
SELECT * FROM products LIMIT 20 OFFSET 10;
GROUP BY with Aggregation:
-- MySQL GROUP_CONCAT
SELECT customer_id, GROUP_CONCAT(product_name SEPARATOR ', ')
FROM orders
GROUP BY customer_id;
-- PostgreSQL STRING_AGG
SELECT customer_id, STRING_AGG(product_name, ', ')
FROM orders
GROUP BY customer_id;
-- PostgreSQL also has ARRAY_AGG for array results
SELECT customer_id, ARRAY_AGG(product_name)
FROM orders
GROUP BY customer_id;
ISNULL / IFNULL / COALESCE:
-- MySQL
SELECT IFNULL(phone, 'No phone') FROM users;
SELECT ISNULL(phone) FROM users;
-- PostgreSQL (use COALESCE - more standard)
SELECT COALESCE(phone, 'No phone') FROM users;
SELECT phone IS NULL FROM users;
Case Sensitivity:
-- MySQL (case-insensitive by default)
SELECT * FROM users WHERE email = 'USER@EXAMPLE.COM';
-- PostgreSQL (case-sensitive, use ILIKE for case-insensitive)
SELECT * FROM users WHERE email = 'user@example.com';
SELECT * FROM users WHERE email ILIKE 'user@example.com';
-- Or use LOWER for case-insensitive comparison
SELECT * FROM users WHERE LOWER(email) = LOWER('USER@EXAMPLE.COM');
Boolean Values:
-- MySQL (uses 0/1)
SELECT * FROM users WHERE is_active = 1;
UPDATE users SET is_active = 0 WHERE id = 123;
-- PostgreSQL (true boolean type)
SELECT * FROM users WHERE is_active = true;
-- Or simply
SELECT * FROM users WHERE is_active;
UPDATE users SET is_active = false WHERE id = 123;
JSON Operations #
PostgreSQL’s JSONB offers superior performance and functionality compared to MySQL’s JSON:
-- MySQL JSON operations
SELECT JSON_EXTRACT(metadata, '$.price') FROM products;
SELECT metadata->'$.tags[0]' FROM products;
-- PostgreSQL JSONB operations
SELECT metadata->>'price' FROM products; -- Returns text
SELECT metadata->'price' FROM products; -- Returns JSONB
SELECT metadata->'tags'->0 FROM products; -- Array access
SELECT metadata #> '{tags,0}' FROM products; -- Path access
-- JSONB containment queries (very powerful)
SELECT * FROM products WHERE metadata @> '{"category": "electronics"}';
SELECT * FROM products WHERE metadata ? 'discount'; -- Key exists
SELECT * FROM products WHERE metadata ?| array['sale', 'clearance']; -- Any key exists
-- Create GIN index for fast JSONB queries
CREATE INDEX idx_products_metadata ON products USING gin (metadata);
Full-Text Search #
PostgreSQL has built-in full-text search capabilities that surpass MySQL’s:
-- MySQL full-text search
CREATE FULLTEXT INDEX ft_products_name ON products(name, description);
SELECT * FROM products
WHERE MATCH(name, description) AGAINST('laptop computer' IN NATURAL LANGUAGE MODE);
-- PostgreSQL full-text search
-- Create tsvector column for better performance
ALTER TABLE products ADD COLUMN search_vector tsvector;
-- Update search vector
UPDATE products SET search_vector =
to_tsvector('english', coalesce(name,'') || ' ' || coalesce(description,''));
-- Create GIN index
CREATE INDEX idx_products_search ON products USING gin(search_vector);
-- Create trigger to auto-update search vector
CREATE TRIGGER products_search_update BEFORE INSERT OR UPDATE
ON products FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(search_vector, 'pg_catalog.english', name, description);
-- Query full-text search
SELECT * FROM products
WHERE search_vector @@ to_tsquery('english', 'laptop & computer');
-- Rank results by relevance
SELECT *, ts_rank(search_vector, query) AS rank
FROM products, to_tsquery('english', 'laptop & computer') query
WHERE search_vector @@ query
ORDER BY rank DESC;
Phase 8: Application Code Updates #
Database Driver Changes #
Update your application to use PostgreSQL drivers instead of MySQL drivers:
Python:
# MySQL (using mysql-connector-python or PyMySQL)
import mysql.connector
conn = mysql.connector.connect(
host='localhost',
user='root',
password='password',
database='mydb'
)
cursor = conn.cursor()
# PostgreSQL (using psycopg2)
import psycopg2
conn = psycopg2.connect(
host='localhost',
user='postgres',
password='password',
database='mydb'
)
cursor = conn.cursor()
# Note: Parameter placeholders are different
# MySQL uses %s for all types
cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
# PostgreSQL also uses %s (psycopg2 handles this)
cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
Node.js:
// MySQL (using mysql2)
const mysql = require('mysql2/promise');
const connection = await mysql.createConnection({
host: 'localhost',
user: 'root',
password: 'password',
database: 'mydb'
});
const [rows] = await connection.execute('SELECT * FROM users WHERE id = ?', [userId]);
// PostgreSQL (using pg)
const { Pool } = require('pg');
const pool = new Pool({
host: 'localhost',
user: 'postgres',
password: 'password',
database: 'mydb'
});
const result = await pool.query('SELECT * FROM users WHERE id = $1', [userId]);
const rows = result.rows;
PHP:
// MySQL (using PDO)
$mysql = new PDO('mysql:host=localhost;dbname=mydb', 'root', 'password');
$stmt = $mysql->prepare('SELECT * FROM users WHERE id = ?');
$stmt->execute([$userId]);
// PostgreSQL (using PDO - very similar!)
$pgsql = new PDO('pgsql:host=localhost;dbname=mydb', 'postgres', 'password');
$stmt = $pgsql->prepare('SELECT * FROM users WHERE id = $1');
$stmt->execute([$userId]);
ORM Configuration Updates #
If you’re using an ORM, update the database configuration:
Django:
# settings.py - MySQL
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql',
'NAME': 'mydb',
'USER': 'root',
'PASSWORD': 'password',
'HOST': 'localhost',
'PORT': '3306',
}
}
# settings.py - PostgreSQL
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql',
'NAME': 'mydb',
'USER': 'postgres',
'PASSWORD': 'password',
'HOST': 'localhost',
'PORT': '5432',
'OPTIONS': {
'connect_timeout': 10,
}
}
}
Ruby on Rails:
# config/database.yml - MySQL
default: &default
adapter: mysql2
encoding: utf8mb4
pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
username: root
password: password
host: localhost
# config/database.yml - PostgreSQL
default: &default
adapter: postgresql
encoding: unicode
pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
username: postgres
password: password
host: localhost
SQLAlchemy (Python):
# MySQL
from sqlalchemy import create_engine
engine = create_engine('mysql+pymysql://root:password@localhost/mydb')
# PostgreSQL
from sqlalchemy import create_engine
engine = create_engine('postgresql://postgres:password@localhost/mydb')
# Connection pooling for PostgreSQL
engine = create_engine(
'postgresql://postgres:password@localhost/mydb',
pool_size=20,
max_overflow=40,
pool_pre_ping=True, # Verify connections before using
pool_recycle=3600 # Recycle connections after 1 hour
)
Phase 9: Testing and Validation #
Data Integrity Verification #
After migration, thoroughly verify data integrity:
-- Compare row counts
-- MySQL
SELECT
table_name,
table_rows as estimated_rows
FROM information_schema.tables
WHERE table_schema = 'source_db'
ORDER BY table_name;
-- PostgreSQL (exact count)
SELECT
schemaname,
tablename,
n_live_tup as row_count
FROM pg_stat_user_tables
ORDER BY tablename;
-- Or for exact counts (slower but accurate)
SELECT
'users' as table_name,
COUNT(*) as row_count
FROM users
UNION ALL
SELECT 'orders', COUNT(*) FROM orders
UNION ALL
SELECT 'products', COUNT(*) FROM products;
Checksum Verification Script:
import hashlib
import psycopg2
import mysql.connector
def calculate_table_checksum(conn, table_name, columns, db_type='mysql'):
"""Calculate checksum for a table's data."""
cursor = conn.cursor()
# Build column list for ordering
column_list = ', '.join(columns)
order_clause = ', '.join(columns)
if db_type == 'mysql':
query = f"""
SELECT MD5(GROUP_CONCAT(
CONCAT_WS('|', {column_list})
ORDER BY {order_clause}
)) as checksum
FROM {table_name}
"""
else: # postgresql
query = f"""
SELECT MD5(STRING_AGG(
CONCAT_WS('|', {column_list}),
'' ORDER BY {order_clause}
)) as checksum
FROM "{table_name}"
"""
cursor.execute(query)
result = cursor.fetchone()
cursor.close()
return result[0] if result else None
def verify_migration(mysql_config, pg_config, tables):
"""Verify data migration by comparing checksums."""
mysql_conn = mysql.connector.connect(**mysql_config)
pg_conn = psycopg2.connect(**pg_config)
results = {}
for table_name, columns in tables.items():
mysql_checksum = calculate_table_checksum(mysql_conn, table_name, columns, 'mysql')
pg_checksum = calculate_table_checksum(pg_conn, table_name, columns, 'postgresql')
matches = mysql_checksum == pg_checksum
results[table_name] = {
'mysql_checksum': mysql_checksum,
'pg_checksum': pg_checksum,
'matches': matches
}
status = "✓ PASS" if matches else "✗ FAIL"
print(f"{status} - {table_name}")
mysql_conn.close()
pg_conn.close()
return results
# Usage
tables_to_verify = {
'users': ['id', 'username', 'email', 'created_at'],
'orders': ['id', 'customer_id', 'total', 'status'],
'products': ['id', 'name', 'price', 'category']
}
verify_migration(mysql_config, pg_config, tables_to_verify)
Performance Testing #
Create a comprehensive performance testing suite:
-- Enable query timing
\timing on
-- Test common queries
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE customer_id = 12345
AND created_at > '2024-01-01';
EXPLAIN ANALYZE
SELECT c.name, COUNT(o.id) as order_count
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name
HAVING COUNT(o.id) > 5;
-- Check index usage
SELECT
schemaname,
tablename,
indexname,
idx_scan as index_scans,
idx_tup_read as tuples_read,
idx_tup_fetch as tuples_fetched
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;
-- Identify missing indexes (tables with sequential scans)
SELECT
schemaname,
tablename,
seq_scan,
seq_tup_read,
idx_scan,
seq_tup_read / seq_scan as avg_seq_tup_read
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_tup_read DESC
LIMIT 20;
Phase 10: Performance Optimization #
VACUUM and ANALYZE #
PostgreSQL requires regular maintenance for optimal performance:
-- Analyze tables to update statistics
ANALYZE VERBOSE;
-- Analyze specific table
ANALYZE users;
-- Full vacuum (reclaims space, rewrites table)
VACUUM FULL ANALYZE users;
-- Regular vacuum (concurrent, doesn't lock table)
VACUUM ANALYZE users;
-- Configure autovacuum (in postgresql.conf)
autovacuum = on
autovacuum_max_workers = 3
autovacuum_naptime = 1min
autovacuum_vacuum_threshold = 50
autovacuum_analyze_threshold = 50
Connection Pooling #
Implement connection pooling for better performance:
Using PgBouncer:
# /etc/pgbouncer/pgbouncer.ini
[databases]
mydb = host=localhost port=5432 dbname=mydb
[pgbouncer]
listen_addr = 127.0.0.1
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25
min_pool_size = 10
reserve_pool_size = 5
reserve_pool_timeout = 3
server_lifetime = 3600
server_idle_timeout = 600
Application Connection Pool (Python):
from psycopg2 import pool
# Create connection pool
connection_pool = psycopg2.pool.SimpleConnectionPool(
minconn=10,
maxconn=100,
host='localhost',
database='mydb',
user='postgres',
password='password'
)
# Get connection from pool
def get_db_connection():
return connection_pool.getconn()
# Return connection to pool
def release_db_connection(conn):
connection_pool.putconn(conn)
# Usage
conn = get_db_connection()
try:
cursor = conn.cursor()
cursor.execute("SELECT * FROM users")
results = cursor.fetchall()
finally:
release_db_connection(conn)
Query Optimization #
Optimize slow queries using PostgreSQL’s powerful features:
-- Use EXPLAIN ANALYZE to identify bottlenecks
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT * FROM orders o
JOIN order_items oi ON o.id = oi.order_id
WHERE o.status = 'pending'
AND o.created_at > CURRENT_DATE - INTERVAL '30 days';
-- Create appropriate indexes
CREATE INDEX CONCURRENTLY idx_orders_status_created
ON orders(status, created_at)
WHERE status = 'pending';
-- Use partial indexes for filtered queries
CREATE INDEX idx_active_users ON users(last_login)
WHERE active = true;
-- Use covering indexes to avoid table lookups
CREATE INDEX idx_orders_customer_covering
ON orders(customer_id)
INCLUDE (status, total_amount, created_at);
-- Partition large tables for better performance
CREATE TABLE orders_2024 PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
CREATE TABLE orders_2025 PARTITION OF orders
FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');
Monitoring and Maintenance #
Set up monitoring to track PostgreSQL performance:
-- Create extension for better monitoring
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- View slow queries
SELECT
query,
calls,
total_exec_time,
mean_exec_time,
max_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;
-- Check table bloat
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename) - pg_relation_size(schemaname||'.'||tablename)) as external_size
FROM pg_tables
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 20;
-- Monitor cache hit ratio (should be > 99%)
SELECT
sum(heap_blks_read) as heap_read,
sum(heap_blks_hit) as heap_hit,
sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) as ratio
FROM pg_statio_user_tables;
Phase 11: Backup and Recovery #
Setting Up Continuous Archiving #
Configure Write-Ahead Log (WAL) archiving for point-in-time recovery:
# In postgresql.conf
wal_level = replica
archive_mode = on
archive_command = 'test ! -f /mnt/backup/wal/%f && cp %p /mnt/backup/wal/%f'
archive_timeout = 300 # Force archive every 5 minutes
# Create base backup
pg_basebackup -h localhost -U postgres -D /mnt/backup/base -Ft -z -P
# Restore from backup
# 1. Stop PostgreSQL
systemctl stop postgresql
# 2. Clear data directory
rm -rf /var/lib/postgresql/14/main/*
# 3. Extract base backup
tar -xzf /mnt/backup/base/base.tar.gz -C /var/lib/postgresql/14/main/
# 4. Create recovery.signal file
touch /var/lib/postgresql/14/main/recovery.signal
# 5. Configure recovery (in postgresql.auto.conf)
restore_command = 'cp /mnt/backup/wal/%f %p'
recovery_target_time = '2024-12-24 10:30:00'
# 6. Start PostgreSQL
systemctl start postgresql
Automated Backup Script #
#!/bin/bash
# automated_backup.sh
# Configuration
BACKUP_DIR="/mnt/backup/postgresql"
DB_NAME="mydb"
DB_USER="postgres"
RETENTION_DAYS=7
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_FILE="$BACKUP_DIR/${DB_NAME}_${DATE}.sql.gz"
# Create backup directory
mkdir -p "$BACKUP_DIR"
# Perform backup
echo "Starting backup of $DB_NAME..."
pg_dump -U "$DB_USER" -Fc "$DB_NAME" | gzip > "$BACKUP_FILE"
if [ $? -eq 0 ]; then
echo "Backup completed: $BACKUP_FILE"
# Calculate file size
SIZE=$(du -h "$BACKUP_FILE" | cut -f1)
echo "Backup size: $SIZE"
# Remove old backups
find "$BACKUP_DIR" -name "${DB_NAME}_*.sql.gz" -mtime +$RETENTION_DAYS -delete
echo "Cleaned up backups older than $RETENTION_DAYS days"
else
echo "Backup failed!"
exit 1
fi
# Verify backup integrity
echo "Verifying backup integrity..."
gunzip -t "$BACKUP_FILE"
if [ $? -eq 0 ]; then
echo "Backup verification successful"
else
echo "Backup verification failed!"
exit 1
fi
Phase 12: Rollback Strategy #
Maintaining Dual Databases During Transition #
For critical systems, maintain both databases temporarily:
class DualDatabaseManager:
"""Manages writes to both MySQL and PostgreSQL during transition."""
def __init__(self, mysql_conn, pg_conn):
self.mysql_conn = mysql_conn
self.pg_conn = pg_conn
def dual_write(self, query, params):
"""Execute write query on both databases."""
mysql_cursor = self.mysql_conn.cursor()
pg_cursor = self.pg_conn.cursor()
try:
# Write to MySQL (primary during transition)
mysql_cursor.execute(query, params)
self.mysql_conn.commit()
# Write to PostgreSQL (secondary)
pg_query = self.convert_query_syntax(query)
pg_cursor.execute(pg_query, params)
self.pg_conn.commit()
return True
except Exception as e:
# Rollback both databases
self.mysql_conn.rollback()
self.pg_conn.rollback()
raise e
finally:
mysql_cursor.close()
pg_cursor.close()
def convert_query_syntax(self, mysql_query):
"""Convert MySQL-specific syntax to PostgreSQL."""
# Add conversion logic here
pg_query = mysql_query.replace('`', '"')
return pg_query
Conclusion #
Migrating from MySQL to PostgreSQL is a comprehensive undertaking that demands meticulous planning, systematic execution, and thorough testing. This guide has walked you through every critical phase of the migration journey, from initial assessment and environment setup through data transfer, query conversion, and post-migration optimization.
The key to successful migration lies in understanding that this is not merely a data transfer operation—it’s an opportunity to leverage PostgreSQL’s advanced features, improve data integrity, and enhance application performance. By carefully mapping data types, converting procedural code, updating application queries, and implementing proper indexing strategies, you can ensure a smooth transition that positions your application for long-term success.
Remember these critical success factors throughout your migration:
Thorough Preparation: Invest substantial time in analyzing your existing MySQL database, identifying dependencies, documenting custom features, and creating a detailed migration plan. Understanding your current system’s quirks and complexities prevents surprises during migration.
Comprehensive Testing: Never migrate directly to production. Establish a staging environment that mirrors production, execute the complete migration process multiple times, verify data integrity through checksums and row counts, and conduct extensive performance testing before committing to the cutover.
Incremental Approach: For large or complex databases, consider a phased migration strategy. Migrate non-critical tables first, maintain dual-database writes during transition periods, and gradually shift traffic to PostgreSQL as confidence builds.
Performance Optimization: Don’t assume equivalent performance immediately after migration. Analyze query execution plans, create appropriate indexes including partial and covering indexes, configure PostgreSQL parameters for your workload, implement connection pooling, and establish regular maintenance routines including VACUUM and ANALYZE operations.
Continuous Monitoring: Implement robust monitoring from day one. Track query performance through pg_stat_statements, monitor cache hit ratios and table bloat, set up alerting for anomalous behavior, and maintain detailed logs for troubleshooting.
Team Preparation: Ensure your development and operations teams understand PostgreSQL’s differences. Provide training on PL/pgSQL for developers accustomed to MySQL’s procedural syntax, educate operations staff on PostgreSQL-specific maintenance tasks, and document new procedures and best practices.
PostgreSQL offers significant advantages over MySQL including superior concurrent transaction handling, advanced data types and indexing options, extensibility through custom types and functions, robust full-text search capabilities, and excellent standards compliance. By completing this migration successfully, you’re positioning your application to take full advantage of these powerful features.
The migration process might seem daunting initially, but by following this systematic approach, maintaining clear communication with stakeholders, and allowing adequate time for each phase, you can achieve a successful transition with minimal disruption. Your application will emerge with a more robust, feature-rich, and performant database foundation.
As you complete your migration, remember that the PostgreSQL community is extensive and supportive. The official documentation is comprehensive and well-maintained, numerous online forums and mailing lists provide expert assistance, and continuous development means regular improvements and new features. Welcome to the PostgreSQL ecosystem—your database infrastructure is now built on one of the most advanced open-source database systems available.