Scaling & Architecture Patterns¶

Advanced40 minPrereqs: database-fundamentalsdatabases performance

Learning outcomes

Set up read replicas and connection pooling for horizontal read scaling
Design sharding strategies and evaluate horizontal vs vertical scaling trade-offs
Apply microservices data patterns including database-per-service and CQRS

Your single-server database handles the first million users. Then queries slow down, connections pile up, and you start eyeing the credit card limit on your cloud account. Scaling a database is not just about throwing hardware at the problem - it requires understanding which bottleneck you are hitting and choosing the right architectural pattern to address it.

This guide covers the spectrum from simple read replicas through sharding, connection pooling, and the distributed data patterns that appear in microservices architectures.

Read Replicas¶

A read replica is a copy of your primary database that receives changes asynchronously and serves read-only queries. Most production workloads are read-heavy (often 80-90% reads), so offloading those reads to replicas is the first scaling move for nearly every application.

How Replication Works¶

The primary server records every write operation in a log - the binary log in MySQL, the write-ahead log (WAL) in PostgreSQL. Replicas connect to the primary, stream the log entries, and apply them locally:

flowchart TD
    App[Application] -->|Writes| Primary[(Primary Database)]
    App -->|Reads| LB{Load Balancer}
    Primary -->|Binary Log / WAL| R1[(Replica 1)]
    Primary -->|Binary Log / WAL| R2[(Replica 2)]
    Primary -->|Binary Log / WAL| R3[(Replica 3)]
    LB --> R1
    LB --> R2
    LB --> R3

In MySQL, binary log replication ships row-level or statement-level changes. In PostgreSQL, streaming replication sends WAL records. Both achieve the same result: replicas converge toward the primary's state.

Routing Reads vs Writes¶

Your application needs to send writes to the primary and reads to replicas. There are three common approaches:

Application-level routing - your code explicitly chooses a connection. Simple but couples routing logic to application code:

def get_user(user_id):
    # Read from replica
    with replica_pool.connection() as conn:
        return conn.execute("SELECT * FROM users WHERE id = %s", (user_id,))

def update_user(user_id, name):
    # Write to primary
    with primary_pool.connection() as conn:
        conn.execute("UPDATE users SET name = %s WHERE id = %s", (name, user_id))

Proxy-based routing - a middleware layer like ProxySQL or PgBouncer with Pgpool-II inspects queries and routes them automatically. The application connects to a single endpoint.

DNS-based routing - separate DNS entries for read and write endpoints (e.g., db-write.internal and db-read.internal). AWS RDS and Aurora use this pattern with their cluster endpoints.

Consistency Trade-offs¶

Replicas lag behind the primary. A user writes a comment, the next page load hits a replica that has not received the change yet, and the comment appears missing. This is replication lag, and you need a strategy for it:

Read-your-writes consistency - after a write, route that user's subsequent reads to the primary for a short window (a few seconds)
Monotonic reads - pin a user's session to a single replica so they never see data go backward
Semi-synchronous replication - the primary waits for at least one replica to acknowledge the write before confirming to the client. Reduces lag at the cost of write latency

Monitoring Replica Lag¶

In MySQL, check Seconds_Behind_Master (or Seconds_Behind_Source in MySQL 8.0.22+):

SHOW REPLICA STATUS\G
-- Look for: Seconds_Behind_Source: 0

In PostgreSQL, compare WAL positions:

-- On the primary
SELECT client_addr,
       state,
       pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS replay_lag_bytes
FROM pg_stat_replication;

Lag is not just a number

Seconds_Behind_Source measures the timestamp difference between the event on the primary and when the replica processed it. It does not account for events still in transit. A replica can show 0 seconds of lag while being thousands of transactions behind if it is processing them quickly but the primary is generating them faster.

Invalid interactive component configuration (terminal)

Connection Pooling¶

Every database connection consumes memory - typically 5-10 MB per connection in PostgreSQL, somewhat less in MySQL. A server with 4 GB of RAM dedicated to connections supports a few hundred at most. When your application spawns thousands of short-lived connections (common with PHP, serverless functions, or microservices), the overhead of creating and tearing down connections becomes a bottleneck before you run out of query capacity.

A connection pooler sits between your application and the database, maintaining a pool of persistent connections and multiplexing client requests across them. One hundred application instances sharing twenty database connections is common.

flowchart LR
    A1[App Instance 1] --> P[Connection Pool<br/>PgBouncer / ProxySQL]
    A2[App Instance 2] --> P
    A3[App Instance 3] --> P
    A4[App Instance 4] --> P
    A5[App Instance N] --> P
    P --> |"Fixed pool<br/>of connections"| DB[(Database Server)]

ProxySQL for MySQL¶

ProxySQL handles connection pooling, query routing, and query caching for MySQL. It operates as a transparent proxy - your application connects to ProxySQL on port 6033 instead of MySQL on port 3306.

Key capabilities:

Connection multiplexing - thousands of client connections share a smaller pool of backend connections
Query routing - sends reads to replicas and writes to the primary based on query rules
Query caching - caches SELECT results for configurable TTLs
Query rewriting - modifies queries in-flight (adding hints, rewriting patterns)

Configuration is stored in ProxySQL's internal SQLite database and managed through its admin interface on port 6032:

-- Add backend servers
INSERT INTO mysql_servers (hostgroup_id, hostname, port, weight)
VALUES (10, 'primary.db.internal', 3306, 1000);

INSERT INTO mysql_servers (hostgroup_id, hostname, port, weight)
VALUES (20, 'replica1.db.internal', 3306, 500);

INSERT INTO mysql_servers (hostgroup_id, hostname, port, weight)
VALUES (20, 'replica2.db.internal', 3306, 500);

-- Route reads to hostgroup 20, writes to hostgroup 10
INSERT INTO mysql_query_rules (rule_id, active, match_pattern, destination_hostgroup)
VALUES (1, 1, '^SELECT.*FOR UPDATE', 10);

INSERT INTO mysql_query_rules (rule_id, active, match_pattern, destination_hostgroup)
VALUES (2, 1, '^SELECT', 20);

-- Apply changes
LOAD MYSQL SERVERS TO RUNTIME;
LOAD MYSQL QUERY RULES TO RUNTIME;
SAVE MYSQL SERVERS TO DISK;
SAVE MYSQL QUERY RULES TO DISK;

PgBouncer for PostgreSQL¶

PgBouncer is a lightweight connection pooler for PostgreSQL. It supports three pooling modes:

Mode	Behavior	Use Case
Session	Connection assigned for the full client session	Applications using session-level features (prepared statements, temp tables)
Transaction	Connection assigned per transaction	Most web applications - best multiplexing ratio
Statement	Connection assigned per query	Simple query patterns with no multi-statement transactions

Transaction pooling gives the best multiplexing ratio. A pool of 20 PostgreSQL connections can serve hundreds of application instances because each instance only holds a connection for the duration of a transaction.

Pool Sizing¶

The optimal pool size is smaller than you think. PostgreSQL's documentation recommends:

pool_size = (core_count * 2) + effective_spindle_count

For a server with 8 cores and SSDs (effectively 1 spindle), that is (8 * 2) + 1 = 17 connections. More connections means more context switching, more lock contention, and worse throughput - not better.

The connection pool paradox

Adding more connections past the optimal point makes your database slower. If 20 connections can handle your workload with 50ms average query time, 200 connections will compete for the same CPU and disk, increasing average query time to 500ms. The database does more work per second with fewer concurrent connections.

Invalid interactive component configuration (terminal)

Horizontal vs Vertical Scaling¶

When your database hits its limits, you have two directions to grow.

Vertical scaling (scaling up) means giving your existing server more resources - more CPU, more RAM, faster storage. It is the simplest path: no code changes, no architectural complexity. You upgrade the instance type, restart, and your database handles more load.

Horizontal scaling (scaling out) means spreading data across multiple servers. This includes read replicas (covered above), sharding (covered below), and distributed databases. It introduces significant complexity but removes the ceiling imposed by a single machine's hardware.

Trade-offs¶

Factor	Vertical	Horizontal
Complexity	Minimal - same architecture	High - application changes, data routing
Downtime	Brief restart for upgrades	Can be zero-downtime with proper orchestration
Cost curve	Exponential - top-tier hardware is disproportionately expensive	Linear - commodity hardware scales predictably
Ceiling	Hard limit at largest available instance	Theoretically unlimited
Consistency	Strong - single node	Requires careful design for consistency guarantees
Operational burden	One server to manage	Many servers, network partitions, coordination

When Each Is Appropriate¶

Scale vertically first. It buys you time without the operational overhead of distributed systems. A single server with 64 cores, 512 GB RAM, and NVMe storage handles more than most applications need. Switch to horizontal scaling when:

Your dataset exceeds what a single server's storage can hold
Your write throughput exceeds what a single server's I/O can handle
Your availability requirements demand geographic distribution
Your cost curve goes parabolic on the next vertical tier

The single-server sweet spot

Many high-traffic applications run on a single primary with read replicas far longer than their engineering teams expect. Stack Overflow served 1.3 billion page views per month from a small cluster of SQL Server instances. Vertical scaling combined with read replicas handles more load than most applications will ever reach.

Sharding Strategies¶

Sharding splits a single logical database across multiple physical databases, each holding a subset of the data. Each subset is a shard. Unlike read replicas (which hold full copies), shards hold partitions - a given row exists on exactly one shard.

Sharding is the most operationally complex scaling strategy. It should be your last resort after exhausting vertical scaling, read replicas, caching, and query optimization.

Hash-Based Sharding¶

Compute a hash of the shard key (typically the primary key or tenant ID) and use modulo arithmetic to assign rows to shards:

shard_number = hash(shard_key) % number_of_shards

This distributes data evenly. The problem arises when you add shards - changing the modulus remaps most keys, requiring a massive data migration.

Consistent hashing solves this. Instead of hash % N, consistent hashing maps both keys and shard nodes onto a hash ring. When you add a shard, only the keys between the new node and its neighbor move. This minimizes data migration during resharding.

Range-Based Sharding¶

Assign rows to shards based on value ranges of the shard key:

Shard	Range
Shard 1	user_id 1 - 1,000,000
Shard 2	user_id 1,000,001 - 2,000,000
Shard 3	user_id 2,000,001 - 3,000,000

Range-based sharding supports efficient range queries (all users in a given range hit a single shard). The downside is hotspots - if most activity comes from recently created users, the latest shard handles disproportionate load.

Directory-Based Sharding¶

A separate lookup service maps each shard key to its shard location. The application queries the directory first, then routes to the appropriate shard.

This is the most flexible approach - you can move individual tenants between shards without changing the hashing or range logic. The directory itself becomes a critical dependency and potential bottleneck, so it is typically a fast key-value store like Redis.

Choosing a Shard Key¶

The shard key determines how data is distributed and which queries can be satisfied by a single shard. A good shard key:

Distributes data evenly - avoids hotspot shards
Aligns with query patterns - queries that filter on the shard key hit a single shard
Has high cardinality - enough distinct values to distribute across many shards
Is immutable - changing a shard key requires moving the row between shards

For multi-tenant SaaS applications, tenant_id is often the ideal shard key. All of a tenant's data lives on one shard, so most queries stay within a single shard.

Cross-Shard Queries¶

Queries that span multiple shards are expensive. A JOIN between a user on Shard 1 and their orders on Shard 3 cannot be executed by either shard alone. The application (or a query coordinator) must:

Query each relevant shard independently
Merge results in the application layer
Handle pagination, sorting, and aggregation across partial results

This is why shard key selection matters so much - you want the queries your application runs most often to hit a single shard.

flowchart TD
    App[Application] --> Router{Shard Router}
    Router -->|tenant_id 1-1000| S1[(Shard 1)]
    Router -->|tenant_id 1001-2000| S2[(Shard 2)]
    Router -->|tenant_id 2001-3000| S3[(Shard 3)]
    S1 --> R1[(Replica 1a)]
    S2 --> R2[(Replica 2a)]
    S3 --> R3[(Replica 3a)]
    Dir[(Shard Directory)] -.->|Lookup| Router

Sharding is a one-way door

Unsharding (merging shards back into a single database) is significantly harder than sharding in the first place. Before sharding, exhaust every other option: better indexes, query optimization, caching, read replicas, vertical scaling. Each of these is reversible. Sharding is not.

You are building a multi-tenant SaaS application where each tenant has users, orders, and invoices. Which shard key gives you the best balance of even distribution and single-shard query efficiency? (requires JavaScript)

Your application uses hash-based sharding with 4 shards (shard = hash(user_id) % 4). You need to expand to 6 shards to handle growth. What is the primary problem with this approach? (requires JavaScript)

Microservices Data Patterns¶

When your application splits into microservices, the database architecture must split too. Getting data boundaries wrong is the most common source of pain in microservices systems.

Database-per-Service¶

Each microservice owns its data store. The Orders service has its own database. The Inventory service has its own database. No service directly queries another service's database - it goes through that service's API.

This provides:

Independent deployability - schema changes do not ripple across services
Technology freedom - the Orders service can use PostgreSQL while the Search service uses Elasticsearch
Fault isolation - one service's database problems do not cascade to others
Clear ownership - one team owns the data, the schema, and the access patterns

The cost is that operations that previously required a single JOIN now require API calls across services.

The Shared Database Anti-Pattern¶

Multiple services reading and writing the same database tables is the shared database pattern. It feels simpler at first - no API calls, just queries. But it creates tight coupling:

Schema changes in one service break other services
Lock contention between services causes unpredictable performance
No clear ownership - who is responsible for data consistency?
Services cannot be deployed independently if they share migration timelines

Shared databases are appropriate when the "services" are really modules of a monolith that happen to deploy separately. For genuinely independent services, database-per-service is the path to operational sanity.

Data Ownership Boundaries¶

Each piece of data should have exactly one authoritative service. The Customers service owns customer profiles. The Orders service owns order records. When the Orders service needs customer data (like a name for an invoice), it has two options:

Synchronous API call - query the Customers service at render time. Simple but creates runtime coupling
Event-driven data replication - the Customers service publishes CustomerUpdated events, and the Orders service stores a local copy of the fields it needs. Decoupled but eventually consistent

API Composition for Cross-Service Queries¶

A dashboard that shows "customer name, recent orders, payment status" spans three services. An API composition layer (sometimes called a BFF - Backend for Frontend) queries each service and assembles the response:

async def get_customer_dashboard(customer_id):
    customer, orders, payments = await asyncio.gather(
        customer_service.get(customer_id),
        order_service.list_by_customer(customer_id),
        payment_service.get_status(customer_id),
    )
    return {
        "customer": customer,
        "recent_orders": orders[:5],
        "payment_status": payments,
    }

This works for simple aggregations. For complex queries involving filtering and sorting across service boundaries, you need CQRS or a dedicated read model.

CQRS¶

Command Query Responsibility Segregation (CQRS) uses separate models for reading and writing data. The command side handles writes (creates, updates, deletes) and optimizes for data integrity and business rule enforcement. The query side handles reads and optimizes for the specific views your application needs.

Why Separate Read and Write Models?¶

In a traditional architecture, the same database schema serves both reads and writes. This forces compromises:

Normalization optimizes writes (no duplicated data to keep in sync) but makes reads expensive (joins across many tables)
Denormalization optimizes reads (pre-joined data) but makes writes complex (update multiple places)

CQRS eliminates this compromise. The write model is normalized for integrity. The read model is denormalized - pre-computed views that match exactly what the UI needs.

Projections¶

A projection transforms command-side events into read-side views. When a new order is placed, the projection updates:

The customer's order list view
The product's sales count view
The daily revenue summary view

Each view is a read model optimized for a specific query pattern. No joins at read time - the data is pre-aggregated.

Eventual Consistency¶

The read model lags behind the write model. After a write, there is a delay before the projection updates the read views. This is eventual consistency between the command and query sides.

For most use cases, this delay is milliseconds to seconds - imperceptible to users. For cases where it matters (like showing a user their just-submitted order), use the same read-your-writes pattern described in the replicas section: route immediately post-write reads to the command side.

When CQRS Adds Value¶

CQRS is worth the complexity when:

Read and write patterns are dramatically different (many more reads than writes, or reads requiring heavy aggregation)
You need different scaling profiles for reads and writes
Multiple services need different views of the same data
Your read queries require joins across many tables that are expensive at query time

CQRS is unnecessary complexity when:

Your application is a straightforward CRUD app
Read and write patterns are similar
You have a small team and the operational overhead of maintaining two models outweighs the benefits

CQRS without event sourcing

CQRS does not require event sourcing. You can implement CQRS with a standard relational database: writes go to normalized tables, and background processes or database triggers maintain denormalized read views. Start here before introducing an event store.

Event Sourcing Overview¶

Event sourcing stores every state change as an immutable event rather than overwriting the current state. Instead of a users table with the current name and email, you have an event stream:

UserCreated    { id: 42, name: "Alice", email: "alice@example.com", timestamp: "2024-01-15T10:00:00Z" }
EmailChanged   { id: 42, email: "alice@newdomain.com", timestamp: "2024-03-22T14:30:00Z" }
NameChanged    { id: 42, name: "Alice Smith", timestamp: "2024-06-01T09:15:00Z" }
AccountLocked  { id: 42, reason: "Too many login failures", timestamp: "2024-07-10T23:45:00Z" }

The current state is derived by replaying all events for that entity. The event store is the system of record - append-only and immutable.

Event Store¶

The event store is an append-only log. Events are never updated or deleted. Each event has:

Stream ID - identifies the entity (e.g., user 42)
Event type - what happened (e.g., EmailChanged)
Payload - the data associated with the event
Timestamp - when it happened
Sequence number - ordering within the stream

Purpose-built event stores include EventStoreDB and Axon Server. You can also build event sourcing on top of PostgreSQL, Kafka, or DynamoDB, though you lose some purpose-built features like stream subscriptions and projections.

Projections and Materializers¶

Since the event store holds raw events, you need projections (also called materializers) to build queryable views. A projection subscribes to the event stream and maintains a read-optimized representation:

class UserProfileProjection:
    def handle(self, event):
        if event.type == "UserCreated":
            db.execute(
                "INSERT INTO user_profiles (id, name, email) VALUES (%s, %s, %s)",
                (event.data["id"], event.data["name"], event.data["email"])
            )
        elif event.type == "EmailChanged":
            db.execute(
                "UPDATE user_profiles SET email = %s WHERE id = %s",
                (event.data["email"], event.data["id"])
            )
        elif event.type == "NameChanged":
            db.execute(
                "UPDATE user_profiles SET name = %s WHERE id = %s",
                (event.data["name"], event.data["id"])
            )

Replay Capability¶

The most powerful feature of event sourcing is replay. Because the event store contains the complete history, you can:

Rebuild read models - if a projection has a bug, fix it and replay all events to rebuild the view from scratch
Add new projections - need a new report? Create a new projection and replay history to backfill it
Debug production issues - replay the exact sequence of events that led to a bug
Audit everything - the event log is a complete, immutable audit trail

Relationship with CQRS¶

Event sourcing and CQRS pair naturally. The event store serves as the write model (command side), and projections build the read models (query side). However, they are independent patterns:

CQRS without event sourcing: writes go to normalized tables, projections build denormalized views
Event sourcing without CQRS: events are the source of truth, but you might query by replaying events (impractical at scale - you almost always want projections)

Event sourcing trade-offs

Event sourcing adds significant complexity: event schema evolution is harder than database migrations, eventual consistency between the event store and projections requires careful handling, and querying by replaying events is expensive without projections. Use it when the audit trail, replay capability, and temporal queries justify the cost - financial systems, collaborative editing, and compliance-heavy domains.

Architecture Topologies¶

Here are three common production topologies, each suited to different scale and availability requirements.

Single Primary with Replicas¶

The most common starting topology. One primary handles all writes; multiple replicas handle reads. A proxy or load balancer routes traffic:

Primary (writes) --> Replica 1 (reads)
                 --> Replica 2 (reads)
                 --> Replica 3 (reads)

Best for: Applications with read-heavy workloads and moderate write volume. Handles the vast majority of production use cases.

Multi-Region¶

For global applications needing low latency and high availability across regions. Each region has a primary or a local read replica, with cross-region replication:

Component	US-East	EU-West	AP-Southeast
Primary	Yes	No	No
Replica	Yes	Yes	Yes
Proxy	ProxySQL	ProxySQL	ProxySQL
Writes	Local	Routed to US-East	Routed to US-East
Reads	Local	Local	Local

Write latency for non-primary regions equals the network round-trip to the primary. Some systems (like CockroachDB, YugabyteDB, or Aurora Global Database) support multi-primary writes with conflict resolution, trading consistency for latency.

Sharded with Replicas¶

For datasets too large for a single server. Each shard is itself a primary with replicas:

Shard 1: Primary --> Replica 1a, Replica 1b
Shard 2: Primary --> Replica 2a, Replica 2b
Shard 3: Primary --> Replica 3a, Replica 3b

Shard Router: directs queries to the correct shard

This is the most operationally complex topology. Each shard needs its own monitoring, backup strategy, and failover configuration. Tools like Vitess (for MySQL) and Citus (for PostgreSQL) provide sharding infrastructure that simplifies management.

Putting It All Together¶

Scaling decisions should follow a progression, not jump to the most complex solution:

Optimize queries and indexes - free performance on existing hardware
Scale vertically - bigger instance, more RAM, faster storage
Add read replicas - offload read traffic
Add connection pooling - handle more concurrent clients with existing capacity
Introduce caching - Redis/Memcached for frequently accessed data
Consider CQRS - separate read and write models when query patterns diverge significantly
Shard as a last resort - when write throughput or dataset size exceeds single-server limits

Each step up adds operational complexity. Do not skip ahead - every level handles more scale than most applications ever reach.

Scaling & Architecture Patterns¶

Read Replicas¶

How Replication Works¶

Routing Reads vs Writes¶

Consistency Trade-offs¶

Monitoring Replica Lag¶

Connection Pooling¶

ProxySQL for MySQL¶

PgBouncer for PostgreSQL¶

Pool Sizing¶

Horizontal vs Vertical Scaling¶

Trade-offs¶

When Each Is Appropriate¶

Sharding Strategies¶

Hash-Based Sharding¶

Range-Based Sharding¶

Directory-Based Sharding¶

Choosing a Shard Key¶

Cross-Shard Queries¶

Microservices Data Patterns¶

Database-per-Service¶

The Shared Database Anti-Pattern¶

Data Ownership Boundaries¶

API Composition for Cross-Service Queries¶

CQRS¶

Why Separate Read and Write Models?¶

Projections¶

Eventual Consistency¶

When CQRS Adds Value¶

Event Sourcing Overview¶

Event Store¶

Projections and Materializers¶

Replay Capability¶

Relationship with CQRS¶

Architecture Topologies¶

Single Primary with Replicas¶

Multi-Region¶

Sharded with Replicas¶

Putting It All Together¶

Further Reading¶

Comments