Skip to content

PostgreSQL Advanced Features

PostgreSQL extends well beyond standard SQL. Features like recursive CTEs, window functions, native JSON support, and built-in full-text search let you solve problems at the database layer that would otherwise require application code, external search engines, or additional infrastructure. This guide covers the capabilities that make PostgreSQL a practical choice for complex workloads.


Common Table Expressions (CTEs)

A Common Table Expression (CTE) is a named temporary result set defined with the WITH clause. It exists only for the duration of a single query, and you reference it by name like a table.

WITH active_customers AS (
    SELECT customer_id, name, email
    FROM customers
    WHERE status = 'active'
)
SELECT ac.name, o.total
FROM active_customers ac
JOIN orders o ON o.customer_id = ac.customer_id
WHERE o.total > 100;

CTEs improve readability when a query has multiple logical steps. You can chain them:

WITH monthly_totals AS (
    SELECT customer_id,
           DATE_TRUNC('month', order_date) AS month,
           SUM(total) AS month_total
    FROM orders
    GROUP BY customer_id, DATE_TRUNC('month', order_date)
),
ranked AS (
    SELECT customer_id, month, month_total,
           RANK() OVER (PARTITION BY month ORDER BY month_total DESC) AS rank
    FROM monthly_totals
)
SELECT * FROM ranked WHERE rank <= 10;

Recursive CTEs

Recursive CTEs use WITH RECURSIVE to traverse hierarchical data - org charts, threaded comments, file trees, bill-of-materials structures. The query has two parts: a base case (the anchor) and a recursive step that references the CTE itself.

WITH RECURSIVE org_chart AS (
    -- Base case: top-level managers (no manager_id)
    SELECT employee_id, name, manager_id, 1 AS depth
    FROM employees
    WHERE manager_id IS NULL

    UNION ALL

    -- Recursive step: find reports of each person found so far
    SELECT e.employee_id, e.name, e.manager_id, oc.depth + 1
    FROM employees e
    JOIN org_chart oc ON e.manager_id = oc.employee_id
)
SELECT depth, name, employee_id, manager_id
FROM org_chart
ORDER BY depth, name;

PostgreSQL executes this iteratively: it runs the base case first, then repeatedly executes the recursive step using the previous iteration's results until no new rows are produced.

Infinite recursion

If your data has cycles (a row that eventually points back to itself), a recursive CTE will run forever. Add a depth limit (WHERE ct.depth < 20) in the recursive step, or use CYCLE detection (PostgreSQL 14+): CYCLE id SET is_cycle USING path.

Materialized vs Non-Materialized CTEs

Before PostgreSQL 12, CTEs were always materialized - PostgreSQL executed the CTE once, stored the full result, and scanned it for each reference. This prevented the optimizer from pushing predicates into the CTE.

PostgreSQL 12+ defaults to inlining (non-materializing) CTEs that are referenced only once, treating them like subqueries. You can control this explicitly:

-- Force materialization (useful when the CTE is referenced multiple times)
WITH expensive_query AS MATERIALIZED (
    SELECT * FROM large_table WHERE complex_condition
)
SELECT * FROM expensive_query WHERE id = 5;

-- Force inlining (lets the planner push filters down)
WITH simple_filter AS NOT MATERIALIZED (
    SELECT * FROM orders WHERE status = 'shipped'
)
SELECT * FROM simple_filter WHERE customer_id = 42;

Use MATERIALIZED when the CTE result is reused multiple times. Use NOT MATERIALIZED (or just let the planner decide) when the CTE is referenced once and you want predicate pushdown.


Window Functions

Window functions perform calculations across a set of rows related to the current row without collapsing them into a single output row. Unlike GROUP BY, which reduces rows, window functions preserve every row and add computed columns.

The core syntax is:

function_name() OVER (
    PARTITION BY column     -- divide rows into groups
    ORDER BY column         -- order within each group
    frame_specification     -- which rows to include in the calculation
)

Ranking Functions

ROW_NUMBER(), RANK(), and DENSE_RANK() assign positions within partitions:

SELECT
    department,
    employee_name,
    salary,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num,
    RANK()       OVER (PARTITION BY department ORDER BY salary DESC) AS rank,
    DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dense_rank
FROM employees;
department employee_name salary row_num rank dense_rank
Engineering Alice 150000 1 1 1
Engineering Bob 150000 2 1 1
Engineering Carol 130000 3 3 2
Sales Dave 120000 1 1 1
Sales Eve 110000 2 2 2

The differences: ROW_NUMBER() always assigns unique sequential numbers (ties get arbitrary ordering). RANK() gives tied rows the same rank but skips subsequent numbers (1, 1, 3). DENSE_RANK() gives tied rows the same rank without gaps (1, 1, 2).

LAG() and LEAD()

LAG() and LEAD() access values from previous and subsequent rows:

SELECT
    order_date,
    total,
    LAG(total)  OVER (ORDER BY order_date) AS prev_total,
    LEAD(total) OVER (ORDER BY order_date) AS next_total,
    total - LAG(total) OVER (ORDER BY order_date) AS change
FROM orders
WHERE customer_id = 42;

This gives you row-by-row comparisons without self-joins. LAG(total, 2) looks two rows back. LAG(total, 1, 0) specifies a default value of 0 when there is no previous row.

Frame Specification and Running Totals

The frame specification controls which rows the window function considers relative to the current row:

-- Running total (all rows from start to current row)
SELECT
    order_date,
    total,
    SUM(total) OVER (ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
FROM orders;

-- 3-day moving average
SELECT
    order_date,
    total,
    AVG(total) OVER (ORDER BY order_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg_3
FROM orders;

-- Percentage of department total
SELECT
    department,
    employee_name,
    salary,
    ROUND(salary * 100.0 / SUM(salary) OVER (PARTITION BY department), 1) AS pct_of_dept
FROM employees;

Common frame options:

Frame Meaning
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW All rows from start to current (default with ORDER BY)
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW Current row plus the two before it
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING Current row to the end
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING All rows in the partition

Invalid interactive component configuration (terminal)


JSONB

PostgreSQL's JSONB type stores JSON data in a decomposed binary format. Unlike the JSON type (which stores the exact text), JSONB is parsed on input, supports indexing, and allows efficient querying of nested structures.

Operators

-- Create a table with JSONB
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name TEXT,
    attributes JSONB
);

INSERT INTO products (name, attributes) VALUES
('Laptop', '{"brand": "ThinkPad", "specs": {"ram": 16, "storage": 512}, "tags": ["business", "portable"]}'),
('Monitor', '{"brand": "Dell", "specs": {"size": 27, "resolution": "4K"}, "tags": ["display"]}');

The key operators:

Operator Returns Example Result
-> JSONB element attributes -> 'brand' "ThinkPad" (JSONB)
->> Text value attributes ->> 'brand' ThinkPad (text)
#> JSONB at path attributes #> '{specs,ram}' 16 (JSONB)
#>> Text at path attributes #>> '{specs,ram}' 16 (text)
@> Contains attributes @> '{"brand":"Dell"}' true
? Key exists attributes ? 'brand' true
?| Any key exists attributes ?| array['brand','color'] true
?& All keys exist attributes ?& array['brand','color'] false
-- Find products with 16GB RAM
SELECT name FROM products
WHERE attributes #>> '{specs,ram}' = '16';

-- Find products that contain a specific key-value pair
SELECT name FROM products
WHERE attributes @> '{"brand": "Dell"}';

-- Find products tagged as "business"
SELECT name FROM products
WHERE attributes -> 'tags' ? 'business';

JSONB Functions

jsonb_each() and jsonb_array_elements() expand JSONB into rows:

-- Expand top-level keys into rows
SELECT p.name, kv.key, kv.value
FROM products p, jsonb_each(p.attributes) AS kv
WHERE kv.key != 'tags';

-- Expand array elements
SELECT p.name, tag
FROM products p, jsonb_array_elements_text(p.attributes -> 'tags') AS tag;

Indexing JSONB with GIN

A GIN index (Generalized Inverted Index) on a JSONB column indexes every key and value in the document, making containment (@>) and existence (?) queries fast:

CREATE INDEX idx_product_attrs ON products USING GIN (attributes);

-- These queries use the GIN index
SELECT * FROM products WHERE attributes @> '{"brand": "Dell"}';
SELECT * FROM products WHERE attributes ? 'specs';

For queries on a specific key path, a btree index on an expression is more efficient than a full GIN index:

CREATE INDEX idx_product_brand ON products ((attributes ->> 'brand'));

-- Uses the btree expression index
SELECT * FROM products WHERE attributes ->> 'brand' = 'ThinkPad';

JSONB vs normalized tables

Use JSONB for truly variable attributes - product specifications, user preferences, API responses where the schema differs per record. If you find yourself querying the same JSONB keys repeatedly with WHERE clauses, those keys should probably be proper columns. JSONB is flexible but slower to query than indexed columns on normalized tables.

Invalid interactive component configuration (terminal)


Table Partitioning

Table partitioning splits a large table into smaller physical pieces (partitions) while presenting a single logical table to queries. PostgreSQL supports declarative partitioning (10+), which handles partition routing and constraint enforcement automatically.

Partition Strategies

PostgreSQL supports three partitioning methods:

  • RANGE - partition by value ranges (dates, IDs). The most common strategy for time-series data.
  • LIST - partition by discrete values (region, status, category).
  • HASH - distribute rows evenly across partitions using a hash function.

Range Partitioning Example

-- Create the partitioned parent table
CREATE TABLE sensor_readings (
    sensor_id   INTEGER NOT NULL,
    reading_time TIMESTAMPTZ NOT NULL,
    value       DOUBLE PRECISION,
    quality     TEXT
) PARTITION BY RANGE (reading_time);

-- Create monthly partitions
CREATE TABLE sensor_readings_2024_01 PARTITION OF sensor_readings
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

CREATE TABLE sensor_readings_2024_02 PARTITION OF sensor_readings
    FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');

CREATE TABLE sensor_readings_2024_03 PARTITION OF sensor_readings
    FOR VALUES FROM ('2024-03-01') TO ('2024-04-01');

-- Create a default partition for data outside defined ranges
CREATE TABLE sensor_readings_default PARTITION OF sensor_readings DEFAULT;

Queries that filter on reading_time automatically skip irrelevant partitions - this is partition pruning:

-- Only scans sensor_readings_2024_02
SELECT AVG(value)
FROM sensor_readings
WHERE reading_time >= '2024-02-01' AND reading_time < '2024-03-01';

-- EXPLAIN confirms partition pruning
EXPLAIN SELECT * FROM sensor_readings WHERE reading_time = '2024-02-15';
--  Append
--    ->  Seq Scan on sensor_readings_2024_02
--          Filter: (reading_time = '2024-02-15')

List Partitioning

CREATE TABLE orders (
    id          SERIAL,
    region      TEXT NOT NULL,
    order_date  DATE NOT NULL,
    total       NUMERIC
) PARTITION BY LIST (region);

CREATE TABLE orders_us PARTITION OF orders FOR VALUES IN ('us-east', 'us-west');
CREATE TABLE orders_eu PARTITION OF orders FOR VALUES IN ('eu-west', 'eu-central');
CREATE TABLE orders_apac PARTITION OF orders FOR VALUES IN ('ap-southeast', 'ap-northeast');

Managing Partitions

Partitions are regular tables. You can attach existing tables and detach partitions without downtime:

-- Create the new partition as a regular table
CREATE TABLE sensor_readings_2024_04 (LIKE sensor_readings INCLUDING ALL);

-- Load historical data into it (no partition routing overhead)
COPY sensor_readings_2024_04 FROM '/data/april_readings.csv' CSV HEADER;

-- Attach it to the partitioned table
ALTER TABLE sensor_readings
    ATTACH PARTITION sensor_readings_2024_04
    FOR VALUES FROM ('2024-04-01') TO ('2024-05-01');

-- Detach an old partition (for archiving or deletion)
ALTER TABLE sensor_readings
    DETACH PARTITION sensor_readings_2024_01;

Indexes on partitioned tables

When you create an index on the parent table, PostgreSQL automatically creates matching indexes on all existing partitions and any future partitions. Each partition's index is independent, which keeps index operations fast.

-- Creates an index on every partition
CREATE INDEX idx_sensor_time ON sensor_readings (reading_time);

PostgreSQL has built-in full-text search (FTS) that eliminates the need for external tools like Elasticsearch for many use cases. FTS uses two core types: tsvector (a processed document) and tsquery (a search query).

tsvector and tsquery

A tsvector is a sorted list of normalized words (lexemes) with position information. A tsquery is a search expression with boolean operators.

-- Convert text to tsvector
SELECT to_tsvector('english', 'The quick brown foxes jumped over lazy dogs');
-- 'brown':3 'dog':9 'fox':4 'jump':5 'lazi':8 'quick':2

-- Convert text to tsquery
SELECT to_tsquery('english', 'quick & foxes');
-- 'quick' & 'fox'

-- plainto_tsquery handles plain text (no operators needed)
SELECT plainto_tsquery('english', 'quick brown fox');
-- 'quick' & 'brown' & 'fox'

Notice that FTS normalizes words: "foxes" becomes "fox", "jumped" becomes "jump", "lazy" becomes "lazi". The english dictionary handles stemming, stop word removal ("the" is dropped), and case normalization.

The @@ Match Operator

The @@ operator matches a tsvector against a tsquery:

-- Search articles
SELECT title, body
FROM articles
WHERE to_tsvector('english', title || ' ' || body) @@ to_tsquery('english', 'database & replication');

Stored tsvector Columns and GIN Indexes

Calling to_tsvector() on every query is expensive. Store the processed vector in a column and index it:

-- Add a tsvector column
ALTER TABLE articles ADD COLUMN search_vector tsvector;

-- Populate it
UPDATE articles
SET search_vector = to_tsvector('english', COALESCE(title, '') || ' ' || COALESCE(body, ''));

-- Keep it updated with a trigger
CREATE FUNCTION articles_search_update() RETURNS trigger AS $$
BEGIN
    NEW.search_vector := to_tsvector('english', COALESCE(NEW.title, '') || ' ' || COALESCE(NEW.body, ''));
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_articles_search
    BEFORE INSERT OR UPDATE ON articles
    FOR EACH ROW EXECUTE FUNCTION articles_search_update();

-- Create a GIN index on the stored vector
CREATE INDEX idx_articles_search ON articles USING GIN (search_vector);

-- Fast searches using the index
SELECT title
FROM articles
WHERE search_vector @@ to_tsquery('english', 'postgresql & partitioning');

Relevance Ranking

ts_rank() scores how well a document matches a query, based on lexeme frequency and position:

SELECT
    title,
    ts_rank(search_vector, query) AS rank
FROM
    articles,
    to_tsquery('english', 'replication | failover') AS query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 10;

You can weight different parts of a document by assigning labels (A, B, C, D) to tsvector components:

UPDATE articles
SET search_vector =
    setweight(to_tsvector('english', COALESCE(title, '')), 'A') ||
    setweight(to_tsvector('english', COALESCE(body, '')), 'B');

Titles (weight A) now contribute more to the rank score than body text (weight B). The default weights are A=1.0, B=0.4, C=0.2, D=0.1.


Connection Pooling with PgBouncer

PostgreSQL forks a new OS process for every client connection. Each process consumes roughly 5-10 MB of RAM. At 500 connections, that is 2.5-5 GB dedicated just to connection overhead - and context switching between hundreds of processes degrades performance. Most of those connections are idle most of the time.

PgBouncer sits between your application and PostgreSQL, maintaining a small pool of actual database connections and multiplexing client connections across them.

Pooling Modes

Mode Behavior Use case
Transaction pooling Connection returned to pool after each transaction completes Most applications. Default choice.
Session pooling Connection held until client disconnects Applications using session-level features (prepared statements, temp tables, SET commands)
Statement pooling Connection returned after each statement Simple autocommit workloads only. Breaks multi-statement transactions.

Transaction pooling is the right choice for most web applications. A pool of 20-30 server connections can serve hundreds of application connections because each request only holds a connection for the duration of its transaction.

Basic Configuration

The main configuration file is pgbouncer.ini:

[databases]
myapp = host=127.0.0.1 port=5432 dbname=myapp_production

[pgbouncer]
listen_addr = 0.0.0.0
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt

; Pool sizing
pool_mode = transaction
default_pool_size = 20
max_client_conn = 200
min_pool_size = 5

; Timeouts
server_idle_timeout = 300
client_idle_timeout = 0
query_timeout = 30

; Logging
log_connections = 1
log_disconnections = 1

The userlist.txt file maps usernames to passwords:

"myapp_user" "md5encrypted_password_here"

Applications connect to PgBouncer on port 6432 instead of PostgreSQL on port 5432. No application code changes are needed beyond updating the connection port.

Transaction pooling limitations

In transaction pooling mode, session-level state does not persist between transactions. This breaks: prepared statements (use DEALLOCATE ALL after each transaction or enable max_prepared_statements in PgBouncer 1.21+), SET commands, LISTEN/NOTIFY, advisory locks, and temporary tables. If your application relies on these, use session pooling mode or refactor the application.


Foreign Data Wrappers

Foreign Data Wrappers (FDWs) let you query external data sources - other PostgreSQL servers, CSV files, MySQL databases, REST APIs - as if they were local tables. PostgreSQL handles the connection, query translation, and data transfer.

postgres_fdw: Cross-Server Queries

postgres_fdw connects to other PostgreSQL instances. This is useful for querying reporting replicas, accessing data in separate microservice databases, or migrating data between servers.

-- Install the extension
CREATE EXTENSION postgres_fdw;

-- Define the remote server
CREATE SERVER reporting_server
    FOREIGN DATA WRAPPER postgres_fdw
    OPTIONS (host 'reporting.internal', port '5432', dbname 'analytics');

-- Map local user to remote credentials
CREATE USER MAPPING FOR app_user
    SERVER reporting_server
    OPTIONS (user 'readonly', password 'secret');

-- Create a foreign table that mirrors the remote table
CREATE FOREIGN TABLE remote_daily_stats (
    stat_date   DATE,
    page_views  BIGINT,
    unique_users BIGINT,
    avg_session_seconds NUMERIC
) SERVER reporting_server
  OPTIONS (schema_name 'public', table_name 'daily_stats');

-- Query it like a local table
SELECT stat_date, page_views
FROM remote_daily_stats
WHERE stat_date >= CURRENT_DATE - INTERVAL '7 days';

You can join local and foreign tables in the same query. PostgreSQL pushes down WHERE clauses, ORDER BY, and aggregates to the remote server when possible, minimizing data transfer.

-- Import all tables from a remote schema at once
IMPORT FOREIGN SCHEMA public
    FROM SERVER reporting_server
    INTO local_reporting;

file_fdw: Querying CSV Files

file_fdw reads flat files as tables, which is useful for log analysis or importing data without COPY:

CREATE EXTENSION file_fdw;

CREATE SERVER csv_files FOREIGN DATA WRAPPER file_fdw;

CREATE FOREIGN TABLE access_log (
    ip          TEXT,
    request_time TIMESTAMPTZ,
    method      TEXT,
    path        TEXT,
    status      INTEGER,
    bytes       BIGINT
) SERVER csv_files
  OPTIONS (filename '/var/log/app/access.csv', format 'csv', header 'true');

-- Query the CSV file directly with SQL
SELECT ip, COUNT(*) AS requests
FROM access_log
WHERE status >= 500
GROUP BY ip
ORDER BY requests DESC
LIMIT 10;

Security with FDWs

User mappings store credentials in the pg_user_mappings catalog. Only superusers and the mapped user can see the password. Use pg_read_server_files role to control who can access file_fdw. In production, store FDW credentials in a secrets manager and rotate them regularly.


Practical Exercise


Further Reading


Previous: PostgreSQL Administration | Next: NoSQL Concepts & Architecture | Back to Index

Comments