Modeling CPU Time vs Query Cost in PostgreSQL

PostgreSQL’s planner cost is a dimensionless heuristic, not money — this page walks through the exact Python needed to replace it with measured CPU-seconds from pg_stat_statements and price each query fingerprint against a real cloud vCPU rate.

Back to: Query Execution Cost Modeling

The single most common attribution error is reading the cost number from EXPLAIN and multiplying it by a dollar figure. That value is a static estimate seeded by seq_page_cost, random_page_cost, and cpu_tuple_cost; it correlates with runtime only loosely and knows nothing about your instance’s vCPU rate or its cache state. The reliable signal lives in pg_stat_statements.total_exec_time — cumulative backend execution time the engine actually measured. This page is the PostgreSQL-specific correction inside the broader discipline of per-query execution cost modeling, and the rate it multiplies against should come from the canonical model built when you are normalizing provider billing exports into a unified schema rather than a hard-coded constant.

The conversion from measured milliseconds to a billable figure is deliberately linear:

\text{vcpu\_seconds} = \frac{\text{cpu\_ms}}{1000} \times \text{parallel\_ratio}

\text{query\_cost} = \text{vcpu\_seconds} \times \text{arch\_coeff} \times \frac{\text{rate\_per\_vcpu\_hour}}{3600}

where cpu_ms is execution time with I/O wait netted out, parallel_ratio accounts for parallel workers, and arch_coeff normalizes an ARM (Graviton) core against the x86 baseline the rate table is quoted in.

Prerequisites

Before running the collector, confirm the following are in place.

PostgreSQL role: the collector needs read-only access to cumulative statistics — nothing more. Grant the built-in pg_monitor role (which includes pg_read_all_stats) to a dedicated login; never reuse an application or superuser credential for extraction. Least-privilege here is part of broader access control for cost data.
```
CREATE ROLE cost_reader LOGIN PASSWORD '<secret>';
GRANT pg_monitor TO cost_reader;   -- read pg_stat_statements + pg_stat_activity
```
Extension + I/O timing: pg_stat_statements must be preloaded, and track_io_timing must be on so blk_read_time/blk_write_time are populated (they are what let you subtract I/O wait from execution time).
```
-- in shared_preload_libraries, then reload:
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
SET track_io_timing = on;   -- set in postgresql.conf for persistence
```
Python: 3.10 or newer (the code uses modern asyncio and structural typing).
Libraries: install the PostgreSQL driver and the async HTTP client used for the pricing lookup.
```
pip install "psycopg[binary]>=3.1" "aiohttp>=3.9"
```

Step-by-Step Implementation

The collector snapshots cumulative counters, diffs two snapshots into per-window CPU-seconds, fetches the live vCPU rate with a cached fallback, and prices each fingerprint. The counters are cumulative since the last stats reset, so a single read is meaningless — you sample, wait, sample again, and subtract.

Step 1 — Snapshot cumulative CPU counters

Read total_exec_time alongside the I/O timing columns so a later step can isolate pure CPU. Keying by queryid collapses literal-bearing SQL into one stable fingerprint.

import psycopg
from psycopg.rows import dict_row

SNAPSHOT_SQL = """
    SELECT queryid,
           calls,
           total_exec_time,   -- ms, cumulative; INCLUDES I/O wait
           blk_read_time,      -- ms reading blocks (needs track_io_timing)
           blk_write_time,     -- ms writing blocks
           rows
    FROM pg_stat_statements
    WHERE dbid = (SELECT oid FROM pg_database WHERE datname = current_database())
      AND queryid IS NOT NULL
    ORDER BY total_exec_time DESC
    LIMIT %(limit)s;
"""

def snapshot(dsn: str, limit: int = 500) -> dict[int, dict]:
    """Read cumulative per-fingerprint counters, keyed by queryid."""
    with psycopg.connect(dsn, row_factory=dict_row) as conn:
        with conn.cursor() as cur:
            cur.execute(SNAPSHOT_SQL, {"limit": limit})
            return {row["queryid"]: row for row in cur.fetchall()}

Step 2 — Diff two snapshots into vCPU-seconds

total_exec_time is wall-clock time inside the backend and therefore includes time blocked on I/O. Subtracting blk_read_time + blk_write_time recovers time the CPU was actually working — the quantity a cloud provider bills as vCPU-seconds. Parallel workers are folded in via parallel_ratio (1 + worker_count).

def to_vcpu_seconds(prev: dict, curr: dict, parallel_ratio: float = 1.0) -> list[dict]:
    """Diff two cumulative snapshots into per-window CPU-seconds."""
    windows = []
    for qid, now in curr.items():
        was = prev.get(qid)
        if was is None:
            continue                        # first sighting; no baseline yet
        calls = now["calls"] - was["calls"]
        if calls <= 0:
            continue                        # no executions, or a stats reset
        exec_ms = now["total_exec_time"] - was["total_exec_time"]
        io_ms = ((now["blk_read_time"] - was["blk_read_time"])
                 + (now["blk_write_time"] - was["blk_write_time"]))
        cpu_ms = max(exec_ms - io_ms, 0.0)  # net out I/O wait to isolate CPU
        windows.append({
            "queryid": qid,
            "calls": calls,
            "vcpu_seconds": (cpu_ms / 1000.0) * parallel_ratio,
        })
    return windows

Step 3 — Fetch the vCPU rate with a cached fallback

The rate is time-varying (region, instance family, and — on Aurora Serverless v2 — the current ACU scaling level), so it is fetched at runtime. A transient billing-API outage must degrade to the last cached rate rather than stall the loop; this is the same circuit-breaker posture described in fallback routing for cost APIs.

import asyncio
import logging
import aiohttp

logger = logging.getLogger("pg_cpu_cost")

async def fetch_vcpu_rate(session: aiohttp.ClientSession, url: str,
                          cached_rate: float, retries: int = 3) -> float:
    """Return live $/vCPU-hour, falling back to the cached rate on failure."""
    for attempt in range(retries):
        try:
            timeout = aiohttp.ClientTimeout(total=10)
            async with session.get(url, timeout=timeout) as resp:
                resp.raise_for_status()
                body = await resp.json()
                return float(body["rate_per_vcpu_hour"])
        except (aiohttp.ClientError, asyncio.TimeoutError, KeyError, ValueError) as exc:
            wait = 2 ** attempt
            logger.warning("pricing attempt %d failed: %s; retry in %ss",
                           attempt + 1, exc, wait)
            await asyncio.sleep(wait)
    logger.error("pricing API exhausted; using cached rate %.5f", cached_rate)
    return cached_rate

The retry-and-fallback lifecycle looks like this:

Step 4 — Price each fingerprint and run a cycle

price_window is a pure function of a window and a rate, so it unit-tests cleanly against known counters. The orchestrator holds the previous snapshot per DSN, so the first cycle emits nothing (no baseline) and every cycle after emits per-window cost.

def price_window(window: dict, rate_per_vcpu_hour: float,
                 arch_coefficient: float = 1.0) -> dict:
    """query_cost = vcpu_seconds * arch_coeff * (rate_per_vcpu_hour / 3600)."""
    rate_per_second = rate_per_vcpu_hour / 3600.0
    cost = window["vcpu_seconds"] * arch_coefficient * rate_per_second
    return {
        "queryid": window["queryid"],
        "calls": window["calls"],
        "vcpu_seconds": round(window["vcpu_seconds"], 4),
        "cost_usd": round(cost, 6),
    }

async def run_cycle(dsn: str, pricing_url: str, prev_store: dict[str, dict],
                    cached_rate: float, arch_coefficient: float = 1.0,
                    parallel_ratio: float = 1.0) -> list[dict]:
    async with aiohttp.ClientSession() as session:
        rate = await fetch_vcpu_rate(session, pricing_url, cached_rate)
    curr = await asyncio.to_thread(snapshot, dsn)   # keep blocking I/O off the loop
    windows = to_vcpu_seconds(prev_store.get(dsn, {}), curr, parallel_ratio)
    prev_store[dsn] = curr
    return [price_window(w, rate, arch_coefficient) for w in windows]

if __name__ == "__main__":
    logging.basicConfig(level=logging.INFO)
    store: dict[str, dict] = {}
    dsn = "postgresql://cost_reader@db-host:5432/appdb"
    records = asyncio.run(run_cycle(
        dsn,
        "https://pricing.internal/v1/postgres/compute",
        store,
        cached_rate=0.048,       # $/vCPU-hour fallback
        arch_coefficient=0.92,   # Graviton2 baseline vs x86 rate table
        parallel_ratio=1.0,
    ))
    for r in sorted(records, key=lambda x: x["cost_usd"], reverse=True)[:5]:
        print(r)

Expected output on the second and later cycles (the first cycle prints nothing — it is only establishing the baseline):

{'queryid': 8113247908547112448, 'calls': 4120, 'vcpu_seconds': 51.7412, 'cost_usd': 0.000634}
{'queryid': -6620198406231792640, 'calls': 88, 'vcpu_seconds': 12.9037, 'cost_usd': 0.000158}
{'queryid': 2299104937874654208, 'calls': 15230, 'vcpu_seconds': 7.4411, 'cost_usd': 0.000091}

Verification

Confirm the numbers before wiring them into any chargeback or quota boundary policy.

Confirm the extension is actually collecting. A missing extension silently yields zero rows:
```
psql "$DSN" -c "SELECT count(*), max(total_exec_time) FROM pg_stat_statements;"
```
A non-zero count with a rising max confirms live capture. track_io_timing should also read on:
```
psql "$DSN" -tAc "SHOW track_io_timing;"   # expect: on
```

Assert the cost identity. Pricing is deterministic, so each record must reconcile exactly to its vCPU-seconds and the resolved rate:

RATE, COEFF = 0.048, 0.92
for r in records:
    expected = round(r["vcpu_seconds"] * COEFF * (RATE / 3600.0), 6)
    assert abs(r["cost_usd"] - expected) < 1e-6, (r["queryid"], expected)

Check the record shape. Every emitted record is a flat dict with queryid, calls, vcpu_seconds, and cost_usd — ready to load into a time-series store or a pandas group-by for per-owner rollup.

Gotchas & Edge Cases

total_exec_time is not pure CPU. It is backend wall-clock time and includes I/O wait, lock wait, and client-round-trip stalls. Netting out blk_read_time/blk_write_time (Step 2) removes disk wait, but lock and network stalls remain — treat the result as a tight upper bound on CPU, not an exact core-second count.
Column rename in PostgreSQL 17. blk_read_time and blk_write_time were split into shared_blk_read_time, local_blk_read_time, etc. Pin your extraction SQL to the server’s pg_stat_statements version and switch column names on major upgrades, or the diff silently reads zeros.
Stats resets produce negative deltas. pg_stat_statements_reset(), a restart, or a fingerprint aging out of the fixed-size hash table makes curr < prev. The max(..., 0.0) clamp drops the corrupted window rather than emitting a negative cost — but alert if resets are frequent enough to blind the collector.
Parallel workers double-count if you ignore them. A parallel sequential scan across four workers burns roughly parallel_ratio ≈ 5 core-seconds per wall-second. Leaving parallel_ratio = 1.0 under-attributes heavy analytical queries; source the real degree of parallelism per workload rather than assuming serial execution.
Cache warmth swings per-query cost. The same statement can cost 100× more on a cold cache (physical reads) than a warm one (buffer hits), and total_exec_time reflects that. Report cost as a windowed average over enough executions to smooth cache state instead of trusting a single window.
The ARM coefficient is not a guess. arch_coefficient corrects for a Graviton core delivering different work-per-vCPU-hour than the x86 core your rate table is quoted against. Calibrate it against actual invoices quarterly — a stale coefficient skews every downstream figure uniformly. Keeping ARM and x86 costs comparable is exactly the job of normalizing metrics into a unified compute model.

Frequently Asked Questions

Why can’t I just multiply the EXPLAIN cost by a dollar rate?

Because EXPLAIN cost is a dimensionless planner estimate, not a measurement. It is seeded by configuration constants like seq_page_cost and drifts with the buffer-cache hit ratio, so one cost-unit is not a fixed number of cents. Use pg_stat_statements.total_exec_time — a value the engine actually measured — and net out I/O wait to approximate CPU-seconds.

What’s the difference between total_exec_time and CPU time?

total_exec_time is the backend’s wall-clock execution time and includes time blocked on disk, locks, and the client. With track_io_timing on you can subtract blk_read_time and blk_write_time to remove disk wait; the remainder is a close upper bound on CPU time. There is no counter that isolates pure CPU perfectly, so model it as a bound, not an exact figure.

How do I account for parallel query workers?

Multiply the per-window CPU-seconds by 1 + number_of_parallel_workers. total_exec_time records elapsed backend time, so a query using four workers for one wall-second consumed roughly five vCPU-seconds. Ignoring this materially under-costs analytical workloads that fan out across gather nodes.

Does this work on Aurora Serverless v2 where the rate changes?

Yes, but the rate is time-varying — it tracks ACU scaling. Fetch the effective rate_per_vcpu_hour at collection time (Step 3) rather than caching one constant for the day, and fall back to the last known rate only when the pricing lookup fails.

How often should the collector sample?

Align the interval with how quickly pg_stat_statements churns and your billing granularity — one to five minutes is typical. Because the counters are cumulative and reset on restart, persist each snapshot before the next reset window so a restart does not erase an interval of attribution.

Using EXPLAIN ANALYZE for cost attribution in MySQL — the MySQL-side equivalent, turning actual runtime and rows_examined into per-statement cost.
Query Execution Cost Modeling — the parent topic covering per-query attribution across engines.
Multi-Cloud Cost Normalization — the canonical rate model this pipeline multiplies against.
Database Quota Boundary Design — turning the per-query cost this page emits into hard and soft enforcement tiers.

Back to: Query Execution Cost Modeling

Modeling CPU Time vs Query Cost in PostgreSQL #

Prerequisites #

Step-by-Step Implementation #

Step 1 — Snapshot cumulative CPU counters #

Step 2 — Diff two snapshots into vCPU-seconds #

Step 3 — Fetch the vCPU rate with a cached fallback #

Step 4 — Price each fingerprint and run a cycle #

Verification #

Gotchas & Edge Cases #

Frequently Asked Questions #

Why can’t I just multiply the EXPLAIN cost by a dollar rate? #

What’s the difference between total_exec_time and CPU time? #

How do I account for parallel query workers? #

Does this work on Aurora Serverless v2 where the rate changes? #

How often should the collector sample? #

Related #