Query Execution Cost Modeling

Attributing the compute, I/O, and memory generated by individual SQL statements back to a billable, per-query dollar figure — so a single expensive query can be charged back, budgeted, and quota-enforced like any other line item.

Back to: Cloud Database Cost Fundamentals & Architecture

Instance-level billing amortizes an entire database bill across every workload that touched the engine, which is exactly the wrong granularity when one tenant’s report query is quietly burning 40% of the vCPU. Query execution cost modeling closes that gap: it reads the per-statement telemetry the engine already collects, weights each resource dimension against real cloud rates, and emits a deterministic cost per query fingerprint. This is the compute counterpart to the compute versus storage split — once you know how much of the bill is compute, per-query attribution tells you which queries spent it. This page covers the billing model behind per-query attribution, how the telemetry is extracted and normalized, the Python that runs the collector, and how the resulting cost signals feed quota enforcement.

The flow below traces a SQL statement from execution through metric extraction, normalization, and attribution into the policy engine.

Billing Model & Attribution Challenges

No relational engine exposes money. Each one exposes a proxy — a dimensionless planner estimate, a set of counters, or a wall-clock timer — and the modeling job is to map those proxies onto the dimensions a cloud provider actually charges for: vCPU-seconds, provisioned or consumed IOPS, and the memory that rides with the instance class. The unit cost of a single execution is a weighted sum over those dimensions:

$$C_{exec} = t_{cpu} \cdot r_{cpu} + b_{io} \cdot r_{io} + m_{peak} \cdot r_{mem}$$

where $t_{cpu}$ is CPU-seconds attributed to the statement, $b_{io}$ is physical blocks read or written, $m_{peak}$ is peak working memory, and each $r$ is the corresponding provider rate. The whole discipline is really the problem of measuring the left-hand terms honestly. Several things make that hard:

Planner cost is not time and not money. PostgreSQL’s EXPLAIN emits an abstract cost in arbitrary units seeded by seq_page_cost and cpu_tuple_cost. It correlates with runtime only loosely and drifts with the buffer cache hit ratio. Treating one cost-unit as a fixed number of cents is the single most common modeling error — the engine-specific correction is covered in modeling CPU time vs query cost in PostgreSQL.
Shared work is not free but is shared. A buffer already resident in cache costs almost nothing to re-read (shared_blks_hit), while a cold read hits the storage tier and bills as I/O (shared_blks_read). Two runs of the identical statement can differ 100x in real cost purely on cache state, so the model must weight hits and reads separately rather than counting “blocks.”
Concurrency steals attribution. CPU-seconds recorded per statement are wall-influenced under contention: a query that waits on locks or I/O logs elapsed time that is not its own compute. Attribution has to lean on the engine’s own accounting (total_exec_time, rows, block counters) rather than externally observed duration.
Fingerprint collapse. Chargeback needs a stable identity, but literal-bearing SQL (WHERE id = 4471) produces a new statement every execution. Engines normalize to a queryid/digest; the model must key cost on that fingerprint, not on the raw text, or the attribution table explodes and every query looks like a one-off.
Serverless and burst pricing. On Aurora Serverless v2 or Azure SQL serverless, the effective $r_{cpu}$ is not constant — it tracks ACU/vCore scaling. A query that runs during a scale-up window is genuinely more expensive per CPU-second than the same query at floor capacity, which means the rate table itself is time-varying.

Telemetry Extraction & Metric Normalization

Extraction means pulling the engine’s cumulative counters, diffing them across an interval to get per-window deltas, and joining each fingerprint to a rate table. PostgreSQL exposes everything needed through the pg_stat_statements extension (official documentation); MySQL exposes an equivalent through performance_schema.events_statements_summary_by_digest. Reading those system views efficiently is a discipline of its own — the broader patterns for polling and diffing them live under system view querying patterns.

The counters are cumulative since the last stats reset, so a raw read is meaningless on its own; you snapshot, wait, snapshot again, and subtract. The following collector pulls the raw per-fingerprint row from PostgreSQL:

import psycopg
from psycopg.rows import dict_row

EXTRACT_SQL = """
    SELECT queryid,
           calls,
           total_exec_time,          -- milliseconds, cumulative
           shared_blks_hit,          -- cache hits (cheap)
           shared_blks_read,         -- physical reads (billable I/O)
           shared_blks_written,
           temp_blks_read,
           temp_blks_written,
           rows
    FROM pg_stat_statements
    WHERE dbid = (SELECT oid FROM pg_database WHERE datname = current_database())
    ORDER BY total_exec_time DESC
    LIMIT %(limit)s;
"""

def snapshot(dsn: str, limit: int = 500) -> dict[int, dict]:
    """Return the current cumulative counters keyed by queryid."""
    with psycopg.connect(dsn, row_factory=dict_row) as conn:
        with conn.cursor() as cur:
            cur.execute(EXTRACT_SQL, {"limit": limit})
            return {row["queryid"]: row for row in cur.fetchall()}

Normalization then diffs two snapshots and converts the deltas into the physical dimensions of the cost function. The block size is fixed by the server’s block_size (8 KiB by default), so block counts convert cleanly to bytes and then to the provider’s I/O unit:

BLOCK_BYTES = 8192  # PostgreSQL default block_size

def normalize(prev: dict, curr: dict) -> list[dict]:
    """Diff two cumulative snapshots into per-window physical metrics."""
    windows = []
    for qid, now in curr.items():
        before = prev.get(qid)
        if before is None:
            continue  # first sight of this fingerprint; no baseline yet
        calls = now["calls"] - before["calls"]
        if calls <= 0:
            continue  # no executions in this window (or stats reset)
        cpu_seconds = (now["total_exec_time"] - before["total_exec_time"]) / 1000.0
        phys_reads = now["shared_blks_read"] - before["shared_blks_read"]
        temp_io = ((now["temp_blks_read"] - before["temp_blks_read"])
                   + (now["temp_blks_written"] - before["temp_blks_written"]))
        windows.append({
            "queryid": qid,
            "calls": calls,
            "cpu_seconds": max(cpu_seconds, 0.0),
            "io_bytes": max(phys_reads, 0) * BLOCK_BYTES,
            "temp_io_bytes": max(temp_io, 0) * BLOCK_BYTES,
            "rows": now["rows"] - before["rows"],
        })
    return windows

Two normalization hazards dominate. First, stats resets: pg_stat_statements_reset(), a server restart, or a fingerprint aging out of the fixed-size hash table makes curr < prev, producing negative deltas. Clamping at zero (as above) drops the corrupted window rather than emitting a negative cost. Second, rate alignment: the $r$ coefficients must come from the same normalized rate model that the rest of the platform uses, not a hard-coded constant — providers price identical vCPU differently by region and instance family. Sourcing those rates through the canonical schema in multi-cloud cost normalization is what keeps a PostgreSQL-on-RDS cost comparable to a MySQL-on-Cloud-SQL cost.

Python Automation Patterns

A production collector runs on a schedule, holds the previous snapshot, applies the rate model, and hands attributed records downstream. The pricing step is a pure function so it can be unit-tested against known counters and cached rates:

from dataclasses import dataclass

@dataclass(frozen=True)
class RateModel:
    cpu_per_second: float     # $ per vCPU-second
    io_per_gb: float          # $ per GB of physical read
    mem_per_gb_second: float  # $ per GB-second of working memory

def price_window(window: dict, rates: RateModel, mem_gb: float = 0.0) -> dict:
    """Apply the cost function C = t_cpu*r_cpu + io*r_io + mem*r_mem."""
    cpu_cost = window["cpu_seconds"] * rates.cpu_per_second
    io_gb = (window["io_bytes"] + window["temp_io_bytes"]) / 1_073_741_824
    io_cost = io_gb * rates.io_per_gb
    mem_cost = mem_gb * window["cpu_seconds"] * rates.mem_per_gb_second
    total = round(cpu_cost + io_cost + mem_cost, 6)
    return {
        "queryid": window["queryid"],
        "calls": window["calls"],
        "cost_usd": total,
        "cost_per_call": round(total / window["calls"], 8),
    }

Because the collector polls many database instances, the fan-out should be concurrent but bounded — an unbounded gather across hundreds of endpoints will exhaust connection limits and trip the provider’s rate ceilings. The bounded, semaphore-controlled pattern is documented in full under async usage parsing workflows; the shape for this collector is:

import asyncio

async def collect_instance(dsn: str, prev_store: dict, rates: RateModel) -> list[dict]:
    curr = await asyncio.to_thread(snapshot, dsn)
    prev = prev_store.get(dsn, {})
    prev_store[dsn] = curr
    return [price_window(w, rates) for w in normalize(prev, curr)]

async def collect_fleet(dsns: list[str], rates: RateModel, prev_store: dict,
                        max_concurrency: int = 8) -> list[dict]:
    sem = asyncio.Semaphore(max_concurrency)  # cap simultaneous DB connections

    async def guarded(dsn: str) -> list[dict]:
        async with sem:
            return await collect_instance(dsn, prev_store, rates)

    results = await asyncio.gather(*(guarded(d) for d in dsns),
                                   return_exceptions=True)
    priced = []
    for r in results:
        if isinstance(r, Exception):
            continue  # instance-level failure is isolated, not fatal to the fleet
        priced.extend(r)
    return priced

When the pricing step depends on a live rate lookup rather than a static RateModel, wrap that call so a transient billing-API outage degrades gracefully to the last cached rate instead of stalling the whole loop — the fallback routing pattern for cost APIs covers that circuit-breaker path. Aggregating the priced windows into a per-fingerprint chargeback table is a straightforward pandas group-by:

import pandas as pd

def attribute(priced: list[dict], fingerprint_owner: dict[int, str]) -> pd.DataFrame:
    df = pd.DataFrame(priced)
    df["owner"] = df["queryid"].map(fingerprint_owner).fillna("unattributed")
    return (df.groupby(["owner", "queryid"], as_index=False)
              .agg(calls=("calls", "sum"), cost_usd=("cost_usd", "sum"))
              .sort_values("cost_usd", ascending=False))

The owner join is the attribution seam. Fingerprints map to owners through application tags, pg_stat_statements userid, or an application_name convention; anything unmatched lands in unattributed, which should be monitored, not silently absorbed.

Quota Enforcement Integration

An attributed cost table is only useful if a decision hangs off it. Per-query cost feeds two enforcement surfaces. The first is a run-rate boundary: sum the priced windows per owner over a rolling interval, project forward, and compare against the tenant’s budget. The projection and the soft/hard tiering are exactly the boundary decision described in database quota boundary design — this model supplies the per-query granularity that lets the boundary point at a specific statement rather than a vague “your compute is high.”

def enforce(attributed: pd.DataFrame, budgets: dict[str, float],
            soft: float = 0.8) -> list[dict]:
    """Map each owner's accrued query cost to an enforcement action."""
    actions = []
    per_owner = attributed.groupby("owner")["cost_usd"].sum()
    for owner, spent in per_owner.items():
        cap = budgets.get(owner)
        if cap is None:
            continue
        ratio = spent / cap
        if ratio >= 1.0:
            tier, action = "hard", "throttle_or_deny"
        elif ratio >= soft:
            tier, action = "soft", "alert"
        else:
            continue
        top = (attributed[attributed["owner"] == owner]
               .nlargest(1, "cost_usd")["queryid"].iloc[0])
        actions.append({
            "owner": owner, "tier": tier, "action": action,
            "ratio": round(ratio, 3), "worst_query": int(top),
        })
    return actions

The second surface is per-query anomaly gating: a single fingerprint whose cost_per_call regresses sharply between deploys is flagged before it accrues budget, catching a dropped index or a plan flip early. Both surfaces carry sensitive workload and tenant metadata, so the service accounts that read query stats and write throttle actions must run under the least-privilege model in access control for cost data — a compromised cost-reader must never be able to move a quota. Sibling attribution work for the specific engines is detailed in modeling CPU time vs query cost in PostgreSQL and using EXPLAIN ANALYZE for cost attribution in MySQL.

Failure Modes & Troubleshooting

The collector fails in a handful of recognizable ways; the signature is most of the fix.

pg_stat_statements returns no rows or truncated text. The extension isn’t loaded via shared_preload_libraries, or pg_stat_statements.max is too small and high-churn fingerprints are being evicted. Resolution: confirm CREATE EXTENSION pg_stat_statements, raise max, and treat a rising eviction count (pg_stat_statements_info.dealloc) as a coverage gap that under-attributes cost.
Negative or absurd deltas. A stats reset, restart, or fingerprint eviction made curr < prev. Resolution: clamp deltas at zero and drop the window (as in normalize); alert if resets happen often enough to blind the collector.
Everything lands in unattributed. The fingerprint-to-owner join is missing — no application_name convention, or the mapping table is stale after a deploy that changed query text. Resolution: alert when unattributed / total crosses ~5%, dump the offending queryids, and backfill the owner map rather than folding them into a catch-all tenant.
ThrottlingException / rate-limit errors from the pricing API. The rate lookup exceeded the provider’s Cost/Pricing API QPS during a wide fleet sweep. Resolution: keep the semaphore ceiling, cache rate snapshots per region so pricing is a local lookup, and fall back to cached rates on error instead of stalling.
Costs that don’t reconcile to the invoice. Per-query modeling estimates marginal compute cost; it will not match the invoice’s fixed provisioned-capacity and storage floors. Resolution: reconcile query cost against the compute portion of the compute versus storage breakdown, not against the whole bill, and treat the two as complementary views rather than the same number.
Cache-warmth swings distort per-query cost. Identical statements price wildly differently because one run hit cache and the next hit storage. Resolution: report cost as a windowed average across enough executions to smooth cache state, and weight shared_blks_hit and shared_blks_read separately so a cache miss is visibly, not silently, more expensive.

Modeling CPU time vs query cost in PostgreSQL — correcting the dimensionless planner cost against real vCPU-seconds.
Using EXPLAIN ANALYZE for cost attribution in MySQL — turning rows_examined and actual runtime into per-statement cost.
Compute vs Storage Cost Breakdowns — the compute figure this model attributes back to individual queries.
Database Quota Boundary Design — turning per-query cost into hard and soft enforcement tiers.
Multi-Cloud Cost Normalization — the canonical rate model that makes cross-engine query costs comparable.

Back to: Cloud Database Cost Fundamentals & Architecture

Query Execution Cost Modeling #

Billing Model & Attribution Challenges #

Telemetry Extraction & Metric Normalization #

Python Automation Patterns #

Quota Enforcement Integration #

Failure Modes & Troubleshooting #

Related #

Explore this section