Compute vs Storage Cost Breakdowns

Disaggregating a managed database bill into separate compute and storage cost dimensions so that each can be attributed, budgeted, and quota-enforced independently.

Back to: Cloud Database Cost Fundamentals & Architecture

Modern cloud database engines — Aurora, Azure SQL Hyperscale, Cloud SQL, and their serverless variants — deliberately decouple the compute tier from the storage tier so each can scale on its own axis. That architecture is efficient to run but hostile to accounting: the invoice arrives blended, and a single line item like USW2-Aurora:ServerlessUsage tells you nothing about whether the money went to CPU seconds or to provisioned volume. For Cloud DBA teams and FinOps engineers, the first job is to split that blended figure back into its physical drivers before any chargeback, budget, or enforcement policy can be trusted. This page covers the billing model, the extraction pipeline, the Python that runs it, and how the resulting signals feed quota boundary policies.

The breakdown below shows how a blended billing statement disaggregates into distinct compute and storage cost dimensions.

Billing Model & Attribution Challenges

Cloud providers meter compute and storage on fundamentally different clocks, and the mismatch is where attribution breaks down.

Compute is billed against allocated capacity over active time: vCPU-hours (provisioned) or Aurora Capacity Units and compute-seconds (serverless), plus the memory that rides along with the instance class. It is bursty, elastic, and frequently the largest single driver during peak workload windows.

Storage is billed against provisioned or consumed capacity in GB-months, layered with snapshot/backup retention (often priced separately from the primary volume) and an I/O component — provisioned IOPS on io1/io2, or per-request I/O charges on Aurora’s distributed storage layer. Storage cost grows monotonically and slowly; it rarely spikes but rarely shrinks either.

The core attribution problem is that these two families collapse into overlapping billing dimensions that no single provider field cleanly separates:

Serverless conflation. Aurora Serverless v2 bills compute as ServerlessUsage (ACU-hours) while the storage volume and I/O bill under distinct USAGE_TYPE values on the same resource. A naive sum over the resource double-reads the ACU line as “compute + storage.”
vCore licensing in Azure SQL. In the vCore purchasing model the compute charge bundles the SQL Server license, and the managed disk tier bills separately — the two must be parsed with distinct meter logic, as detailed in separating compute and storage costs in Azure SQL.
Burst credits distorting the storage baseline. gp3/gp2 burst balances and provisioned-IOPS overages make the I/O portion of storage swing independently of provisioned GB, which is why IOPS spend has to be tracked against the baseline volume separately rather than folded into one storage number.
Snapshot lifecycle drift. Backups retained past their window keep billing under a storage meter long after the source instance is gone, so a resource-scoped query misses them entirely.

Formally, the goal is to resolve one blended figure $C_{blended}$ into two attributable components:

$$C_{compute} = \sum_i (\text{vCPU_hours}i \cdot r) + (\text{ACU_hours}i \cdot r)$$

$$C_{storage} = \sum_i (\text{GB_month}i \cdot r) + (\text{IO_requests}i \cdot r) + C_{snapshot,i}$$

where each term is keyed by a resource and a cost-allocation tag. Without that explicit split, chargeback falls back to an arbitrary allocation key and every downstream budget inherits the error.

Telemetry Extraction & Metric Normalization

Attribution starts with deterministic extraction. No provider exposes “compute cost” and “storage cost” as first-class queryable fields, so the pipeline must pull raw usage records, classify each by its meter/usage-type, enrich with resource tags, and map into a canonical schema before aggregating. The classification vectors are:

Compute: vCPU-hours, memory-GiB-hours, serverless-compute-seconds / ACU-hours, connection-pool-utilization, query-execution-time
Storage: provisioned-GB, snapshot-retention-GB, read/write-IO-operations, throughput-GBps, backup-lifecycle-cost

On AWS the source is Cost Explorer (GroupBy on USAGE_TYPE) or the Cost and Usage Report; on Azure it is the Cost Management query.usage API grouped by MeterCategory/MeterSubCategory; on GCP it is billing export grouped by SKU. Each provider names the same physical thing differently, so the classifier maps them into one dimension set — the same discipline described in normalizing provider billing exports into a unified schema. The regex/lookup table that decides “is this usage type compute or storage” is the single most load-bearing piece of the pipeline and must be version-controlled and tested.

Normalization also has to reconcile cadence and units: Cost Explorer is daily, some meters are hourly, GB-months must be prorated to daily snapshots, and I/O is a raw count that only becomes cost after applying the SKU rate. When correlating execution behaviour with the compute figure, query execution cost modeling supplies the mapping from plans, wait stats, and lock contention to measurable spend, so an expensive USAGE_TYPE can be traced to the workload that caused it. Ingestion must be idempotent — Cost Explorer restates the trailing several days as charges finalize — so records are upserted on (date, resource_id, dimension) rather than appended. Payloads should be validated on the way in with strict typing on billing records, rejecting rows with missing allocation tags before they poison an aggregate.

Python Automation Patterns

The extractor is a thin wrapper over the provider SDK plus a deterministic classifier. Below is a synchronous boto3 implementation that pulls a day of Cost Explorer data, splits each USAGE_TYPE into a compute or storage bucket, and returns a normalized record.

import re
import boto3

# Version-controlled classifier: substring -> canonical dimension.
_COMPUTE = re.compile(r"(vCPU|ServerlessUsage|ACU|InstanceUsage|Compute)", re.I)
_STORAGE = re.compile(r"(Storage|VolumeUsage|IOPS|IO-Requests|Snapshot|Backup)", re.I)


def classify_usage_type(usage_type: str) -> str:
    """Map a raw AWS USAGE_TYPE onto a canonical cost dimension."""
    if _STORAGE.search(usage_type):
        return "storage"
    if _COMPUTE.search(usage_type):
        return "compute"
    return "unattributed"


def fetch_breakdown(start: str, end: str, region: str = "us-east-1") -> dict:
    """Return {'compute': float, 'storage': float, 'unattributed': float} for a date range."""
    ce = boto3.client("ce", region_name=region)
    totals = {"compute": 0.0, "storage": 0.0, "unattributed": 0.0}
    next_token = None

    while True:
        params = {
            "TimePeriod": {"Start": start, "End": end},   # YYYY-MM-DD, end exclusive
            "Granularity": "DAILY",
            "Metrics": ["UnblendedCost"],
            "Filter": {"Dimensions": {"Key": "SERVICE",
                                       "Values": ["Amazon Relational Database Service"]}},
            "GroupBy": [{"Type": "DIMENSION", "Key": "USAGE_TYPE"}],
        }
        if next_token:
            params["NextPageToken"] = next_token

        resp = ce.get_cost_and_usage(**params)
        for day in resp["ResultsByTime"]:
            for group in day["Groups"]:
                usage_type = group["Keys"][0]
                amount = float(group["Metrics"]["UnblendedCost"]["Amount"])
                totals[classify_usage_type(usage_type)] += amount

        next_token = resp.get("NextPageToken")
        if not next_token:
            break

    return totals

Cost Explorer is rate-limited and occasionally throttles, so any production caller wraps the SDK call in a retry decorator with exponential backoff and jitter. Keeping the retry policy in one decorator — the same pattern used across the pipeline for retry logic on failed metric pulls — keeps it testable and consistent:

import random
import time
import functools
from botocore.exceptions import ClientError

def retry_throttled(max_attempts: int = 5, base: float = 0.5):
    """Retry only on throttling errors with full-jitter exponential backoff."""
    def decorator(fn):
        @functools.wraps(fn)
        def wrapper(*args, **kwargs):
            for attempt in range(max_attempts):
                try:
                    return fn(*args, **kwargs)
                except ClientError as exc:
                    code = exc.response["Error"]["Code"]
                    if code not in ("ThrottlingException", "LimitExceededException"):
                        raise
                    if attempt == max_attempts - 1:
                        raise
                    sleep = random.uniform(0, base * (2 ** attempt))
                    time.sleep(sleep)
            return None
        return wrapper
    return decorator

When a fleet spans many accounts or regions, the synchronous loop becomes the bottleneck. The async variant fans the per-account calls out under bounded concurrency — the async semaphore-controlled workflow pattern — so a slow account never stalls the batch and the provider rate limit is respected globally:

import asyncio
import aioboto3

async def fetch_account(session, account_id: str, role_arn: str,
                        start: str, end: str, sem: asyncio.Semaphore) -> dict:
    async with sem:  # cap concurrent Cost Explorer calls across the whole fan-out
        async with session.client("ce", region_name="us-east-1") as ce:
            resp = await ce.get_cost_and_usage(
                TimePeriod={"Start": start, "End": end},
                Granularity="DAILY",
                Metrics=["UnblendedCost"],
                GroupBy=[{"Type": "DIMENSION", "Key": "USAGE_TYPE"}],
            )
            totals = {"compute": 0.0, "storage": 0.0, "unattributed": 0.0}
            for day in resp["ResultsByTime"]:
                for group in day["Groups"]:
                    amount = float(group["Metrics"]["UnblendedCost"]["Amount"])
                    totals[classify_usage_type(group["Keys"][0])] += amount
            return {"account_id": account_id, **totals}


async def fetch_all(accounts: list[dict], start: str, end: str) -> list[dict]:
    session = aioboto3.Session()
    sem = asyncio.Semaphore(5)  # honour the Cost Explorer request rate
    tasks = [fetch_account(session, a["id"], a["role_arn"], start, end, sem)
             for a in accounts]
    return await asyncio.gather(*tasks)

On Azure the same classifier feeds a different SDK — azure-mgmt-costmanagement — where the split is driven by MeterCategory rather than USAGE_TYPE:

from azure.identity import DefaultAzureCredential
from azure.mgmt.costmanagement import CostManagementClient
from azure.mgmt.costmanagement.models import (
    QueryDefinition, QueryDataset, QueryAggregation, QueryGrouping, TimeframeType,
)

def azure_breakdown(subscription_id: str) -> dict:
    client = CostManagementClient(DefaultAzureCredential())
    scope = f"/subscriptions/{subscription_id}"
    query = QueryDefinition(
        type="ActualCost",
        timeframe=TimeframeType.MONTH_TO_DATE,
        dataset=QueryDataset(
            granularity="Daily",
            aggregation={"total": QueryAggregation(name="Cost", function="Sum")},
            grouping=[QueryGrouping(type="Dimension", name="MeterSubCategory")],
        ),
    )
    result = client.query.usage(scope, query)
    totals = {"compute": 0.0, "storage": 0.0, "unattributed": 0.0}
    for row in result.rows:
        cost, sub_category = float(row[0]), str(row[2])
        totals[classify_usage_type(sub_category)] += cost
    return totals

Quota Enforcement Integration

A clean compute/storage split is only valuable if it drives action. Because the two dimensions behave differently, they map to different enforcement primitives, and feeding them separately into database quota boundary design is what makes budgets actionable rather than merely observed.

Compute cost is elastic and reversible, so it belongs on a fast control loop: a soft boundary at, say, 80% of the monthly compute budget triggers alerting and Aurora min/max-ACU tightening, while a hard boundary throttles new connections or caps ServerlessV2ScalingConfiguration before spend runs away within a billing hour. Storage cost is near-irreversible on the same timescale, so its boundaries target growth velocity — a soft trigger fires when the GB-month run-rate projects past budget by month-end, prompting snapshot pruning or tier review, long before a hard cap is relevant.

Concretely, each normalized record is compared against a per-tenant budget and emits a decision the enforcement layer consumes:

def evaluate_boundaries(record: dict, budget: dict) -> list[dict]:
    """Emit boundary decisions from a normalized compute/storage record."""
    decisions = []
    for dimension in ("compute", "storage"):
        spend, cap = record[dimension], budget[dimension]
        ratio = spend / cap if cap else 0.0
        if ratio >= 1.0:
            tier = "hard"
        elif ratio >= budget.get("soft_threshold", 0.8):
            tier = "soft"
        else:
            continue
        decisions.append({
            "tenant": record["tenant"],
            "dimension": dimension,
            "tier": tier,
            "ratio": round(ratio, 3),
            "action": "throttle" if tier == "hard" else "alert",
        })
    return decisions

Those decisions become alert-threshold events and policy-as-code inputs (Open Policy Agent, or cloud-native IAM conditions on scale-up actions). Because the records carry sensitive workload and tenant metadata, the automation service accounts that read cost data and write throttle actions must run under the least-privilege model described in access control for cost data — a compromised cost-reader should never be able to change a quota.

Failure Modes & Troubleshooting

The pipeline fails in a handful of characteristic ways. Recognizing the signature is most of the fix.

ThrottlingException / LimitExceededException from Cost Explorer. The account exceeded the request rate (Cost Explorer allows only a low steady-state QPS, and each call is billable). Resolution: the retry_throttled decorator above, plus the semaphore ceiling in the async fan-out; never remove the concurrency cap to “go faster.”
unattributed bucket growing over time. A new USAGE_TYPE or MeterSubCategory shipped by the provider isn’t matched by the classifier — classic schema drift. Resolution: alert when unattributed / total crosses ~2%, dump the offending keys, and extend the regex table (with a test) rather than silently folding them into compute or storage.
Compute and storage totals that don’t reconcile to the invoice. Usually credits, RIs, or Savings Plans applied at the account level, or snapshots billing on a now-deleted resource. Resolution: query UnblendedCost (not BlendedCost) for attribution, reconcile against the Cost and Usage Report monthly, and query snapshots by tag rather than by live-resource scope.
Missing cost-allocation tags. Untagged resources land in unattributed and can’t be charged back. Cost-allocation tags also backfill only from activation date and can lag up to 24h. Resolution: enforce tags at provision time and treat untagged spend as a policy violation, not a rounding error.
Duplicated rows on re-run. Cost Explorer restates the trailing days, so an append-only writer double-counts. Resolution: upsert on (date, resource_id, dimension); make the collector idempotent. When the billing API is unavailable entirely, the extractor should degrade to the last good cached breakdown rather than emit zeros — the fallback routing pattern for cost APIs covers that graceful-degradation path.

How to separate compute and storage costs in Azure SQL — vCore meter parsing that splits the bundled license from the managed disk tier.
Tracking IOPS vs baseline storage spend in RDS — isolating the provisioned-IOPS component from the baseline volume.
Database Quota Boundary Design — turning the compute/storage split into hard and soft enforcement tiers.
Multi-Cloud Cost Normalization — the canonical schema that unifies AWS, Azure, and GCP usage dimensions.
Query Execution Cost Modeling — attributing the compute figure back to the workloads that drive it.

Back to: Cloud Database Cost Fundamentals & Architecture

Compute vs Storage Cost Breakdowns #

Billing Model & Attribution Challenges #

Telemetry Extraction & Metric Normalization #

Python Automation Patterns #

Quota Enforcement Integration #

Failure Modes & Troubleshooting #

Related #

Explore this section