Handling Rate Limits When Pulling Database Metrics

This page walks through the exact Python you need to pull cloud database metrics at scale without tripping provider throttling, so a 429 at 2,000 databases becomes a scheduled backoff instead of a hole in your chargeback ledger.

Back to: Async Usage Parsing Workflows

Cloud database telemetry is the foundational input for every downstream stage in your Metric Extraction & Aggregation Pipelines. When a provider throttles a metric endpoint, the consequence is not merely dropped telemetry — it is billing drift, misaligned chargeback models, and silent quota exhaustion. The fix is to treat rate limits as deterministic state transitions rather than transient network errors: parse the server’s own backoff signal, gate concurrency so you never overshoot the quota, and make every retry idempotent so an aggressive re-pull cannot double-count a metric window. This page implements that pattern on the same async semaphore-controlled concurrency the parsing tier is built on.

Prerequisites

Before running the fetcher, confirm the following are in place.

Cloud permissions: the execution role needs read-only access to the metric endpoints you pull. For an AWS CloudWatch/Cost Explorer footprint, scope to least privilege:
```
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "ReadOnlyMetricPull",
      "Effect": "Allow",
      "Action": [
        "cloudwatch:GetMetricData",
        "ce:GetCostAndUsage"
      ],
      "Resource": "*"
    }
  ]
}
```
cloudwatch:GetMetricData and ce:GetCostAndUsage do not support resource-level ARNs, so Resource stays "*"; constrain the account and region with an IAM condition or an SCP instead. Azure Monitor pulls need the built-in Monitoring Reader role on the target subscription.
Python: 3.10 or newer (the code uses modern asyncio and structural typing).
Libraries: install the async HTTP client and the retry helper.
```
pip install "httpx>=0.27" "tenacity>=8.2"
```

Step-by-Step Implementation

The fetcher parses provider backoff headers into a single delay, protects the quota proactively with a local token bucket, gates in-flight requests with a semaphore, and layers jitter-aware exponential backoff on top for the throttling the bucket does not catch. Build it in four steps.

Step 1 — Normalize provider backoff headers into one delay

Every provider signals recovery differently, so the first job is a pure function that maps a throttled response to a single float delay. Blind retry logic — sleeping a hardcoded interval and hammering the endpoint again — exhausts the quota exponentially. Compliant servers return an explicit recovery window per the HTTP 429 status specification; parse it before scheduling the next attempt.

from datetime import datetime, timezone
from email.utils import parsedate_to_datetime
from typing import Optional
import httpx

def extract_backoff(response: httpx.Response) -> Optional[float]:
    """Map a 429 response to a delay in seconds, or None if not throttled."""
    if response.status_code != 429:
        return None
    # RFC 7231 Retry-After: delta-seconds or an HTTP-date.
    retry_after = response.headers.get("Retry-After")
    if retry_after:
        try:
            return max(0.0, float(retry_after))
        except ValueError:
            when = parsedate_to_datetime(retry_after)
            return max(0.0, (when - datetime.now(timezone.utc)).total_seconds())
    # Azure Monitor / ARM signals milliseconds.
    ms_delay = response.headers.get("x-ms-retry-after-ms")
    if ms_delay:
        return float(ms_delay) / 1000.0
    # Fall back to the reset epoch some gateways expose.
    reset = response.headers.get("X-RateLimit-Reset")
    if reset:
        return max(0.0, float(reset) - datetime.now(timezone.utc).timestamp())
    return None

Never hardcode a sleep interval — always derive it from a server signal, capped downstream by a maximum so a misbehaving header cannot stall the run indefinitely.

Step 2 — Add a preemptive token bucket

Header-driven backoff is reactive: you only learn you are over the limit after the request is rejected. A local token bucket that mirrors the provider’s published rate blocks the coroutine before it dispatches a doomed request, eliminating wasted round-trips and stabilizing a distributed worker pool.

import asyncio
import time

class AsyncTokenBucket:
    """Refills at `rate` tokens/sec up to `capacity`; awaits when empty."""

    def __init__(self, rate: float, capacity: float):
        self.rate = rate
        self.capacity = capacity
        self.tokens = capacity
        self.updated = time.monotonic()
        self._lock = asyncio.Lock()

    async def acquire(self, tokens: float = 1.0) -> None:
        async with self._lock:
            while True:
                now = time.monotonic()
                self.tokens = min(
                    self.capacity, self.tokens + (now - self.updated) * self.rate
                )
                self.updated = now
                if self.tokens >= tokens:
                    self.tokens -= tokens
                    return
                deficit = tokens - self.tokens
                await asyncio.sleep(deficit / self.rate)

Set rate to the provider’s documented steady-state limit (for example, Cost Explorer’s per-second ceiling) and capacity to the burst it tolerates.

Step 3 — Build the semaphore-gated async fetcher

Unbounded concurrency is the primary cause of throttling in metric extraction. A semaphore caps in-flight requests to a safe threshold; tenacity supplies jitter-aware exponential backoff for the throttling that slips past the bucket. The semaphore follows Python’s asyncio synchronization primitives.

import logging
from typing import Any, Dict
from tenacity import (
    retry,
    stop_after_attempt,
    wait_exponential_jitter,
    retry_if_exception_type,
    before_sleep_log,
)

logger = logging.getLogger("db_metrics.rate_limited_fetcher")

class RateLimitError(Exception):
    """Provider signalled hard throttling; safe to retry after backoff."""

class MetricFetchError(Exception):
    """Non-recoverable API failure; do not retry."""

class AsyncMetricFetcher:
    def __init__(
        self,
        base_url: str,
        api_key: str,
        rate: float = 8.0,
        capacity: float = 16.0,
        max_concurrency: int = 8,
        timeout: float = 15.0,
    ):
        self.client = httpx.AsyncClient(
            base_url=base_url,
            headers={"Authorization": f"Bearer {api_key}"},
            timeout=timeout,
        )
        self.bucket = AsyncTokenBucket(rate=rate, capacity=capacity)
        self.semaphore = asyncio.Semaphore(max_concurrency)

    async def _fetch_once(self, endpoint: str, params: Dict[str, Any]) -> Dict[str, Any]:
        await self.bucket.acquire()          # preemptive: block before dispatch
        async with self.semaphore:           # cap concurrent in-flight requests
            response = await self.client.get(endpoint, params=params)
            backoff = extract_backoff(response)
            if backoff is not None:
                # Honour the server's window before tenacity's own wait fires.
                await asyncio.sleep(min(backoff, 60.0))
                raise RateLimitError(f"throttled on {endpoint}; waited {backoff:.1f}s")
            if response.status_code >= 400:
                raise MetricFetchError(f"HTTP {response.status_code} on {endpoint}")
            return response.json()

    @retry(
        stop=stop_after_attempt(5),
        wait=wait_exponential_jitter(initial=1, max=60, jitter=2.0),
        retry=retry_if_exception_type(RateLimitError),
        before_sleep=before_sleep_log(logger, logging.WARNING),
        reraise=True,
    )
    async def fetch_metric(self, endpoint: str, params: Dict[str, Any]) -> Dict[str, Any]:
        """Fetch one metric window, retrying only on provider throttling."""
        return await self._fetch_once(endpoint, params)

    async def aclose(self) -> None:
        await self.client.aclose()

Classifying RateLimitError separately from MetricFetchError is what keeps the retry loop honest: a malformed request or an expired credential fails fast instead of burning five attempts, while genuine throttling backs off. This is the same discipline covered in depth under retry logic for failed metric pulls.

Step 4 — Orchestrate concurrent, idempotent pulls

Fan the fetcher across every database with asyncio.gather, then deduplicate on the window key so a retried pull cannot double-count. Aggressive retries that append rather than upsert are the classic way to inflate projected spend.

async def pull_all(fetcher: AsyncMetricFetcher, db_ids: list[str]) -> dict[str, dict]:
    async def one(db_id: str) -> tuple[str, Optional[dict]]:
        try:
            data = await fetcher.fetch_metric(
                "/metrics", {"resource_id": db_id, "period": "PT1H"}
            )
            return db_id, data
        except (RateLimitError, MetricFetchError) as exc:
            logger.error("giving up on %s: %s", db_id, exc)
            return db_id, None

    results = await asyncio.gather(*(one(d) for d in db_ids))
    # Idempotent upsert keyed by (resource, window) — never append duplicates.
    return {db_id: payload for db_id, payload in results if payload is not None}

Expected output is one record per database that responded, keyed for an idempotent merge into the parsing tier:

{
  "db-prod-01": {"resource_id": "db-prod-01", "window": "2026-07-05T14:00Z", "cpu_credits": 42.0},
  "db-prod-02": {"resource_id": "db-prod-02", "window": "2026-07-05T14:00Z", "cpu_credits": 17.5}
}

The sequence below traces a single metric pull through the token bucket, the semaphore gate, and the jitter-aware backoff path when the provider returns a 429.

Verification

Confirm the fetcher respects the limit before you schedule it against production.

Assert no dropped windows on the happy path. Every requested database that responded must appear exactly once in the result map.

fetcher = AsyncMetricFetcher(base_url="https://metrics.example", api_key="…")
records = asyncio.run(pull_all(fetcher, ["db-prod-01", "db-prod-02"]))
assert len(records) == len(set(records))          # no duplicate keys
assert all(r["window"] for r in records.values()) # every record carries its window

Force a 429 and confirm backoff, not failure. Point the client at a mock endpoint that returns 429 with Retry-After: 2 on the first two calls, then 200. The run should succeed on the third attempt with two WARNING lines logged by before_sleep_log.
Watch the effective request rate. Log a timestamp on each dispatch and confirm the observed rate never exceeds the bucket’s rate over any one-second window — the proof the preemptive limiter is doing its job.

Gotchas & Edge Cases

Retry-After can be an HTTP-date, not a number. RFC 7231 allows either form. Parsing it as a float alone throws ValueError on a date string and collapses your backoff to zero — Step 1 handles both branches deliberately.
Azure and AWS disagree on units. Azure Monitor and ARM return x-ms-retry-after-ms in milliseconds; a naive Retry-After reader treats 2000 as a 2,000-second wait and stalls the run for over half an hour. Normalize units at the header boundary.
Cap the honoured backoff. A misconfigured gateway can emit an absurd Retry-After. Always clamp the effective wait (here, min(backoff, 60.0)) so one bad header cannot freeze a worker pool.
The token bucket must be shared, not per-task. Instantiate one bucket per provider and pass it to every coroutine. A bucket created inside each task enforces nothing, because each task sees a full bucket.
Idempotency is non-negotiable under retries. Keying results on (resource_id, window) and upserting is what prevents a retried pull from double-counting a metric window and inflating the aggregate that feeds hard and soft quota boundaries.
A sustained throttle is a signal to shed load. If the bucket empties for an entire reset window, do not just wait — log the X-RateLimit-Reset timestamp, pause the affected provider, and let the pipeline degrade gracefully rather than queue unbounded work, exactly as covered under graceful degradation when billing APIs are down.

Frequently Asked Questions

Do I still need a token bucket if I already retry on 429?

Yes. Retrying on 429 is reactive — you pay a full round-trip and a rejection before backing off, and at high fan-out those rejections themselves count against the limit. The token bucket is proactive: it holds a coroutine back before it ever dispatches a request that would be throttled, which keeps the effective rate below the ceiling and cuts wasted calls dramatically.

Should I set the semaphore or the token bucket to the provider’s limit?

They govern different things. The token bucket caps the request rate (calls per second), which is what most providers actually meter. The semaphore caps concurrent in-flight requests, which protects against connection exhaustion and memory blowup. Set the bucket to the documented rate limit and the semaphore to a smaller number that keeps memory and open sockets bounded — typically single digits.

How do I tune exponential backoff so retries do not stampede?

Add jitter. Pure exponential backoff makes every throttled worker wake at the same instant, producing a synchronized retry storm — the thundering herd. wait_exponential_jitter spreads wake-ups across a random window, so recovering workers rejoin gradually. Cap the maximum wait (60s here) so a deep backoff never exceeds your pull interval.

What happens to metric windows that fail every retry?

They should be logged and excluded from the result map, never silently appended as partial data. A None payload after five attempts means the window is missing, and a downstream reconciliation job should re-pull it rather than let the aggregate under-report. Enforcing the record contract at this boundary is where schema validation for billing data catches the gap.

Does this pattern work for historical backfills as well as live pulls?

The concurrency and backoff logic is identical, but backfills add archival-retrieval quotas that are far stricter than live endpoints. Chunk the time range and lower the bucket rate accordingly, as covered under batch processing for historical metrics; for sub-minute live telemetry, reuse the same header parsing to throttle subscription channels in a real-time metric streaming setup.

Building async Python parsers for AWS Cost Explorer — the sibling extraction pattern that consumes the rate-limited records this fetcher produces.
Implementing retry logic for failed metric pulls — the error-classification and retry discipline that keeps this backoff loop honest.
Graceful degradation when billing APIs are down — what to do when throttling turns into a full outage.
Async Usage Parsing Workflows — the parent topic covering concurrent ingestion, normalization, and idempotent aggregation.

Back to: Async Usage Parsing Workflows

Handling Rate Limits When Pulling Database Metrics #

Prerequisites #

Step-by-Step Implementation #

Step 1 — Normalize provider backoff headers into one delay #

Step 2 — Add a preemptive token bucket #

Step 3 — Build the semaphore-gated async fetcher #

Step 4 — Orchestrate concurrent, idempotent pulls #

Verification #

Gotchas & Edge Cases #

Frequently Asked Questions #

Do I still need a token bucket if I already retry on 429? #

Should I set the semaphore or the token bucket to the provider’s limit? #

How do I tune exponential backoff so retries do not stampede? #

What happens to metric windows that fail every retry? #

Does this pattern work for historical backfills as well as live pulls? #

Related #