Class LlmRateLimitHeaders

java.lang.Object
org.mockserver.llm.LlmRateLimitHeaders

public final class LlmRateLimitHeaders extends Object
Pure, deterministic helper that produces the provider-specific rate-limit HTTP headers real LLM providers send. Client SDKs (e.g. the OpenAI Python SDK, Anthropic SDK) read these headers to drive retry/backoff logic, so emitting them faithfully allows MockServer to exercise that logic against a mock.

The standard Retry-After header is intentionally not produced here — it is a generic HTTP header (not provider-specific) and is owned solely by HttpLlmResponseActionHandler.applyRateLimitHeaders(...), which emits it for every provider (including those with no provider-specific headers, such as Gemini, Bedrock, and Ollama). Keeping Retry-After in one place avoids a duplicate header on the wire.

Provider header reference

  • OPENAI / OPENAI_RESPONSES / AZURE_OPENAI (source: OpenAI docs "Rate limits" page) — x-ratelimit-limit-requests, x-ratelimit-remaining-requests, x-ratelimit-reset-requests (duration, e.g. "6s").
  • ANTHROPIC (source: Anthropic docs "Rate limits" page) — anthropic-ratelimit-requests-limit, anthropic-ratelimit-requests-remaining, anthropic-ratelimit-requests-reset (RFC 3339 timestamp).
  • GEMINI / BEDROCK — no provider-specific rate-limit headers; on a 429 only the standard Retry-After header (added by the handler) is exposed.
  • OLLAMA — none. Ollama is a local inference engine with no rate-limit concept.

All methods are static, deterministic, and pure (no clocks, no randomness inside — the caller passes resetSeconds and the current epoch second for RFC 3339 timestamps).

  • Method Details

    • headersFor

      public static Map<String,String> headersFor(Provider provider, Integer requestLimit, Integer requestRemaining, Long resetSeconds, long nowEpochSecond, boolean limited)
      Produce provider-specific rate-limit headers (excluding Retry-After, which the caller emits).
      Parameters:
      provider - the LLM provider
      requestLimit - quota limit (requests per window); may be null
      requestRemaining - requests remaining in the window; may be null
      resetSeconds - seconds until the window resets; may be null
      nowEpochSecond - current epoch second (for Anthropic RFC 3339 reset timestamp)
      limited - true when this is a rate-limit error (429); false for a successful response with quota info
      Returns:
      an insertion-ordered map of header-name to header-value; empty if the provider has no provider-specific rate-limit headers (Gemini, Bedrock, Ollama)