Class LlmQuotaRegistry

java.lang.Object
org.mockserver.llm.LlmQuotaRegistry

public class LlmQuotaRegistry extends Object
Process-wide, stateful request quota for LLM responses — a fixed-window rate limiter. Unlike the probabilistic 429 in LlmChaosProfile, this is deterministic and stateful: it counts how many requests have hit a named quota within the current time window and reports when the limit is exceeded, so a test can drive an agent into a hard rate-limit (e.g. "the 4th call in 60s gets 429").

Quotas are keyed by name, so several expectations that share a quotaName share one counter (model an upstream account limit), while distinct names are independent. State is held in a ConcurrentHashMap and each acquire is an atomic per-key update, safe under concurrent requests.

The time source is injectable so window behaviour is unit-testable without sleeping; production uses System.currentTimeMillis().

  • Constructor Details

    • LlmQuotaRegistry

      public LlmQuotaRegistry(LongSupplier clock)
  • Method Details

    • getInstance

      public static LlmQuotaRegistry getInstance()
    • tryAcquire

      public boolean tryAcquire(String name, int limit, long windowMillis)
      Record one request against the named quota and report whether it is allowed.

      Fixed-window semantics: the first request in a window starts it; the window expires windowMillis after it started, after which the next request starts a fresh window. A request is allowed when the in-window count (including itself) is at or below limit.

      Returns:
      true if the request is within the quota, false if it exceeds the limit for the current window.
    • tryAcquire

      public boolean tryAcquire(String name, long limit, long windowMillis, long amount)
      Record amount units (e.g. tokens) against the named quota and report whether the cumulative total is within the limit.

      Semantics are the same fixed-window as tryAcquire(String, int, long) but the counter increments by amount instead of 1, and the limit is a long to support large token-based quotas (TPM/TPD).

      Parameters:
      name - shared counter key
      limit - maximum allowed units per window (must be >= 0)
      windowMillis - window length in milliseconds (must be > 0)
      amount - units to consume (must be >= 0)
      Returns:
      true if the cumulative in-window total (including this call) is at or below limit, false otherwise.
    • reset

      public void reset()
      Clear all quota state. Called on server reset and for test isolation.