Class ChaosAutoHaltMonitor

java.lang.Object
org.mockserver.mock.action.http.ChaosAutoHaltMonitor

public class ChaosAutoHaltMonitor extends Object
Safety circuit-breaker for service-scoped chaos: when the number of error-class chaos faults (5xx synthetic errors, dropped connections, and quota-limit responses) within a configurable sliding window exceeds a threshold, all active service-scoped chaos profiles are automatically halted (disabled) via ServiceChaosRegistry.reset().

Only destructive fault types contribute to the window: "error" (synthetic 5xx), "drop" (connection kill), and "quota" (429/503). Benign fault types such as "latency", "slow", "truncate", "malformed", and "graphql" do not count — a latency-only experiment will never auto-halt, which matches the circuit-breaker's purpose.

This prevents a chaos experiment from driving a cascading outage — the "steady-state guardrail" SREs expect.

The monitor is evaluated per chaos-fault injection (called from Metrics.incrementHttpChaosInjected(String)). It does not block the event loop — the sliding window is maintained in a lock-free ConcurrentLinkedDeque of timestamps.

Configuration (all read dynamically from ConfigurationProperties):

  • chaosAutoHaltEnabled — master switch (default false = inert)
  • chaosAutoHaltErrorThreshold — error count to trigger halt (default 50)
  • chaosAutoHaltWindowMillis — sliding window (default 60 000 ms)

The singleton instance is shared process-wide, consistent with ServiceChaosRegistry's singleton pattern.

  • Method Details

    • getInstance

      public static ChaosAutoHaltMonitor getInstance()
    • recordError

      public void recordError(String faultType)
      Record a chaos-injected fault and evaluate the circuit-breaker. Called after each chaos fault injection (from Metrics.incrementHttpChaosInjected).

      Only destructive fault types ("error", "drop", "quota") contribute to the sliding window. Benign faults ("latency", "slow", "truncate", "malformed", "graphql") are ignored — a latency-only experiment will never auto-halt.

      When the feature is disabled (chaosAutoHaltEnabled is false), this method is a no-op — no timestamps are recorded, no evaluation occurs.

      Parameters:
      faultType - the fault type string (e.g. "error", "drop", "latency")
    • getHaltCount

      public long getHaltCount()
      Returns the total number of times the auto-halt circuit-breaker has triggered since the process started (or since the last reset()).
    • currentWindowSize

      public int currentWindowSize()
      Returns the number of error timestamps currently in the sliding window.
    • reset

      public void reset()
      Reset the monitor state. Called on server reset and for test isolation.