Class ChaosExperimentOrchestrator
C1 auto-halt integration: If ChaosAutoHaltMonitor halts
chaos mid-experiment (by calling ServiceChaosRegistry.reset()), the
orchestrator detects the empty registry at the next stage advance and stops
the experiment. An experiment stopped by auto-halt is reported with status
"halted_by_auto_halt".
Safety limits:
- Maximum 50 stages per experiment
- Maximum stage duration: 86400000L ms (24 hours)
- Only one experiment may be active at a time
- Stopping an experiment is idempotent
The orchestrator uses a single-thread ScheduledExecutorService
for non-blocking stage advancement. It never blocks the Netty event loop.
Time is measured via a pluggable LongSupplier clock (defaults to
TimeService.currentTimeMillis()) so tests can drive advancement
deterministically without wall-clock sleeps.
Shared-registry exclusivity: A running experiment takes exclusive ownership
of ServiceChaosRegistry. Manual service-chaos registrations are overwritten
at the next stage advance (which calls registry.reset() then re-applies the
stage profiles). A manual reset() of the registry is detected as an auto-halt
condition at the next stage boundary (see the entries().isEmpty() check in
advanceStage(RunningExperiment)). Users should stop the experiment before
making manual service-chaos changes.
The singleton instance is shared process-wide, consistent with
ServiceChaosRegistry's singleton pattern.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classAn experiment definition: name, ordered stages, and whether to loop.static classSnapshot of the current experiment status.static classA single stage: profiles to apply to specific hosts for a duration. -
Method Summary
Modifier and TypeMethodDescriptionstatic ChaosExperimentOrchestratorReturns the current experiment status.voidreset()Resets the orchestrator: stops any running experiment, clears chaos, and clears the terminal status sogetStatus()returns null.Starts an experiment.voidstop()Stops the current experiment and clears all chaos from the registry.
-
Method Details
-
getInstance
-
start
Starts an experiment. Returns a validation error message if the definition is invalid, ornullon success. Only one experiment may be active at a time; starting a new one while one is running stops the previous one. -
stop
public void stop()Stops the current experiment and clears all chaos from the registry. Idempotent: no-op if no experiment is running. -
reset
public void reset()Resets the orchestrator: stops any running experiment, clears chaos, and clears the terminal status sogetStatus()returns null. Called on server reset. -
getStatus
Returns the current experiment status. If no experiment is currently running but one recently terminated, returns a status with the terminal status (halted_by_auto_halt,completed, orstopped). Returnsnullonly when no experiment has ever run (or after reset).
-