Observability | MockServer

MockServer provides two observability channels: Prometheus metrics for counters, gauges, and histograms; and OpenTelemetry (OTLP) for trace and metric export. Both are opt-in and have zero overhead when disabled.

Prometheus metrics
LLM token and cost metrics
OpenTelemetry (OTLP) export
GenAI spans
W3C trace context propagation
Configuration reference

Prometheus Metrics

Enable Prometheus metrics by setting metricsEnabled to true. MockServer then exposes a scrape endpoint at /mockserver/metrics in Prometheus text exposition format. When metrics are disabled, this endpoint returns 404.

# Start MockServer with metrics enabled
docker run --rm -p 1080:1080 \
  -e MOCKSERVER_METRICS_ENABLED=true \
  mockserver/mockserver:7.4.0

# Scrape metrics
curl http://localhost:1080/mockserver/metrics

Available metrics

Naming convention: the core request-tracking gauges are exposed with unprefixed names (e.g. requests_received_count, expectations_not_matched_count). Counter and histogram metrics, and the operational gauges (mock_server_active_service_chaos, mock_server_expectations_by_type, mock_server_build_info), all use a mock_server_ prefix (e.g. mock_server_request_duration_seconds). Note that the _total suffix is appended to counter names in the exposition output, so mock_server_http_chaos_injected appears as mock_server_http_chaos_injected_total on the /mockserver/metrics endpoint and in PromQL queries and Grafana.

Request and expectation matching

Metric	Type	Description
requests_received_count	Gauge	Total requests received
expectations_not_matched_count	Gauge	Requests that did not match any expectation
response_expectations_matched_count	Gauge	Requests matched to a response expectation
forward_expectations_matched_count	Gauge	Requests matched to a forward expectation

Each of these four gauges (plus llm_chaos_injected_count) is genuinely monotonic (only ever increases). Because the four gauge names above are read by the dashboard UI and existing Grafana dashboards, they are kept unchanged; alongside each one MockServer additionally publishes a proper Prometheus Counter with a _total suffix. Use the _total counters in PromQL rate()/increase() queries — they are true monotonic counters, which those functions model correctly:

Legacy gauge (unchanged)	Counter for rate()/increase()
requests_received_count	mock_server_requests_received_total
expectations_not_matched_count	mock_server_expectations_not_matched_total
response_expectations_matched_count	mock_server_response_expectations_matched_total
forward_expectations_matched_count	mock_server_forward_expectations_matched_total
llm_chaos_injected_count	mock_server_llm_chaos_injected_total

# requests-per-second over the last 5 minutes
rate(mock_server_requests_received_total[5m])

Per-expectation match counter (opt-in)

When perExpectationMetricsEnabled is true (alongside metricsEnabled), MockServer registers an additional counter:

Metric	Type	Labels	Description
mock_server_expectation_matched	Counter	expectation_id	Total matches (and served responses) for each expectation, labelled by the stable expectation id. Appears in scrape output as mock_server_expectation_matched_total{expectation_id="..."}.

This counter is off by default: each active expectation adds one Prometheus label value, so cardinality grows with the number of registered expectations. Enable it only when you need per-expectation visibility. See Per-Expectation Match Counters configuration.

Action execution (one per action type)

Metric	Description
response_actions_count	Response actions executed
forward_actions_count	Forward actions executed
sse_response_actions_count	SSE response actions executed
llm_response_actions_count	LLM response actions executed
error_actions_count	Error actions executed
grpc_stream_response_actions_count	gRPC stream response actions executed

Additional action counters exist for template, callback, and other action types. See the full list by scraping the endpoint.

Request latency histogram

mock_server_request_duration_seconds is a Prometheus histogram of request handling duration (receipt to response), with buckets from 0.5 ms to 10 s. Use it to derive latency percentiles:

histogram_quantile(0.95, sum by (le) (rate(mock_server_request_duration_seconds_bucket[1m])))

Build info

mock_server_build_info is a gauge with labels version, major_minor_version, group_id, artifact_id, and git_hash.

JVM runtime

When metrics are enabled, MockServer also exposes JVM health gauges:

Metric	Labels	Description
jvm_memory_used_bytes	area = heap / nonheap	Memory currently used
jvm_memory_committed_bytes	area	Memory committed by the JVM
jvm_memory_max_bytes	area	Max memory (-1 if undefined)
jvm_threads_current	—	Live thread count
jvm_threads_daemon	—	Daemon thread count
jvm_gc_collection_count	—	Total GC collections
jvm_gc_collection_seconds_sum	—	Total GC time in seconds

Chaos metrics

When chaos testing is active, additional metrics track fault injection:

mock_server_http_chaos_injected_total — counter with a fault_type label (drop, error, latency, truncate, malformed, slow, quota, rateLimit, graphql)
mock_server_active_service_chaos — gauge per fault_type of currently active chaos profiles
mock_server_chaos_auto_halt_total — counter that increments each time the chaos auto-halt circuit-breaker triggers

Cluster metrics

MockServer also exposes the size of the cluster it belongs to:

mock_server_cluster_members — gauge of the number of members in the MockServer cluster, read live at scrape time. It reads 1 for a single-node deployment (the default in-memory backend, or Infinispan in LOCAL mode) and the real fleet size when clustering is enabled. The same membership detail is available as JSON from the GET /mockserver/cluster control-plane endpoint.

LLM Token and Cost Metrics

When both metricsEnabled and llmMetricsEnabled are true, three additional Prometheus counters track LLM usage:

Metric	Labels	Description
mock_server_llm_input_tokens_total	provider, model	Cumulative input tokens
mock_server_llm_output_tokens_total	provider, model	Cumulative output tokens
mock_server_llm_cost_usd_total	provider, model	Cumulative estimated cost in USD

These counters are incremented on both the mock path (when MockServer serves an httpLlmResponse) and the forward/proxy path (when MockServer forwards requests to a real LLM provider). Cost estimation uses an internal pricing table and is approximate.

The cost-budget circuit-breaker (mock_server_llm_cost_budget_tripped_total counter) is documented in LLM Response Mocking → Cost Budget.

# Example: total LLM cost rate per hour
sum(rate(mock_server_llm_cost_usd_total[1h]))

Three further gauges expose the headline verdict of the latest LLM optimisation report. They have no labels (single global gauges) and report the figures from the most recently built optimisation report — 0 until a report has been built (and again after a server reset), so build the report periodically (via the dashboard, the REST endpoint, or the export_optimisation_report MCP tool) to keep them fresh.

Metric	Description
mock_server_llm_estimated_waste_usd	Estimated recoverable LLM spend (USD) from the latest optimisation report
mock_server_llm_cache_hit_ratio	Cache-hit ratio (0–1) from the latest optimisation report
mock_server_llm_one_shot_rate	One-shot rate (0–1, fraction of non-retry calls) from the latest optimisation report

OpenTelemetry (OTLP) Export

MockServer can push metrics and traces to an OpenTelemetry Collector (or any OTLP-compatible backend) via OTLP HTTP/protobuf. Set the collector endpoint and enable the signals you want:

docker run --rm -p 1080:1080 \
  -e MOCKSERVER_OTEL_ENDPOINT=http://otel-collector:4318 \
  -e MOCKSERVER_OTEL_METRICS_ENABLED=true \
  -e MOCKSERVER_OTEL_TRACES_ENABLED=true \
  -e MOCKSERVER_METRICS_ENABLED=true \
  mockserver/mockserver:7.4.0

otelEndpoint is the base URL of the OTLP HTTP collector. MockServer appends /v1/metrics and /v1/traces automatically. If MOCKSERVER_OTEL_ENDPOINT is not set, MockServer falls back to the standard OpenTelemetry OTEL_EXPORTER_OTLP_ENDPOINT environment variable, so existing OTel deployments work without extra configuration. See otelEndpoint in the configuration reference for details.

Metrics export interval: otelMetricsExportIntervalSeconds controls how often metrics are pushed (default 60 seconds, minimum 1 second).

GenAI Spans

When otelTracesEnabled is true, MockServer emits OpenTelemetry GenAI semantic-convention spans for LLM completions. Each span includes:

gen_ai.system — the provider (e.g. openai, anthropic)
gen_ai.request.model — the model identifier
Token usage attributes (input and output tokens)
Finish reason

GenAI spans fire on two paths:

Mock path — when MockServer serves an httpLlmResponse
Forward/proxy path — when MockServer forwards requests to a real LLM provider. The provider is detected from the target host (e.g. api.openai.com maps to OpenAI, api.anthropic.com maps to Anthropic).

W3C Trace Context Propagation

MockServer can extract and propagate W3C traceparent and tracestate headers across requests and responses. This enables distributed tracing correlation when MockServer sits in a service mesh or test harness.

otelPropagateTraceContext (default false) — when enabled, MockServer copies the incoming trace context headers to the response, so downstream tracing tooling can correlate the mock response with the original request trace.
otelGenerateTraceId (default false) — when enabled, MockServer generates a new random W3C trace ID for requests that arrive without a traceparent header.

Configuration Reference

Prometheus

Property	Env var	Default	Description
mockserver.metricsEnabled	MOCKSERVER_METRICS_ENABLED	false	Enable Prometheus metrics and the /mockserver/metrics endpoint
mockserver.llmMetricsEnabled	MOCKSERVER_LLM_METRICS_ENABLED	false	Enable LLM token/cost counters (requires metricsEnabled)

OpenTelemetry

Property	Env var	Default	Description
mockserver.otelEndpoint	MOCKSERVER_OTEL_ENDPOINT	(empty)	OTLP collector base URL (e.g. http://collector:4318)
mockserver.otelMetricsEnabled	MOCKSERVER_OTEL_METRICS_ENABLED	false	Push metrics to OTLP
mockserver.otelTracesEnabled	MOCKSERVER_OTEL_TRACES_ENABLED	false	Export GenAI spans via OTLP
mockserver.otelMetricsExportIntervalSeconds	MOCKSERVER_OTEL_METRICS_EXPORT_INTERVAL_SECONDS	60	OTLP metrics push interval in seconds (minimum 1)
mockserver.otelPropagateTraceContext	MOCKSERVER_OTEL_PROPAGATE_TRACE_CONTEXT	false	Copy W3C trace context headers to responses
mockserver.otelGenerateTraceId	MOCKSERVER_OTEL_GENERATE_TRACE_ID	false	Generate trace IDs for requests without traceparent

Configuration Properties — full reference for all observability properties
LLM Response Mocking — LLM mocking and cost budget
Chaos Testing & Fault Injection — chaos metrics and auto-halt
Scalability & Latency — performance benchmarks and tuning