Observability
MockServer provides two observability channels: Prometheus metrics for counters, gauges, and histograms; and OpenTelemetry (OTLP) for trace and metric export. Both are opt-in and have zero overhead when disabled.
- Prometheus metrics
- LLM token and cost metrics
- OpenTelemetry (OTLP) export
- GenAI spans
- W3C trace context propagation
- Configuration reference
Prometheus Metrics
Enable Prometheus metrics by setting metricsEnabled to true. MockServer then exposes a scrape endpoint at /mockserver/metrics in Prometheus text exposition format. When metrics are disabled, this endpoint returns 404.
# Start MockServer with metrics enabled
docker run --rm -p 1080:1080 \
-e MOCKSERVER_METRICS_ENABLED=true \
mockserver/mockserver:7.1.0
# Scrape metrics
curl http://localhost:1080/mockserver/metrics
Available metrics
Naming convention: the core request-tracking gauges are exposed with unprefixed names (e.g. requests_received_count, expectations_not_matched_count). Counter and histogram metrics, and the operational gauges (mock_server_active_service_chaos, mock_server_expectations_by_type, mock_server_build_info), all use a mock_server_ prefix (e.g. mock_server_request_duration_seconds). Note that the _total suffix is appended to counter names in the exposition output, so mock_server_http_chaos_injected appears as mock_server_http_chaos_injected_total on the /mockserver/metrics endpoint and in PromQL queries and Grafana.
Request and expectation matching
| Metric | Type | Description |
|---|---|---|
| requests_received_count | Gauge | Total requests received |
| expectations_not_matched_count | Gauge | Requests that did not match any expectation |
| response_expectations_matched_count | Gauge | Requests matched to a response expectation |
| forward_expectations_matched_count | Gauge | Requests matched to a forward expectation |
Action execution (one per action type)
| Metric | Description |
|---|---|
| response_actions_count | Response actions executed |
| forward_actions_count | Forward actions executed |
| sse_response_actions_count | SSE response actions executed |
| llm_response_actions_count | LLM response actions executed |
| error_actions_count | Error actions executed |
| grpc_stream_response_actions_count | gRPC stream response actions executed |
Additional action counters exist for template, callback, and other action types. See the full list by scraping the endpoint.
Request latency histogram
mock_server_request_duration_seconds is a Prometheus histogram of request handling duration (receipt to response), with buckets from 0.5 ms to 10 s. Use it to derive latency percentiles:
histogram_quantile(0.95, sum by (le) (rate(mock_server_request_duration_seconds_bucket[1m])))
Build info
mock_server_build_info is a gauge with labels version, major_minor_version, group_id, artifact_id, and git_hash.
JVM runtime
When metrics are enabled, MockServer also exposes JVM health gauges:
| Metric | Labels | Description |
|---|---|---|
| jvm_memory_used_bytes | area = heap / nonheap | Memory currently used |
| jvm_memory_committed_bytes | area | Memory committed by the JVM |
| jvm_memory_max_bytes | area | Max memory (-1 if undefined) |
| jvm_threads_current | — | Live thread count |
| jvm_threads_daemon | — | Daemon thread count |
| jvm_gc_collection_count | — | Total GC collections |
| jvm_gc_collection_seconds_sum | — | Total GC time in seconds |
Chaos metrics
When chaos testing is active, additional metrics track fault injection:
- mock_server_http_chaos_injected_total — counter with a fault_type label (drop, error, latency, truncate, malformed, slow, quota, graphql)
- mock_server_active_service_chaos — gauge per fault_type of currently active chaos profiles
- mock_server_chaos_auto_halt_total — counter that increments each time the chaos auto-halt circuit-breaker triggers
LLM Token and Cost Metrics
When both metricsEnabled and llmMetricsEnabled are true, three additional Prometheus counters track LLM usage:
| Metric | Labels | Description |
|---|---|---|
| mock_server_llm_input_tokens_total | provider, model | Cumulative input tokens |
| mock_server_llm_output_tokens_total | provider, model | Cumulative output tokens |
| mock_server_llm_cost_usd_total | provider, model | Cumulative estimated cost in USD |
These counters are incremented on both the mock path (when MockServer serves an httpLlmResponse) and the forward/proxy path (when MockServer forwards requests to a real LLM provider). Cost estimation uses an internal pricing table and is approximate.
The cost-budget circuit-breaker (mock_server_llm_cost_budget_tripped_total counter) is documented in LLM Response Mocking → Cost Budget.
# Example: total LLM cost rate per hour
sum(rate(mock_server_llm_cost_usd_total[1h]))
OpenTelemetry (OTLP) Export
MockServer can push metrics and traces to an OpenTelemetry Collector (or any OTLP-compatible backend) via OTLP HTTP/protobuf. Set the collector endpoint and enable the signals you want:
docker run --rm -p 1080:1080 \
-e MOCKSERVER_OTEL_ENDPOINT=http://otel-collector:4318 \
-e MOCKSERVER_OTEL_METRICS_ENABLED=true \
-e MOCKSERVER_OTEL_TRACES_ENABLED=true \
-e MOCKSERVER_METRICS_ENABLED=true \
mockserver/mockserver:7.1.0
otelEndpoint is the base URL of the OTLP HTTP collector. MockServer appends /v1/metrics and /v1/traces automatically.
Metrics export interval: otelMetricsExportIntervalSeconds controls how often metrics are pushed (default 60 seconds, minimum 1 second).
GenAI Spans
When otelTracesEnabled is true, MockServer emits OpenTelemetry GenAI semantic-convention spans for LLM completions. Each span includes:
- gen_ai.system — the provider (e.g. openai, anthropic)
- gen_ai.request.model — the model identifier
- Token usage attributes (input and output tokens)
- Finish reason
GenAI spans fire on two paths:
- Mock path — when MockServer serves an httpLlmResponse
- Forward/proxy path — when MockServer forwards requests to a real LLM provider. The provider is detected from the target host (e.g. api.openai.com maps to OpenAI, api.anthropic.com maps to Anthropic).
W3C Trace Context Propagation
MockServer can extract and propagate W3C traceparent and tracestate headers across requests and responses. This enables distributed tracing correlation when MockServer sits in a service mesh or test harness.
- otelPropagateTraceContext (default false) — when enabled, MockServer copies the incoming trace context headers to the response, so downstream tracing tooling can correlate the mock response with the original request trace.
- otelGenerateTraceId (default false) — when enabled, MockServer generates a new random W3C trace ID for requests that arrive without a traceparent header.
Configuration Reference
Prometheus
| Property | Env var | Default | Description |
|---|---|---|---|
| mockserver.metricsEnabled | MOCKSERVER_METRICS_ENABLED | false | Enable Prometheus metrics and the /mockserver/metrics endpoint |
| mockserver.llmMetricsEnabled | MOCKSERVER_LLM_METRICS_ENABLED | false | Enable LLM token/cost counters (requires metricsEnabled) |
OpenTelemetry
| Property | Env var | Default | Description |
|---|---|---|---|
| mockserver.otelEndpoint | MOCKSERVER_OTEL_ENDPOINT | (empty) | OTLP collector base URL (e.g. http://collector:4318) |
| mockserver.otelMetricsEnabled | MOCKSERVER_OTEL_METRICS_ENABLED | false | Push metrics to OTLP |
| mockserver.otelTracesEnabled | MOCKSERVER_OTEL_TRACES_ENABLED | false | Export GenAI spans via OTLP |
| mockserver.otelMetricsExportIntervalSeconds | MOCKSERVER_OTEL_METRICS_EXPORT_INTERVAL_SECONDS | 60 | OTLP metrics push interval in seconds (minimum 1) |
| mockserver.otelPropagateTraceContext | MOCKSERVER_OTEL_PROPAGATE_TRACE_CONTEXT | false | Copy W3C trace context headers to responses |
| mockserver.otelGenerateTraceId | MOCKSERVER_OTEL_GENERATE_TRACE_ID | false | Generate trace IDs for requests without traceparent |
Related Pages
- Configuration Properties — full reference for all observability properties
- LLM Response Mocking — LLM mocking and cost budget
- Chaos Testing & Fault Injection — chaos metrics and auto-halt
- Scalability & Latency — performance benchmarks and tuning