Inspect AI Agent Traffic - LLM & MCP Proxy
When you use an AI coding agent such as Claude Code or OpenCode, or an AI library such as LangChain or LangGraph, you often have no visibility into the HTTPS traffic it sends to LLM APIs (Anthropic, OpenAI, etc.) and MCP servers. Running MockServer as an HTTPS proxy gives you a complete record of every request and response — including model selection, token counts, tool calls, and streamed completions — with no changes to application code.
Because MockServer performs TLS man-in-the-middle (MITM) interception, it can decrypt and log HTTPS traffic while forwarding it transparently. Streaming responses (Server-Sent Events used by LLM APIs) are relayed incrementally so the agent remains fully responsive — there is no buffering delay.
How It Works
- Start MockServer as a proxy on a local port
- Trust the MockServer CA certificate so TLS connections succeed
- Configure your AI tool to send traffic through the proxy
- View the captured traffic in the dashboard or via the retrieve API
1. Start MockServer as a Proxy
Any MockServer instance acts as a transparent HTTPS proxy — no special mode is required. Start one using your preferred method:
MockServer is flexible and support numerous usage patterns.
MockServer can be run:
- programmatically via a Java API in an @Before or @After method
- using a JUnit 4 @Rule via a @Rule annotated field in a JUnit 4 test
- using a JUnit 5 Test Extension via a @ExtendWith annotated JUnit 5 class
- using a Spring Test Execution Listener via a @MockServerTest annotated test class
- as a Docker container in any Docker enabled environment
- via a Helm chart in any Kubernetes environment
- from the command line as a stand-alone process in a test environment
- via a Maven Plugin as part of a Maven build cycle
- as a Node.js (npm) module from any Node.js code
- as a Grunt plugin as part of a Grunt build cycle
- as a deployable WAR to an existing application server
To simplify configuration all versions (except the deployable WAR) use a single port to support the control plane and data plane in HTTP, HTTPS or SOCKS.
MockServer is available in the following formats:
- java dependency
- Docker container
- Helm chart for Kubernetes
- executable jar
- Homebrew package
- maven plugin
- npm plugin
- Grunt plugin
- deployable WAR that runs on JEE web servers
It is also possible to build and run MockServer directly from source code
MockServer UI:
MockServer has a UI that can be used to view the internal state within MockServer, including:
The quickest option for local use:
docker run -d --rm -p 1080:1080 mockserver/mockserver
Or download and run the executable JAR:
java -jar mockserver-netty-no-dependencies-6.0.0.jar -serverPort 1080
Once started, MockServer listens on port 1080. Any HTTPS request sent through it will be intercepted, logged, and forwarded to the real destination.
2. Trust the MockServer CA Certificate
MockServer intercepts TLS by dynamically generating certificates for each upstream hostname, signed by its own Certificate Authority (CA). Your AI tool must trust this CA, otherwise TLS handshakes will fail.
The MockServer CA certificate (PEM format) is available at:
- From the running container or JAR:
http://localhost:1080/mockserver/ca.pem— download this once MockServer is running - From the GitHub repository: CertificateAuthorityCertificate.pem
- From the classpath (if using the Java API):
/org/mockserver/socket/CertificateAuthorityCertificate.pem
Download it to a local file:
curl -s http://localhost:1080/mockserver/ca.pem -o mockserver-ca.pem
See HTTPS & TLS — Ensure MockServer Certificates Are Trusted for full details on adding the CA to operating systems, JVMs, and HTTP clients.
Security note: The default CA private key is public knowledge (it is in the MockServer git repository), which means the default CA should only be used in isolated development environments. For shared or semi-permanent setups, enable dynamicallyCreateCertificateAuthorityCertificate to generate a unique local CA whose private key is never published.
3. Configure Your AI Tool
Most AI tools and libraries route HTTPS traffic through the proxy and CA specified by standard environment variables. Set these before starting your tool:
export HTTPS_PROXY=http://localhost:1080
export NODE_EXTRA_CA_CERTS=/path/to/mockserver-ca.pem # Node.js tools (Claude Code, OpenCode)
export SSL_CERT_FILE=/path/to/mockserver-ca.pem # Python tools (LangChain, httpx)
Select your tool for specific instructions:
Claude Code is a Node.js process. It honours HTTPS_PROXY for outbound traffic and NODE_EXTRA_CA_CERTS to extend the Node.js CA trust store without replacing it.
export HTTPS_PROXY=http://localhost:1080
export NODE_EXTRA_CA_CERTS=/path/to/mockserver-ca.pem
Start Claude Code in a terminal where these variables are set:
claude
All HTTPS calls to api.anthropic.com and any MCP servers using HTTP/streamable-HTTP transport will now flow through MockServer. Streaming completions are relayed incrementally — Claude Code remains fully responsive.
To make the configuration persistent for a shell session, add the exports to your shell profile (~/.zshrc, ~/.bashrc, etc.).
OpenCode is also a Node.js process and uses the same standard variables:
export HTTPS_PROXY=http://localhost:1080
export NODE_EXTRA_CA_CERTS=/path/to/mockserver-ca.pem
Start OpenCode in a terminal where these variables are set:
opencode
Alternatively, set them in your OpenCode launch configuration or shell profile to apply them globally.
Python's httpx and requests libraries (used by the Anthropic and OpenAI Python SDKs) honour HTTPS_PROXY and SSL_CERT_FILE:
export HTTPS_PROXY=http://localhost:1080
export SSL_CERT_FILE=/path/to/mockserver-ca.pem
If you instantiate an httpx client directly, pass the proxy and CA bundle explicitly:
import anthropic
import httpx
client = anthropic.Anthropic(
http_client=httpx.Client(
proxy="http://localhost:1080",
verify="/path/to/mockserver-ca.pem",
)
)
For the OpenAI Python SDK:
import openai
import httpx
client = openai.OpenAI(
http_client=httpx.Client(
proxy="http://localhost:1080",
verify="/path/to/mockserver-ca.pem",
)
)
LangChain and LangGraph applications use whatever HTTP client the underlying SDK uses, so setting HTTPS_PROXY and SSL_CERT_FILE at the process level is normally sufficient. The same applies to any other Python AI framework or SDK.
4. View the Captured Traffic
Dashboard Traffic View
Open the MockServer dashboard in a browser:
http://localhost:1080/mockserver/dashboard
Click Traffic in the navigation bar to open the Traffic view. Unlike the standard Dashboard panels, the Traffic view shows all captured requests in one list — both requests that matched a mock expectation and requests that were forwarded to a real upstream.
The Traffic view shows a master list of every captured request/response pair. Click any row to open a detail pane on the right.
LLM Usage Strip
For any LLM request, a thin strip appears above the detail tabs showing the LLM provider, model name, token counts (input and output), estimated cost, and stop reason. This lets you check usage figures without switching to a different tab.
Detail Tabs by Traffic Kind
The detail pane adapts to the type of traffic detected:
| Traffic kind | Detail tabs |
|---|---|
| Anthropic, OpenAI, OpenAI Responses, Gemini, or Ollama |
Messages — the request body: system prompt, messages/contents, and tools definition Conversation — a chat-transcript view (see below) Scripted Turns — shown when scripted conversation expectations are active SSE Timeline — decoded Server-Sent Events for streamed responses (shown when stream data is present) Raw JSON — the raw request and response JSON |
MCP JSON-RPC (Content-Type: application/json with a jsonrpc field) |
MCP — decoded JSON-RPC: method, id, params, and result or error Raw JSON — the raw request and response JSON |
| Any other HTTP traffic | Raw JSON only |
Conversation View
The Conversation tab renders LLM exchanges as a chat transcript, making it easy to read multi-turn interactions at a glance. The Conversation tab is available for all five supported LLM providers: Anthropic, OpenAI, OpenAI Responses API, Gemini, and Ollama.
- User messages appear left-aligned; assistant messages appear right-aligned, styled as WhatsApp-style chat bubbles
- System prompts appear as a distinct banner above the conversation
- Tool calls (requests to use a tool) and tool results (the tool's output) each appear as their own labelled bubbles
SSE Timeline
The SSE Timeline tab is shown for streamed LLM responses. It displays each decoded Server-Sent Event as a separate row with the elapsed time since the first chunk arrived, making it easy to spot latency spikes mid-stream. The final reassembled message is also shown.
The Proxied Requests panel in the standard Dashboard view also shows all forwarded request/response pairs with full JSON body inspection.
Sessions View
Click Sessions in the navigation bar to see captured LLM traffic grouped into conversation swim-lanes. Each swim-lane is labelled with the scenario name and isolation value (for example, weather-agent / agent-A) and shows chips for each captured turn. Click a chip to open the Conversation view for that turn.
An Unscoped requests strip at the bottom collects requests that did not match any isolated session. The Sessions view requires that LLM conversation expectations were set up with a per-session isolation key — see LLM Conversation Mocking for details.
Download HAR and Export
To export captured traffic, open the Library view and click the Export sub-tab. Pick what to export from the dropdown — either the registered expectations or the captured requests — in one of five formats: MockServer JSON, HAR, OpenAPI 3, Postman v2.1 collection, or Bruno collection (.zip). Click Download. OpenAPI / Postman / Bruno round-trip into Swagger UI, Postman, and Bruno respectively. Streamed LLM responses are included as readable text in every format.
The same exports are also available via the retrieve API.
Retrieve API
Retrieve all proxied request-response pairs as JSON:
curl -s -X PUT http://localhost:1080/mockserver/retrieve?type=REQUEST_RESPONSES
Export in any of the supported formats by adding a format= query parameter (case-insensitive):
# HAR (HTTP Archive) — for browser DevTools or HAR analysers
curl -s -X PUT "http://localhost:1080/mockserver/retrieve?type=REQUEST_RESPONSES&format=HAR" -o traffic.har
# OpenAPI 3 spec — derived from observed traffic
curl -s -X PUT "http://localhost:1080/mockserver/retrieve?type=REQUEST_RESPONSES&format=OPENAPI" -o traffic.openapi.json
# Postman collection v2.1 — each captured request as an item with example response
curl -s -X PUT "http://localhost:1080/mockserver/retrieve?type=REQUEST_RESPONSES&format=POSTMAN" -o traffic.postman.json
# Bruno collection — zip archive of .bru files + bruno.json manifest
curl -s -X PUT "http://localhost:1080/mockserver/retrieve?type=REQUEST_RESPONSES&format=BRUNO" -o traffic.bruno.zip
The same five formats are available for type=ACTIVE_EXPECTATIONS too — substitute it in the URL to export the registered matchers instead of captured traffic.
Filter to a specific host (e.g. the Anthropic API):
curl -s -X PUT http://localhost:1080/mockserver/retrieve?type=REQUEST_RESPONSES \
-d '{"headers": {"host": [{"value": "api.anthropic.com"}]}}'
Streamed Response Capture
LLM APIs stream completions as Server-Sent Events. MockServer relays each chunk immediately to the client (so the agent sees live output) and simultaneously captures up to 256 KB of the stream body in the event log. When a streamed response body exceeds this limit, the logged body is truncated and marked with the response header x-mockserver-stream-truncated: true; the full stream still reaches the client.
The streaming relay applies to HTTP/1.1 proxied responses only. Streaming is auto-detected from the Content-Type: text/event-stream response header. Ordinary chunked responses (without text/event-stream) are aggregated normally. Non-streaming responses are handled identically to before — this feature adds no overhead for ordinary JSON responses.
Relevant configuration properties:
streamingResponsesEnabled— enable or disable streaming relay (default:true)maxStreamingCaptureBytes— maximum bytes captured per stream (default: 262144)streamIdleTimeoutSeconds— idle timeout between chunks (default: 60s)
MCP Transport Caveat
stdio MCP servers cannot be proxied. MCP servers running over the stdio transport communicate through local process pipes, not over the network. There is no HTTP traffic to intercept.
MCP servers using the HTTP or streamable-HTTP transport communicate over HTTPS and will be captured by MockServer exactly like any other HTTPS call. If you want to inspect MCP traffic, choose an MCP server that supports HTTP or streamable-HTTP transport, or connect to a remote MCP server over HTTPS.
LLM Record & Replay
After capturing LLM/MCP traffic through MockServer's proxy, you can snapshot it into a fixture file for deterministic, offline replay. This enables AI application tests that are free (no metered API calls), fast, and reproducible.
How It Works
- Record — run your AI application through MockServer's proxy as described above. MockServer logs every forwarded request/response pair, including SSE streaming responses.
- Snapshot — call the
record_llm_fixturesMCP tool (or REST API equivalent) to write the captured traffic to a JSON fixture file. Secrets (API keys, auth tokens, cookies) are automatically redacted. SSE streaming responses are converted to MockServer's SSE response format for faithful event-by-event replay. - Commit — add the fixture file to version control. It contains no secrets and uses MockServer's standard expectation JSON format.
- Replay — in your test suite, start MockServer and load the fixture file with
load_expectations_from_fileor via theinitializationJsonPathconfiguration property. Your application now talks to MockServer instead of the real API and receives the same responses (including SSE streaming) deterministically.
Recording via MCP
If you have an AI agent connected to MockServer's MCP control plane, use the record_llm_fixtures tool:
{
"method": "tools/call",
"params": {
"name": "record_llm_fixtures",
"arguments": {
"path": "./fixtures/anthropic-chat.json",
"requestPath": "/v1/messages"
}
}
}
Optional filters:
requestPath— only include traffic matching this request pathhost— only include traffic matching this host header
Loading Fixtures for Replay
Load the fixture file into MockServer at test startup:
{
"method": "tools/call",
"params": {
"name": "load_expectations_from_file",
"arguments": {
"path": "./fixtures/anthropic-chat.json"
}
}
}
Alternatively, use MockServer's initializationJsonPath configuration property to load fixtures automatically on startup:
java -jar mockserver-netty-no-dependencies-6.0.0.jar \
-serverPort 1080 \
-initializationJsonPath ./fixtures/anthropic-chat.json
Secret Redaction
The record_llm_fixtures tool automatically redacts the following sensitive headers in both requests and responses, replacing their values with ***REDACTED***:
Authorization(Bearer tokens, Basic auth)x-api-key/api-keyCookie/Set-CookieProxy-Authorization
This means fixture files are safe to commit to public or shared repositories without leaking credentials stored in headers.
Request and response bodies are not redacted. The automatic redaction covers sensitive headers only. If your application places credentials, API keys, or other secrets in request or response bodies (for example, in a JSON login payload or an OAuth token response), those values will appear in the fixture file. Review fixture files before committing them to version control to ensure no secrets remain in body content.
SSE Streaming Replay
When the recorded response was an SSE stream (from APIs like Anthropic Claude or OpenAI's streaming mode), the fixture converter automatically:
- Detects the stream via the
Content-Type: text/event-streamresponse header - Parses the captured SSE body into individual events (event type, data, id, retry)
- Produces an
HttpSseResponseaction that replays each event with a small inter-event delay (50ms)
On replay, your application receives SSE events one by one, just like from the real API. This is important for testing streaming token rendering, progress indicators, and partial-response handling.
If the SSE body was truncated during capture (when it exceeded maxStreamingCaptureBytes), the fixture falls back to a static response with a warning header. Increase maxStreamingCaptureBytes to capture longer streams.
Related Pages
- HTTPS & TLS — full details on trusting the MockServer CA certificate
- Getting Started Proxying — general proxy setup and client configuration
- Debugging with AI — using AI assistants to analyse captured traffic via MCP
- Proxying Configuration — all proxy-related configuration properties
- MockServer UI — dashboard overview