Scalability & Latency
average of 1.58ms and p99 of 4ms for 150 parallel clients sending 95,228 requests per second
MockServer is build to support massive scale from a single instance:
- Apache Benchmark tested up to 6,000 parallel clients and shows MockServer has an average of 1.58ms and p99 of 4ms for 150 parallel clients sending 95,228 requests per second
- Locust tested up to 3,000 parallel clients and shows MockServer has a p99 of 4ms for 150 parallel clients
Both performance testing frameworks show similar results up to 2,000 parallel clients at which point Locust reports warnings and the figures are no longer consistent with Apache Benchmark.
The following frameworks & techniques are used to maximise scalability:
- Netty an asynchronous event-driven network application framework to maximise the scalability of HTTP and TLS
- LMAX Disruptor a high performance inter-thread messaging library to maximise the scalability of recording events (i.e. state) and logging
- ScheduledThreadPoolExecutor a thread pool that can scheduled delayed tasks is used to execute delay response to avoid blocking threads
Performance Tests
MockServer has been performance tested using Apache Benchmark and Locust with the following scenario:
- four basic expectations, including method, path and headers
- basic GET request matching third expectation (i.e. three matches are attempted for each request)
During the test MockServer was run on a Java 13 JVM, with the following command:
java -Xmx500m -Dmockserver.logLevel=WARN -Dmockserver.disableLogging=true -jar ~/.m2/repository/org/mock-server/mockserver-netty/6.1.0/mockserver-netty-6.1.0-no-dependencies.jar -serverPort 1080
Note: The benchmark uses -Dmockserver.disableLogging=true to suppress system-out log output and -Dmockserver.logLevel=WARN to reduce diagnostic log entries, minimising I/O and formatting overhead. Request/response log entries are still stored in memory for verification. For sustained high-throughput operation, the most effective tuning is to reduce maxLogEntries — see the troubleshooting section below for guidance.
The following graph shows how the p99 increases as the number of parallel clients increase.
Apache Bench Performance Test Results
Apache Benchmark was executed as follows:
ab -k -n 10000000 -c <parallel clients> http://127.0.0.1:1080/simple
The test results are:
| parallel clients | 50% | 66% | 75% | 80% | 90% | 95% | 98% | 99% | requests/s | mean | |
| 10 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 77,122 | 0.13 | |
| 50 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 85,765 | 0.58 | |
| 100 | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 3 | 92,846 | 1.08 | |
| 150 | 1 | 2 | 2 | 2 | 2 | 2 | 3 | 4 | 95,228 | 1.58 | |
| 250 | 3 | 3 | 3 | 3 | 4 | 5 | 7 | 8 | 86,470 | 2.89 | |
| 500 | 6 | 6 | 6 | 6 | 7 | 7 | 8 | 9 | 83,209 | 6.01 | |
| 750 | 9 | 9 | 10 | 10 | 11 | 11 | 12 | 15 | 75,554 | 9.93 | |
| 1000 | 11 | 12 | 13 | 13 | 14 | 16 | 17 | 21 | 75,423 | 13.26 | |
| 2000 | 24 | 24 | 25 | 26 | 27 | 29 | 31 | 35 | 82,191 | 24.33 | |
| 3000 | 37 | 39 | 40 | 40 | 43 | 46 | 51 | 58 | 78,171 | 38.38 | |
| 4000 | 52 | 55 | 57 | 59 | 64 | 70 | 82 | 91 | 73,552 | 54.38 | |
| 5000 | 65 | 67 | 70 | 71 | 75 | 79 | 90 | 102 | 74,065 | 67.51 | |
| 6000 | 80 | 84 | 88 | 90 | 97 | 104 | 122 | 137 | 70,432 | 85.19 |
Locust Performance Test Results
Apache Benchmark was executed as follows:
locust --loglevel=WARNING --headless --only-summary -u <parallel clients> -r 100 -t 180 --host=http://127.0.0.1:1080
The test results are:
| parallel clients | 50% | 66% | 75% | 80% | 90% | 95% | 98% | 99% | 99.90% | 99.99% | requests/s | mean |
| 10 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 2 | 5 | 5 | 11 | 0 |
| 50 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 2 | 3 | 5 | 50 | 0 |
| 100 | 0 | 1 | 1 | 1 | 1 | 2 | 2 | 3 | 4 | 8 | 100 | 0 |
| 150 | 1 | 1 | 1 | 1 | 2 | 3 | 3 | 4 | 5 | 6 | 149 | 0 |
| 250 | 2 | 3 | 3 | 4 | 5 | 6 | 7 | 8 | 15 | 46 | 245 | 2 |
| 500 | 2 | 2 | 3 | 3 | 4 | 5 | 6 | 7 | 9 | 46 | 479 | 2 |
| 750 | 3 | 4 | 5 | 6 | 8 | 10 | 12 | 14 | 29 | 34 | 699 | 3 |
| 1000 | 3 | 4 | 6 | 6 | 8 | 10 | 13 | 16 | 36 | 52 | 909 | 3 |
| 2000 | 4 | 7 | 10 | 12 | 22 | 34 | 49 | 59 | 87 | 110 | 1626.14 | 8 |
| 3000 | 51 | 78 | 99 | 110 | 160 | 180 | 220 | 240 | 290 | 310 | 2629.92 | 54 |
The following locustfile.py was used
import locust.stats
from locust import task, between
locust.stats.CONSOLE_STATS_INTERVAL_SEC = 60
from locust.contrib.fasthttp import FastHttpLocust
class UserBehavior(FastHttpUser):
wait_time = between(1, 1)
@task
def request(self):
self.client.get("/simple", verify=False)
Clustering MockServer
MockServer supports a very high request throughput, however if a higher request per second rate is required it is possible to cluster MockServer so that all nodes share expectations.
Although expectations are clustered, currently there is no support for clustering the MockServer log therefore request verifications will only work against the node that received the request.
To create a MockServer cluster all instances need to:
- share a read-write file system i.e. same physical / virtual machine, NFS, AWS EFS, Azure Files, etc
- configure identical expectation initialiser and expectation persistence
- bind to a free port i.e. separate ports if on same physical / virtual machine
Each node could be configured as follows (adjusting the port as necessary):
MOCKSERVER_WATCH_INITIALIZATION_JSON=true \
MOCKSERVER_INITIALIZATION_JSON_PATH=mockserverInitialization.json \
MOCKSERVER_PERSIST_EXPECTATIONS=true \
MOCKSERVER_PERSISTED_EXPECTATIONS_PATH=mockserverInitialization.json \
java -jar ~/Downloads/mockserver-netty-6.1.0-no-dependencies.jar -serverPort 1080 -logLevel INFO
or
java \
-Dmockserver.watchInitializationJson=true \
-Dmockserver.initializationJsonPath=mockserverInitialization.json \
-Dmockserver.persistExpectations=true \
-Dmockserver.persistedExpectationsPath=mockserverInitialization.json \
-jar ~/Downloads/mockserver-netty-6.1.0-no-dependencies.jar -serverPort 1080 -logLevel INFO
Memory Tuning Guide
MockServer stores expectations and log entries in memory using ring buffers. The two most important settings that affect memory usage are maxExpectations and maxLogEntries. Each HTTP request processed by MockServer generates 2-3 log entries (the request itself, the expectation match result, and the response).
Both settings have automatic defaults based on available JVM heap space. The table below provides recommended values if you want to override the defaults for different heap sizes:
| JVM Heap Size | maxExpectations | maxLogEntries | Approx HTTP Requests Retained |
|---|---|---|---|
| 256 MB | 1,000 | 5,000 | ~1,500 - 2,500 |
| 512 MB | 2,000 | 15,000 | ~5,000 - 7,500 |
| 1 GB | 5,000 | 40,000 | ~13,000 - 20,000 |
These are conservative estimates assuming typical request/response sizes of 1-5 KB. Large request or response bodies will consume more memory per entry.
Tips for reducing memory usage:
- Reduce maxLogEntries to limit the number of stored request/response log entries
- Reduce maxExpectations if expectations contain large response bodies
- Increase JVM heap size (-Xmx) to give the garbage collector more headroom
- Use outputMemoryUsageCsv to monitor actual heap usage and tune values accordingly
Docker example with memory-constrained settings:
docker run -d --rm -p 1080:1080 \
--env MOCKSERVER_MAX_EXPECTATIONS=1000 \
--env MOCKSERVER_MAX_LOG_ENTRIES=5000 \
mockserver/mockserver
Introduces a delay (in milliseconds) before protocol detection on new TCP connections. This can be used to simulate slow connection establishment, such as when testing client timeout handling or connection pooling behaviour under latency.
Type: long Default: 0
Java Code:
ConfigurationProperties.connectionDelayMillis(long millis)
Configuration.connectionDelay(Delay delay)
System Property:
-Dmockserver.connectionDelayMillis=...
Environment Variable:
MOCKSERVER_CONNECTION_DELAY_MILLIS=...
Property File:
mockserver.connectionDelayMillis=...
Example:
-Dmockserver.connectionDelayMillis="500"
Troubleshooting: MockServer Becomes Slow or Unresponsive
If MockServer appears to freeze, hang, or become progressively slower under sustained load, the most likely cause is memory pressure from log entry accumulation. This section explains why it happens and how to fix it.
Why Does MockServer Slow Down?
Every HTTP request that MockServer processes generates 2-3 log entries that are stored in memory regardless of the configured log level. These entries record the received request, expectation match result, and response — they are always stored to support request verification. Each log entry consumes approximately 4-10 KB of heap for small request/response bodies, scaling proportionally for larger bodies. Under sustained high-throughput load, log entry allocation drives significant GC pressure:
| Request Rate | Response Body Size | Log Data Generated Per Minute |
|---|---|---|
| 1 req/s | 1 KB | ~1 MB |
| 10 req/s | 1 KB | ~12 MB |
| 10 req/s | 100 KB | ~120 MB |
| 1 req/s | 1 MB+ | ~120 MB |
Log entries are stored in a bounded circular queue (maxLogEntries), so total memory usage does not grow indefinitely. However, the constant allocation and eviction of log entries creates GC pressure. When the JVM heap fills up, the garbage collector runs more frequently and for longer, causing pauses that make MockServer appear to freeze. In extreme cases, the JVM may spend almost all of its time in garbage collection, effectively halting request processing.
Large response bodies amplify this problem significantly. A single expectation returning a 10 MB response at 1 request per second generates over 600 MB of log data per minute — far more than a default heap can handle. Even with ring buffer eviction, the JVM must allocate and then garbage-collect these large objects continuously.
Note: Expectations with large response bodies also consume heap proportionally (e.g., a 50 KB response body results in ~55-75 KB per stored expectation). If you have many expectations with large bodies, reduce maxExpectations as well.
How To Fix It
Apply one or more of the following, depending on your use case:
| Fix | When To Use | Trade-off |
|---|---|---|
| Increase JVM heap size | Always recommended for large responses or high request rates | Uses more container/host memory |
| Reduce maxLogEntries | The single most effective fix — fewer entries means less memory and less GC pressure | Fewer requests available for verification |
| Reduce maxExpectations | When expectations contain large response bodies | Fewer expectations can be stored simultaneously |
Note on logLevel and disableLogging: Setting logLevel to WARN reduces diagnostic TRACE/DEBUG log entries but does not prevent request/response recording — the memory-intensive log entries (received requests, matched expectations, and responses) are always stored regardless of log level, as they are required for verification. Similarly, disableLogging only suppresses system-out output and does not reduce memory usage. To reduce memory, lower maxLogEntries or increase heap size.
Recommended Configurations
High-throughput with small responses (e.g., API mocking at >10 req/s with <10 KB bodies):
docker run -d --rm -p 1080:1080 \
-e MOCKSERVER_MAX_LOG_ENTRIES=5000 \
mockserver/mockserver
Large response bodies (e.g., responses >100 KB):
docker run -d --rm -p 1080:1080 \
-e JAVA_TOOL_OPTIONS="-Xmx1g" \
-e MOCKSERVER_MAX_LOG_ENTRIES=1000 \
-e MOCKSERVER_MAX_EXPECTATIONS=100 \
mockserver/mockserver
Maximum throughput, minimal memory (verification limited to most recent requests):
docker run -d --rm -p 1080:1080 \
-e JAVA_TOOL_OPTIONS="-Xmx512m" \
-e MOCKSERVER_MAX_LOG_ENTRIES=100 \
mockserver/mockserver
Configuring JVM Heap Size
The default JVM heap in the MockServer Docker image is determined by the JVM's container-aware defaults (typically 25% of the container's memory limit). To set an explicit heap size, use the JAVA_TOOL_OPTIONS environment variable:
docker run -d --rm -p 1080:1080 \
-e JAVA_TOOL_OPTIONS="-Xmx512m" \
mockserver/mockserver
When running with docker compose:
services:
mockServer:
image: mockserver/mockserver
ports:
- "1080:1080"
environment:
JAVA_TOOL_OPTIONS: "-Xmx512m"
MOCKSERVER_MAX_LOG_ENTRIES: "5000"
When running as a standalone JAR:
java -Xmx512m -jar mockserver-netty.jar -serverPort 1080
Monitoring Memory Usage
To diagnose memory issues, enable CSV memory tracking:
docker run -d --rm -p 1080:1080 \
-e MOCKSERVER_OUTPUT_MEMORY_USAGE_CSV=true \
-e MOCKSERVER_MEMORY_USAGE_CSV_DIRECTORY=/config \
-v $(pwd):/config \
mockserver/mockserver
This creates a memoryUsage_<date>.csv file that records heap usage, log entry count, and expectation count over time. If you see heap usage consistently near the maximum, increase -Xmx or reduce maxLogEntries.
Scalability Configuration:
Number of threads for main event loop
These threads are used for fast non-blocking activities such as:
- reading and de-serialise all requests
- serialising and writing control plane responses
- adding, updating or removing expectations
- verifying requests or request sequences
- retrieving logs
Expectation actions are handled in a separate thread pool to ensure slow object or class callbacks and response / forward delays do not impact the main event loop.
Type: int Default: 5
Java Code:
ConfigurationProperties.nioEventLoopThreadCount(int count)
System Property:
-Dmockserver.nioEventLoopThreadCount=...
Environment Variable:
MOCKSERVER_NIO_EVENT_LOOP_THREAD_COUNT=...
Property File:
mockserver.nioEventLoopThreadCount=...
Example:
-Dmockserver.nioEventLoopThreadCount="5"
Number of threads for the action handler thread pool
These threads are used for handling actions such as:
- serialising and writing expectation or proxied responses
- handling response delays in a non-blocking way (i.e. using a scheduler)
- executing class callbacks
- handling method / closure callbacks (using web sockets)
Type: int Default: maximum of 5 or available processors count
Java Code:
ConfigurationProperties.actionHandlerThreadCount(int count)
System Property:
-Dmockserver.actionHandlerThreadCount=...
Environment Variable:
MOCKSERVER_ACTION_HANDLER_THREAD_COUNT=...
Property File:
mockserver.actionHandlerThreadCount=...
Example:
-Dmockserver.actionHandlerThreadCount="5"
Number of threads for client event loop when calling downstream
These threads are used for fast non-blocking activities such as, reading and de-serialise all requests and responses
Type: int Default: 5
Java Code:
ConfigurationProperties.clientNioEventLoopThreadCount(int count)
System Property:
-Dmockserver.clientNioEventLoopThreadCount=...
Environment Variable:
MOCKSERVER_CLIENT_NIO_EVENT_LOOP_THREAD_COUNT=...
Property File:
mockserver.clientNioEventLoopThreadCount=...
Example:
-Dmockserver.clientNioEventLoopThreadCount="5"
Number of threads for each expectation with a method / closure callback (i.e. web socket client) in the org.mockserver.client.MockServerClient
This setting only effects the Java client and how requests each method / closure callbacks it can handle, the default is 5 which should be suitable except in extreme cases.
Type: int Default: 5
Java Code:
ConfigurationProperties.webSocketClientEventLoopThreadCount(int count)
System Property:
-Dmockserver.webSocketClientEventLoopThreadCount=...
Environment Variable:
MOCKSERVER_WEB_SOCKET_CLIENT_EVENT_LOOP_THREAD_COUNT=...
Property File:
mockserver.webSocketClientEventLoopThreadCount=...
Example:
-Dmockserver.webSocketClientEventLoopThreadCount="5"
Maximum time allowed in milliseconds for any future to wait, for example when waiting for a response over a web socket callback.
Type: long Default: 90000
Java Code:
ConfigurationProperties.maxFutureTimeout(long milliseconds)
System Property:
-Dmockserver.maxFutureTimeout=...
Environment Variable:
MOCKSERVER_MAX_FUTURE_TIMEOUT=...
Property File:
mockserver.maxFutureTimeout=...
Example:
-Dmockserver.maxFutureTimeout="90000"
If true (the default) request matchers will fail on the first non-matching field, if false request matchers will compare all fields.
Set to false when debugging matching issues to see all mismatching fields in a single log entry. See Troubleshooting Matching for a step-by-step guide.
Type: boolean Default: true
Java Code:
ConfigurationProperties.matchersFailFast(boolean enable)
System Property:
-Dmockserver.matchersFailFast=...
Environment Variable:
MOCKSERVER_MATCHERS_FAIL_FAST=...
Property File:
mockserver.matchersFailFast=...
Example:
-Dmockserver.matchersFailFast="false"
The the minimum level of logs to record in the event log and to output to system out (if system out log output is not disabled). The lower the log level the more log entries will be captured, particularly at TRACE level logging.
Type: string Default: INFO
Java Code:
ConfigurationProperties.logLevel(String level)
System Property:
-Dmockserver.logLevel=...
Environment Variable:
MOCKSERVER_LOG_LEVEL=...
Property File:
mockserver.logLevel=...
The log level, which can be TRACE, DEBUG, INFO, WARN, ERROR, OFF, FINEST, FINE, INFO, WARNING, SEVERE
Example:
-Dmockserver.logLevel="DEBUG"
Disable logging to the system output
Type: boolean Default: false
Java Code:
ConfigurationProperties.disableSystemOut(boolean disableSystemOut)
System Property:
-Dmockserver.disableSystemOut=...
Environment Variable:
MOCKSERVER_DISABLE_SYSTEM_OUT=...
Property File:
mockserver.disableSystemOut=...
Example:
-Dmockserver.disableSystemOut="true"
Disable logging output to system out. Request/response log entries are still recorded in memory for verification.
Type: boolean Default: false
Java Code:
ConfigurationProperties.disableLogging(boolean disableLogging)
System Property:
-Dmockserver.disableLogging=...
Environment Variable:
MOCKSERVER_DISABLE_LOGGING=...
Property File:
mockserver.disableLogging=...
Example:
-Dmockserver.disableLogging="true"
Maximum request body size in bytes that conversation-aware LLM matchers will parse. LLM conversation matchers (whenLatestMessageContains, whenContainsToolResultFor, etc.) parse the inbound request body as JSON to extract the message history. For deep conversation histories or large tool results, this parse step is proportional to body size.
Requests whose body exceeds this cap skip conversation-aware matching and are treated as a no-match for conversation predicates (the scenario state machine is unaffected). Increase this value only when your LLM conversations regularly include very large tool results or long message histories. Reduce it in memory-constrained environments to bound the maximum allocation per matching attempt.
Type: int Default: 1048576 (1 MiB) Range: 16384 (16 KiB) — 67108864 (64 MiB)
Java Code:
ConfigurationProperties.maxLlmConversationBodySize(int size)
Configuration.maxLlmConversationBodySize(Integer size)
System Property:
-Dmockserver.maxLlmConversationBodySize=...
Environment Variable:
MOCKSERVER_MAX_LLM_CONVERSATION_BODY_SIZE=...
Property File:
mockserver.maxLlmConversationBodySize=...
Example:
-Dmockserver.maxLlmConversationBodySize="4194304"