First Things First: Coordinated Omission

p99 = 1 ms — flip one switch — p99 = 195 ms

Same service. Same pause pattern. Same nominal target rate. One change in the client model — p99 jumps 182×. Not a system failure. A measurement failure.

Design can lie. The environment can lie. Fix both — the benchmark looks solid, the percentiles look clean. Too clean. The measurement method itself can lie — a systematic omission baked into how the test collects data.

All code in this post: clone, build, run. Numbers below were measured on dual Xeon E5-2697 v2 — run the companion code on your hardware for your own results. Different hardware, different numbers — that’s half the lesson.

Convention: charts use milliseconds; tables reproduce raw simulation output. Histograms are approximate visualizations of the recorded latency distribution — the percentile tables are the authoritative data.

Send, wait, measure, repeat

public static LatencyReport Run(SimulatedService service, int ratePerSec, int durationSec)
{
    int totalRequests = ratePerSec * durationSec;
    var recorder = new LatencyRecorder();

    for (int i = 0; i < totalRequests; i++)
    {
        long start = Stopwatch.GetTimestamp();
        service.Process();
        long elapsed = Stopwatch.GetTimestamp() - start;
        recorder.Record(elapsed);
    }

    return recorder.GetReport();
}

Closed-loop client — full source in companion code.

Send a request. Wait for the response. Measure the elapsed time. Send the next one. The client and the service take turns — a lockstep conversation where neither moves without the other. This pattern has a name: closed-loop.¹ Most load test frameworks default to it. Most dashboards assume it.

What does your test do when the system slows down?

The comfortable picture

The system under test: a simulated service with ~1 ms baseline latency (calibrated SpinWait) and a 200 ms pause every 500th request — modeling GC, compaction, or any periodic maintenance event. Target rate: 450 req/sec over 30 seconds (13,500 total). Average service time: (499 × 1 ms + 1 × 200 ms) / 500 = 1.4 ms. At 450 req/sec the service needs 630 ms of work per second — ~63% utilization, with headroom to spare. The pauses are the problem, not the capacity.

The closed-loop client has no rate limiter, no inter-request delay — totalRequests is just a count (rate × duration) to match the open-loop’s output volume. The effective rate is whatever the service delivers. During normal processing (~1 ms per request), well above 450 req/sec. During a 200 ms pause: zero. The arrival rate follows the system. When the system slows, the test slows with it.

| Metric | Closed-loop  |
|--------|-------------:|
| Count  |       13,500 |
| p50    |      1.00 ms |
| p90    |      1.00 ms |
| p99    |      1.07 ms |
| p99.9  |    200.15 ms |
| max    |    200.28 ms |

The dashboard looks clean. 99th percentile: 1 ms. Only p99.9 shows any trouble — and that’s 27 requests out of 13,500, the ones that directly hit a pause. Every other request: ~1 ms, tight distribution, no tail. You read the numbers and move on.

The dashboard maps what the test recorded — not what users experienced.

Hume (1739): no finite set of observations guarantees the next. A thousand closed-loop measurements say p99 = 1 ms. The thousand-and-first doesn’t have to agree. Induction from data that systematically omits the worst moments is induction from a sample that excludes its own counterexamples.

Flip one switch

Same service. Same pause injector. Same nominal target rate. One change: the client sends on a fixed schedule, regardless of whether the previous request came back.

public static LatencyReport Run(SimulatedService service, int ratePerSec, int durationSec)
{
    var recorder = new LatencyRecorder();
    long intervalTicks = Stopwatch.Frequency / ratePerSec;
    long deadline = Stopwatch.GetTimestamp() + (long)durationSec * Stopwatch.Frequency;
    long nextSend = Stopwatch.GetTimestamp();

    while (Stopwatch.GetTimestamp() < deadline)
    {
        long intendedStart = nextSend;
        nextSend += intervalTicks;

        service.Process();

        long now = Stopwatch.GetTimestamp();
        long latency = now - intendedStart;  // ← intended, not actual
        recorder.Record(latency);

        while (Stopwatch.GetTimestamp() < nextSend)
            Thread.SpinWait(10);
    }

    return recorder.GetReport();
}

Open-loop client — full source in companion code. Note: intervalTicks uses integer division, introducing sub-microsecond step quantization at 450 req/sec — negligible for this demonstration.

One line changed: now - intendedStart instead of now - actualStart. The user’s clock starts when they click, not when the server gets around to processing their request. When the service pauses, requests that should have been sent during the pause pile up — each measured from when it was supposed to start, because that’s when the user started waiting.

Bimodal. A peak at ~1 ms and a wide spread from 50–200 ms. Two different experiences on the same chart.

| Metric | Closed-loop  |    Open-loop |     Ratio |
|--------|-------------:|-------------:|----------:|
| Count  |       13,500 |       13,500 |           |
| p50    |      1.00 ms |      1.00 ms |      1.0x |
| p90    |      1.00 ms |    137.89 ms |    137.9x |
| p99    |      1.07 ms |    194.64 ms |    182.4x |
| p99.9  |    200.15 ms |    200.15 ms |      1.0x |
| max    |    200.28 ms |    200.41 ms |      1.0x |

Ratios computed from raw data before rounding to displayed precision.

Same system. Same load. Same pause. One variable: whether the test waits for a response before sending the next request.

Closed-loop p99 = 1 ms. Open-loop p99 = 195 ms. 182× on this workload.

The mechanism — coordinated omission

During a 200 ms pause, the closed-loop client waits. While waiting, it sends no new requests — it goes with the system, slowing down exactly when the system slows down. 200 ms × 450 req/sec = 90 requests that should have been sent but weren’t. They don’t appear in the histogram. They don’t exist in the data. The dashboard stays clean.

The open-loop client doesn’t coordinate. It tracks what the schedule should have been. After the pause resolves:

Request N+1: intended at T+2 ms, completed at T+201 ms → latency = 199 ms
Request N+2: intended at T+4 ms, completed at T+202 ms → latency = 198 ms
Request N+3: intended at T+7 ms, completed at T+203 ms → latency = 196 ms
…catch-up continues for ~160 requests until the schedule recovers

Each pause contaminates ~160 subsequent requests with elevated latency. 27 pauses × ~160 requests = ~4,300 requests — roughly a third of all traffic — experiencing latency between 2 ms and 200 ms. That’s why the open-loop p90 is 138 ms: the top 10% of requests (1,350 out of 13,500) fall squarely in that contaminated range.

The closed-loop client sees 27 bad requests. The open-loop client sees 4,300. Same service. Same pauses.

The worse the failure, the more requests the closed-loop client skips, the cleaner the dashboard. The mechanism is inversely proportional to the problem. A 200 ms pause omits 90 measurements. A 2-second pause omits 900. A 10-second GC stop-the-world omits 4,500. The worst event your system can produce is the one your test is least likely to record.

Gil Tene named this Coordinated Omission — the test coordinates with the system’s failures, omitting measurements precisely when they would be most damning.²

Baudrillard (1981): the third phase of the simulacrum — the image masks the absence of reality. The closed-loop benchmark doesn’t distort measurements. It masks their nonexistence. Those 90 requests during the pause aren’t poorly measured. They don’t exist. The dashboard is a simulacrum — it doesn’t lie about the system. It replaces it.

How to stop coordinating

Property	Closed-loop	Open-loop
Request timing	After previous response	Fixed schedule, independent of response
What it measures	Response time of sent requests (omits unsent)	Response time from intended start (incl. queuing)
During a pause	Stops sending → omits measurements	Tracks intended schedule → captures queuing
p99 under pauses	Looks clean (only direct hits visible)	Shows full impact (queued requests visible)
Best for	Throughput measurement, saturation testing	Latency measurement, SLA validation

Four rules for latency measurement:

Open-loop by default for latency load tests. Closed-loop is still useful for throughput and saturation testing — finding the breaking point. But if your SLAs are latency percentiles, you need open-loop. Closed-loop tells you the system can handle the load; open-loop tells you what users experience while it does.¹
Measure from intended time, not actual time. latency = now - intendedStart, not now - actualStart. The user’s clock starts when they click, not when the server gets around to reading their request.
Record the full tail. p50 and p99 are not enough. Report p99.9 and max. Coordinated omission hides in the gap between p99 and p99.9 — the range where closed-loop sees nothing and open-loop sees the damage.
Use histograms that can handle it. HdrHistogram³ records values across a wide dynamic range with configurable precision — from sub-millisecond to multi-second latencies in the same histogram. Fixed-bucket histograms clip the tail.

Tools that get it right

Tool	Open-loop	CO correction	Notes
wrk2⁴	Yes	Built-in	Constant-rate HTTP benchmark, HdrHistogram output
Gatling	Yes	Configurable	Open-loop mode available, reports percentiles
k6	Partial	Manual	Constant-rate via scenarios, no auto-correction
Custom (this post)	Yes	By design	`intendedStart` tracking, HdrHistogram.NET

Capabilities and defaults vary by tool version and configuration; verify settings in your release.

Run it yourself

git clone https://github.com/0x3f-blog/companion-code.git
cd companion-code/first-things-first/coordinated-omission
dotnet run -c Release

Benchmark environment

Component	Value
CPU	2× Intel Xeon E5-2697 v2 @ 2.70 GHz (24 cores / 48 threads)
RAM	~115 GB DDR3-1866 (quad-channel per socket)
OS	Fedora Linux 42 (kernel 6.17)
Runtime	.NET 9.0.11 (RyuJIT AVX)
SDK	.NET SDK 10.0.102
HdrHistogram	HdrHistogram.NET 2.5.0
Simulation	450 req/sec, 30 sec, 200 ms pause every 500 requests

Not BenchmarkDotNet — this is a custom in-process simulation. SpinWait calibrated at startup for ~1 ms baseline on current hardware (binary search, 50 samples, median). Fresh SimulatedService instance per client — no counter contamination.

Limitations: In-process simulation — no HTTP, no network stack, no kernel-level queuing. The open-loop client is single-threaded and blocks on Process(), so it tracks the intended schedule rather than dispatching concurrently (a real open-loop system like wrk2 or Gatling sends requests asynchronously). These simplifications isolate the coordinated omission mechanism from transport noise — the measurement effect is the same, but absolute numbers would differ in a networked setup.

Popper (1934): a meaningful test must be capable of producing a negative result. The closed-loop client cannot falsify the hypothesis “the system is healthy” — it hides the counterexamples. Measurements that would disprove it don’t exist. Open-loop is the falsification instrument: it doesn’t ask the system whether it’s ready. It measures regardless.

Each layer of deception sits closer to you. Design — visible in the code. Environment — visible in the configuration. The method of collection — buried in an assumption you never questioned. Data collected correctly. But what do the data mean?

A metric that looks better the worse the system performs isn’t a metric. It’s anesthesia.

p99 = 1 ms — flip one switch — p99 = 195 ms#

Send, wait, measure, repeat#

The comfortable picture#

Flip one switch#

The mechanism — coordinated omission#

How to stop coordinating#

Tools that get it right#

Run it yourself#

Benchmark environment#

Further reading#