Skip to content

Commit f90c53b

Browse files
mfleaderclaude
andcommitted
feat(benchmarking): add API latency comparison experiment pipeline
Adds a data collection pipeline under benchmarking/api_latency_comparison/ for comparing per-request API latency between two OGX versions. The orchestrator sets up git worktrees for each version, generates a randomized complete block design experiment matrix, starts servers with CPU pinning via mirakuru, runs Locust against each version, and records per-request response times. A third "comparison control" group runs the same code as comparison to catch false positives from environmental noise. First of two commits. Follow-up adds model fitting and a CI workflow. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Matthew F Leader <mleader@redhat.com>
1 parent 16c0ad8 commit f90c53b

14 files changed

Lines changed: 1656 additions & 3 deletions

benchmarking/__init__.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Copyright (c) The OGX Contributors.
2+
# All rights reserved.
3+
#
4+
# This source code is licensed under the terms described in the LICENSE file in
5+
# the root directory of this source tree.
Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
# API Latency Comparison Benchmark
2+
3+
Measures per-request latency of two OGX versions under a controlled
4+
agentic workload. Compares an older release against a newer commit by
5+
running both through a mocked agentic workload and recording
6+
per-request response times.
7+
8+
Analysis and model fitting are added in a follow-up PR.
9+
10+
## Overview
11+
12+
The experiment is a randomized complete block design with three trials
13+
(treatment combinations). Each trial is replicated multiple times, and
14+
each replicate is a **run**: one row in the design matrix. The design
15+
matrix generator randomizes run order to guard against temporal
16+
confounding.
17+
18+
The three trials are:
19+
20+
- **Baseline**: the older version (e.g., latest release tag)
21+
- **Comparison**: the newer version under test
22+
- **Comparison control**: same commit as comparison, run independently as
23+
a negative control for false positive detection
24+
25+
Each run starts a fresh OGX server against a mock backend, sends
26+
agentic requests (with web_search tool calls) via Locust for a fixed
27+
duration, and records per-request latencies. The false positive
28+
detection runs the negative control (same code as comparison, run
29+
independently) to verify the experiment isn't producing spurious
30+
differences.
31+
32+
Components:
33+
34+
- **Mock server** (`experiment/mock_server.py`): canned OpenAI + Brave Search responses
35+
- **Locust** (`experiment/locustfile_responses.py`): load generator, 1 concurrent user
36+
- **Experiment orchestrator** (`experiment/benchmark.py`): run execution with CPU pinning
37+
- **Worktree setup** (`experiment/setup-worktree.sh`): isolated git worktrees per version
38+
- **Design matrix** (`experiment/generate_design_matrix.py`): randomized experiment design
39+
40+
## Prerequisites
41+
42+
```bash
43+
# Benchmark experiment dependencies (Locust, mirakuru)
44+
uv sync --group api-latency-comparison
45+
```
46+
47+
## Quick Start
48+
49+
The orchestrator handles worktree setup, matrix generation, and
50+
experiment execution in one command:
51+
52+
```bash
53+
uv run python -m benchmarking.api_latency_comparison.experiment.benchmark \
54+
--baseline-ref v1.1.0 --comparison-ref HEAD --replicates 5
55+
```
56+
57+
Output lands in an auto-timestamped directory under `results/`.
58+
59+
## Configuration
60+
61+
| Environment variable | Default | Description |
62+
|---|---|---|
63+
| `RESULTS_DIR` | auto-timestamped | Where to write results |
64+
| `MATRIX_CSV` | `$RESULTS_DIR/experiment-matrix.csv` | Experiment matrix |
65+
| `RUN_DURATION` | 10 | Seconds per run |
66+
| `MOCK_PORT` | 8080 | Mock server port |
67+
| `STACK_PORT` | 8321 | OGX server port |
68+
| `CPU_OGX` | 0 | Core for OGX server |
69+
| `CPU_LOCUST` | 1 | Core for Locust |
70+
| `CPU_MOCK` | 2 | Core for mock server |
71+
72+
## Implementation Notes
73+
74+
**CPU pinning**: Processes are pinned via `os.sched_setaffinity()` in
75+
`preexec_fn` callbacks, applied at fork before exec. Pinning is verified
76+
per run via `os.sched_getaffinity(pid)` after each server start.
77+
78+
**Brave Search patching**: Older OGX versions don't have the `base_url`
79+
field on `BraveSearchToolConfig`. The setup script patches it via `sed`
80+
so the mock server can serve search results locally.
Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
version: 2
2+
distro_name: vertical-scaling-responses-agentic
3+
apis:
4+
- inference
5+
- responses
6+
- vector_io
7+
- tool_runtime
8+
- files
9+
providers:
10+
inference:
11+
- provider_id: openai
12+
provider_type: remote::openai
13+
config:
14+
api_key: ${env.OPENAI_API_KEY:=fake-token}
15+
base_url: ${env.OPENAI_BASE_URL:=http://localhost:8080/v1}
16+
responses:
17+
- provider_id: builtin
18+
provider_type: inline::builtin
19+
config:
20+
persistence:
21+
agent_state:
22+
namespace: agents
23+
backend: kv_default
24+
responses:
25+
table_name: responses
26+
backend: sql_default
27+
vector_io:
28+
- provider_id: faiss
29+
provider_type: inline::faiss
30+
config:
31+
kvstore:
32+
namespace: vector_io::faiss
33+
backend: kv_default
34+
persistence:
35+
namespace: vector_io::faiss_persistence
36+
backend: kv_default
37+
tool_runtime:
38+
- provider_id: brave-search
39+
provider_type: remote::brave-search
40+
config:
41+
api_key: "fake-benchmark-key"
42+
max_results: 1
43+
base_url: ${env.MOCK_SEARCH_URL:=http://localhost:8080}
44+
files:
45+
- provider_id: builtin-files
46+
provider_type: inline::localfs
47+
config:
48+
storage_dir: /tmp/ogx-benchmark/files
49+
metadata_store:
50+
table_name: files_metadata
51+
backend: sql_default
52+
storage:
53+
backends:
54+
kv_default:
55+
type: kv_sqlite
56+
db_path: /tmp/ogx-benchmark/kvstore.db
57+
sql_default:
58+
type: sql_sqlite
59+
db_path: /tmp/ogx-benchmark/sql_store.db
60+
stores:
61+
metadata:
62+
namespace: registry
63+
backend: kv_default
64+
registered_resources:
65+
models:
66+
- model_id: mock-model
67+
provider_id: openai
68+
model_type: llm
69+
server:
70+
port: 8321
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Copyright (c) The OGX Contributors.
2+
# All rights reserved.
3+
#
4+
# This source code is licensed under the terms described in the LICENSE file in
5+
# the root directory of this source tree.

0 commit comments

Comments
 (0)