Problem
The proxy exposes a GET /health endpoint that returns basic status information:
{
"status": "ok",
"backends_configured": 3,
"backends_connected": 2,
"active_clients": 1,
"tools": 42,
"version": "0.4.3"
}
This is sufficient for liveness checks, but provides no observability into the proxy's runtime behavior over time. In a production Kubernetes deployment, operators need to answer questions like:
Questions that cannot be answered today
- Throughput — How many tool calls per second is the proxy handling? What's the breakdown by backend? By tool? By identity?
- Latency — What's the p50/p95/p99 latency for tool calls? Which backends are slow? Are latencies increasing over time?
- Error rates — What percentage of tool calls are failing? Is the error rate spiking? Which backends have the highest error rates?
- Backend pool health — How often are backends being connected/disconnected by the idle reaper? How long do backend connections live? How often does lazy reconnection happen?
- Connection management — How many SSE/Streamable HTTP sessions are active? What's the connection churn rate? How often do TCP keepalive timeouts fire?
- Resource usage — How large is the in-memory tool cache? How many child processes are running (for stdio backends)?
- Auth/ACL — How many requests are rejected by ACL? What's the breakdown by identity and denied tool?
Why this matters
The proxy is designed to be shared infrastructure — a single proxy serving multiple AI clients. The /health endpoint is a point-in-time snapshot with no history, no aggregation, and no percentiles. Operators relying on it can only tell "is it up right now?" but not "is it degrading?" or "should I scale?"
The standard solution in the Kubernetes ecosystem is a Prometheus-compatible /metrics endpoint exposing counters, gauges, and histograms that are scraped by Prometheus/Grafana, Datadog, or similar monitoring stacks.
Data sources already in the code
The proxy already tracks much of this data internally but doesn't expose it:
- Audit log — records every tool call with duration, success, identity, server, tool name (
src/audit.rs)
- Backend pool — tracks connected/configured/idle backends with timestamps (
src/serve.rs)
- Active clients — maintained as a counter in
AppState (src/serve.rs)
- Tool cache — holds all discovered tools with their server associations (
src/serve.rs)
- ACL decisions — grant/deny/classify happen on every request (
src/server_auth/)
- Idle reaper — runs every 30s and logs disconnections (
src/serve.rs)
The data exists — it's just not exposed in a scrapeable format.
Related issues
Expected behavior
The proxy should expose runtime metrics in a format consumable by standard monitoring infrastructure (Prometheus, OpenTelemetry, or similar), covering at minimum: request throughput, latency distribution, error rates, and backend pool status.
Problem
The proxy exposes a
GET /healthendpoint that returns basic status information:{ "status": "ok", "backends_configured": 3, "backends_connected": 2, "active_clients": 1, "tools": 42, "version": "0.4.3" }This is sufficient for liveness checks, but provides no observability into the proxy's runtime behavior over time. In a production Kubernetes deployment, operators need to answer questions like:
Questions that cannot be answered today
Why this matters
The proxy is designed to be shared infrastructure — a single proxy serving multiple AI clients. The
/healthendpoint is a point-in-time snapshot with no history, no aggregation, and no percentiles. Operators relying on it can only tell "is it up right now?" but not "is it degrading?" or "should I scale?"The standard solution in the Kubernetes ecosystem is a Prometheus-compatible
/metricsendpoint exposing counters, gauges, and histograms that are scraped by Prometheus/Grafana, Datadog, or similar monitoring stacks.Data sources already in the code
The proxy already tracks much of this data internally but doesn't expose it:
src/audit.rs)src/serve.rs)AppState(src/serve.rs)src/serve.rs)src/server_auth/)src/serve.rs)The data exists — it's just not exposed in a scrapeable format.
Related issues
Expected behavior
The proxy should expose runtime metrics in a format consumable by standard monitoring infrastructure (Prometheus, OpenTelemetry, or similar), covering at minimum: request throughput, latency distribution, error rates, and backend pool status.