A machine learning inference API built with Go for dynamically serving ONNX models with Prometheus metrics and structured logging.
- Dynamic ONNX Model Upload —
POST /models/uploadaccepts arbitrary ONNX files at runtime with automatic metadata extraction - Pure-Go Protobuf Parser — Extracts model metadata (inputs, outputs, dtypes, shapes) without Python dependencies using raw
protowiredecoding - ONNX Runtime Inference — Real-time predictions with full dtype support (float32, float64, int32, int64)
- Thread-Safe Model Registry — Concurrent-safe model storage with
sync.RWMutex
- Prometheus Metrics — HTTP request counts/latency, prediction counts, error rates, models loaded
- Structured Logging — JSON logs with request ID tracing via
slog - Request IDs — Auto-generated UUID per request, propagated to all logs and response headers
- Graceful Shutdown — Drains in-flight requests with 15s timeout on SIGTERM
- Docker Support — Multi-stage builds with CGO for ONNX Runtime
- Railway Deployment — Live production deployment here
- Health Checks —
/healthendpoint for monitoring - CORS Enabled — Ready for web frontends
Built using Clean Architecture principles with clear separation of concerns:
┌─────────────────────────────────────────────────────────┐
│ HTTP Request │
├─────────────────────────────────────────────────────────┤
│ Middleware Stack (execution order): │
│ 1. Metrics (outermost) → 2. RequestID → 3. CORS │
├─────────────────────────────────────────────────────────┤
│ Handler Layer → Request validation & response │
│ Service Layer → Business logic (PredictionService) │
│ Repository Layer → Model registry (in-memory map) │
│ ONNX Predictor → Inference via ONNX Runtime │
├─────────────────────────────────────────────────────────┤
│ JSON Response │
└─────────────────────────────────────────────────────────┘
Key Patterns:
- Repository Pattern for model management
- Dependency Injection throughout
- Interface-based design for testability
- Sidecar
.model_info.jsonfiles for metadata persistence
- Language: Go 1.24
- ML Runtime: ONNX Runtime (CGO)
- Protobuf Parsing:
google.golang.org/protobuf/encoding/protowire(pure-Go, no Python) - Observability: Prometheus, structured logging (slog)
- Deployment: Docker, Railway
- Go 1.24 or higher
- ONNX Runtime library (see installation below)
Linux:
wget https://github.com/microsoft/onnxruntime/releases/download/v1.24.1/onnxruntime-linux-x64-1.24.1.tgz
tar -xzf onnxruntime-linux-x64-1.24.1.tgz
sudo cp onnxruntime-linux-x64-1.24.1/lib/libonnxruntime.so.1.24.1 /usr/lib/libonnxruntime.somacOS:
brew install onnxruntimeWindows:
- Download
onnxruntime-win-x64-1.24.1.zipfrom releases - Extract to
C:\onnxruntime\ - Set
ONNX_LIBRARY_PATH=C:\onnxruntime\lib\onnxruntime.dll
git clone https://github.com/kevo-1/model-nexus.git
cd model-nexus
go mod downloadCreate a .env file (optional):
ONNX_LIBRARY_PATH=/usr/lib/libonnxruntime.so
PORT=8080# Build
go build -o server ./cmd/server
# Run
./serverServer starts on http://localhost:8080 with dynamic model upload enabled.
POST /models/upload
Upload an ONNX model file for dynamic serving.
Request: multipart/form-data
file—.onnxmodel file (required)id— Unique model identifier (required)name— Human-readable model name (required)version— Model version string (required)
Response (201 Created):
{
"model": {
"id": "my_classifier",
"name": "My Custom Classifier",
"version": "v1.0.0",
"path": "models/my_classifier.onnx"
},
"info": {
"inputs": [
{
"name": "input",
"dtype": 1,
"shape": [1, 4]
}
],
"outputs": [
{
"name": "output",
"dtype": 7,
"shape": [1, 3]
}
]
}
}Error Responses:
400 Bad Request— Missing fields or invalid file409 Conflict— Model ID already registered500 Internal Server Error— Failed to parse or load model
Example:
curl -X POST http://localhost:8080/models/upload \
-F "file=@my_model.onnx" \
-F "id=my_classifier" \
-F "name=My Classifier" \
-F "version=v1.0.0"POST /predict
Make a prediction using a registered model.
Request:
{
"model_id": "my_classifier",
"features": [5.1, 3.5, 1.4, 0.2]
}Response (200 OK):
{
"model_id": "my_classifier",
"request_id": "550e8400-e29b-41d4-a716-446655440000",
"prediction": [0.0],
"latency_ms": 12.5,
"timestamp": "2026-04-05T10:30:00Z"
}Error Responses:
400 Bad Request— Invalid input (wrong feature count, invalid JSON)404 Not Found— Model not found500 Internal Server Error— Prediction failed
Example:
curl -X POST http://localhost:8080/predict \
-H "Content-Type: application/json" \
-d '{"model_id": "my_classifier", "features": [5.1, 3.5, 1.4, 0.2]}'GET /health
Response (200 OK):
{
"status": "ok",
"timestamp": "2026-04-05T10:30:00Z"
}GET /models
List all registered models.
Response (200 OK):
{
"models": [
{
"id": "my_classifier",
"name": "My Classifier",
"version": "v1.0.0",
"path": "models/my_classifier.onnx"
}
]
}GET /models/info?id=<model_id>
Get detailed metadata for a specific model.
Response (200 OK):
{
"id": "my_classifier",
"name": "My Classifier",
"version": "v1.0.0",
"inputs": [
{"name": "input", "dtype": 1, "shape": [1, 4]}
],
"outputs": [
{"name": "output", "dtype": 7, "shape": [1, 3]}
]
}GET /metrics
Prometheus metrics endpoint.
Key Metrics:
http_requests_total— Total HTTP requests by endpoint and statushttp_request_duration_seconds— Request latency histogrammodel_predictions_total— Total predictions by model and statusmodel_inference_duration_seconds— Model inference latency histogrammodels_loaded— Number of models currently loaded
├── cmd/server/main.go # Application entry point
├── internal/
│ ├── domain/ # Core types, interfaces, errors
│ ├── handler/http/ # HTTP handlers, middleware, routes
│ ├── logger/ # Structured logging (slog)
│ ├── metrics/ # Prometheus metrics
│ ├── repository/ # Model registry (in-memory)
│ └── service/ # Business logic (prediction, model upload)
├── pkg/onnx/
│ ├── onnx_parser.go # Pure-Go protobuf metadata extractor
│ ├── onnx_predictor.go # ONNX Runtime integration
│ ├── predictor.go # ModelPredictor interface
│ └── model_metadata.go # ModelInfo, TensorInfo types
├── models/ # Uploaded .onnx files stored here
├── Dockerfile
├── index.html # Web frontend
└── scripts/ # Model training utilities
All logs are JSON with request tracing:
{
"time": "2026-04-05T10:30:00Z",
"level": "INFO",
"msg": "prediction completed",
"request_id": "550e8400-e29b-41d4-a716-446655440000",
"model_id": "my_classifier",
"latency_ms": 12.5,
"status": "success"
}Every request gets a unique request_id in:
- Response header:
X-Request-ID - All log entries
- Enables end-to-end request tracing across the stack
The pkg/onnx/onnx_parser.go file implements a raw protobuf parser using protowire to extract model metadata (inputs, outputs, dtypes, shapes) without Python or generated proto structs.
Why? Avoiding a Python runtime in the Docker image keeps it minimal and Go-only. The trade-off is parsing fragility — if the ONNX spec changes field numbers, this parser could break silently.
Limitations:
- Only extracts metadata needed for inference, not full model validation
- Assumes ONNX v1.0+ spec field numbers from onnx.proto
- If parsing fails, use the official ONNX Python package to inspect model metadata
- Model drift detection integration
- Accuracy and confidence measuring
- Model versioning and A/B testing
- Batch prediction support
- gRPC support for high-performance scenarios
- E2E test suite with Playwright
MIT