Model Nexus

A machine learning inference API built with Go for dynamically serving ONNX models with Prometheus metrics and structured logging.

Features

Core Functionality

Dynamic ONNX Model Upload — POST /models/upload accepts arbitrary ONNX files at runtime with automatic metadata extraction
Pure-Go Protobuf Parser — Extracts model metadata (inputs, outputs, dtypes, shapes) without Python dependencies using raw protowire decoding
ONNX Runtime Inference — Real-time predictions with full dtype support (float32, float64, int32, int64)
Thread-Safe Model Registry — Concurrent-safe model storage with sync.RWMutex

Observability

Prometheus Metrics — HTTP request counts/latency, prediction counts, error rates, models loaded
Structured Logging — JSON logs with request ID tracing via slog
Request IDs — Auto-generated UUID per request, propagated to all logs and response headers

Production Ready

Graceful Shutdown — Drains in-flight requests with 15s timeout on SIGTERM
Docker Support — Multi-stage builds with CGO for ONNX Runtime
Railway Deployment — Live production deployment here
Health Checks — /health endpoint for monitoring
CORS Enabled — Ready for web frontends

Architecture

Built using Clean Architecture principles with clear separation of concerns:

┌─────────────────────────────────────────────────────────┐
│                    HTTP Request                          │
├─────────────────────────────────────────────────────────┤
│  Middleware Stack (execution order):                     │
│    1. Metrics (outermost) → 2. RequestID → 3. CORS       │
├─────────────────────────────────────────────────────────┤
│  Handler Layer     → Request validation & response       │
│  Service Layer     → Business logic (PredictionService)  │
│  Repository Layer  → Model registry (in-memory map)      │
│  ONNX Predictor    → Inference via ONNX Runtime          │
├─────────────────────────────────────────────────────────┤
│                    JSON Response                         │
└─────────────────────────────────────────────────────────┘

Key Patterns:

Repository Pattern for model management
Dependency Injection throughout
Interface-based design for testability
Sidecar .model_info.json files for metadata persistence

Tech Stack

Language: Go 1.24
ML Runtime: ONNX Runtime (CGO)
Protobuf Parsing: google.golang.org/protobuf/encoding/protowire (pure-Go, no Python)
Observability: Prometheus, structured logging (slog)
Deployment: Docker, Railway

Getting Started

Prerequisites

Go 1.24 or higher
ONNX Runtime library (see installation below)

Installing ONNX Runtime

Linux:

wget https://github.com/microsoft/onnxruntime/releases/download/v1.24.1/onnxruntime-linux-x64-1.24.1.tgz
tar -xzf onnxruntime-linux-x64-1.24.1.tgz
sudo cp onnxruntime-linux-x64-1.24.1/lib/libonnxruntime.so.1.24.1 /usr/lib/libonnxruntime.so

macOS:

brew install onnxruntime

Windows:

Download onnxruntime-win-x64-1.24.1.zip from releases
Extract to C:\onnxruntime\
Set ONNX_LIBRARY_PATH=C:\onnxruntime\lib\onnxruntime.dll

Installation

git clone https://github.com/kevo-1/model-nexus.git
cd model-nexus
go mod download

Configuration

Create a .env file (optional):

ONNX_LIBRARY_PATH=/usr/lib/libonnxruntime.so
PORT=8080

Running

# Build
go build -o server ./cmd/server

# Run
./server

Server starts on http://localhost:8080 with dynamic model upload enabled.

API Endpoints

Upload Model

POST /models/upload

Upload an ONNX model file for dynamic serving.

Request: multipart/form-data

file — .onnx model file (required)
id — Unique model identifier (required)
name — Human-readable model name (required)
version — Model version string (required)

Response (201 Created):

{
  "model": {
    "id": "my_classifier",
    "name": "My Custom Classifier",
    "version": "v1.0.0",
    "path": "models/my_classifier.onnx"
  },
  "info": {
    "inputs": [
      {
        "name": "input",
        "dtype": 1,
        "shape": [1, 4]
      }
    ],
    "outputs": [
      {
        "name": "output",
        "dtype": 7,
        "shape": [1, 3]
      }
    ]
  }
}

Error Responses:

400 Bad Request — Missing fields or invalid file
409 Conflict — Model ID already registered
500 Internal Server Error — Failed to parse or load model

Example:

curl -X POST http://localhost:8080/models/upload \
  -F "file=@my_model.onnx" \
  -F "id=my_classifier" \
  -F "name=My Classifier" \
  -F "version=v1.0.0"

Prediction

POST /predict

Make a prediction using a registered model.

Request:

{
  "model_id": "my_classifier",
  "features": [5.1, 3.5, 1.4, 0.2]
}

Response (200 OK):

{
  "model_id": "my_classifier",
  "request_id": "550e8400-e29b-41d4-a716-446655440000",
  "prediction": [0.0],
  "latency_ms": 12.5,
  "timestamp": "2026-04-05T10:30:00Z"
}

Error Responses:

400 Bad Request — Invalid input (wrong feature count, invalid JSON)
404 Not Found — Model not found
500 Internal Server Error — Prediction failed

Example:

curl -X POST http://localhost:8080/predict \
  -H "Content-Type: application/json" \
  -d '{"model_id": "my_classifier", "features": [5.1, 3.5, 1.4, 0.2]}'

Health Check

GET /health

Response (200 OK):

{
  "status": "ok",
  "timestamp": "2026-04-05T10:30:00Z"
}

List Models

GET /models

List all registered models.

Response (200 OK):

{
  "models": [
    {
      "id": "my_classifier",
      "name": "My Classifier",
      "version": "v1.0.0",
      "path": "models/my_classifier.onnx"
    }
  ]
}

Model Info

GET /models/info?id=<model_id>

Get detailed metadata for a specific model.

Response (200 OK):

{
  "id": "my_classifier",
  "name": "My Classifier",
  "version": "v1.0.0",
  "inputs": [
    {"name": "input", "dtype": 1, "shape": [1, 4]}
  ],
  "outputs": [
    {"name": "output", "dtype": 7, "shape": [1, 3]}
  ]
}

Metrics

GET /metrics

Prometheus metrics endpoint.

Key Metrics:

http_requests_total — Total HTTP requests by endpoint and status
http_request_duration_seconds — Request latency histogram
model_predictions_total — Total predictions by model and status
model_inference_duration_seconds — Model inference latency histogram
models_loaded — Number of models currently loaded

Project Structure

├── cmd/server/main.go          # Application entry point
├── internal/
│   ├── domain/                  # Core types, interfaces, errors
│   ├── handler/http/            # HTTP handlers, middleware, routes
│   ├── logger/                  # Structured logging (slog)
│   ├── metrics/                 # Prometheus metrics
│   ├── repository/              # Model registry (in-memory)
│   └── service/                 # Business logic (prediction, model upload)
├── pkg/onnx/
│   ├── onnx_parser.go           # Pure-Go protobuf metadata extractor
│   ├── onnx_predictor.go        # ONNX Runtime integration
│   ├── predictor.go             # ModelPredictor interface
│   └── model_metadata.go        # ModelInfo, TensorInfo types
├── models/                      # Uploaded .onnx files stored here
├── Dockerfile
├── index.html                   # Web frontend
└── scripts/                     # Model training utilities

Monitoring & Observability

Structured Logging

All logs are JSON with request tracing:

{
  "time": "2026-04-05T10:30:00Z",
  "level": "INFO",
  "msg": "prediction completed",
  "request_id": "550e8400-e29b-41d4-a716-446655440000",
  "model_id": "my_classifier",
  "latency_ms": 12.5,
  "status": "success"
}

Request Tracing

Every request gets a unique request_id in:

Response header: X-Request-ID
All log entries
Enables end-to-end request tracing across the stack

Design Decisions

Pure-Go ONNX Metadata Extraction

The pkg/onnx/onnx_parser.go file implements a raw protobuf parser using protowire to extract model metadata (inputs, outputs, dtypes, shapes) without Python or generated proto structs.

Why? Avoiding a Python runtime in the Docker image keeps it minimal and Go-only. The trade-off is parsing fragility — if the ONNX spec changes field numbers, this parser could break silently.

Limitations:

Only extracts metadata needed for inference, not full model validation
Assumes ONNX v1.0+ spec field numbers from onnx.proto
If parsing fails, use the official ONNX Python package to inspect model metadata

Future Enhancements

Model drift detection integration
Accuracy and confidence measuring
Model versioning and A/B testing
Batch prediction support
gRPC support for high-performance scenarios
E2E test suite with Playwright

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
cmd/server		cmd/server
internal		internal
models		models
pkg/onnx		pkg/onnx
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Model Nexus

Features

Core Functionality

Observability

Production Ready

Architecture

Tech Stack

Getting Started

Prerequisites

Installing ONNX Runtime

Installation

Configuration

Running

API Endpoints

Upload Model

Prediction

Health Check

List Models

Model Info

Metrics

Project Structure

Monitoring & Observability

Structured Logging

Request Tracing

Design Decisions

Pure-Go ONNX Metadata Extraction

Future Enhancements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Model Nexus

Features

Core Functionality

Observability

Production Ready

Architecture

Tech Stack

Getting Started

Prerequisites

Installing ONNX Runtime

Installation

Configuration

Running

API Endpoints

Upload Model

Prediction

Health Check

List Models

Model Info

Metrics

Project Structure

Monitoring & Observability

Structured Logging

Request Tracing

Design Decisions

Pure-Go ONNX Metadata Extraction

Future Enhancements

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages