ogx-k8s-operator

This repo hosts a Kubernetes operator that creates and manages OGX (Open GenAI Stack) servers.

Features

Automated deployment of OGX servers
Support for multiple distributions (includes Ollama, vLLM, and others)
Customizable server configurations
Volume management for model storage
Kubernetes-native resource management

Quick Start

Installation

You can install the operator directly from a released version or the latest main branch using kubectl apply -f.

To install the latest version from the main branch:

kubectl apply -f https://raw.githubusercontent.com/ogx-ai/ogx-k8s-operator/main/release/operator.yaml

To install a specific released version (e.g., v1.0.0), replace main with the desired tag:

kubectl apply -f https://raw.githubusercontent.com/ogx-ai/ogx-k8s-operator/v1.0.0/release/operator.yaml

Deploying the OGX Server

Deploy the inference provider server (ollama, vllm)

Ollama Examples:

Deploy Ollama with default model llama3.2:1b

./hack/deploy-quickstart.sh

Deploy Ollama with other model:

./hack/deploy-quickstart.sh --provider ollama --model llama3.2:7b

vLLM Examples:

This would require a secret "hf-token-secret" in namespace "vllm-dist" for HuggingFace token (required for downloading models) to be created in advance.

Deploy vLLM with default model (meta-llama/Llama-3.2-1B):

./hack/deploy-quickstart.sh --provider vllm

Deploy vLLM with GPU support:

./hack/deploy-quickstart.sh --provider vllm --runtime-env "VLLM_TARGET_DEVICE=gpu,CUDA_VISIBLE_DEVICES=0"

Create an OGXServer CR to get the server running. Example:

apiVersion: ogx.io/v1beta1
kind: OGXServer
metadata:
  name: ogxserver-sample
spec:
  distribution:
    name: starter
  workload:
    replicas: 1
    storage:
      size: "20Gi"
      mountPath: "/.ogx"
    overrides:
      env:
      - name: OLLAMA_INFERENCE_MODEL
        value: "llama3.2:1b"
      - name: OLLAMA_URL
        value: "http://ollama-server-service.ollama-dist.svc.cluster.local:11434"

Verify the server pod is running in the user defined namespace.

Local Vector Storage (inline::milvus)

To enable the inline::milvus local vector storage provider, set ENABLE_INLINE_MILVUS in spec.workload.overrides.env. This is only supported in single-worker, single-replica deployments. Milvus-Lite uses SQLite internally and does not support concurrent access from multiple processes.

Using a ConfigMap for config.yaml configuration

A ConfigMap can be used to store config.yaml configuration for each OGXServer. Updates to the ConfigMap will restart the Pod to load the new data.

Example to create a config.yaml ConfigMap, and an OGXServer that references it:

kubectl apply -f config/samples/example-with-configmap.yaml

Enabling Network Policies

Network policies are enabled by default per-CR. Configure via spec.network.policy:

apiVersion: ogx.io/v1beta1
kind: OGXServer
metadata:
  name: my-ogxserver
spec:
  distribution:
    name: starter
  network:
    externalAccess:
      enabled: true
      hostname: my-ogx.example.com
    policy:
      enabled: true
      ingress:
        - from:
            - namespaceSelector:
                matchLabels:
                  kubernetes.io/metadata.name: my-app-namespace
          ports:
            - protocol: TCP
              port: 8321

Field	Description
`network.externalAccess.enabled`	When `true`, enables external access configuration for the server
`network.externalAccess.hostname`	Hostname used for external access (for example, Ingress host)
`network.policy.enabled`	When `true`, the operator creates a `NetworkPolicy` for the OGXServer workload
`network.policy.ingress`	Ingress rules for the policy (for example, allowed sources and ports)

Image Mapping Overrides

The operator supports ConfigMap-driven image updates for OGX distribution images. This allows independent patching for security fixes or bug fixes without requiring a new operator version.

Configuration

Create or update the operator ConfigMap with an image-overrides key:

image-overrides: |
  starter-gpu: quay.io/custom/ogx:starter-gpu
  starter: quay.io/custom/ogx:starter

Configuration Format

Use the distribution name directly as the key (e.g., starter-gpu, starter). The operator will apply these overrides automatically

Example Usage

To update the OGX distribution image for all starter distributions:

kubectl patch configmap ogx-operator-config -n ogx-k8s-operator-system --type merge -p '{"data":{"image-overrides":"starter: quay.io/ogx-ai/ogx-server:latest"}}'

This will cause all OGXServer resources using the starter distribution to restart with the new image.

Developer Guide

Prerequisites

Kubernetes cluster (v1.20 or later)
Go version go1.24
operator-sdk v1.39.2 (v4 layout) or newer
kubectl configured to access your cluster
A running inference server:
- For local development, you can use the provided script: /hack/deploy-quickstart.sh

Building the Operator

Prepare release files with specific versions
```
make release VERSION=0.2.1 LLAMASTACK_VERSION=0.2.12
```
This command updates distribution configurations and generates release manifests with the specified versions.
Custom operator image can be built using your local repository
```
make image IMG=quay.io/<username>/ogx-k8s-operator:<custom-tag>
```
The default image used is quay.io/ogx-ai/ogx-k8s-operator:latest when not supply argument for make image To create a local file local.mk with env variables can overwrite the default values set in the Makefile.
Building multi-architecture images (ARM64, AMD64, etc.)

The operator supports building for multiple architectures including ARM64. To build and push multi-arch images:
```
make image-buildx IMG=quay.io/<username>/ogx-k8s-operator:<custom-tag>
```
By default, this builds for linux/amd64,linux/arm64. You can customize the platforms by setting the PLATFORMS variable:
```
# Build for specific platforms
make image-buildx IMG=quay.io/<username>/ogx-k8s-operator:<custom-tag> PLATFORMS=linux/amd64,linux/arm64

# Add more architectures (e.g., for future support)
make image-buildx IMG=quay.io/<username>/ogx-k8s-operator:<custom-tag> PLATFORMS=linux/amd64,linux/arm64,linux/s390x,linux/ppc64le
```
Note:
- The image-buildx target works with both Docker and Podman. It will automatically detect which tool is being used.
- Native builds in CI: CI workflows use a matrix strategy with native runners for each architecture (AMD64 and ARM64). Each architecture is built on its own runner, avoiding QEMU emulation entirely. Per-architecture images are pushed separately, then combined into a single multi-arch manifest list. This ensures CGO_ENABLED=1 with full OpenSSL FIPS support for all architectures.
- Local cross-compilation: For local development, the Dockerfile uses --platform=$BUILDPLATFORM to run Go compilation natively on the build host. When cross-compiling (e.g., building ARM64 on an AMD64 host), CGO_ENABLED=0 is used with pure Go FIPS (via GOEXPERIMENT=strictfipsruntime). Native local builds use CGO_ENABLED=1 with full OpenSSL FIPS support.
- FIPS adherence: All CI-produced images use CGO_ENABLED=1 with full OpenSSL FIPS support via native builds on architecture-matched runners.
- For Docker: Multi-arch builds require Docker Buildx. Ensure Docker Buildx is set up:
```
docker buildx create --name x-builder --use
```
- For Podman: Podman 4.0+ supports podman buildx (experimental). If buildx is unavailable, the Makefile will automatically fall back to using podman's native manifest-based multi-arch build approach.
- The resulting images are multi-arch manifest lists, which means Kubernetes will automatically select the correct architecture when pulling the image.
CI Build Targets:

The CI workflows use the following Makefile targets for the matrix-based build strategy:
```
# Build and push a single-arch image (used by each matrix job on its native runner)
make image-build-push-single PLATFORM=linux/amd64 IMG=quay.io/<username>/ogx-k8s-operator:<tag>-amd64

# Create a multi-arch manifest from per-arch images (used by the final manifest job)
make image-create-manifest IMG=quay.io/<username>/ogx-k8s-operator:<tag> \
  ARCH_IMGS="quay.io/<username>/ogx-k8s-operator:<tag>-amd64 quay.io/<username>/ogx-k8s-operator:<tag>-arm64"
```

Building ARM64-only images

To build a single ARM64 image (useful for testing or ARM-native systems):

make image-build-arm IMG=quay.io/<username>/ogx-k8s-operator:<custom-tag>
make image-push IMG=quay.io/<username>/ogx-k8s-operator:<custom-tag>

This works with both Docker and Podman.

Once the image is created, the operator can be deployed directly. For each deployment method a kubeconfig should be exported
```
export KUBECONFIG=<path to kubeconfig>
```

Deployment

Deploying on vanilla Kubernetes (cert-manager)

Deploy the created image in your cluster using following command:

make deploy IMG=quay.io/<username>/ogx-k8s-operator:<custom-tag>

To remove resources created during installation use:
```
make undeploy
```

Deploying on OpenShift

OpenShift clusters use the built-in service-serving-cert-signer for webhook TLS (no cert-manager required):

make deploy-openshift IMG=quay.io/<username>/ogx-k8s-operator:<custom-tag>

To remove resources:
```
make undeploy-openshift
```

Running E2E Tests

The operator includes end-to-end (E2E) tests to verify the complete functionality of the operator. To run the E2E tests:

Ensure you have a running Kubernetes cluster
Run the E2E tests using one of the following commands:
- If you want to deploy the operator and run tests:
```
make deploy test-e2e
```
- If the operator is already deployed:
```
make test-e2e
```

The make target will handle prerequisites including deploying ollama server.

API Overview

Please refer to api documentation

Name		Name	Last commit message	Last commit date
Latest commit History 226 Commits
.github		.github
api		api
config		config
controllers		controllers
docs		docs
hack		hack
pkg		pkg
release		release
specs		specs
tests/e2e		tests/e2e
.dockerignore		.dockerignore
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.limgo.json		.limgo.json
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
PROJECT		PROJECT
README.md		README.md
crd-ref-docs.config.yaml		crd-ref-docs.config.yaml
distributions.json		distributions.json
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ogx-k8s-operator

Features

Table of Contents

Quick Start

Installation

Deploying the OGX Server

Local Vector Storage (inline::milvus)

Using a ConfigMap for config.yaml configuration

Enabling Network Policies

Image Mapping Overrides

Configuration

Configuration Format

Example Usage

Developer Guide

Prerequisites

Building the Operator

Deployment

Running E2E Tests

API Overview

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ogx-k8s-operator

Features

Table of Contents

Quick Start

Installation

Deploying the OGX Server

Local Vector Storage (inline::milvus)

Using a ConfigMap for config.yaml configuration

Enabling Network Policies

Image Mapping Overrides

Configuration

Configuration Format

Example Usage

Developer Guide

Prerequisites

Building the Operator

Deployment

Running E2E Tests

API Overview

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages