This repo hosts a Kubernetes operator that creates and manages OGX (Open GenAI Stack) servers.
- Automated deployment of OGX servers
- Support for multiple distributions (includes Ollama, vLLM, and others)
- Customizable server configurations
- Volume management for model storage
- Kubernetes-native resource management
You can install the operator directly from a released version or the latest main branch using kubectl apply -f.
To install the latest version from the main branch:
kubectl apply -f https://raw.githubusercontent.com/ogx-ai/ogx-k8s-operator/main/release/operator.yamlTo install a specific released version (e.g., v1.0.0), replace main with the desired tag:
kubectl apply -f https://raw.githubusercontent.com/ogx-ai/ogx-k8s-operator/v1.0.0/release/operator.yaml- Deploy the inference provider server (ollama, vllm)
Ollama Examples:
Deploy Ollama with default model llama3.2:1b
./hack/deploy-quickstart.shDeploy Ollama with other model:
./hack/deploy-quickstart.sh --provider ollama --model llama3.2:7bvLLM Examples:
This would require a secret "hf-token-secret" in namespace "vllm-dist" for HuggingFace token (required for downloading models) to be created in advance.
Deploy vLLM with default model (meta-llama/Llama-3.2-1B):
./hack/deploy-quickstart.sh --provider vllmDeploy vLLM with GPU support:
./hack/deploy-quickstart.sh --provider vllm --runtime-env "VLLM_TARGET_DEVICE=gpu,CUDA_VISIBLE_DEVICES=0"- Create an OGXServer CR to get the server running. Example:
apiVersion: ogx.io/v1beta1
kind: OGXServer
metadata:
name: ogxserver-sample
spec:
distribution:
name: starter
workload:
replicas: 1
storage:
size: "20Gi"
mountPath: "/.ogx"
overrides:
env:
- name: OLLAMA_INFERENCE_MODEL
value: "llama3.2:1b"
- name: OLLAMA_URL
value: "http://ollama-server-service.ollama-dist.svc.cluster.local:11434"
- Verify the server pod is running in the user defined namespace.
To enable the inline::milvus local vector storage provider, set ENABLE_INLINE_MILVUS in spec.workload.overrides.env. This is only supported in single-worker, single-replica deployments. Milvus-Lite uses SQLite internally and does not support concurrent access from multiple processes.
A ConfigMap can be used to store config.yaml configuration for each OGXServer. Updates to the ConfigMap will restart the Pod to load the new data.
Example to create a config.yaml ConfigMap, and an OGXServer that references it:
kubectl apply -f config/samples/example-with-configmap.yaml
Network policies are enabled by default per-CR. Configure via spec.network.policy:
apiVersion: ogx.io/v1beta1
kind: OGXServer
metadata:
name: my-ogxserver
spec:
distribution:
name: starter
network:
externalAccess:
enabled: true
hostname: my-ogx.example.com
policy:
enabled: true
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: my-app-namespace
ports:
- protocol: TCP
port: 8321| Field | Description |
|---|---|
network.externalAccess.enabled |
When true, enables external access configuration for the server |
network.externalAccess.hostname |
Hostname used for external access (for example, Ingress host) |
network.policy.enabled |
When true, the operator creates a NetworkPolicy for the OGXServer workload |
network.policy.ingress |
Ingress rules for the policy (for example, allowed sources and ports) |
The operator supports ConfigMap-driven image updates for OGX distribution images. This allows independent patching for security fixes or bug fixes without requiring a new operator version.
Create or update the operator ConfigMap with an image-overrides key:
image-overrides: |
starter-gpu: quay.io/custom/ogx:starter-gpu
starter: quay.io/custom/ogx:starterUse the distribution name directly as the key (e.g., starter-gpu, starter). The operator will apply these overrides automatically
To update the OGX distribution image for all starter distributions:
kubectl patch configmap ogx-operator-config -n ogx-k8s-operator-system --type merge -p '{"data":{"image-overrides":"starter: quay.io/ogx-ai/ogx-server:latest"}}'This will cause all OGXServer resources using the starter distribution to restart with the new image.
- Kubernetes cluster (v1.20 or later)
- Go version go1.24
- operator-sdk v1.39.2 (v4 layout) or newer
- kubectl configured to access your cluster
- A running inference server:
- For local development, you can use the provided script:
/hack/deploy-quickstart.sh
- For local development, you can use the provided script:
-
Prepare release files with specific versions
make release VERSION=0.2.1 LLAMASTACK_VERSION=0.2.12This command updates distribution configurations and generates release manifests with the specified versions.
-
Custom operator image can be built using your local repository
make image IMG=quay.io/<username>/ogx-k8s-operator:<custom-tag>The default image used is
quay.io/ogx-ai/ogx-k8s-operator:latestwhen not supply argument formake imageTo create a local filelocal.mkwith env variables can overwrite the default values set in theMakefile. -
Building multi-architecture images (ARM64, AMD64, etc.)
The operator supports building for multiple architectures including ARM64. To build and push multi-arch images:
make image-buildx IMG=quay.io/<username>/ogx-k8s-operator:<custom-tag>By default, this builds for
linux/amd64,linux/arm64. You can customize the platforms by setting thePLATFORMSvariable:# Build for specific platforms make image-buildx IMG=quay.io/<username>/ogx-k8s-operator:<custom-tag> PLATFORMS=linux/amd64,linux/arm64 # Add more architectures (e.g., for future support) make image-buildx IMG=quay.io/<username>/ogx-k8s-operator:<custom-tag> PLATFORMS=linux/amd64,linux/arm64,linux/s390x,linux/ppc64leNote:
-
The
image-buildxtarget works with both Docker and Podman. It will automatically detect which tool is being used. -
Native builds in CI: CI workflows use a matrix strategy with native runners for each architecture (AMD64 and ARM64). Each architecture is built on its own runner, avoiding QEMU emulation entirely. Per-architecture images are pushed separately, then combined into a single multi-arch manifest list. This ensures
CGO_ENABLED=1with full OpenSSL FIPS support for all architectures. -
Local cross-compilation: For local development, the Dockerfile uses
--platform=$BUILDPLATFORMto run Go compilation natively on the build host. When cross-compiling (e.g., building ARM64 on an AMD64 host),CGO_ENABLED=0is used with pure Go FIPS (viaGOEXPERIMENT=strictfipsruntime). Native local builds useCGO_ENABLED=1with full OpenSSL FIPS support. -
FIPS adherence: All CI-produced images use
CGO_ENABLED=1with full OpenSSL FIPS support via native builds on architecture-matched runners. -
For Docker: Multi-arch builds require Docker Buildx. Ensure Docker Buildx is set up:
docker buildx create --name x-builder --use -
For Podman: Podman 4.0+ supports
podman buildx(experimental). If buildx is unavailable, the Makefile will automatically fall back to using podman's native manifest-based multi-arch build approach. -
The resulting images are multi-arch manifest lists, which means Kubernetes will automatically select the correct architecture when pulling the image.
CI Build Targets:
The CI workflows use the following Makefile targets for the matrix-based build strategy:
# Build and push a single-arch image (used by each matrix job on its native runner) make image-build-push-single PLATFORM=linux/amd64 IMG=quay.io/<username>/ogx-k8s-operator:<tag>-amd64 # Create a multi-arch manifest from per-arch images (used by the final manifest job) make image-create-manifest IMG=quay.io/<username>/ogx-k8s-operator:<tag> \ ARCH_IMGS="quay.io/<username>/ogx-k8s-operator:<tag>-amd64 quay.io/<username>/ogx-k8s-operator:<tag>-arm64" -
-
Building ARM64-only images
To build a single ARM64 image (useful for testing or ARM-native systems):
make image-build-arm IMG=quay.io/<username>/ogx-k8s-operator:<custom-tag> make image-push IMG=quay.io/<username>/ogx-k8s-operator:<custom-tag>This works with both Docker and Podman.
-
Once the image is created, the operator can be deployed directly. For each deployment method a kubeconfig should be exported
export KUBECONFIG=<path to kubeconfig>
Deploying on vanilla Kubernetes (cert-manager)
-
Deploy the created image in your cluster using following command:
make deploy IMG=quay.io/<username>/ogx-k8s-operator:<custom-tag> -
To remove resources created during installation use:
make undeploy
Deploying on OpenShift
OpenShift clusters use the built-in service-serving-cert-signer for webhook TLS (no cert-manager required):
make deploy-openshift IMG=quay.io/<username>/ogx-k8s-operator:<custom-tag>
-
To remove resources:
make undeploy-openshift
The operator includes end-to-end (E2E) tests to verify the complete functionality of the operator. To run the E2E tests:
- Ensure you have a running Kubernetes cluster
- Run the E2E tests using one of the following commands:
- If you want to deploy the operator and run tests:
make deploy test-e2e - If the operator is already deployed:
make test-e2e
- If you want to deploy the operator and run tests:
The make target will handle prerequisites including deploying ollama server.
Please refer to api documentation