🎭 Inference-in-a-Box Demo Guide

📋 Navigation: 🏠 Main README • 🎯 Goals & Vision • 🚀 Getting Started • 📖 Usage Guide • 🏗️ Architecture • 🤖 AI Assistant

This guide provides step-by-step instructions for running interactive demonstrations of the Inference-in-a-Box platform capabilities.

🎯 Demo Context: These demonstrations showcase the capabilities outlined in GOALS.md 🚀 Setup Required: Ensure platform is deployed using Getting Started Guide

📋 Prerequisites

Before running the demos, ensure:

Platform is bootstrapped: ./scripts/bootstrap.sh
All models are deployed and ready
Required observability tools are running

🚀 Quick Start

# Run the interactive demo menu
./scripts/demo.sh

# Or run specific demos directly
./scripts/demo.sh --demo security
./scripts/demo.sh --demo autoscaling
./scripts/demo.sh --demo canary
./scripts/demo.sh --demo multitenancy
./scripts/demo.sh --demo observability

🎯 Demo Scenarios

1. 🔒 Security & Authentication Demo

Demonstrates: JWT-based authentication, tenant isolation, and zero-trust networking

What it shows:

JWT token structure and validation
Tenant-specific access control
Cross-tenant access prevention
mTLS communication between services

Demo Flow:

Displays JWT token contents for different tenants
Makes authorized request to sklearn-iris model with correct tenant token
Attempts unauthorized cross-tenant access (should fail)
Shows security policies in action

Expected Outcome:

✅ Tenant A can access sklearn-iris model
❌ Tenant A cannot access Tenant B resources
🔒 All communication is encrypted with mTLS

2. ⚡ Auto-scaling Demo

Demonstrates: Serverless model serving with scale-to-zero and auto-scaling

What it shows:

Scale from zero pods to multiple replicas
Load-based scaling decisions
Scale-down after load removal
Knative Serving integration

Demo Flow:

Shows initial pod state (likely zero pods)
Generates sustained load for 60 seconds
Monitors pod scaling events in real-time
Observes scale-down after load stops

Expected Outcome:

📈 Pods scale up from 0 to N based on load
⚡ Fast response times during scaling
📉 Automatic scale-down after load removal

3. 🚦 Canary Deployment Demo

Demonstrates: Advanced traffic management and progressive deployment

What it shows:

Canary deployment creation
Traffic splitting between versions
Istio virtual service configuration
Progressive rollout capabilities

Demo Flow:

Checks for existing sklearn-iris model
Deploys canary version (v2) with 10% traffic
Configures Istio virtual service for traffic splitting
Makes multiple requests to demonstrate traffic distribution
Shows promotion options

Expected Outcome:

🔄 90% traffic to main version, 10% to canary
📊 Visible traffic distribution in responses
🎛️ Easy promotion/rollback options

4. 🌐 Multi-tenant Isolation Demo

Demonstrates: Secure multi-tenancy with resource isolation

What it shows:

Namespace-based tenant separation
Network policies for isolation
Resource quotas and limits
Independent model deployments

Demo Flow:

Lists all tenant namespaces and labels
Shows network policies enforcing isolation
Displays resource quotas per tenant
Lists models deployed in each tenant

Expected Outcome:

🏢 Clear tenant boundaries
🔒 Network isolation between tenants
📊 Resource governance in place
🎯 Independent model lifecycles

5. 📊 Observability Demo

Demonstrates: Comprehensive monitoring and observability

What it shows:

Real-time metrics collection
Distributed tracing (when available)
Service mesh visualization
Custom dashboards

Demo Flow:

Starts port-forwarding for observability tools
Generates sample traffic for metrics
Provides access URLs for tools
Shows recommended dashboards

Expected Outcome:

📈 Live metrics in Prometheus
📊 Rich dashboards in Grafana
🗺️ Service topology in Kiali
🔍 Request traces (if Jaeger deployed)

🔧 Current Configuration

AI Gateway Access

Envoy AI Gateway: http://localhost:8080 (via Istio Gateway port-forward)
JWT Server: http://localhost:8081 (for token retrieval)
Model Endpoints: Use hostname-based routing (e.g., http://sklearn-iris-predictor.tenant-a.127.0.0.1.sslip.io:8080)
JWT Tokens: Dynamically retrieved from server during demos

Deployed Models

sklearn-iris (tenant-a): Iris classification model
pytorch-resnet (tenant-c): ResNet image classification model

Observability Stack

Grafana: http://localhost:3000 (admin/prom-operator)
Prometheus: http://localhost:9090
Kiali: http://localhost:20001

Tenant Structure

tenant-a: Scikit-learn models
tenant-b: Reserved for TensorFlow models
tenant-c: PyTorch models

🛠️ Demo Script Features

Interactive Menu System

🚀 Inference-in-a-Box Demo Menu

1) 🔒 Security & Authentication Demo
2) ⚡ Auto-scaling Demo
3) 🚦 Canary Deployment Demo
4) 🌐 Multi-tenant Isolation Demo
5) 📊 Observability Demo
6) 🧪 Run All Demos
7) 🚪 Exit

Automated Setup

Port-forwarding for required services
Model deployment validation
Service readiness checks
Cleanup on exit

Error Handling

Graceful fallbacks for missing components
Informative error messages
Automatic retries where appropriate
Clean exit handling

📱 Usage Examples

Run Security Demo

./scripts/demo.sh
# Select option 1

Generate Load for Auto-scaling

# The demo automatically generates load
# Watch scaling in real-time:
watch "kubectl get pods -n tenant-a -l serving.kserve.io/inferenceservice=sklearn-iris"

Monitor Traffic Split

# During canary demo, monitor virtual service:
kubectl get virtualservice -n tenant-a -o yaml

Access Observability Tools

# Tools are automatically port-forwarded during demo
# Access at:
# - Grafana: http://localhost:3000
# - Prometheus: http://localhost:9090
# - Kiali: http://localhost:20001

🔍 Troubleshooting

Common Issues

Port conflicts:

# Kill existing port-forwards
pkill -f "port-forward"

Model not ready:

# Check model status
kubectl get inferenceservice --all-namespaces
kubectl describe inferenceservice sklearn-iris -n tenant-a

Authentication failures:

# Verify JWT tokens are valid
echo "TOKEN" | cut -d "." -f2 | base64 -d | jq .

Validation Commands

# Check platform health
kubectl get pods --all-namespaces
kubectl get inferenceservice --all-namespaces

# Verify observability stack
kubectl get pods -n monitoring
kubectl get svc -n monitoring

# Test model access
curl -H "Authorization: Bearer TOKEN" \
  http://sklearn-iris-predictor.tenant-a.127.0.0.1.sslip.io:8080/v1/models/sklearn-iris:predict \
  -d '{"instances": [[5.1, 3.5, 1.4, 0.2]]}'

🎬 Demo Script Architecture

The demo system is now split into separate files for better maintainability:

scripts/demo.sh - Main entry point with interactive menu
scripts/demo-security.sh - Security & authentication demo
scripts/demo-autoscaling.sh - Auto-scaling demo
scripts/demo-canary.sh - Canary deployment demo
scripts/demo-multitenancy.sh - Multi-tenant isolation demo
scripts/demo-observability.sh - Observability demo

Key Features:

Modular design with separate scripts for each demo
Interactive menu system for user-friendly navigation
Automated cleanup to prevent resource conflicts
Comprehensive logging for debugging and monitoring
Error handling with graceful fallbacks
Correct URL patterns using hostname-based routing

📖 Related Documentation

README.md - Platform overview and quick start
CLAUDE.md - Development and operational guidance
Architecture Documentation - Technical deep dive
Troubleshooting Guide - Common issues and solutions

🤝 Contributing

To add new demo scenarios:

Create a new scripts/demo-<name>.sh file following the existing patterns
Add a new function in scripts/demo.sh that calls your script
Add menu option in the main() function
Update this documentation
Test thoroughly across different environments

Demo Script Template:

#!/bin/bash
# <Demo Name> Demo
# Description of what this demo shows

set -e

# Include common functions and logging
# ... (see existing demo scripts for patterns)

# Main demo function
demo_<name>() {
    log "Running <Demo Name> Demo"
    # Your demo logic here
    success "<Demo Name> Demo completed"
}

# Run the demo
demo_<name>

🏁 Conclusion

The Inference-in-a-Box demo showcases enterprise-grade AI/ML infrastructure capabilities through interactive, hands-on scenarios. Each demo is designed to highlight specific aspects of the platform while providing practical experience with real-world use cases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🎭 Inference-in-a-Box Demo Guide

📋 Prerequisites

🚀 Quick Start

🎯 Demo Scenarios

1. 🔒 Security & Authentication Demo

2. ⚡ Auto-scaling Demo

3. 🚦 Canary Deployment Demo

4. 🌐 Multi-tenant Isolation Demo

5. 📊 Observability Demo

🔧 Current Configuration

AI Gateway Access

Deployed Models

Observability Stack

Tenant Structure

🛠️ Demo Script Features

Interactive Menu System

Automated Setup

Error Handling

📱 Usage Examples

Run Security Demo

Generate Load for Auto-scaling

Monitor Traffic Split

Access Observability Tools

🔍 Troubleshooting

Common Issues

Validation Commands

🎬 Demo Script Architecture

Key Features:

📖 Related Documentation

🤝 Contributing

Demo Script Template:

🏁 Conclusion

FilesExpand file tree

demo.md

Latest commit

History

demo.md

File metadata and controls

🎭 Inference-in-a-Box Demo Guide

📋 Prerequisites

🚀 Quick Start

🎯 Demo Scenarios

1. 🔒 Security & Authentication Demo

2. ⚡ Auto-scaling Demo

3. 🚦 Canary Deployment Demo

4. 🌐 Multi-tenant Isolation Demo

5. 📊 Observability Demo

🔧 Current Configuration

AI Gateway Access

Deployed Models

Observability Stack

Tenant Structure

🛠️ Demo Script Features

Interactive Menu System

Automated Setup

Error Handling

📱 Usage Examples

Run Security Demo

Generate Load for Auto-scaling

Monitor Traffic Split

Access Observability Tools

🔍 Troubleshooting

Common Issues

Validation Commands

🎬 Demo Script Architecture

Key Features:

📖 Related Documentation

🤝 Contributing

Demo Script Template:

🏁 Conclusion