A session-scoped AI orchestrator for Claude Code that turns Claude into a senior video production technical director. It routes natural-language requests to 12 specialized skill modules covering the full pipeline: character image generation, video production, voice synthesis, LoRA training, and publishing -- all driven by ComfyUI.
Producing AI-generated video with ComfyUI involves juggling dozens of models, custom nodes, prompt styles, and hardware constraints. VideoAgent wraps all of that domain knowledge into a structured skill system that Claude reads on demand, so you can say things like:
- "Generate a photorealistic portrait of a dog using InstantID"
- "Make a talking head video where she says 'Welcome to my channel'"
- "Train a LoRA from these 20 reference images"
- "Check for new ComfyUI video models released this month"
...and get validated, hardware-aware ComfyUI workflows without memorizing node names or VRAM budgets.
video-agent.bat
|
|-- Writes state/session.json (active project, ComfyUI URL)
|-- cd to this repo
|-- Launches: claude
|
|-- Claude Code auto-loads CLAUDE.md (the orchestrator)
|-- Local hooks fire (staleness check)
|-- User's first message triggers foundation reads
|-- Each request is routed to the right skill file
Key design decisions:
| Decision | Rationale |
|---|---|
| Session-scoped, not global | Skills stay in this repo. Other Claude Code sessions are unaffected. |
CLAUDE.md is the orchestrator |
The only file Claude auto-loads from a project root. Contains the routing table and behavioral instructions. |
| Skills are read-on-demand markdown | No build step, no registration. Claude reads skills/{name}/SKILL.md when the routing table says to. |
| REST polling over WebSocket | Claude Code can't hold persistent connections. Polling every 5s works fine for minute-long video generation. |
| Research is user-triggered | No cron jobs. A session-start hook reminds you when data is stale. |
- Claude Code installed and authenticated
- ComfyUI installed (local or remote)
- FFmpeg on PATH (for video assembly)
- PowerShell 7+ (for utility scripts)
- Windows (the launcher is a
.batfile; WSL/Linux adaptation is straightforward)
git clone https://github.com/MCKRUZ/ComfyUI-Expert.git
cd ComfyUI-Expert
video-agent.batWith options:
video-agent.bat --project "my-video" # Set active project
video-agent.bat --comfyui "http://<remote-ip>:8188" # Remote ComfyUI
video-agent.bat --resume # Resume last sessionFirst time (or after installing new models/nodes), tell the agent:
Scan my ComfyUI installation at C:\ComfyUI
Or run the script directly:
pwsh -File scripts/scan-inventory.ps1 -ComfyUIPath "C:\ComfyUI"This creates state/inventory.json -- a cache of every model, custom node, and VRAM detail. The agent validates every workflow against this inventory before execution.
Generate a photorealistic portrait using FLUX
Create a new project called "Character Showcase"
Add a character named Spot - German Shepard, shaggy hair, dark fur
Train a LoRA from these reference images
Research the latest ComfyUI video models
VideoAgent loads context incrementally to stay within Claude's context window:
| Tier | Files | Loaded When | Size |
|---|---|---|---|
| 1: Foundation | foundation/*.md |
Session start (first interaction) | ~2K tokens |
| 2: Working | projects/{name}/* |
When working on a specific project | Varies |
| 3: Reference | references/*.md |
Only when a skill explicitly needs detail | Large |
CLAUDE.md (orchestrator - always loaded)
|
|-- Foundation Skills (no dependencies)
| |-- comfyui-api REST API connection
| |-- comfyui-inventory Model/node discovery
| |-- project-manager Project & character state
|
|-- Research (independent)
| |-- comfyui-research Self-updating knowledge base
|
|-- Core Creation (depend on inventory)
| |-- comfyui-prompt-engineer Model-specific prompt optimization
| |-- comfyui-workflow-builder Validated workflow JSON generation
|
|-- Production (depend on creation)
| |-- comfyui-video-pipeline Wan 2.2 / FramePack / AnimateDiff
| |-- comfyui-voice-pipeline Chatterbox / F5-TTS / lip-sync
| |-- comfyui-lora-training Dataset prep, training, evaluation
|
|-- Output (depend on production)
| |-- video-assembly FFmpeg + Remotion composition
| |-- video-publisher YouTube metadata & upload
|
|-- Support
|-- comfyui-troubleshooter Error diagnosis & fixes
When you make a request, CLAUDE.md routes it to the right skill:
| You Say | Skill Loaded | What Happens |
|---|---|---|
| "Generate a character portrait" | comfyui-workflow-builder |
Checks inventory, builds workflow JSON, queues via API |
| "Craft a better prompt" | comfyui-prompt-engineer |
Model-specific optimization (FLUX vs SDXL vs Wan) |
| "Create a video from this image" | comfyui-video-pipeline |
Selects engine (Wan/FramePack/AnimateDiff), builds pipeline |
| "Clone this voice / make her talk" | comfyui-voice-pipeline |
Voice synthesis + lip-sync pipeline |
| "Train a LoRA" | comfyui-lora-training |
Dataset prep, training config, checkpoint evaluation |
| "Build a raw workflow" | comfyui-workflow-builder |
Direct workflow construction with inventory validation |
| "Check for new models" | comfyui-research |
Scans YouTube/GitHub/HuggingFace, updates references |
| "Something broke" | comfyui-troubleshooter |
Error pattern matching, fix suggestions |
| "Assemble the final video" | video-assembly |
FFmpeg or Remotion-based composition |
| "Upload to YouTube" | video-publisher |
Metadata generation + upload delegation |
| "Create a new project" | project-manager |
Project manifests, character profiles |
| "Connect to ComfyUI" | comfyui-api |
Connection test, system info |
comfyui-api -- Connects to ComfyUI's REST API (default http://127.0.0.1:8188). Queues workflows, polls for results at 5-second intervals, handles image/model uploads, cancellations, and VRAM management. Supports online mode (live API) and offline mode (JSON export).
comfyui-inventory -- Discovers every installed model, custom node, and VRAM configuration. Works online (API queries) or offline (directory scanning via scan-inventory.ps1). Caches results to state/inventory.json. Maps node classes to packages (e.g., ApplyInstantID -> ComfyUI_InstantID).
project-manager -- Creates project structures with YAML manifests and character profiles. Tracks generation history (what settings worked), manages character identity (appearance, voice, LoRA, reference images), and updates defaults after successful runs.
comfyui-research -- Monitors 7 YouTube channels, 11 GitHub repos, and HuggingFace trending models. Extracts knowledge from tutorials (via transcript analysis), tracks releases, and generates staleness reports. Models older than 90 days and nodes older than 60 days get flagged.
comfyui-prompt-engineer -- Model-specific prompt optimization for FLUX, SDXL, SD1.5, and Wan. Adjusts prompts for identity methods (InstantID, PuLID, IP-Adapter, LoRA), recommends CFG scales per model, and provides negative prompt templates. Integrates with character profiles for context.
comfyui-workflow-builder -- Generates ComfyUI workflow JSON from natural language. Validates every model and node against inventory before output. Supports text-to-image, identity-preserved generation, video (Wan/AnimateDiff), upscaling, and inpainting patterns. Includes VRAM estimation per component.
comfyui-video-pipeline -- Orchestrates three video engines based on requirements:
- Wan 2.2 MoE 14B: Film-level quality, 5-10 sec clips, 24GB+ VRAM
- FramePack: Long videos (60+ sec), VRAM-invariant (works on 6GB)
- AnimateDiff V3: Fast iteration, motion LoRAs, 4-8 step Lightning
Includes post-processing (RIFE frame interpolation, face enhancement, deflicker, color correction) and a dedicated talking-head pipeline.
comfyui-voice-pipeline -- Six voice synthesis tools (Chatterbox, F5-TTS, TTS Audio Suite, IndexTTS-2, RVC, ElevenLabs) and four lip-sync methods (Wav2Lip, SadTalker, LivePortrait, LatentSync 1.6). Three complete pipelines: Quick (image-to-talk), Quality (image-to-video-to-lip-sync), and Premium (expression transfer).
comfyui-lora-training -- Training tools (AI-Toolkit for FLUX, Kohya_ss for SDXL, FluxGym/SimpleTuner for low VRAM). Covers dataset preparation (15-30 images, captioning strategy), hyperparameter guidance, checkpoint evaluation, and LoRA + zero-shot method combination.
video-assembly -- Two modes: FFmpeg (concatenation, audio mixing, subtitles, transitions) and Remotion (animated captions, motion graphics, React-based templates). Audio normalization to -16 LUFS for YouTube. Quality presets (CRF 15-28).
video-publisher -- Thin orchestrator that delegates to global YouTube skills for research, title/thumbnail optimization, upload, and analytics. Generates platform-specific metadata (YouTube, Shorts, Instagram Reels, TikTok).
comfyui-troubleshooter -- Diagnoses four error categories (server, workflow, quality, performance). Covers the top 10 common errors (OOM, missing nodes, precision mismatch, burned faces, etc.) with quick fixes. Includes a quality decision tree and missing-dependency resolution.
The agent tracks the top models across five categories:
| Model | Best For | VRAM |
|---|---|---|
| FLUX.1-dev | Photorealism, highest quality | 16GB+ |
| FLUX Kontext | Iterative character editing | 12-32GB |
| RealVisXL V5.0 | Fast SDXL photorealism | 8GB+ |
| Method | Best For | VRAM |
|---|---|---|
| InfiniteYou | Highest identity fidelity | 24GB |
| FLUX Kontext | Edit without retraining | 12-32GB |
| PuLID Flux II | Dual characters, no pollution | 24-40GB |
| Model | Best For | VRAM |
|---|---|---|
| Wan 2.2 MoE | Film-level quality | 24GB+ |
| FramePack | Long videos, low VRAM | 6GB+ |
| AnimateDiff V3 | Fast iteration, motion LoRAs | 8GB+ |
| Tool | Best For | License |
|---|---|---|
| TTS Audio Suite | 23 languages, unified platform | Multi |
| Chatterbox | Emotion tags, beats ElevenLabs 63.8% | MIT |
| F5-TTS | Zero-shot cloning, fastest | MIT |
| Tool | Best For |
|---|---|
| LatentSync 1.6 | Highest accuracy (ByteDance) |
| Wav2Lip | Proven, works with any face |
| SadTalker | Head movement + expressions |
Full specs and download links are in references/models.md.
VideoAgent is configured for an RTX 5090 (32GB VRAM) but works with any GPU. The agent adjusts recommendations based on available VRAM.
| Workload | 32GB Status | Notes |
|---|---|---|
| FLUX.1-dev FP16 | Native | No quantization needed |
| Wan 2.2 14B | Native | Full quality |
| FramePack | Overkill | Designed for 6GB |
| PuLID Flux II | Native | Dual-character generation |
| InfiniteYou | Native | Both SIM and AES variants |
| LoRA Training (FLUX) | Native | No quantization needed |
Recommended ComfyUI launch flags: --highvram --fp8_e4m3fn-unet
ComfyUI-Expert/
|-- video-agent.bat Launcher (writes session config, opens Claude)
|-- CLAUDE.md Orchestrator (routing table, behavior, rules)
|-- .claude/
| +-- settings.local.json Project-local hooks & permissions
|
|-- foundation/ Tier 1: Quick reference (~2K tokens)
| |-- agent-persona.md Communication style & principles
| |-- api-quick-ref.md ComfyUI REST API cheat sheet
| |-- hardware-profile.md GPU specs, VRAM capabilities
| |-- model-landscape.md Top 3 models per category
| +-- skill-registry.md Skill list & dependency map
|
|-- skills/ 12 skill modules (read on demand)
| |-- comfyui-api/
| |-- comfyui-inventory/
| |-- comfyui-lora-training/
| |-- comfyui-prompt-engineer/
| |-- comfyui-research/
| |-- comfyui-troubleshooter/
| |-- comfyui-video-pipeline/
| |-- comfyui-voice-pipeline/
| |-- comfyui-workflow-builder/
| |-- project-manager/
| |-- video-assembly/
| +-- video-publisher/
|
|-- references/ Tier 3: Deep reference (loaded on demand)
| |-- models.md Full model catalog & download links
| |-- workflows.md Complete workflow node configurations
| |-- lora-training.md Training parameters & best practices
| |-- voice-synthesis.md Voice tools in depth
| |-- prompt-templates.md Model-specific prompt strategies
| |-- troubleshooting.md Error database with solutions
| |-- research-2025.md Full technique survey
| |-- staleness-report.md Freshness tracking for all entries
| +-- evolution.md Update protocol & changelog
|
|-- projects/ Per-project state (gitignored)
|-- state/ Runtime state (gitignored)
| |-- session.json Active project & ComfyUI URL
| +-- inventory.json Cached models/nodes from scan
|
|-- scripts/ Utility scripts
| |-- scan-inventory.ps1 Offline ComfyUI directory scanner
| |-- connect-comfyui.ps1 Connection test & diagnostics
| |-- staleness-check.ps1 Session-start hook (checks research age)
| +-- deploy.ps1 Sync references to global skill
|
|-- agent/
| +-- AGENT.md Extended orchestration spec
|
|-- openclaw/ OpenClaw compatibility layer
| |-- AGENTS.md Orchestration rules (CLAUDE.md equivalent)
| |-- SOUL.md Agent persona
| |-- TOOLS.md Available tools & API reference
| |-- setup.ps1 Install skills into OpenClaw workspace
| +-- openclaw.example.json Config template
|
+-- docs/
|-- architecture.md System design decisions
+-- getting-started.md Quick start guide
1. "Create a new project called Character Showcase"
2. "Add a character named Sage - auburn hair, green eyes, freckles"
3. "Generate a photorealistic portrait of Sage using InstantID"
The agent: reads inventory -> loads workflow-builder skill -> loads prompt-engineer skill -> generates optimized prompt -> builds validated workflow JSON -> queues via ComfyUI API -> polls for result.
1. "Make Sage say 'Hello everyone, welcome to my channel'"
The agent orchestrates a multi-step pipeline: voice synthesis (Chatterbox/F5-TTS) -> video generation (Wan 2.2) -> lip-sync (LatentSync) -> face enhancement -> assembly.
1. "Check for new ComfyUI models and techniques"
The agent: loads research skill -> checks YouTube channels, GitHub repos, HuggingFace trending -> extracts knowledge from tutorials -> updates reference files -> generates staleness report.
VideoAgent uses Claude Code's hooks system for lightweight automation:
| Hook | Event | What It Does |
|---|---|---|
| Staleness check | SessionStart |
Warns if research data is older than 2 weeks |
Configured in .claude/settings.local.json (project-local, doesn't affect other sessions).
| Script | Purpose | Usage |
|---|---|---|
scan-inventory.ps1 |
Scan ComfyUI models & nodes offline | pwsh -File scripts/scan-inventory.ps1 -ComfyUIPath "C:\ComfyUI" |
connect-comfyui.ps1 |
Test ComfyUI connection & show diagnostics | pwsh -File scripts/connect-comfyui.ps1 |
staleness-check.ps1 |
Check research freshness (session hook) | Runs automatically at session start |
deploy.ps1 |
Sync references to global comfyui-character-gen skill |
pwsh -File scripts/deploy.ps1 |
Edit foundation/hardware-profile.md with your GPU specs. The agent reads this at session start and adjusts VRAM recommendations accordingly.
Pass it at launch:
video-agent.bat --comfyui "http://<remote-ip>:8188"Or edit the default in video-agent.bat (line 18).
Edit foundation/model-landscape.md (top 3 quick reference) and references/models.md (full catalog). Or just ask the agent to run a research update.
Replace video-agent.bat with a shell script that:
- Writes
state/session.json cds to the repo- Runs
claude
Update the PowerShell script paths in .claude/settings.local.json to use pwsh (which is cross-platform).
| Issue | Solution |
|---|---|
| Claude doesn't act as VideoAgent | Launch via video-agent.bat, not plain claude |
| "Model not found" in workflow | Run inventory scan, then ask to regenerate |
| ComfyUI won't connect | Run pwsh -File scripts/connect-comfyui.ps1 |
| Staleness hook not firing | Check .claude/settings.local.json is valid JSON |
| Skills leaking to other sessions | They shouldn't -- skills are local files, not globally installed |
VideoAgent skills also work with OpenClaw. Both platforms follow the AgentSkills specification, so the skill files are cross-compatible. The openclaw/ directory contains everything needed.
| Claude Code | OpenClaw Equivalent | Notes |
|---|---|---|
CLAUDE.md (orchestrator) |
AGENTS.md + SOUL.md + TOOLS.md |
Split across three workspace files |
video-agent.bat (launcher) |
OpenClaw daemon | OpenClaw runs as a persistent service |
.claude/settings.local.json (hooks) |
openclaw.json (config) |
Different config format |
| Auto-loads from project root | Skills in ~/.openclaw/workspace/skills/ |
Must be installed to workspace |
{skill}/SKILL.md frontmatter |
Same + metadata.openclaw block |
Already added to all 12 skills |
# 1. Run the setup script (copies skills + workspace files into OpenClaw)
pwsh -File openclaw/setup.ps1
# Or use symlinks to keep them in sync with the repo
pwsh -File openclaw/setup.ps1 -Symlink
# Or specify a custom OpenClaw workspace path
pwsh -File openclaw/setup.ps1 -OpenClawDir "~/.openclaw/workspace"# 2. Add skill config to your ~/.openclaw/openclaw.json
# See openclaw/openclaw.example.json for the full template -- at minimum:{
"skills": {
"entries": {
"comfyui-api": {
"enabled": true,
"env": { "COMFYUI_URL": "http://127.0.0.1:8188" }
},
"comfyui-inventory": {
"enabled": true,
"env": { "COMFYUI_PATH": "C:\\ComfyUI" }
}
}
}
}# 3. Restart OpenClaw to pick up the new skills- Copies (or symlinks) all 12 skill folders into
~/.openclaw/workspace/skills/ - Copies
AGENTS.md,SOUL.md,TOOLS.mdinto the workspace root - Copies
foundation/andreferences/alongside skills for reference access
- Skill routing: Claude Code uses a routing table in
CLAUDE.md. OpenClaw uses keyword matching on thedescriptionfield in each skill's frontmatter -- the descriptions are already written to support this. - Requirements gating: OpenClaw validates
metadata.openclaw.requiresat load time (checks that binaries/env vars exist). Skills that fail requirements are excluded from the session. - No session hook: The staleness-check hook is Claude Code specific. In OpenClaw, ask the agent to check for stale research manually, or set up a cron job.
- File references: OpenClaw skills can use
{baseDir}to reference files relative to the skill folder. The current skills use relative paths that work in both environments.
openclaw/
|-- AGENTS.md Orchestration rules (CLAUDE.md equivalent)
|-- SOUL.md Agent persona
|-- TOOLS.md Available tools and API reference
|-- setup.ps1 Installation script
+-- openclaw.example.json Config template for ~/.openclaw/openclaw.json
MIT