Skip to content

ristponex/wan-video-skill

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Wan Video Skill

An open AI agent skill for generating videos using Wan (Alibaba) models via Atlas Cloud API. Text-to-video, image-to-video, video-to-video β€” all from your terminal. Wan 2.1/2.2 are open source (Apache 2.0), Wan 2.5/2.6 are closed-source APIs. LoRA support, and NSFW-capable with Wan 2.2 Spicy.

Built for the open agent skills ecosystem β€” works with Claude Code, Cursor, Codex, Copilot, Gemini CLI, Windsurf, OpenCode, Kiro, and 15+ AI coding agents.

Features

  • Text-to-Video β€” Generate video from a text prompt
  • Image-to-Video β€” Animate a static image into a video
  • Video-to-Video β€” Transform existing video with new styles/content
  • Wan Model Family β€” Built on Alibaba's Wan video generation models (Wan 2.1/2.2 are open source under Apache 2.0; Wan 2.5/2.6 are closed-source commercial APIs)
  • LoRA Support β€” Use custom LoRA weights with Wan 2.2 Spicy for fine-tuned results
  • NSFW Mode β€” Wan 2.2 Spicy for uncensored, unrestricted content generation
  • Multiple Resolutions β€” 480p, 720p, 1080p output support
  • Variable Duration β€” 3s to 10s video generation
  • Affordable Pricing β€” Starting at just from $0.03/s with Atlas Cloud

Model Variants

Model Mode Starting Price per Second Resolution Duration Notes
Wan 2.6 T2V Text-to-Video from $0.07/s Up to 720p 5s Latest generation, best quality
Wan 2.6 I2V Image-to-Video from $0.07/s Up to 720p 5s Animate any image
Wan 2.6 V2V Video-to-Video from $0.07/s Up to 720p 5s Transform existing videos
Wan 2.5 T2V Text-to-Video from $0.05/s Up to 720p 5s Balanced quality and cost
Wan 2.2 Spicy Image-to-Video from $0.03/s Up to 480p 5s NSFW/uncensored content
Wan 2.2 Spicy LoRA Image-to-Video from $0.03/s Up to 480p 5s NSFW + custom LoRA weights

Prices shown are starting prices. Higher resolution or longer duration may cost more.

Install

Requirements: Bun

# Clone the repo
git clone https://github.com/thoughtincode/wan-video-skill.git ~/tools/wan-video-skill
cd ~/tools/wan-video-skill

# Install dependencies
bun install

# Link globally (no sudo needed - uses Bun's global bin)
bun link

# Set up your API key
cp .env.example .env
# Edit .env and add your Atlas Cloud API key

Get an Atlas Cloud API key at Atlas Cloud.

Now you can use wan-video from anywhere.

As an Agent Skill

When installed as an agent skill (via Claude Code, Cursor, Codex, Copilot, Gemini CLI, Windsurf, Kiro, and more), just say /init and your AI agent will clone the repo, install deps, and link the command for you. Then use it by saying "generate a video of..." and the agent handles the rest.

Fallback (if bun link doesn't work)

mkdir -p ~/.local/bin
ln -sf ~/tools/wan-video-skill/src/cli.ts ~/.local/bin/wan-video
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.zshrc
source ~/.zshrc

Usage

# Basic text-to-video
wan-video "a cat walking across a sunny garden"

# Custom output name
wan-video "ocean waves crashing on rocks" -o waves

# Higher resolution
wan-video "futuristic city flyover" --resolution 720p

# Longer duration
wan-video "time-lapse of flowers blooming" --duration 10

# Custom output directory
wan-video "sunset timelapse" -o sunset -d ~/Videos

Modes

# Text-to-Video (default)
wan-video "your prompt"

# Image-to-Video β€” animate a static image
wan-video "make this character wave hello" --mode i2v --image character.png

# Video-to-Video β€” transform existing video
wan-video "convert to anime style" --mode v2v --image input.mp4

Models

# Default β€” Wan 2.6 (latest, best quality)
wan-video "your prompt"

# Wan 2.5 β€” balanced quality and cost
wan-video "your prompt" --model wan-2.5

# Wan 2.6 Image-to-Video
wan-video "animate this scene" --model wan-2.6-i2v --image scene.png

# Wan 2.6 Video-to-Video
wan-video "add rain effect" --model wan-2.6-v2v --image input.mp4
Alias Model ID Best For
wan-2.6, wan-2.6-t2v alibaba/wan-2.6/text-to-video Best quality text-to-video
wan-2.6-i2v alibaba/wan-2.6/image-to-video Animating static images
wan-2.6-v2v alibaba/wan-2.6/video-to-video Transforming existing videos
wan-2.5 alibaba/wan-2.5/text-to-video Cost-effective generation
wan-spicy alibaba/wan-2.2-spicy/image-to-video NSFW/uncensored content
wan-spicy-lora alibaba/wan-2.2-spicy/image-to-video-lora NSFW + custom LoRA

NSFW Mode

Use the --nsfw flag to switch to Wan 2.2 Spicy for uncensored content generation:

# NSFW image-to-video (requires --image)
wan-video "your nsfw prompt" --nsfw --image input.png

# NSFW with LoRA weights
wan-video "your nsfw prompt" --nsfw --lora --image input.png

The --nsfw flag automatically selects the Wan 2.2 Spicy model. Add --lora to use the LoRA variant for custom fine-tuned styles.

Important: NSFW mode requires an input image (--image flag). Wan 2.2 Spicy only supports image-to-video generation.

Resolution and Duration

# 480p (fastest, cheapest)
wan-video "quick concept video" --resolution 480p

# 720p (default, balanced)
wan-video "product demo video" --resolution 720p

# Short clip
wan-video "logo animation" --duration 3

# Long clip
wan-video "nature documentary scene" --duration 10

Options

Option Default Description
-o, --output wan-gen-{timestamp} Output filename (no extension)
--model wan-2.6 Model alias or full model ID
--mode t2v Generation mode: t2v, i2v, v2v
--duration 5 Video duration in seconds (3-10)
--resolution 720p Output resolution: 480p, 720p, 1080p
--nsfw false Use Wan 2.2 Spicy for NSFW content
--lora false Use LoRA variant (only with --nsfw)
--image - Input image/video path (for i2v/v2v modes)
-d, --dir current directory Output directory
--api-key - Atlas Cloud API key (overrides env/file)
-h, --help - Show help

API Key Configuration

The CLI resolves the Atlas Cloud API key in priority order:

  1. --api-key flag on the command line
  2. ATLAS_API_KEY environment variable
  3. .env file in the current working directory
  4. .env file in the repo root (next to src/)
  5. ~/.wan-video/.env
# Option 1: Environment variable
export ATLAS_API_KEY=your_key_here

# Option 2: .env file in current directory
echo "ATLAS_API_KEY=your_key_here" > .env

# Option 3: Global config
mkdir -p ~/.wan-video
echo "ATLAS_API_KEY=your_key_here" > ~/.wan-video/.env

# Option 4: Pass directly
wan-video "your prompt" --api-key your_key_here

How It Works

The CLI uses the Atlas Cloud API to interface with Alibaba's Wan video generation models:

  1. Submit Request β€” Sends your prompt (and optional image/video) to the Atlas Cloud API
  2. Poll for Completion β€” Checks the prediction status every 5 seconds until the video is ready
  3. Download Result β€” Downloads the generated video to your specified output location

API Flow

POST /api/v1/model/prediction
  β†’ Returns request_id

GET /api/v1/model/prediction/{request_id}
  β†’ Poll until status: "completed"
  β†’ Download output video URL

Model Selection Logic

Default:         alibaba/wan-2.6/text-to-video
--mode i2v:      alibaba/wan-2.6/image-to-video
--mode v2v:      alibaba/wan-2.6/video-to-video
--model wan-2.5: alibaba/wan-2.5/text-to-video
--nsfw:          alibaba/wan-2.2-spicy/image-to-video
--nsfw --lora:   alibaba/wan-2.2-spicy/image-to-video-lora

Wan Model Deep Dive

Wan 2.6 (Latest)

The newest generation of Alibaba's Wan video models. Wan 2.6 delivers the best quality across text-to-video, image-to-video, and video-to-video tasks. It supports up to 720p resolution and 5-second clips, with significantly improved temporal consistency and prompt adherence compared to earlier versions.

  • Best for: Production-quality video generation, commercial content, marketing materials
  • Strengths: Superior motion quality, accurate prompt following, clean outputs
  • Price: from $0.07/s β€” the most affordable high-quality video API available

Wan 2.5

The previous generation model offering a good balance of quality and cost. Wan 2.5 produces solid results at a lower price point, making it ideal for prototyping and high-volume generation.

  • Best for: Prototyping, batch generation, cost-sensitive workflows
  • Strengths: Fast generation, reliable quality, lower cost
  • Price: from $0.05/s

Wan 2.2 Spicy (NSFW)

An uncensored variant of the Wan model specifically designed for adult content generation. Wan 2.2 Spicy removes all content filters and safety restrictions, allowing generation of explicit content.

  • Best for: Adult content platforms, unrestricted creative work
  • Strengths: No content filters, NSFW output, LoRA support for custom styles
  • Price: from $0.03/s β€” cheapest NSFW video generation API
  • Note: Image-to-video only β€” requires an input image

Wan 2.2 Spicy LoRA

The LoRA variant of Wan 2.2 Spicy adds support for custom fine-tuned weights. This allows you to apply specific styles, characters, or aesthetics to your NSFW generations.

  • Best for: Custom character generation, specific art styles, branded content
  • Strengths: All Spicy features + custom LoRA weight support
  • Price: from $0.03/s

Use Cases

  • Marketing Videos β€” Product demos, social media clips, ad creatives
  • Prototyping β€” Quick video concepts before full production
  • Content Creation β€” YouTube thumbnails-to-video, blog post illustrations
  • Game Development β€” Cutscene concepts, trailer storyboards
  • Animation β€” Animate static artwork, character concepts in motion
  • Style Transfer β€” Transform existing videos into new visual styles
  • Adult Content β€” NSFW generation with Wan 2.2 Spicy (uncensored)
  • Video Production β€” Visual elements for Remotion/video compositions

Agent Skill Integration

When installed as an agent skill, the skill triggers on phrases like:

  • "generate a video"
  • "create a video clip"
  • "animate this image"
  • "make a video of"
  • "convert this to video"

Your AI agent will construct the appropriate wan-video command based on your request, handling model selection, resolution, duration, NSFW mode, and output configuration automatically. Works across Claude Code, Cursor, Codex, Copilot, Gemini CLI, Windsurf, Kiro, and more.

Example Skill Interactions

User: "Generate a 10-second video of a dragon flying over mountains"
Agent: wan-video "a dragon flying over mountains, cinematic aerial shot" --duration 10

User: "Animate this character image"
Agent: wan-video "character performing idle animation, smooth motion" --mode i2v --image character.png

User: "Make an NSFW video from this image"
Agent: wan-video "your prompt" --nsfw --image input.png

Comparison with Other Video Models

Feature Wan 2.6 Kling 2.0 Runway Gen-3 Sora
Price from $0.07/s $0.10+ $0.50+ N/A
Open Source 2.1/2.2 only No No No
NSFW Support Yes (Spicy) No No No
LoRA Support Yes (Spicy) No No No
API Access Atlas Cloud Limited Runway API Waitlist
Text-to-Video Yes Yes Yes Yes
Image-to-Video Yes Yes Yes Yes
Video-to-Video Yes No No No

Prompt Engineering Tips

Writing effective prompts is key to getting great results from Wan video models. Here are some best practices:

Be Specific About Motion

Instead of vague descriptions, describe exactly what should move and how:

# Vague (unpredictable results)
wan-video "a person in a park"

# Specific (much better results)
wan-video "a woman walking slowly through a sunlit park, camera follows from behind, gentle breeze moving tree leaves"

Include Camera Instructions

Wan models respond well to cinematographic language:

# Dolly shot
wan-video "slow dolly forward through a dimly lit hallway, dust particles in light beams"

# Aerial shot
wan-video "drone aerial shot flying over a vast mountain range at golden hour, clouds below"

# Tracking shot
wan-video "tracking shot following a cyclist through city streets, shallow depth of field"

# Static shot
wan-video "static wide shot of a waterfall in a tropical forest, mist rising"

Describe Lighting and Atmosphere

wan-video "neon-lit cyberpunk alley at night, rain reflections on wet pavement, fog rolling in"
wan-video "soft morning light streaming through curtains, dust particles visible in light beams"
wan-video "harsh midday sun casting sharp shadows on a desert landscape"

Use Style Keywords

# Cinematic
wan-video "cinematic shot of a train crossing a bridge at sunset, 35mm film grain, anamorphic lens flare"

# Documentary
wan-video "documentary style close-up of hands crafting pottery, natural lighting, handheld camera"

# Music video
wan-video "stylized music video shot, artist performing on rooftop, city skyline background, dramatic lighting"

Prompt Structure

A good prompt follows this pattern:

[subject] [action] [environment] [camera/style] [lighting/mood]

Example:

wan-video "a golden retriever (subject) running through shallow water (action) on a beach at sunset (environment), slow motion tracking shot (camera), warm golden hour lighting with lens flare (mood)"

Advanced Usage

Batch Generation

Generate multiple videos with a simple shell loop:

# Generate variations of the same concept
for i in 1 2 3 4 5; do
  wan-video "abstract fluid art, colorful paint mixing in water, macro shot" \
    -o "fluid-art-$i" --duration 5
done

Pipeline with Image-to-Video

Use AI-generated images as input for Wan I2V:

# Step 1: Generate an image (using any image generation tool)
# Step 2: Animate it with Wan
wan-video "the character slowly turns to face the camera and smiles" \
  --mode i2v --image generated-character.png -o animated-character

Combining with FFmpeg

Post-process your generated videos:

# Generate a video
wan-video "cinematic landscape timelapse" -o landscape

# Add slow motion (2x slower)
ffmpeg -i landscape.mp4 -filter:v "setpts=2.0*PTS" landscape-slow.mp4

# Create a loop
ffmpeg -stream_loop 3 -i landscape.mp4 -c copy landscape-looped.mp4

# Convert to GIF
ffmpeg -i landscape.mp4 -vf "fps=15,scale=480:-1" -loop 0 landscape.gif

# Add audio track
ffmpeg -i landscape.mp4 -i music.mp3 -shortest -c:v copy landscape-audio.mp4

Environment-Specific Configurations

# Production environment
ATLAS_API_KEY=prod_key wan-video "product demo" --resolution 720p --model wan-2.6

# Development (cheaper model for testing)
ATLAS_API_KEY=dev_key wan-video "test video" --model wan-2.5

# NSFW pipeline
ATLAS_API_KEY=nsfw_key wan-video "prompt" --nsfw --image input.png

Frequently Asked Questions

How long does video generation take?

Typical generation times:

  • Wan 2.6: 30-90 seconds depending on resolution and duration
  • Wan 2.5: 20-60 seconds
  • Wan 2.2 Spicy: 15-45 seconds

The CLI polls automatically and will timeout after 5 minutes if the request hasn't completed.

What image formats are supported for I2V?

PNG, JPEG, WebP, and GIF are supported for image-to-video mode. For best results, use PNG or JPEG at the target resolution.

What video formats are supported for V2V?

MP4, MOV, and AVI are supported for video-to-video mode. The output is always MP4.

Can I use this commercially?

Yes. Wan 2.1/2.2 models are open source (Apache 2.0). Wan 2.5/2.6 are closed-source commercial APIs. This CLI is MIT licensed. You own the generated content. Check Atlas Cloud's terms for API usage specifics.

How do I get the best quality?

  1. Use Wan 2.6 (latest model)
  2. Write detailed, specific prompts
  3. Use 720p resolution (best quality/speed balance)
  4. Include camera and lighting descriptions in your prompt

What's the maximum video duration?

Currently 3-10 seconds per generation. For longer videos, generate multiple clips and concatenate them with FFmpeg:

wan-video "scene 1" -o clip1 --duration 10
wan-video "scene 2" -o clip2 --duration 10
ffmpeg -f concat -i <(echo -e "file clip1.mp4\nfile clip2.mp4") -c copy final.mp4

Can I generate in 1080p?

Wan models currently support up to 720p natively. For 1080p output, generate at 720p and upscale with FFmpeg or an AI upscaler.

Troubleshooting

"API key not found"

Make sure your Atlas Cloud API key is set. See API Key Configuration.

"Request timed out"

Video generation can take 30-120 seconds depending on the model and resolution. The CLI will poll automatically. If it times out after 5 minutes, try again or use a lower resolution.

"Model not found"

Check that you're using a valid model alias. Run wan-video --help to see all available models.

"NSFW mode requires --image"

Wan 2.2 Spicy only supports image-to-video. You must provide an input image with --image.

πŸš€ Take This to Production Today

This workflow is optimized for Atlas Cloud. Move from experiment to enterprise-ready scale.

  • Production-Ready: Wan 2.6 at only from $0.07/s β€” cheapest API available
  • NSFW Support: Wan 2.2 Spicy for uncensored content at from $0.03/s
  • Enterprise Security: SOC I & II Certified | HIPAA Compliant
  • Zero Maintenance: Serverless architectureβ€”focus on your product, not the servers

πŸ‘‰ Start Building

License

MIT

Releases

No releases published

Packages

 
 
 

Contributors