-
-
Notifications
You must be signed in to change notification settings - Fork 241
Add video input file support #260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
aa6d971
9604320
f8c4655
228ab17
6a45b51
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -43,11 +43,12 @@ RubyLLM fixes all that. One beautiful API for everything. One consistent format. | |
|
||
```ruby | ||
# Just ask questions | ||
chat = RubyLLM.chat | ||
chat = RubyLLM.chat(model: "gemini-2.0-flash") | ||
chat.ask "What's the best way to learn Ruby?" | ||
|
||
# Analyze images, audio, documents, and text files | ||
# Analyze images, videos, audio, documents, and text files | ||
chat.ask "What's in this image?", with: "ruby_conf.jpg" | ||
chat.ask "What's happening in this video?", with: "presentation.mp4" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. let's call the example video |
||
chat.ask "Describe this meeting", with: "meeting.wav" | ||
chat.ask "Summarize this document", with: "contract.pdf" | ||
chat.ask "Explain this code", with: "app.rb" | ||
|
@@ -88,7 +89,8 @@ chat.with_tool(Weather).ask "What's the weather in Berlin? (52.5200, 13.4050)" | |
## Core Capabilities | ||
|
||
* 💬 **Unified Chat:** Converse with models from OpenAI, Anthropic, Gemini, Bedrock, OpenRouter, DeepSeek, Ollama, or any OpenAI-compatible API using `RubyLLM.chat`. | ||
* 👁️ **Vision:** Analyze images within chats. | ||
* 👁️ **Vision:** Analyze images and documents within chats. | ||
* 🎞️ **Video:** Analyze videos within chats. | ||
* 🔊 **Audio:** Transcribe and understand audio content. | ||
* 📄 **Document Analysis:** Extract information from PDFs, text files, and other documents. | ||
* 🖼️ **Image Generation:** Create images with `RubyLLM.paint`. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -119,7 +119,7 @@ RubyLLM manages a registry of known models and their capabilities. For detailed | |
|
||
## Multi-modal Conversations | ||
|
||
Modern AI models can often process more than just text. RubyLLM provides a unified way to include images, audio, text files, and PDFs in your chat messages using the `with:` option in the `ask` method. | ||
Modern AI models can often process more than just text. RubyLLM provides a unified way to include images, videos, audio, text files, and PDFs in your chat messages using the `with:` option in the `ask` method. | ||
|
||
### Working with Images | ||
|
||
|
@@ -144,6 +144,30 @@ puts response.content | |
|
||
RubyLLM handles converting the image source into the format required by the specific provider API. | ||
|
||
### Working with Videos | ||
|
||
You can also analyze video files or URLs with vision-capable models. RubyLLM will automatically detect video files and handle them appropriately. | ||
|
||
```ruby | ||
# Ask about a local video file | ||
chat = RubyLLM.chat(model: 'gemini-2.0-flash') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. change the video model to |
||
response = chat.ask "What happens in this video?", with: "path/to/demo.mp4" | ||
puts response.content | ||
|
||
# Ask about a video from a URL | ||
response = chat.ask "Summarize the main events in this video.", with: "https://example.com/demo_video.mp4" | ||
puts response.content | ||
|
||
# Combine videos with other file types | ||
response = chat.ask "Analyze these files for visual content.", with: ["diagram.png", "demo.mp4", "notes.txt"] | ||
puts response.content | ||
``` | ||
|
||
**Notes:** | ||
- Supported video formats include .mp4, .mov, .avi, .webm, and others (provider-dependent). | ||
- Only Google Gemini models currently support video input; check the [Available Models Guide]({% link guides/available-models.md %}) for details. | ||
- Large video files may be subject to size or duration limits imposed by the provider. | ||
|
||
### Working with Audio | ||
|
||
Provide audio file paths to audio-capable models (like `gpt-4o-audio-preview`). | ||
|
@@ -224,6 +248,7 @@ response = chat.ask "What's in this image?", with: { image: "photo.jpg" } | |
|
||
**Supported file types:** | ||
- **Images:** .jpg, .jpeg, .png, .gif, .webp, .bmp | ||
- **Videos:** .mp4, .mov, .avi, .webm | ||
- **Audio:** .mp3, .wav, .m4a, .ogg, .flac | ||
- **Documents:** .pdf, .txt, .md, .csv, .json, .xml | ||
- **Code:** .rb, .py, .js, .html, .css (and many others) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -69,11 +69,12 @@ RubyLLM fixes all that. One beautiful API for everything. One consistent format. | |
|
||
```ruby | ||
# Just ask questions | ||
chat = RubyLLM.chat | ||
chat = RubyLLM.chat(model: "gemini-2.0-flash") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. let's not change this. |
||
chat.ask "What's the best way to learn Ruby?" | ||
|
||
# Analyze images, audio, documents, and text files | ||
# Analyze images, videos, audio, documents, and text files | ||
chat.ask "What's in this image?", with: "ruby_conf.jpg" | ||
chat.ask "What's happening in this video?", with: "presentation.mp4" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
chat.ask "Describe this meeting", with: "meeting.wav" | ||
chat.ask "Summarize this document", with: "contract.pdf" | ||
chat.ask "Explain this code", with: "app.rb" | ||
|
@@ -114,7 +115,8 @@ chat.with_tool(Weather).ask "What's the weather in Berlin? (52.5200, 13.4050)" | |
## Core Capabilities | ||
|
||
* 💬 **Unified Chat:** Converse with models from OpenAI, Anthropic, Gemini, Bedrock, OpenRouter, DeepSeek, Ollama, or any OpenAI-compatible API using `RubyLLM.chat`. | ||
* 👁️ **Vision:** Analyze images within chats. | ||
* 👁️ **Vision:** Analyze images and documents within chats. | ||
* 🎞️ **Video:** Analyze videos within chats. | ||
* 🔊 **Audio:** Transcribe and understand audio content. | ||
* 📄 **Document Analysis:** Extract information from PDFs, text files, and other documents. | ||
* 🖼️ **Image Generation:** Create images with `RubyLLM.paint`. | ||
|
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -157,6 +157,10 @@ | |
{ provider: :ollama, model: 'qwen3' } | ||
].freeze | ||
|
||
VIDEO_MODELS = [ | ||
{ provider: :gemini, model: 'gemini-2.0-flash' } | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
].freeze | ||
|
||
AUDIO_MODELS = [ | ||
{ provider: :openai, model: 'gpt-4o-mini-audio-preview' } | ||
].freeze |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's not change this.