Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
f3136e3
Support modalities for gemini-2.0-flash-preview-image-generation
tpaulshippy Jun 14, 2025
6e3128c
Extract images from chat response
tpaulshippy Jun 14, 2025
28cf942
Rubocop
tpaulshippy Jun 14, 2025
357ea8a
Set modalities from capabilities
tpaulshippy Jun 14, 2025
0fed244
Merge branch 'main' into image-to-image
tpaulshippy Jul 20, 2025
9a1eeb8
Attach output image to message content
tpaulshippy Jul 20, 2025
1f60caa
Update comment
tpaulshippy Jul 20, 2025
98097d4
Refine image in conversation
tpaulshippy Jul 20, 2025
1b58d43
Merge branch 'main' into image-to-image
tpaulshippy Jul 24, 2025
d14d1e9
Remove duplicate SEO tags from docs
crmne Jul 28, 2025
460108c
Updated models
crmne Jul 28, 2025
730a8c8
fix: add missing blank lines for improved readability in generator an…
crmne Jul 28, 2025
1c50176
fix: Rails integration with_context now works without global config
crmne Jul 30, 2025
2afc6b2
Anthropic: Fix system prompt (use plain text instead of serialized JS…
MichaelHoste Jul 30, 2025
af0ead4
Provide access to raw response object from Faraday (#304)
tpaulshippy Jul 30, 2025
98dabdb
Add Chat#on_tool_call callback (#299)
bryan-ash Jul 30, 2025
cbb4276
Added proper handling of streaming error responses across both Farada…
dansingerman Jul 30, 2025
8626a77
Add message ordering guidance to Rails docs (#288)
crmne Jul 30, 2025
b6095a5
Bump version to 1.4.0 and update VCR cassettes
crmne Jul 30, 2025
8b0809b
Update model pricing and capabilities in JSON configuration
crmne Jul 30, 2025
598b584
Fix Action Cable capitalization in Rails guide
crmne Jul 31, 2025
20ae7a5
Update README and docs with comprehensive feature list
crmne Jul 31, 2025
7595a4e
Update Rails guide with instant message display pattern
crmne Jul 31, 2025
8f0ba07
Add Perplexity provider support
crmne Jul 31, 2025
7842a0b
Move available models guide to top-level navigation
crmne Jul 31, 2025
fe9d9d9
Fix broken links to available-models guide after relocation
crmne Jul 31, 2025
b6f9c13
Add Mistral AI provider support
crmne Jul 31, 2025
81f0a8c
Update specs to disable additional RuboCop checks for multi-turn conv…
crmne Jul 31, 2025
c5e059a
docs: add mistral provider
crmne Jul 31, 2025
9ae1018
reorder providers alphabetically
crmne Jul 31, 2025
2336483
Bust cache of gem version badge in README
crmne Jul 31, 2025
bf8c096
Fix Rails generator migration order and PostgreSQL detection
crmne Jul 31, 2025
2ff42aa
Removed unnecessary rubocop disable comments after last commit
crmne Jul 31, 2025
5a19a41
Fix Mistral models created_at timestamps
crmne Jul 31, 2025
a405e9e
Version bump to 1.5.0
crmne Jul 31, 2025
07444e5
Fix model capabilities format and imagen output modality
crmne Aug 1, 2025
a6dcd40
Automatically generate appraisal gemfiles
crmne Aug 1, 2025
b3b4684
Update JRuby version in CI matrix to jruby-10.0.1.0
crmne Aug 1, 2025
98f0cd1
Bump version to 1.5.1
crmne Aug 1, 2025
db1d563
Bust cache for gem badge in README
crmne Aug 1, 2025
43afe2f
Bust cache again for gem badge
crmne Aug 1, 2025
4f7a163
Wire up on_tool_call when using acts_as_chat rails integration (#318)
agarcher Aug 1, 2025
f291744
Resolve rubocop offenses
tpaulshippy Aug 3, 2025
76c7714
Merge branch 'main' into image-to-image
tpaulshippy Aug 3, 2025
84a939f
Update guides
tpaulshippy Aug 3, 2025
5c2c5d2
Merge branch 'main' into image-to-image
tpaulshippy Aug 7, 2025
a68483a
Merge branch 'main' into image-to-image
tpaulshippy Aug 25, 2025
061a8a5
Merge branch 'main' into image-to-image
tpaulshippy Aug 28, 2025
9bf9e3e
Refactor image to image specs
tpaulshippy Aug 28, 2025
9704fe1
Support attachments when accumulating streams
tpaulshippy Aug 28, 2025
a6e8ce1
Support attachments when accumulating streams
tpaulshippy Aug 28, 2025
bd71bef
Merge branch 'main' into image-to-image
tpaulshippy Aug 30, 2025
6a64d78
Failing specs for #7 and #8
tpaulshippy Aug 29, 2025
9c492e5
Failing spec for #9
tpaulshippy Aug 29, 2025
71e12f2
Do not merge duplicate attachments
tpaulshippy Aug 29, 2025
88f9af7
Make messages with images attached serializable
tpaulshippy Aug 30, 2025
1016d1c
Rename some spec variables
tpaulshippy Aug 30, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 29 additions & 1 deletion docs/_core_features/chat.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ Many modern AI models can process multiple types of input beyond just text. Ruby

### Working with Images

Vision-capable models can analyze images, answer questions about visual content, and even compare multiple images. Common vision models include `gpt-4o`, `claude-3-opus`, and `gemini-1.5-pro`.
Vision-capable models can analyze images, answer questions about visual content, and even compare multiple images. Some specialized models can also generate and edit images. Common vision models include `gpt-4o`, `claude-3-opus`, and `gemini-1.5-pro`.

```ruby
# Ensure you select a vision-capable model
Expand All @@ -150,6 +150,34 @@ puts response.content

RubyLLM automatically handles image encoding and formatting for each provider's API. Local images are read and encoded as needed, while URLs are passed directly when supported by the provider.

### Image Generation with Chat

While most vision models analyze images, some specialized models can generate and edit images through the chat interface. This approach is ideal for image editing workflows and iterative refinement:

```ruby
# Use a model capable of image generation
chat = RubyLLM.chat(model: 'gemini-2.0-flash-preview-image-generation')

# Edit an existing image
response = chat.ask('make this look more futuristic', with: 'current_design.png')

# Access generated images from attachments
if response.content.attachments.any?
generated_image = response.content.attachments.first.image
puts "Generated image: #{generated_image.mime_type}"

# Save the generated image
generated_image.save('futuristic_design.png')
end

# Continue refining in the same conversation
response = chat.ask('add some neon lighting effects')
refined_image = response.content.attachments.first.image
refined_image.save('futuristic_with_neon.png')
```

For simple text-to-image generation without existing images, see the [Image Generation Guide]({% link guides/image-generation.md %}).

### Working with Audio

Audio-capable models can transcribe speech, analyze audio content, and answer questions about what they hear. Currently, models like `gpt-4o-audio-preview` support audio input.
Expand Down
86 changes: 81 additions & 5 deletions docs/_core_features/image-generation.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ redirect_from:
After reading this guide, you will know:

* How to generate images from text prompts.
* How to edit and modify existing images.
* How to refine images through multi-turn conversations.
* How to select different image generation models.
* How to specify image sizes (for supported models).
* How to access and save generated image data (URL or Base64).
Expand Down Expand Up @@ -98,6 +100,75 @@ end

Refer to the [Working with Models Guide]({% link _advanced/models.md %}) and the [Available Models Guide]({% link _reference/available-models.md %}) to find image models.

## Image Editing & Modification

Beyond generating images from text prompts, you can also edit and modify existing images using capable models like `gemini-2.0-flash-preview-image-generation`. This approach uses the chat interface rather than the `paint` method.

### Basic Image Editing

Use the chat interface with image generation models to edit existing images:

```ruby
# Start a chat with an image generation model
chat = RubyLLM.chat(model: 'gemini-2.0-flash-preview-image-generation')

# Edit an existing image
response = chat.ask('put this in a ring', with: 'path/to/ruby.png')

# Access the generated image from the response
image = response.content.attachments.first.image

# Check image properties
puts "Generated image: #{image.mime_type}"
puts "Base64 encoded: #{image.base64?}"
puts "Data size: ~#{image.data.length} bytes" if image.base64?

# Save the edited image
saved_path = image.save('ruby_with_ring.png')
puts "Saved to: #{saved_path}"
```

### Multi-turn Image Refinement

One of the powerful features of using the chat interface is the ability to refine generated images through conversation:

```ruby
chat = RubyLLM.chat(model: 'gemini-2.0-flash-preview-image-generation')

# First edit - add a ring to the ruby image
chat.ask('put this in a ring', with: 'path/to/ruby.png')

# Refine the result in the same conversation
response = chat.ask('change the background to blue')

# The model will modify the previously generated image
refined_image = response.content.attachments.first.image
refined_image.save('ruby_ring_blue_background.png')

# Continue refining
response = chat.ask('make the ring more ornate and golden')
final_image = response.content.attachments.first.image
final_image.save('ruby_ornate_golden_ring.png')
```

### Chat vs Paint Methods

RubyLLM provides two approaches for image generation:

- **`RubyLLM.paint`**: Best for simple text-to-image generation from scratch
- **`RubyLLM.chat` with image models**: Best for image editing, refinement, and complex workflows

Use the chat interface for:
- Editing existing images
- Multi-turn image refinement and iteration
- Complex image generation workflows
- When you need conversation context and memory

Use the paint method for:
- Simple text-to-image generation
- One-off image creation
- When you don't need conversation context

## Image Sizes

Some models, like DALL-E 3, allow you to specify the desired image dimensions via the `size:` argument.
Expand All @@ -124,7 +195,7 @@ image_portrait = RubyLLM.paint(

## Working with Generated Images

The `RubyLLM::Image` object provides access to the generated image data and metadata.
The `RubyLLM::Image` object provides access to the generated image data and metadata, whether the image was created using `RubyLLM.paint` or retrieved from a chat response.

### Accessing Image Data

Expand All @@ -138,10 +209,15 @@ The `RubyLLM::Image` object provides access to the generated image data and meta
The `save` method works regardless of whether the image was delivered via URL or Base64. It fetches the data if necessary and writes it to the specified file path.

```ruby
# Generate an image
# Generate an image using paint method
image = RubyLLM.paint("A steampunk mechanical owl")

# Save the image to a local file
# Or get an image from a chat response
# chat = RubyLLM.chat(model: 'gemini-2.0-flash-preview-image-generation')
# response = chat.ask("Create a steampunk mechanical owl")
# image = response.content.attachments.first.image

# Save the image to a local file (works the same for both methods)
begin
saved_path = image.save("steampunk_owl.png")
puts "Image saved to #{saved_path}"
Expand Down Expand Up @@ -275,6 +351,6 @@ Image generation can take several seconds (typically 5-20 seconds depending on t

## Next Steps

* [Chatting with AI Models]({% link _core_features/chat.md %}): Learn about conversational AI.
* [Chatting with AI Models]({% link _core_features/chat.md %}): Learn about conversational AI and using chat for advanced image workflows.
* [Embeddings]({% link _core_features/embeddings.md %}): Explore text vector representations.
* [Error Handling]({% link _advanced/error-handling.md %}): Master handling API errors.
* [Error Handling]({% link _advanced/error-handling.md %}): Master handling API errors.
5 changes: 5 additions & 0 deletions lib/ruby_llm/content.rb
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,11 @@ def add_attachment(source, filename: nil)
self
end

def attach(attachment)
@attachments << attachment
self
end

def format
if @text && @attachments.empty?
@text
Expand Down
22 changes: 22 additions & 0 deletions lib/ruby_llm/image_attachment.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# frozen_string_literal: true

module RubyLLM
# A class representing a file attachment that is an image generated by an LLM.
class ImageAttachment < Attachment
attr_reader :image, :content

def initialize(data:, mime_type:, model_id:)
super(nil, filename: nil)
@image = Image.new(data:, mime_type:, model_id:)
@mime_type = mime_type
end

def image?
true
end

def encoded
image.data
end
end
end
2 changes: 1 addition & 1 deletion lib/ruby_llm/message.rb
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ def tool_results
def to_h
{
role: role,
content: content,
content: content.is_a?(Content) ? content.to_h : content,
tool_calls: tool_calls,
tool_call_id: tool_call_id,
input_tokens: input_tokens,
Expand Down
3 changes: 3 additions & 0 deletions lib/ruby_llm/providers/gemini/capabilities.rb
Original file line number Diff line number Diff line change
Expand Up @@ -219,6 +219,9 @@ def modalities_for(model_id)

modalities[:input] << 'audio' if model_id.match?(/audio/)
modalities[:output] << 'embeddings' if model_id.match?(/embedding|gemini-embedding/)

modalities[:output] << 'image' if model_id.match?(/image-generation/)

modalities[:output] = ['image'] if model_id.match?(/imagen/)

modalities
Expand Down
4 changes: 3 additions & 1 deletion lib/ruby_llm/providers/gemini/chat.rb
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,9 @@ def render_payload(messages, tools:, temperature:, model:, stream: false, schema
@model = model.id
payload = {
contents: format_messages(messages),
generationConfig: {}
generationConfig: {
responseModalities: capabilities.modalities_for(model.id)[:output]
}
}

payload[:generationConfig][:temperature] = temperature unless temperature.nil?
Expand Down
16 changes: 15 additions & 1 deletion lib/ruby_llm/providers/gemini/streaming.rb
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,21 @@ def extract_content(data)
return nil unless parts

text_parts = parts.select { |p| p['text'] }
text_parts.map { |p| p['text'] }.join if text_parts.any?
image_parts = parts.select { |p| p['inlineData'] }

content = RubyLLM::Content.new(text_parts.map { |p| p['text'] }.join)

image_parts.map do |p|
content.attach(
ImageAttachment.new(
data: p['inlineData']['data'],
mime_type: p['inlineData']['mimeType'],
model_id: data['modelVersion']
)
)
end

content
end

def extract_input_tokens(data)
Expand Down
50 changes: 47 additions & 3 deletions lib/ruby_llm/stream_accumulator.rb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ class StreamAccumulator
attr_reader :content, :model_id, :tool_calls

def initialize
@content = +''
@content = nil
@tool_calls = {}
@input_tokens = 0
@output_tokens = 0
Expand All @@ -20,7 +20,7 @@ def add(chunk)
if chunk.tool_call?
accumulate_tool_calls chunk.tool_calls
else
@content << (chunk.content || '')
accumulate_content(chunk.content)
end

count_tokens chunk
Expand All @@ -30,7 +30,7 @@ def add(chunk)
def to_message(response)
Message.new(
role: :assistant,
content: content.empty? ? nil : content,
content: final_content,
model_id: model_id,
tool_calls: tool_calls_from_stream,
input_tokens: @input_tokens.positive? ? @input_tokens : nil,
Expand All @@ -41,6 +41,50 @@ def to_message(response)

private

def accumulate_content(new_content)
return unless new_content

if @content.nil?
@content = new_content.is_a?(String) ? +new_content : new_content
else
case [@content.class, new_content.class]
when [String, String]
@content << new_content
when [String, Content]
@content = Content.new(@content)
merge_content(new_content)
when [Content, String]
@content.instance_variable_set(:@text, (@content.text || '') + new_content)
when [Content, Content]
merge_content(new_content)
end
end
end

def merge_content(new_content)
current_text = @content.text || ''
new_text = new_content.text || ''
@content.instance_variable_set(:@text, current_text + new_text)

existing_encoded = @content.attachments.map(&:encoded)
new_content.attachments.each do |attachment|
@content.attach(attachment) unless existing_encoded.include?(attachment.encoded)
end
end

def final_content
case @content
when nil
nil
when String
@content.empty? ? nil : @content
when Content
@content.text.nil? && @content.attachments.empty? ? nil : @content
else
@content
end
end

def tool_calls_from_stream
tool_calls.transform_values do |tc|
arguments = if tc.arguments.is_a?(String) && !tc.arguments.empty?
Expand Down

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Loading