Add Content Moderation Feature #383

iraszl · 2025-09-02T02:46:23Z

What this does

This PR adds content moderation functionality to RubyLLM, allowing developers to identify potentially harmful content before sending it to LLM providers. This helps prevent API key bans and ensures safer user interactions.

New Features

Content Moderation API: New RubyLLM.moderate() method for screening text content
Safety Categories: Detects sexual, hate, harassment, violence, self-harm, and other harmful content types
Convenience Methods: Easy-to-use helpers like flagged?, flagged_categories, and category_scores
Provider Integration: Currently supports OpenAI's moderation API with extensible architecture for future providers

Usage Examples

# Basic usage
result = RubyLLM.moderate("User input text")
puts result.flagged?  # => true/false

# Get flagged categories
puts result.flagged_categories  # => ["harassment", "hate"]

# Integration pattern - screen before chat
def safe_chat(user_input)
  moderation = RubyLLM.moderate(user_input)
  return "Content not allowed" if moderation.flagged?
  
  RubyLLM.chat.ask(user_input)
end

Changes Made

Core Implementation

New Class: RubyLLM::Moderate - Main moderation interface following existing patterns
Provider Method: Added moderate() to base Provider class
OpenAI Integration: OpenAI::Moderation module with API implementation
Main Module: Added RubyLLM.moderate() method for global access

Configuration

Default Model: Added default_moderation_model configuration option (defaults to omni-moderation-latest)
API Requirements: Requires OpenAI API key (follows existing provider pattern)

Documentation

Complete Guide: New moderation.md with examples
Integration Patterns: Real-world usage examples including Rails integration
Best Practices: Performance considerations and user experience guidelines

Testing

Test Suite: moderation_spec.rb with 4 test cases
VCR Cassettes: Mock API responses fo testing
Tests Passing: No regressions in existing functionality

Type of change

Scope check

I read the Contributing Guide
This aligns with RubyLLM's focus on LLM communication
This isn't application-specific logic that belongs in user code
This benefits most users, not just my specific use case

Quality check

I ran overcommit --install and all hooks pass
I tested my changes thoroughly
I updated documentation if needed
I didn't modify auto-generated files manually (models.json, aliases.json)

API changes

Breaking change
New public methods/classes
Changed method signatures
No API changes

Related issues

N/A

iraszl added 3 commits September 2, 2025 09:43

add moderation API

47db2a4

update docs, fix rubocop violations, update gemfile locks

319fcc9

update docs

7878f89

iraszl changed the title ~~Moderate~~ Add Content Moderation Feature Sep 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add Content Moderation Feature #383

Add Content Moderation Feature #383

Uh oh!

iraszl commented Sep 2, 2025

Uh oh!

Uh oh!

Uh oh!

Add Content Moderation Feature #383

Are you sure you want to change the base?

Add Content Moderation Feature #383

Uh oh!

Conversation

iraszl commented Sep 2, 2025

What this does

New Features

Usage Examples

Changes Made

Type of change

Scope check

Quality check

API changes

Related issues

Uh oh!

Uh oh!