Skip to content

Conversation

iraszl
Copy link

@iraszl iraszl commented Sep 2, 2025

What this does

This PR adds content moderation functionality to RubyLLM, allowing developers to identify potentially harmful content before sending it to LLM providers. This helps prevent API key bans and ensures safer user interactions.

New Features

  • Content Moderation API: New RubyLLM.moderate() method for screening text content
  • Safety Categories: Detects sexual, hate, harassment, violence, self-harm, and other harmful content types
  • Convenience Methods: Easy-to-use helpers like flagged?, flagged_categories, and category_scores
  • Provider Integration: Currently supports OpenAI's moderation API with extensible architecture for future providers

Usage Examples

# Basic usage
result = RubyLLM.moderate("User input text")
puts result.flagged?  # => true/false

# Get flagged categories
puts result.flagged_categories  # => ["harassment", "hate"]

# Integration pattern - screen before chat
def safe_chat(user_input)
  moderation = RubyLLM.moderate(user_input)
  return "Content not allowed" if moderation.flagged?
  
  RubyLLM.chat.ask(user_input)
end

Changes Made

Core Implementation

  • New Class: RubyLLM::Moderate - Main moderation interface following existing patterns
  • Provider Method: Added moderate() to base Provider class
  • OpenAI Integration: OpenAI::Moderation module with API implementation
  • Main Module: Added RubyLLM.moderate() method for global access

Configuration

  • Default Model: Added default_moderation_model configuration option (defaults to omni-moderation-latest)
  • API Requirements: Requires OpenAI API key (follows existing provider pattern)

Documentation

  • Complete Guide: New moderation.md with examples
  • Integration Patterns: Real-world usage examples including Rails integration
  • Best Practices: Performance considerations and user experience guidelines

Testing

  • Test Suite: moderation_spec.rb with 4 test cases
  • VCR Cassettes: Mock API responses fo testing
  • Tests Passing: No regressions in existing functionality

Type of change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation
  • Performance improvement

Scope check

  • I read the Contributing Guide
  • This aligns with RubyLLM's focus on LLM communication
  • This isn't application-specific logic that belongs in user code
  • This benefits most users, not just my specific use case

Quality check

  • I ran overcommit --install and all hooks pass
  • I tested my changes thoroughly
  • I updated documentation if needed
  • I didn't modify auto-generated files manually (models.json, aliases.json)

API changes

  • Breaking change
  • New public methods/classes
  • Changed method signatures
  • No API changes

Related issues

N/A

@iraszl iraszl changed the title Moderate Add Content Moderation Feature Sep 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant