📊 Built-in evaluation harness with recommended metrics and benchmark datasets #722
CharnaParkey
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Today, you can hook up any evaluation harness to OpenRAG using OpenTelemetry. But we're considering providing an out-of-the-box evaluation harness that gives you everything you need to get started immediately.
What would be included:
1. Recommended Metrics - Curated set of metrics you should be tracking:
2. Benchmark Datasets - Standard datasets to measure against:
3. Synthetic Data Generation - Generate test data on demand:
How it would work:
Why this matters:
Questions for the community:
Vote with 👍 if you'd use this feature!
Tell us: How do you evaluate your RAG applications today? What makes it painful?
Beta Was this translation helpful? Give feedback.
All reactions