-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Labels
Description
What component(s) are affected?
- Opik Python SDK
- Opik Typescript SDK
- Opik Agent Optimizer SDK
- Opik UI
- Opik Server
- Documentation
Opik version
- Opik version: 1.9.48
Describe the problem
I'm using opik.evaluation.evaluator.evaluate together with opik.evaluation.metrics.ragas_metric.RagasMetricWrapper and track=True to evaluate my RAG application and compute Ragas metrics.
Expected behavior
After evaluation, each experiment item should have one associated trace. That trace should consist of spans that show steps used to calculate the metrics.
Actual behavior
After evaluation, each experiment item does have one associated trace. However:
- Some traces are empty and contain no spans.
- Other traces contain spans that belong to another traces.
I also noticed that the number of non-empty traces is equal to the number of task threads.
Screenshots
An empty trace:
A trace with unrelated spans:
Reproduction steps and code snippets
Code snippet that I'm using to evaluate the RAG application:
def get_ragas_metrics():
...
def run_rag(input_data):
...
opik_client = opik.Opik()
ragas_metrics = get_ragas_metrics()
scoring_metrics = [RagasMetricWrapper(metric, track=True) for metric in ragas_metrics]
num_task_threads = 2
evaluation_result = evaluate(
opik_client.get_dataset("Dataset Name"),
run_rag,
scoring_metrics=scoring_metrics,
experiment_name=None,
verbose=2,
task_threads=num_task_threads
)Error logs or stack trace
No response
Healthcheck results
No response