Skip to content

[Bug]: Spans associated with experiment items are collected under incorrect traces #4483

@kraftn

Description

@kraftn

What component(s) are affected?

  • Opik Python SDK
  • Opik Typescript SDK
  • Opik Agent Optimizer SDK
  • Opik UI
  • Opik Server
  • Documentation

Opik version

  • Opik version: 1.9.48

Describe the problem

I'm using opik.evaluation.evaluator.evaluate together with opik.evaluation.metrics.ragas_metric.RagasMetricWrapper and track=True to evaluate my RAG application and compute Ragas metrics.

Expected behavior

After evaluation, each experiment item should have one associated trace. That trace should consist of spans that show steps used to calculate the metrics.

Actual behavior

After evaluation, each experiment item does have one associated trace. However:

  • Some traces are empty and contain no spans.
  • Other traces contain spans that belong to another traces.

I also noticed that the number of non-empty traces is equal to the number of task threads.

Screenshots

An empty trace:

Image

A trace with unrelated spans:

Image

Reproduction steps and code snippets

Code snippet that I'm using to evaluate the RAG application:

def get_ragas_metrics():
    ...


def run_rag(input_data):
    ...


opik_client = opik.Opik()
ragas_metrics = get_ragas_metrics()
scoring_metrics = [RagasMetricWrapper(metric, track=True) for metric in ragas_metrics]
num_task_threads = 2

evaluation_result = evaluate(
    opik_client.get_dataset("Dataset Name"),
    run_rag,
    scoring_metrics=scoring_metrics,
    experiment_name=None,
    verbose=2,
    task_threads=num_task_threads
)

Error logs or stack trace

No response

Healthcheck results

No response

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions