Skip to content

Introduce evaluators for agentic workflows #6514

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jun 13, 2025

Conversation

shyamnamboodiripad
Copy link
Contributor

@shyamnamboodiripad shyamnamboodiripad commented Jun 11, 2025

Introduces the following new evaluators as part of the Quality package: ToolCallAccuracyEvaluator, TaskAdherenceEvaluator, IntentResolutionEvalutor. Since the package is already GA, the new evaluators are currently marked [Experimental].

Also includes following changes:

  • Fixes a regex bug that was causing reasoning and chain of thought outputs present in the evaluation response to not be parsed correctly into the corresponding metrics.
  • Adds support for displaying tool calls and tool results in the conversation displayed in the report. This fixes [AI Evaluation] Tool messages (or anything without 'Text' contents) are not rendered in reports #6370
  • Adds support for displaying JSON content both in the conversation as well as in context (along with a new settings toggle for controlling pretty printing for the displayed JSON).
  • Adds tests for the new evaluators.

Fixes #6350
Fixes #6370

Microsoft Reviewers: Open in CodeFlow

@shyamnamboodiripad shyamnamboodiripad requested a review from a team as a code owner June 11, 2025 08:01
@github-actions github-actions bot added the area-ai-eval Microsoft.Extensions.AI.Evaluation and related label Jun 11, 2025
Copy link
Member

@peterwald peterwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

This was referenced Jul 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-ai-eval Microsoft.Extensions.AI.Evaluation and related
Projects
None yet
2 participants