Introduce evaluators for agentic workflows #6514

shyamnamboodiripad · 2025-06-11T08:01:58Z

Introduces the following new evaluators as part of the Quality package: ToolCallAccuracyEvaluator, TaskAdherenceEvaluator, IntentResolutionEvalutor. Since the package is already GA, the new evaluators are currently marked [Experimental].

Also includes following changes:

Fixes a regex bug that was causing reasoning and chain of thought outputs present in the evaluation response to not be parsed correctly into the corresponding metrics.
Adds support for displaying tool calls and tool results in the conversation displayed in the report. This fixes [AI Evaluation] Tool messages (or anything without 'Text' contents) are not rendered in reports #6370
Adds support for displaying JSON content both in the conversation as well as in context (along with a new settings toggle for controlling pretty printing for the displayed JSON).
Adds tests for the new evaluators.

Fixes #6350
Fixes #6370

Microsoft Reviewers: Open in CodeFlow

src/Libraries/Microsoft.Extensions.AI.Evaluation.Quality/ChatResponseExtensions.cs

peterwald

This bug was breaking the parsing of reasons and chains-of-thought present within LLM generated evaluation responses

…etrievalEvaluator

… conversation details

shyamnamboodiripad requested a review from a team as a code owner June 11, 2025 08:01

github-actions bot added the area-ai-eval Microsoft.Extensions.AI.Evaluation and related label Jun 11, 2025

dotnet-policy-service bot assigned shyamnamboodiripad Jun 11, 2025

peterwald reviewed Jun 11, 2025

View reviewed changes

src/Libraries/Microsoft.Extensions.AI.Evaluation.Quality/ChatResponseExtensions.cs Show resolved Hide resolved

peterwald approved these changes Jun 11, 2025

View reviewed changes

shyamnamboodiripad added 13 commits June 12, 2025 15:15

Fix a regex bug

01bbf46

This bug was breaking the parsing of reasons and chains-of-thought present within LLM generated evaluation responses

Update some diagnostic messages

6eade5b

Report a diagnostic if no context chunks are supplied when invoking R…

b051881

…etrievalEvaluator

Introduce ToolCallAccuracyEvaluator

cd0a2fd

Add support for displaying tool calls and other json based content in…

f554de0

… conversation details

Add tests

043df3e

Remove SemanticKernel tests

82dd782

Introduce TaskAdherenceEvaluator

1c4f5c8

Update tests

3f37c9e

Make tool definitions more complete

0d832ff

Simplify serialization to JsonNode

093d2e1

Introduce IntentResolutionEvaluator

eb33b31

Add more tests

fd908ed

shyamnamboodiripad force-pushed the agentevals branch from 9992ded to fd908ed Compare June 12, 2025 22:18

shyamnamboodiripad added 2 commits June 13, 2025 00:05

Improve validation for inputs

ac33a87

Add missing context

cf1717f

shyamnamboodiripad enabled auto-merge (squash) June 13, 2025 07:40

shyamnamboodiripad merged commit 694b95e into dotnet:main Jun 13, 2025
6 checks passed

shyamnamboodiripad deleted the agentevals branch June 16, 2025 20:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Introduce evaluators for agentic workflows #6514

Introduce evaluators for agentic workflows #6514

Uh oh!

shyamnamboodiripad commented Jun 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

peterwald left a comment

Uh oh!

Uh oh!

Uh oh!

Introduce evaluators for agentic workflows #6514

Introduce evaluators for agentic workflows #6514

Uh oh!

Conversation

shyamnamboodiripad commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Microsoft Reviewers: Open in CodeFlow

Uh oh!

Uh oh!

peterwald left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

shyamnamboodiripad commented Jun 11, 2025 •

edited

Loading