LLM Format Comparison Benchmark

This benchmark compares four data formats (EDI, TOON, JSON, YAML) to determine which is most efficient for LLM processing. It measures real token counts, processing times, and thinking times using Azure OpenAI models.

What It Does

Tests 10 invoices in four formats:

EDI - Structured electronic data interchange format
TOON - Custom compact format
JSON - Standard format
YAML - Human-readable data serialization format

Each format is tested with two prompt strategies:

Inferred (-INF) - Detailed prompts with field mappings and format descriptions
Raw (-RAW) - Minimal prompts, model figures out the structure itself

The benchmark asks complex questions requiring:

Filtering: Count invoices by region, segment, payment status
Structure validation: Verify declared_line_count matches actual lines
Conditional logic: Find invoices matching multiple criteria (NET45 AND paid)
Aggregation: Calculate totals and averages

The benchmark runs these combinations across two models (gpt-5-chat and o1), measuring token usage, speed, and accuracy. It generates a markdown report with analysis and recommendations.

Requirements

.NET 8 SDK
Azure OpenAI endpoint with gpt-5-chat and o1 deployments

Setup

Set your API credentials:

# Linux/macOS
export OPENAI_API_KEY="your-key"
export OPENAI_ENDPOINT="https://your-endpoint.openai.azure.com/openai/v1"

# Windows PowerShell
$env:OPENAI_API_KEY = "your-key"
$env:OPENAI_ENDPOINT = "https://your-endpoint.openai.azure.com/openai/v1"

Run

# Quick test (1 iteration per quick test = 10 invoices)
dotnet run

# Statistical test (20 iterations recommended for real values = 200 invoices)
dotnet run 20

The benchmark generates benchmark-report.md with complete results and analysis.

What Gets Tested

16 combinations total:

EDI-INF, EDI-RAW (both models)
TOON-INF, TOON-RAW (both models)
JSON-INF, JSON-RAW (both models)
YAML-INF, YAML-RAW (both models)

Each iteration takes 2-60 seconds depending on model and format. Total time for 20 iterations: ~2-2.5 hours.

Output

The markdown report includes:

Performance metrics table (tokens, time, accuracy)
AI responses for each format
Key insights (fastest, most efficient)
Expert interpretation and recommendations

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
DotNetLlmFormatDemo.csproj		DotNetLlmFormatDemo.csproj
Program.cs		Program.cs
README.md		README.md
dotnet_llm_format_demo.sln		dotnet_llm_format_demo.sln
invoice.edi		invoice.edi
invoice.json		invoice.json
invoice.toon		invoice.toon
invoice.yaml		invoice.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Format Comparison Benchmark

What It Does

Requirements

Setup

Run

What Gets Tested

Output

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM Format Comparison Benchmark

What It Does

Requirements

Setup

Run

What Gets Tested

Output

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages