Skip to content

jmfloreszazo/dotnet_llm_toon_format_demo

Repository files navigation

LLM Format Comparison Benchmark

This benchmark compares four data formats (EDI, TOON, JSON, YAML) to determine which is most efficient for LLM processing. It measures real token counts, processing times, and thinking times using Azure OpenAI models.

What It Does

Tests 10 invoices in four formats:

  • EDI - Structured electronic data interchange format
  • TOON - Custom compact format
  • JSON - Standard format
  • YAML - Human-readable data serialization format

Each format is tested with two prompt strategies:

  • Inferred (-INF) - Detailed prompts with field mappings and format descriptions
  • Raw (-RAW) - Minimal prompts, model figures out the structure itself

The benchmark asks complex questions requiring:

  • Filtering: Count invoices by region, segment, payment status
  • Structure validation: Verify declared_line_count matches actual lines
  • Conditional logic: Find invoices matching multiple criteria (NET45 AND paid)
  • Aggregation: Calculate totals and averages

The benchmark runs these combinations across two models (gpt-5-chat and o1), measuring token usage, speed, and accuracy. It generates a markdown report with analysis and recommendations.

Requirements

  • .NET 8 SDK
  • Azure OpenAI endpoint with gpt-5-chat and o1 deployments

Setup

Set your API credentials:

# Linux/macOS
export OPENAI_API_KEY="your-key"
export OPENAI_ENDPOINT="https://your-endpoint.openai.azure.com/openai/v1"

# Windows PowerShell
$env:OPENAI_API_KEY = "your-key"
$env:OPENAI_ENDPOINT = "https://your-endpoint.openai.azure.com/openai/v1"

Run

# Quick test (1 iteration per quick test = 10 invoices)
dotnet run

# Statistical test (20 iterations recommended for real values = 200 invoices)
dotnet run 20

The benchmark generates benchmark-report.md with complete results and analysis.

What Gets Tested

16 combinations total:

  • EDI-INF, EDI-RAW (both models)
  • TOON-INF, TOON-RAW (both models)
  • JSON-INF, JSON-RAW (both models)
  • YAML-INF, YAML-RAW (both models)

Each iteration takes 2-60 seconds depending on model and format. Total time for 20 iterations: ~2-2.5 hours.

Output

The markdown report includes:

  • Performance metrics table (tokens, time, accuracy)
  • AI responses for each format
  • Key insights (fastest, most efficient)
  • Expert interpretation and recommendations

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages