This benchmark compares four data formats (EDI, TOON, JSON, YAML) to determine which is most efficient for LLM processing. It measures real token counts, processing times, and thinking times using Azure OpenAI models.
Tests 10 invoices in four formats:
- EDI - Structured electronic data interchange format
- TOON - Custom compact format
- JSON - Standard format
- YAML - Human-readable data serialization format
Each format is tested with two prompt strategies:
- Inferred (-INF) - Detailed prompts with field mappings and format descriptions
- Raw (-RAW) - Minimal prompts, model figures out the structure itself
The benchmark asks complex questions requiring:
- Filtering: Count invoices by region, segment, payment status
- Structure validation: Verify declared_line_count matches actual lines
- Conditional logic: Find invoices matching multiple criteria (NET45 AND paid)
- Aggregation: Calculate totals and averages
The benchmark runs these combinations across two models (gpt-5-chat and o1), measuring token usage, speed, and accuracy. It generates a markdown report with analysis and recommendations.
- .NET 8 SDK
- Azure OpenAI endpoint with
gpt-5-chatando1deployments
Set your API credentials:
# Linux/macOS
export OPENAI_API_KEY="your-key"
export OPENAI_ENDPOINT="https://your-endpoint.openai.azure.com/openai/v1"
# Windows PowerShell
$env:OPENAI_API_KEY = "your-key"
$env:OPENAI_ENDPOINT = "https://your-endpoint.openai.azure.com/openai/v1"# Quick test (1 iteration per quick test = 10 invoices)
dotnet run
# Statistical test (20 iterations recommended for real values = 200 invoices)
dotnet run 20The benchmark generates benchmark-report.md with complete results and analysis.
16 combinations total:
- EDI-INF, EDI-RAW (both models)
- TOON-INF, TOON-RAW (both models)
- JSON-INF, JSON-RAW (both models)
- YAML-INF, YAML-RAW (both models)
Each iteration takes 2-60 seconds depending on model and format. Total time for 20 iterations: ~2-2.5 hours.
The markdown report includes:
- Performance metrics table (tokens, time, accuracy)
- AI responses for each format
- Key insights (fastest, most efficient)
- Expert interpretation and recommendations