RouteProfile is a general framework for designing LLM profiles for routing. It formulates LLM profiling as a structured information integration problem over heterogeneous interaction histories, enabling more principled and effective routing across queries, domains, and models.
Highlights:
- General profile design space: Define LLM profiles along four dimensions: organizational form, representation type, aggregation depth, and learning configuration.
- Comprehensive evaluation: Evaluate LLM profiles across three representative routers under both standard routing and new-LLM routing settings.
Step 1: Data Collection β profile_data/ (manual / provided)
Step 2: Build Data Graph β results/result_data_graph/{mode}/
Step 3: Build Profile β results/model_profile_result/{mode}/
Step 4: Route & Evaluate β results/routing_result/{mode}/
Two routing settings:
| Mode | Description |
|---|---|
standard |
Standard routing with a known set of candidate LLMs |
newllm |
Generalisation to newly introduced, unseen LLMs |
pip install routeprofileFor Text-GNN profiles (requires vLLM):
pip install "routeprofile[text-gnn]"Install from source (editable):
git clone https://github.com/your-org/RouteProfile.git
cd RouteProfile
pip install -e .| Method | File | Org. form | Repr. type | Agg. depth | Learning |
|---|---|---|---|---|---|
flat |
flat.npz |
Flat | Text | 0 | Training-free |
index |
index.npz |
Flat | Embedding | 0 | Training-free |
emb_gnn |
emb_gnn.npz |
Structured | Embedding | Multi-hop | Training-free |
text_gnn |
text_gnn.npz |
Structured | Text | Multi-hop | Training-free |
trainable |
trainable_gnn.npz |
Structured | Embedding | Multi-hop | Trainable |
All functions are importable directly from routeprofile:
import routeprofile
print(routeprofile.__version__) # "0.1.0"from routeprofile import (
build_task_graph,
build_query_graph,
build_query_task_graph,
build_task_domain_graph,
build_query_task_domain_graph,
)
# Uses default profile_data/ inputs; outputs to results/result_data_graph/standard/
build_task_graph(mode="standard")
# Override any input/output path
build_query_task_domain_graph(
mode="standard",
json="profile_data/model_feature_standard.json",
arch="profile_data/model_family_feature.json",
dataset="profile_data/task_feature.json",
query="profile_data/task_queries_standard.json",
domain_map="profile_data/domain_task_map.json",
domain_feat="profile_data/domain_feature.json",
save="results/result_data_graph/standard/query_task_domain_graph_full.pt",
)from routeprofile import (
build_flat_profile,
build_emb_gnn_profile,
build_index_profile,
build_text_gnn_profile,
)
# Flat: Longformer encoding of model text + sampled neighbours
build_flat_profile(mode="standard")
# β results/model_profile_result/standard/flat.npz
# Index: random vector baseline (no text or graph)
build_index_profile(mode="standard")
# β results/model_profile_result/standard/index.npz
# Emb-GNN: K-hop neighbourhood propagation (training-free)
build_emb_gnn_profile(
mode="standard",
graph="results/result_data_graph/standard/task_graph_full.pt",
K=2,
norm="sym", # "sym" | "rw" | "none"
save="results/model_profile_result/standard/emb_gnn.npz",
)
# Text-GNN: LLM-based text aggregation per hop (requires vLLM)
build_text_gnn_profile(
mode="standard",
graph="results/result_data_graph/standard/query_task_domain_graph_full.pt",
K=1,
model="Qwen/Qwen2.5-7B-Instruct",
tp=1, # tensor parallel size (number of GPUs)
gpu_memory_utilization=0.6, # fraction of GPU memory for vLLM
keep=[], # [] = save all models; None = TARGET_MODELS only
emb_save="results/model_profile_result/standard/text_gnn.npz",
)from routeprofile import build_trainable_gnn_profile
build_trainable_gnn_profile(
mode="standard",
graph="results/result_data_graph/standard/task_graph_full.pt",
hidden_dim=256,
out_dim=128,
epochs=100,
save_emb="results/model_profile_result/standard/trainable_gnn.npz",
save_ckpt="results/trained_trainable_gnn/standard/pretrain_ckpt.pt",
)from routeprofile import call_simrouter, call_mlprouter, call_graphrouter
# SimRouter: training-free cosine similarity routing
call_simrouter(
model_profile_path="results/model_profile_result/standard/flat.npz",
routing_data_path="route_data/routing_test_data.json",
output_path="results/routing_result/standard/SimRouter_results.json",
)
# MLPRouter: pairwise-ranking MLP
call_mlprouter(
model_profile_path="results/model_profile_result/standard/emb_gnn.npz",
training_data_path="route_data/pairwise_training_data_standard.json",
testing_data_path="route_data/routing_test_data.json",
output_path="results/routing_result/standard/MLPRouter_results.json",
save_ckpt="results/trained_MLPRouter/standard/mlp_router_ckpt.pt",
epochs=50,
)
# GraphRouter: bipartite GAT
call_graphrouter(
model_profile_path="results/model_profile_result/standard/trainable_gnn.npz",
training_data_path="route_data/pairwise_training_data_standard.json",
testing_data_path="route_data/routing_test_data.json",
output_path="results/routing_result/standard/GraphRouter_results.json",
save_ckpt="results/trained_GraphRouter/standard/graphrouter_ckpt.pt",
epochs=50,
)You can also import the router classes directly:
from routeprofile import SimRouter, MLPRouter, GraphRouterAfter installation every step is available as a command-line tool:
# Step 2: Build graphs (outputs to results/result_data_graph/{mode}/)
routeprofile-build-task-graph --mode standard
routeprofile-build-query-graph --mode standard
routeprofile-build-query-task-graph --mode standard
routeprofile-build-task-domain-graph --mode standard
routeprofile-build-query-task-domain-graph --mode standard
# Step 3a: Training-free profiles (outputs to results/model_profile_result/{mode}/)
routeprofile-flat-profile --mode standard
routeprofile-index-profile --mode standard
routeprofile-emb-gnn-profile --mode standard --K 2
routeprofile-trainable-gnn-profile --mode standard --epochs 100
# Step 4: Routing (outputs to results/routing_result/{mode}/)
routeprofile-sim-router \
--model-profile-path results/model_profile_result/standard/flat.npz \
--routing-data-path route_data/routing_test_data.json
routeprofile-mlp-router \
--model-profile-path results/model_profile_result/standard/emb_gnn.npz \
--training-data-path route_data/pairwise_training_data_standard.json \
--testing-data-path route_data/routing_test_data.json \
--save-ckpt results/trained_MLPRouter/standard/mlp_router_ckpt.pt
routeprofile-graph-router \
--model-profile-path results/model_profile_result/standard/trainable_gnn.npz \
--training-data-path route_data/pairwise_training_data_standard.json \
--testing-data-path route_data/routing_test_data.json \
--save-ckpt results/trained_GraphRouter/standard/graphrouter_ckpt.ptAll commands accept --help for full usage.
# Build all graphs (standard mode)
bash routeprofile/scripts/step2_build_data_graph.sh standard
# All training-free profiles
bash routeprofile/scripts/step3a_training_free_profile.sh standard all
# Text-GNN (requires vLLM + GPU)
bash routeprofile/scripts/step3a_training_free_profile.sh standard text_gnn
# Trainable GNN
bash routeprofile/scripts/step3b_trainable_profile.sh standard
# Routing evaluation
bash routeprofile/scripts/step4_routing_evaluation.sh standard sim flat.npz
bash routeprofile/scripts/step4_routing_evaluation.sh standard all flat.npzRouteProfile/
βββ profile_data/ # Input data (read-only)
β βββ model_feature_standard.json # Model metadata (standard routing)
β βββ model_feature_newllm.json # Model metadata (newllm routing)
β βββ model_family_feature.json # Architecture family descriptions
β βββ task_queries_standard.json # Queries per benchmark (standard)
β βββ task_queries_newllm.json # Queries per benchmark (newllm)
β βββ task_feature.json # Benchmark task descriptions
β βββ domain_feature.json # Task domain descriptions
β βββ domain_task_map.json # Domain β benchmark mapping
β βββ candidate_models.json # Candidate LLM metadata
β
βββ route_data/ # Pre-computed routing data
β βββ routing_test_data.json # Test queries with model responses
β βββ pairwise_training_data_standard.json # Pairwise training data (standard)
β βββ pairwise_training_data_newllm.json # Pairwise training data (newllm)
β
βββ routeprofile/ # Library source
β βββ build_data_graph/ # Step 2: graph construction
β βββ get_model_profile/
β β βββ training_free/ # flat, index, emb_gnn, text_gnn
β β βββ trainable/ # HANConv self-supervised
β βββ routing_evaluation/ # SimRouter, MLPRouter, GraphRouter
β βββ scripts/ # Shell scripts for batch runs
β
βββ results/ # All generated outputs (git ignored)
β βββ result_data_graph/{standard,newllm}/ # Built graphs (.pt)
β βββ model_profile_result/{standard,newllm}/ # Model profiles (.npz)
β βββ routing_result/{standard,newllm}/ # Routing evaluation results (.json)
β βββ trained_trainable_gnn/{standard,newllm}/ # HANConv checkpoints
β βββ trained_MLPRouter/{standard,newllm}/ # MLP router checkpoints
β βββ trained_GraphRouter/{standard,newllm}/ # Graph router checkpoints
β
βββ tests/ # pytest test suite
βββ pyproject.toml
Main model metadata. Primary input to all graph builders.
{
"model-name": {
"size": "7B",
"feature": "Natural language description of the model...",
"architecture": "Qwen2ForCausalLM",
"detailed_scores": {
"ifeval": 75.85, "bbh": 53.94, "math": 50.0,
"gpqa": 29.11, "musr": 40.2, "mmlu_pro": 42.87
},
"parameters": 7.616,
"input_price": 0.2,
"output_price": 0.2,
"model": "qwen/qwen2.5-7b-instruct",
"service": "NVIDIA",
"api_endpoint": "https://integrate.api.nvidia.com/v1",
"average_score": 35.2
}
}Architecture family descriptions used as architecture node features.
{
"Qwen2ForCausalLM": "A family of decoder-only Transformer-based large language models developed by Alibaba Cloud...",
"LlamaForCausalLM": "A family of autoregressive large language models developed by Meta AI..."
}Natural language description of each benchmark task.
{
"ifeval": "IFEval (Instruction-Following Evaluation) is a benchmark designed to evaluate the ability of large language models to follow explicit natural language instructions...",
"bbh": "BBH (BIG-Bench Hard) is a challenging subset of the BIG-Bench benchmark..."
}Maps broad task domains to specific benchmarks.
{
"knowledge": ["mmlu", "mmlu_pro", "C-Eval", "AGIEval English", "SQuAD", "gpqa"],
"reasoning": ["bbh", "TheoremQA", "WinoGrande"],
"math": ["math", "gsm8k", "TheoremQA"],
"coding": ["human_eval", "mbpp"]
}Natural language description of each task domain.
{
"knowledge": "Knowledge tasks test factual recall and information retrieval...",
"reasoning": "Reasoning tasks require multi-step logical inference...",
"math": "Math tasks evaluate quantitative and symbolic problem solving..."
}Candidate model metadata including API endpoints and aggregate scores.
{
"qwen2.5-7b-instruct": {
"size": "7B",
"feature": "Qwen2.5-7B-Instruct represents an upgraded version...",
"input_price": 0.2,
"output_price": 0.2,
"model": "qwen/qwen2.5-7b-instruct",
"service": "NVIDIA",
"api_endpoint": "https://integrate.api.nvidia.com/v1",
"average_score": 35.2,
"detailed_scores": { "ifeval": 75.85, "bbh": 53.94 },
"parameters": 7.616,
"architecture": "Qwen2ForCausalLM"
}
}Per-benchmark query lists used to build query nodes.
{
"ifeval": ["Instruction 1...", "Instruction 2...", ...],
"bbh": ["Question 1...", "Question 2...", ...]
}Pre-computed model responses for test queries.
[
{
"task_name": "ifeval",
"query": "Follow these instructions...",
"ground_truth": "A",
"metric": "em_mc",
"choices": "{'text': ['A', 'B', 'C', 'D'], 'labels': ['A', 'B', 'C', 'D']}",
"model_performance": {
"qwen2.5-7b-instruct": { "response": "A", "task_performance": 1.0, "success": true }
}
}
]Pairwise training data for MLPRouter and GraphRouter. Each entry records which model outperforms which on a given query.
{
"task_data_count": {
"agentverse-logicgrid": 1352,
"gsm8k": 741
},
"pairwise_data": [
{
"task_name": "agentverse-logicgrid",
"query": "Q: There are 4 houses...",
"ground_truth": "B",
"metric": "em_mc",
"choices": "{'text': ['1', '2', '3', '4'], 'labels': ['A', 'B', 'C', 'D']}",
"task_id": null,
"better_model": "mistral-small-24b-instruct-2501-bf16",
"worse_model": "mixtral-8x22b-instruct-v0.1"
}
]
}Note: Use
pairwise_training_data_{mode}.jsonastraining_data_pathfor MLPRouter and GraphRouter. Therouting_test_data.jsonis used fortesting_data_path.
The default set of 8 candidate models:
| Model | Size | Architecture |
|---|---|---|
qwen2.5-7b-instruct |
7B | Qwen2ForCausalLM |
gemma-2-9b-it |
9B | Gemma2ForCausalLM |
llama-3.1-8b-instruct |
8B | LlamaForCausalLM |
mixtral-8x7b-instruct-v0.1 |
46.7B | MixtralForCausalLM |
mixtral-8x22b-instruct-v0.1 |
141B | MixtralForCausalLM |
llama-3.2-3b-instruct |
3B | LlamaForCausalLM |
mistral-small-24b-instruct-2501-bf16 |
24B | MistralForCausalLM |
llama-3.3-70b-instruct |
70B | LlamaForCausalLM |
| Router | Type | Description |
|---|---|---|
SimRouter |
Training-free | Cosine similarity between query and model embeddings |
MLPRouter |
Trainable | Pairwise ranking loss; query + model encoders |
GraphRouter |
Trainable | Bipartite GAT with edge prediction (BCE loss) |
If you use RouteProfile in your research, please cite:
@article{routeprofile2025,
title={RouteProfile: Elucidating the Design Space of LLM Profiles for Routing},
year={2025}
}
