You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"description": "Add and manage evaluation results in Hugging Face model cards. Supports extracting eval tables from README content and importing scores from Artificial Analysis API.",
3
+
"version": "1.3.0",
4
+
"description": "Add and manage evaluation results in Hugging Face model cards. Supports extracting eval tables from README content, importing scores from Artificial Analysis API, and running custom model evaluations with vLLM/lighteval/inspect-ai.",
Copy file name to clipboardExpand all lines: hf_model_evaluation/skills/hugging-face-evaluation-manager/SKILL.md
+223-7Lines changed: 223 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,29 +1,50 @@
1
1
---
2
2
name: hugging-face-evaluation-manager
3
-
description: Add and manage evaluation results in Hugging Face model cards. Supports extracting eval tables from README content and importing scores from Artificial Analysis API. Works with the model-index metadata format.
3
+
description: Add and manage evaluation results in Hugging Face model cards. Supports extracting eval tables from README content, importing scores from Artificial Analysis API, and running custom model evaluations with vLLM/lighteval. Works with the model-index metadata format.
4
4
---
5
5
6
6
# Overview
7
-
This skill provides tools to add structured evaluation results to Hugging Face model cards. It supports two primary methods for adding evaluation data: extracting existing evaluation tables from README content and importing benchmark scores from Artificial Analysis.
7
+
This skill provides tools to add structured evaluation results to Hugging Face model cards. It supports multiple methods for adding evaluation data:
8
+
- Extracting existing evaluation tables from README content
9
+
- Importing benchmark scores from Artificial Analysis
10
+
- Running custom model evaluations with vLLM or accelerate backends (lighteval/inspect-ai)
8
11
9
12
## Integration with HF Ecosystem
10
13
-**Model Cards**: Updates model-index metadata for leaderboard integration
11
14
-**Artificial Analysis**: Direct API integration for benchmark imports
12
15
-**Papers with Code**: Compatible with their model-index specification
13
16
-**Jobs**: Run evaluations directly on Hugging Face Jobs with `uv` integration
17
+
-**vLLM**: Efficient GPU inference for custom model evaluation
18
+
-**lighteval**: HuggingFace's evaluation library with vLLM/accelerate backends
19
+
-**inspect-ai**: UK AI Safety Institute's evaluation framework
14
20
15
21
# Version
16
-
1.2.0
22
+
1.3.0
17
23
18
24
# Dependencies
25
+
26
+
## Core Dependencies
19
27
- huggingface_hub>=0.26.0
20
28
- markdown-it-py>=3.0.0
21
29
- python-dotenv>=1.2.1
22
30
- pyyaml>=6.0.3
23
31
- requests>=2.32.5
24
-
- inspect-ai>=0.3.0
25
32
- re (built-in)
26
33
34
+
## Inference Provider Evaluation
35
+
- inspect-ai>=0.3.0
36
+
- inspect-evals
37
+
- openai
38
+
39
+
## vLLM Custom Model Evaluation (GPU required)
40
+
- lighteval[accelerate,vllm]>=0.6.0
41
+
- vllm>=0.4.0
42
+
- torch>=2.0.0
43
+
- transformers>=4.40.0
44
+
- accelerate>=0.30.0
45
+
46
+
Note: vLLM dependencies are installed automatically via PEP 723 script headers when using `uv run`.
47
+
27
48
# IMPORTANT: Using This Skill
28
49
29
50
## ⚠️ CRITICAL: Check for Existing PRs Before Creating New Ones
### Method 4: Run Custom Model Evaluation with vLLM
244
+
245
+
Evaluate custom HuggingFace models directly on GPU using vLLM or accelerate backends. These scripts are **separate from inference provider scripts** and run models locally on the job's hardware.
246
+
247
+
#### When to Use vLLM Evaluation (vs Inference Providers)
0 commit comments