|
1 | | -# NeMo-Evaluator-Launcher |
| 1 | +# NeMo Evaluator Launcher |
2 | 2 |
|
3 | | -A comprehensive evaluation platform for large language models (LLMs) that supports multiple benchmarks and execution environments. |
4 | | - |
5 | | -> **Submit bugs**: please help us improve by submitting bugs and improvements http://nv/eval.issue! |
6 | | -
|
7 | | -> Below applies to version `0.3.0+` |
8 | | -
|
9 | | -## Installation |
10 | | - |
11 | | -Install both `internal` and public the package using pip: |
12 | | - |
13 | | -```bash |
14 | | -pip install nemo-evaluator-launcher --index-url <TODO: add URL> |
15 | | -``` |
16 | | - |
17 | | -### Optional Exporters |
18 | | - |
19 | | -To use the result exporters, install the optional dependencies separately: |
20 | | - |
21 | | -```bash |
22 | | -# Install with MLflow exporter |
23 | | -pip install nemo-evaluator-launcher-internal[mlflow] --index-url <TODO: add URL> |
24 | | - |
25 | | -# Install with Weights & Biases exporter |
26 | | -pip install nemo-evaluator-launcher-internal[wandb] --index-url <TODO: add URL> |
27 | | - |
28 | | -# Install with Google Sheets exporter |
29 | | -pip install nemo-evaluator-launcher-internal[gsheets] --index-url <TODO: add URL> |
30 | | - |
31 | | -# Install with multiple exporters |
32 | | -pip install nemo-evaluator-launcher-internal[mlflow,wandb,gsheets] --index-url <TODO: add URL> |
33 | | -``` |
34 | | - |
35 | | -**Supported Exporters:** |
36 | | -- **MLflow**: Track experiments and metrics in MLflow |
37 | | -- **Weights & Biases**: Log results to W&B for experiment tracking |
38 | | -- **Google Sheets**: Export results to Google Sheets for analysis |
39 | | - |
40 | | -### Lepton AI Execution |
41 | | - |
42 | | -For Lepton AI execution, install leptonai and configure credentials: |
43 | | - |
44 | | -```bash |
45 | | -pip install leptonai |
46 | | -lep login login -c <workspace_id>:<token> |
47 | | -``` |
48 | | - |
49 | | -## Quick Start |
50 | | - |
51 | | -### 1. List Available Benchmarks |
52 | | - |
53 | | -View all available evaluation benchmarks: |
54 | | - |
55 | | -```bash |
56 | | -nv-eval ls |
57 | | -``` |
58 | | - |
59 | | -**TODO(public release)**: change reference to the `nemo-evaluator`: readme |
60 | | - |
61 | | -### 2. Run Evaluations |
62 | | - |
63 | | -NV-Eval uses Hydra for configuration management. You can run evaluations using predefined configurations or create your own. |
64 | | - |
65 | | -#### Using Example Configurations |
66 | | - |
67 | | -The [examples/](examples/) directory contains ready-to-use configurations: |
68 | | - |
69 | | -- **Local execution**: [local_llama_3_1_8b_instruct.yaml](examples/local_llama_3_1_8b_instruct.yaml) |
70 | | -- **Slurm cluster execution**: [slurm_llama_3_1_8b_instruct.yaml](examples/slurm_llama_3_1_8b_instruct.yaml) |
71 | | -- **Lepton AI execution**: [lepton_nim_llama_3_1_8b_instruct.yaml](examples/lepton_nim_llama_3_1_8b_instruct.yaml), [lepton_vllm_llama_3_1_8b_instruct.yaml](examples/lepton_vllm_llama_3_1_8b_instruct.yaml), [lepton_none_llama_3_1_8b_instruct.yaml](examples/lepton_none_llama_3_1_8b_instruct.yaml) |
72 | | - |
73 | | -Run a local evaluation (requires docker): |
74 | | -```bash |
75 | | -nv-eval run --config-dir examples --config-name local_llama_3_1_8b_instruct --override execution.output_dir=<YOUR_OUTPUT_LOCAL_DIR> |
76 | | -``` |
77 | | - |
78 | | -Run a Slurm cluster evaluation: |
79 | | -```bash |
80 | | -nv-eval run --config-dir examples --config-name slurm_llama_3_1_8b_instruct --override execution.output_dir=<YOUR_OUTPUT_DIR_ON_CLUSTER> |
81 | | -``` |
82 | | - |
83 | | -Run a Lepton AI evaluation (requires leptonai and Lepton credentials): |
84 | | -```bash |
85 | | -# Deploy NIM model and run evaluation |
86 | | -nv-eval run --config-dir examples --config-name lepton_nim_llama_3_1_8b_instruct |
87 | | - |
88 | | -# Deploy vLLM model and run evaluation |
89 | | -nv-eval run --config-dir examples --config-name lepton_vllm_llama_3_1_8b_instruct |
90 | | - |
91 | | -# Use existing endpoint for evaluation |
92 | | -nv-eval run --config-dir examples --config-name lepton_none_llama_3_1_8b_instruct |
93 | | -``` |
94 | | - |
95 | | -#### Lepton Execution Strategy |
96 | | - |
97 | | -The Lepton executor provides **parallel endpoint deployment** for optimal resource isolation and performance: |
98 | | - |
99 | | -- **Dedicated Endpoints**: Each evaluation task gets its own endpoint of the same model |
100 | | -- **Parallel Deployment**: All endpoints created simultaneously (~3x faster than sequential) |
101 | | -- **Resource Isolation**: Tasks run independently without interference |
102 | | -- **Storage Isolation**: Each evaluation gets isolated directory `/shared/nv-eval-workspace/{invocation_id}` |
103 | | -- **Simple Cleanup**: Single command removes all endpoints and storage |
104 | | - |
105 | | -**Architecture Diagram:** |
106 | | - |
107 | | -```mermaid |
108 | | -graph TD |
109 | | - A["nv-eval run"] --> B["Load Tasks"] |
110 | | -
|
111 | | - B --> D["Endpoints Deployment"] |
112 | | -
|
113 | | - D --> E1["Deployment 1: Create Endpoint 1"] |
114 | | - D --> E2["Deployment 2: Create Endpoint 2"] |
115 | | - D --> E3["Deployment 3: Create Endpoint 3"] |
116 | | -
|
117 | | - E1 --> F["Wait for All Ready"] |
118 | | - E2 --> F |
119 | | - E3 --> F |
120 | | -
|
121 | | - F --> G["Mount Storage per Task"] |
122 | | -
|
123 | | - G --> H["Parallel Tasks Creation as Jobs in Lepton"] |
124 | | -
|
125 | | - H --> J1["Task 1: Job 1 Evaluation"] |
126 | | - H --> J2["Task 2: Job 2 Evaluation"] |
127 | | - H --> J3["Task 3: Job 3 Evaluation"] |
128 | | -
|
129 | | - J1 --> K["Execute in Parallel"] |
130 | | - J2 --> K |
131 | | - J3 --> K |
132 | | -
|
133 | | - K --> L["Finish"] |
134 | | -
|
135 | | -
|
136 | | - style D fill:#e1f5fe |
137 | | - style G fill:#fff3e0 |
138 | | - style H fill:#f3e5f5 |
139 | | - style K fill:#e8f5e8 |
140 | | - style A fill:#fff3e0 |
141 | | - style M fill:#ffebee |
142 | | -``` |
143 | | - |
144 | | -**Example Configuration:** |
145 | | -```yaml |
146 | | -evaluation: |
147 | | - tasks: |
148 | | - - name: gpqa_diamond # Gets endpoint: nim-gpqa-d-0-abc123 |
149 | | - - name: hellaswag # Gets endpoint: nim-hellas-1-abc123 |
150 | | - - name: winogrande # Gets endpoint: nim-winogr-2-abc123 |
151 | | -``` |
152 | | -
|
153 | | -Generate all the configs: |
154 | | -```bash |
155 | | -python scripts/generate_configs.py |
156 | | -``` |
157 | | - |
158 | | -#### Creating Custom Configurations |
159 | | - |
160 | | -1. Create your own configuration directory: |
161 | | -```bash |
162 | | -mkdir my_configs |
163 | | -``` |
164 | | - |
165 | | -2. Copy an example configuration as a starting point: |
166 | | -```bash |
167 | | -cp examples/local_llama_3_1_8b_instruct.yaml my_configs/my_evaluation.yaml |
168 | | -``` |
169 | | - |
170 | | -3. Modify the configuration to suit your needs: |
171 | | - - Change the model endpoint |
172 | | - - Adjust evaluation parameters |
173 | | - - Select different benchmarks |
174 | | - - Configure execution settings |
175 | | - |
176 | | -4. Run your custom configuration: |
177 | | -```bash |
178 | | -nv-eval run --config-dir my_configs --config-name my_evaluation |
179 | | -``` |
180 | | - |
181 | | -#### Configuration Overrides |
182 | | - |
183 | | -You can override configuration values from the command line (`-o` can be used multiple times, the notation is following hydra) |
184 | | - |
185 | | -```bash |
186 | | -nv-eval run --config-dir examples --config-name local_llama_3_1_8b_instruct \ |
187 | | - -o execution.output_dir=my_results \ |
188 | | - -o target.api_endpoint.model_id=model/another/one |
189 | | -``` |
190 | | - |
191 | | -#### Environment Variables in Deployment |
192 | | - |
193 | | -The platform supports passing environment variables to deployment containers in a Hydra-extensible way: |
194 | | - |
195 | | -**Direct Values:** |
196 | | -```yaml |
197 | | -deployment: |
198 | | - type: vllm |
199 | | - envs: |
200 | | - CUDA_VISIBLE_DEVICES: "0,1,2,3,4,5,6,7" |
201 | | - OMP_NUM_THREADS: "1" |
202 | | - VLLM_USE_FLASH_ATTN: "1" |
203 | | -``` |
204 | | -
|
205 | | -**Environment Variable References:** |
206 | | -```yaml |
207 | | -deployment: |
208 | | - type: sglang |
209 | | - envs: |
210 | | - HF_TOKEN: ${oc.env:HF_TOKEN} # References host environment variable |
211 | | - NGC_API_KEY: ${oc.env:NGC_API_KEY} |
212 | | -``` |
213 | | -
|
214 | | -**Supported Executors:** |
215 | | -- **SLURM**: Environment variables are exported in the sbatch script before running deployment commands |
216 | | -- **Lepton**: Environment variables are passed to the container specification |
217 | | -- **Local**: Environment variables are passed to Docker containers (when deployment support is added) |
218 | | -
|
219 | | -**Example with SLURM:** |
220 | | -```yaml |
221 | | -deployment: |
222 | | - type: vllm |
223 | | - envs: |
224 | | - CUDA_VISIBLE_DEVICES: "0,1,2,3,4,5,6,7" |
225 | | - HF_TOKEN: ${oc.env:HF_TOKEN} |
226 | | - VLLM_USE_V2_BLOCK_MANAGER: "1" |
227 | | - command: vllm serve /checkpoint --port 8000 |
228 | | -``` |
229 | | -
|
230 | | -This will generate a sbatch script that exports these variables before running the deployment command. |
231 | | -
|
232 | | -### 3. Check Evaluation Status |
233 | | -
|
234 | | -Monitor the status of your evaluation jobs: |
235 | | -
|
236 | | -```bash |
237 | | -nv-eval status <job_id_or_invocation_id> |
238 | | -``` |
239 | | - |
240 | | -You can check: |
241 | | -- **Individual job status**: `nv-eval status <job_id>` |
242 | | -- **All jobs in an invocation**: `nv-eval status <invocation_id>` |
243 | | - |
244 | | -The status command returns JSON output with job status information. |
245 | | - |
246 | | - |
247 | | -## Using python API |
248 | | - |
249 | | -Consider checking out [Python notebooks](./examples/notebooks) |
250 | | - |
251 | | -## Troubleshooting |
252 | | - |
253 | | -### View Full Configuration |
254 | | - |
255 | | -To see the complete resolved configuration: |
256 | | - |
257 | | -```bash |
258 | | -nv-eval run --config-dir examples --config-name local_llama_3_1_8b_instruct --dry-run |
259 | | -``` |
260 | | - |
261 | | -## Contributing |
262 | | - |
263 | | -See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines on contributing to the project. |
| 3 | +For complete documentation, please see: [docs/nemo-evaluator-launcher/index.md](../../docs/nemo-evaluator-launcher/index.md) |
0 commit comments