Skip to content

Commit f8549b1

Browse files
authored
Fgalko/docs (#129)
Fix README files and add links to configuration examples --------- Signed-off-by: fgalko <[email protected]>
1 parent d74c844 commit f8549b1

File tree

5 files changed

+59
-263
lines changed

5 files changed

+59
-263
lines changed

README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@ To use out-of-the-box build.nvidia.com APIs, you need an API key:
7272
Run a small evaluation on your local machine. The launcher automatically pulls the correct container and executes the benchmark. The list of benchmarks is directly configured in the yaml file.
7373

7474
```bash
75-
nemo-evaluator-launcher run --config-dir examples --config-name nvidia-nemotron-nano-9b-v2 --override execution.output_dir=<YOUR_OUTPUT_LOCAL_DIR>
75+
nemo-evaluator-launcher run --config-dir examples --config-name local_nvidia_nemotron_nano_9b_v2 --override execution.output_dir=<YOUR_OUTPUT_LOCAL_DIR>
7676
```
7777

7878
Upon running this command, you will be able to see a job_id, which can then be used for tracking the job.
@@ -84,6 +84,8 @@ Results, logs, and run configurations are saved locally. Inspect the status of t
8484
nemo-evaluator-launcher status <job_id_or_invocation_id>
8585
```
8686

87+
**Configuration Examples**: Explore ready-to-use configuration files in [`packages/nemo-evaluator-launcher/examples/`](./packages/nemo-evaluator-launcher/examples/) for local, Lepton, and Slurm deployments with various model hosting options (vLLM, NIM, hosted endpoints).
88+
8789
#### Next Steps
8890

8991
- List all supported benchmarks:

docs/nemo-evaluator-launcher/quickstart.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@ pip install nemo-evaluator-launcher
1313

1414
NeMo Evaluator sends OpenAI-compatible requests to your model during evaluation. You must have an endpoint that accepts either chat or completions API calls and can handle the evaluation load.
1515

16+
**Configuration Examples**: Explore ready-to-use configuration files in [`packages/nemo-evaluator-launcher/examples/`](./packages/nemo-evaluator-launcher/examples/) for local, Lepton, and Slurm deployments with various model hosting options (vLLM, NIM, hosted endpoints).
17+
1618
Hosted endpoints (fastest):
1719

1820
- [build.nvidia.com](https://build.nvidia.com) (ready-to-use hosted models):
Lines changed: 2 additions & 262 deletions
Original file line numberDiff line numberDiff line change
@@ -1,263 +1,3 @@
1-
# NeMo-Evaluator-Launcher
1+
# NeMo Evaluator Launcher
22

3-
A comprehensive evaluation platform for large language models (LLMs) that supports multiple benchmarks and execution environments.
4-
5-
> **Submit bugs**: please help us improve by submitting bugs and improvements http://nv/eval.issue!
6-
7-
> Below applies to version `0.3.0+`
8-
9-
## Installation
10-
11-
Install both `internal` and public the package using pip:
12-
13-
```bash
14-
pip install nemo-evaluator-launcher --index-url <TODO: add URL>
15-
```
16-
17-
### Optional Exporters
18-
19-
To use the result exporters, install the optional dependencies separately:
20-
21-
```bash
22-
# Install with MLflow exporter
23-
pip install nemo-evaluator-launcher-internal[mlflow] --index-url <TODO: add URL>
24-
25-
# Install with Weights & Biases exporter
26-
pip install nemo-evaluator-launcher-internal[wandb] --index-url <TODO: add URL>
27-
28-
# Install with Google Sheets exporter
29-
pip install nemo-evaluator-launcher-internal[gsheets] --index-url <TODO: add URL>
30-
31-
# Install with multiple exporters
32-
pip install nemo-evaluator-launcher-internal[mlflow,wandb,gsheets] --index-url <TODO: add URL>
33-
```
34-
35-
**Supported Exporters:**
36-
- **MLflow**: Track experiments and metrics in MLflow
37-
- **Weights & Biases**: Log results to W&B for experiment tracking
38-
- **Google Sheets**: Export results to Google Sheets for analysis
39-
40-
### Lepton AI Execution
41-
42-
For Lepton AI execution, install leptonai and configure credentials:
43-
44-
```bash
45-
pip install leptonai
46-
lep login login -c <workspace_id>:<token>
47-
```
48-
49-
## Quick Start
50-
51-
### 1. List Available Benchmarks
52-
53-
View all available evaluation benchmarks:
54-
55-
```bash
56-
nv-eval ls
57-
```
58-
59-
**TODO(public release)**: change reference to the `nemo-evaluator`: readme
60-
61-
### 2. Run Evaluations
62-
63-
NV-Eval uses Hydra for configuration management. You can run evaluations using predefined configurations or create your own.
64-
65-
#### Using Example Configurations
66-
67-
The [examples/](examples/) directory contains ready-to-use configurations:
68-
69-
- **Local execution**: [local_llama_3_1_8b_instruct.yaml](examples/local_llama_3_1_8b_instruct.yaml)
70-
- **Slurm cluster execution**: [slurm_llama_3_1_8b_instruct.yaml](examples/slurm_llama_3_1_8b_instruct.yaml)
71-
- **Lepton AI execution**: [lepton_nim_llama_3_1_8b_instruct.yaml](examples/lepton_nim_llama_3_1_8b_instruct.yaml), [lepton_vllm_llama_3_1_8b_instruct.yaml](examples/lepton_vllm_llama_3_1_8b_instruct.yaml), [lepton_none_llama_3_1_8b_instruct.yaml](examples/lepton_none_llama_3_1_8b_instruct.yaml)
72-
73-
Run a local evaluation (requires docker):
74-
```bash
75-
nv-eval run --config-dir examples --config-name local_llama_3_1_8b_instruct --override execution.output_dir=<YOUR_OUTPUT_LOCAL_DIR>
76-
```
77-
78-
Run a Slurm cluster evaluation:
79-
```bash
80-
nv-eval run --config-dir examples --config-name slurm_llama_3_1_8b_instruct --override execution.output_dir=<YOUR_OUTPUT_DIR_ON_CLUSTER>
81-
```
82-
83-
Run a Lepton AI evaluation (requires leptonai and Lepton credentials):
84-
```bash
85-
# Deploy NIM model and run evaluation
86-
nv-eval run --config-dir examples --config-name lepton_nim_llama_3_1_8b_instruct
87-
88-
# Deploy vLLM model and run evaluation
89-
nv-eval run --config-dir examples --config-name lepton_vllm_llama_3_1_8b_instruct
90-
91-
# Use existing endpoint for evaluation
92-
nv-eval run --config-dir examples --config-name lepton_none_llama_3_1_8b_instruct
93-
```
94-
95-
#### Lepton Execution Strategy
96-
97-
The Lepton executor provides **parallel endpoint deployment** for optimal resource isolation and performance:
98-
99-
- **Dedicated Endpoints**: Each evaluation task gets its own endpoint of the same model
100-
- **Parallel Deployment**: All endpoints created simultaneously (~3x faster than sequential)
101-
- **Resource Isolation**: Tasks run independently without interference
102-
- **Storage Isolation**: Each evaluation gets isolated directory `/shared/nv-eval-workspace/{invocation_id}`
103-
- **Simple Cleanup**: Single command removes all endpoints and storage
104-
105-
**Architecture Diagram:**
106-
107-
```mermaid
108-
graph TD
109-
A["nv-eval run"] --> B["Load Tasks"]
110-
111-
B --> D["Endpoints Deployment"]
112-
113-
D --> E1["Deployment 1: Create Endpoint 1"]
114-
D --> E2["Deployment 2: Create Endpoint 2"]
115-
D --> E3["Deployment 3: Create Endpoint 3"]
116-
117-
E1 --> F["Wait for All Ready"]
118-
E2 --> F
119-
E3 --> F
120-
121-
F --> G["Mount Storage per Task"]
122-
123-
G --> H["Parallel Tasks Creation as Jobs in Lepton"]
124-
125-
H --> J1["Task 1: Job 1 Evaluation"]
126-
H --> J2["Task 2: Job 2 Evaluation"]
127-
H --> J3["Task 3: Job 3 Evaluation"]
128-
129-
J1 --> K["Execute in Parallel"]
130-
J2 --> K
131-
J3 --> K
132-
133-
K --> L["Finish"]
134-
135-
136-
style D fill:#e1f5fe
137-
style G fill:#fff3e0
138-
style H fill:#f3e5f5
139-
style K fill:#e8f5e8
140-
style A fill:#fff3e0
141-
style M fill:#ffebee
142-
```
143-
144-
**Example Configuration:**
145-
```yaml
146-
evaluation:
147-
tasks:
148-
- name: gpqa_diamond # Gets endpoint: nim-gpqa-d-0-abc123
149-
- name: hellaswag # Gets endpoint: nim-hellas-1-abc123
150-
- name: winogrande # Gets endpoint: nim-winogr-2-abc123
151-
```
152-
153-
Generate all the configs:
154-
```bash
155-
python scripts/generate_configs.py
156-
```
157-
158-
#### Creating Custom Configurations
159-
160-
1. Create your own configuration directory:
161-
```bash
162-
mkdir my_configs
163-
```
164-
165-
2. Copy an example configuration as a starting point:
166-
```bash
167-
cp examples/local_llama_3_1_8b_instruct.yaml my_configs/my_evaluation.yaml
168-
```
169-
170-
3. Modify the configuration to suit your needs:
171-
- Change the model endpoint
172-
- Adjust evaluation parameters
173-
- Select different benchmarks
174-
- Configure execution settings
175-
176-
4. Run your custom configuration:
177-
```bash
178-
nv-eval run --config-dir my_configs --config-name my_evaluation
179-
```
180-
181-
#### Configuration Overrides
182-
183-
You can override configuration values from the command line (`-o` can be used multiple times, the notation is following hydra)
184-
185-
```bash
186-
nv-eval run --config-dir examples --config-name local_llama_3_1_8b_instruct \
187-
-o execution.output_dir=my_results \
188-
-o target.api_endpoint.model_id=model/another/one
189-
```
190-
191-
#### Environment Variables in Deployment
192-
193-
The platform supports passing environment variables to deployment containers in a Hydra-extensible way:
194-
195-
**Direct Values:**
196-
```yaml
197-
deployment:
198-
type: vllm
199-
envs:
200-
CUDA_VISIBLE_DEVICES: "0,1,2,3,4,5,6,7"
201-
OMP_NUM_THREADS: "1"
202-
VLLM_USE_FLASH_ATTN: "1"
203-
```
204-
205-
**Environment Variable References:**
206-
```yaml
207-
deployment:
208-
type: sglang
209-
envs:
210-
HF_TOKEN: ${oc.env:HF_TOKEN} # References host environment variable
211-
NGC_API_KEY: ${oc.env:NGC_API_KEY}
212-
```
213-
214-
**Supported Executors:**
215-
- **SLURM**: Environment variables are exported in the sbatch script before running deployment commands
216-
- **Lepton**: Environment variables are passed to the container specification
217-
- **Local**: Environment variables are passed to Docker containers (when deployment support is added)
218-
219-
**Example with SLURM:**
220-
```yaml
221-
deployment:
222-
type: vllm
223-
envs:
224-
CUDA_VISIBLE_DEVICES: "0,1,2,3,4,5,6,7"
225-
HF_TOKEN: ${oc.env:HF_TOKEN}
226-
VLLM_USE_V2_BLOCK_MANAGER: "1"
227-
command: vllm serve /checkpoint --port 8000
228-
```
229-
230-
This will generate a sbatch script that exports these variables before running the deployment command.
231-
232-
### 3. Check Evaluation Status
233-
234-
Monitor the status of your evaluation jobs:
235-
236-
```bash
237-
nv-eval status <job_id_or_invocation_id>
238-
```
239-
240-
You can check:
241-
- **Individual job status**: `nv-eval status <job_id>`
242-
- **All jobs in an invocation**: `nv-eval status <invocation_id>`
243-
244-
The status command returns JSON output with job status information.
245-
246-
247-
## Using python API
248-
249-
Consider checking out [Python notebooks](./examples/notebooks)
250-
251-
## Troubleshooting
252-
253-
### View Full Configuration
254-
255-
To see the complete resolved configuration:
256-
257-
```bash
258-
nv-eval run --config-dir examples --config-name local_llama_3_1_8b_instruct --dry-run
259-
```
260-
261-
## Contributing
262-
263-
See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines on contributing to the project.
3+
For complete documentation, please see: [docs/nemo-evaluator-launcher/index.md](../../docs/nemo-evaluator-launcher/index.md)
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
#
16+
# How to use: copy this file locally into a directory, say `examples`, and run
17+
# Run this config with `nemo-evaluator-launcher run --config-dir examples --config-name nvidia_nemotron_nano_9b_v2`.
18+
19+
# This is a TEST CONFIGURATION that limits all benchmarks to use only 10 samples total.
20+
# This allows you to test your setup end-to-end quickly without running full evaluations.
21+
#
22+
# ⚠️ WARNING: Results from this test run should NEVER be used to compare models or
23+
# report benchmark performance. This is solely for testing configuration and setup.
24+
# Always run full evaluations (without limit_samples) for actual benchmark results.
25+
26+
# specify default configs for execution and deployment
27+
defaults:
28+
- execution: local
29+
- deployment: none
30+
- _self_
31+
32+
execution:
33+
output_dir: nvidia_nemotron_nano_9b_v2_results
34+
35+
target:
36+
api_endpoint:
37+
model_id: nvidia/nemotron-nano-9b-v2
38+
url: https://integrate.api.nvidia.com/v1/chat/completions
39+
api_key_name: API_KEY # API Key with access to build.nvidia.com
40+
41+
# specify the benchmarks to evaluate
42+
evaluation:
43+
overrides: # these overrides apply to all tasks; for task-specific overrides, use the `overrides` field
44+
config.params.request_timeout: 3600
45+
config.params.limit_samples: 10 # TEST ONLY: Limits all benchmarks to 10 samples total for quick testing
46+
target.api_endpoint.adapter_config.use_reasoning: false # if true, strips reasoning tokens
47+
target.api_endpoint.adapter_config.use_system_prompt: false
48+
tasks:
49+
- name: ifeval

packages/nemo-evaluator/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# NeMo Evaluator
2+
3+
For complete documentation, please see: [docs/nemo-evaluator/index.md](../../docs/nemo-evaluator/index.md)

0 commit comments

Comments
 (0)