Fgalko/docs (#129)

fgalko-oss · web-flow · commit f8549b19f609 · 2025-09-16T16:53:47.000+02:00
Fix README files and add links to configuration examples

---------

Signed-off-by: fgalko &lt;fgalko@nvidia.com&gt;
diff --git a/README.md b/README.md
@@ -72,7 +72,7 @@ To use out-of-the-box build.nvidia.com APIs, you need an API key:
 Run a small evaluation on your local machine. The launcher automatically pulls the correct container and executes the benchmark. The list of benchmarks is directly configured in the yaml file.
 
 ```bash
-nemo-evaluator-launcher run --config-dir examples --config-name nvidia-nemotron-nano-9b-v2 --override execution.output_dir=<YOUR_OUTPUT_LOCAL_DIR>
+nemo-evaluator-launcher run --config-dir examples --config-name local_nvidia_nemotron_nano_9b_v2 --override execution.output_dir=<YOUR_OUTPUT_LOCAL_DIR>
 ```
 
 Upon running this command, you will be able to see a job_id, which can then be used for tracking the job.
@@ -84,6 +84,8 @@ Results, logs, and run configurations are saved locally. Inspect the status of t
 nemo-evaluator-launcher status <job_id_or_invocation_id>
 ```
 
+**Configuration Examples**: Explore ready-to-use configuration files in [`packages/nemo-evaluator-launcher/examples/`](./packages/nemo-evaluator-launcher/examples/) for local, Lepton, and Slurm deployments with various model hosting options (vLLM, NIM, hosted endpoints).
+
 #### Next Steps
 
 - List all supported benchmarks:
diff --git a/docs/nemo-evaluator-launcher/quickstart.md b/docs/nemo-evaluator-launcher/quickstart.md
@@ -13,6 +13,8 @@ pip install nemo-evaluator-launcher
 
 NeMo Evaluator sends OpenAI-compatible requests to your model during evaluation. You must have an endpoint that accepts either chat or completions API calls and can handle the evaluation load.
 
+**Configuration Examples**: Explore ready-to-use configuration files in [`packages/nemo-evaluator-launcher/examples/`](./packages/nemo-evaluator-launcher/examples/) for local, Lepton, and Slurm deployments with various model hosting options (vLLM, NIM, hosted endpoints).
+
 Hosted endpoints (fastest):
 
 - [build.nvidia.com](https://build.nvidia.com) (ready-to-use hosted models):
diff --git a/packages/nemo-evaluator-launcher/README.md b/packages/nemo-evaluator-launcher/README.md
@@ -1,263 +1,3 @@
-# NeMo-Evaluator-Launcher
+# NeMo Evaluator Launcher
 
-A comprehensive evaluation platform for large language models (LLMs) that supports multiple benchmarks and execution environments.
-
-> **Submit bugs**: please help us improve by submitting bugs and improvements http://nv/eval.issue!
-
-> Below applies to version `0.3.0+`
-
-## Installation
-
-Install both `internal` and public the package using pip:
-
-```bash
-pip install nemo-evaluator-launcher --index-url <TODO: add URL>
-```
-
-### Optional Exporters
-
-To use the result exporters, install the optional dependencies separately:
-
-```bash
-# Install with MLflow exporter
-pip install nemo-evaluator-launcher-internal[mlflow] --index-url <TODO: add URL>
-
-# Install with Weights & Biases exporter
-pip install nemo-evaluator-launcher-internal[wandb] --index-url <TODO: add URL>
-
-# Install with Google Sheets exporter
-pip install nemo-evaluator-launcher-internal[gsheets] --index-url <TODO: add URL>
-
-# Install with multiple exporters
-pip install nemo-evaluator-launcher-internal[mlflow,wandb,gsheets] --index-url <TODO: add URL>
-```
-
-**Supported Exporters:**
-- **MLflow**: Track experiments and metrics in MLflow
-- **Weights & Biases**: Log results to W&B for experiment tracking
-- **Google Sheets**: Export results to Google Sheets for analysis
-
-### Lepton AI Execution
-
-For Lepton AI execution, install leptonai and configure credentials:
-
-```bash
-pip install leptonai
-lep login login -c <workspace_id>:<token>
-```
-
-## Quick Start
-
-### 1. List Available Benchmarks
-
-View all available evaluation benchmarks:
-
-```bash
-nv-eval ls
-```
-
-**TODO(public release)**: change reference to the `nemo-evaluator`: readme
-
-### 2. Run Evaluations
-
-NV-Eval uses Hydra for configuration management. You can run evaluations using predefined configurations or create your own.
-
-#### Using Example Configurations
-
-The [examples/](examples/) directory contains ready-to-use configurations:
-
-- **Local execution**: [local_llama_3_1_8b_instruct.yaml](examples/local_llama_3_1_8b_instruct.yaml)
-- **Slurm cluster execution**: [slurm_llama_3_1_8b_instruct.yaml](examples/slurm_llama_3_1_8b_instruct.yaml)
-- **Lepton AI execution**: [lepton_nim_llama_3_1_8b_instruct.yaml](examples/lepton_nim_llama_3_1_8b_instruct.yaml), [lepton_vllm_llama_3_1_8b_instruct.yaml](examples/lepton_vllm_llama_3_1_8b_instruct.yaml), [lepton_none_llama_3_1_8b_instruct.yaml](examples/lepton_none_llama_3_1_8b_instruct.yaml)
-
-Run a local evaluation (requires docker):
-```bash
-nv-eval run --config-dir examples --config-name local_llama_3_1_8b_instruct --override execution.output_dir=<YOUR_OUTPUT_LOCAL_DIR>
-```
-
-Run a Slurm cluster evaluation:
-```bash
-nv-eval run --config-dir examples --config-name slurm_llama_3_1_8b_instruct --override execution.output_dir=<YOUR_OUTPUT_DIR_ON_CLUSTER>
-```
-
-Run a Lepton AI evaluation (requires leptonai and Lepton credentials):
-```bash
-# Deploy NIM model and run evaluation
-nv-eval run --config-dir examples --config-name lepton_nim_llama_3_1_8b_instruct
-
-# Deploy vLLM model and run evaluation
-nv-eval run --config-dir examples --config-name lepton_vllm_llama_3_1_8b_instruct
-
-# Use existing endpoint for evaluation
-nv-eval run --config-dir examples --config-name lepton_none_llama_3_1_8b_instruct
-```
-
-#### Lepton Execution Strategy
-
-The Lepton executor provides **parallel endpoint deployment** for optimal resource isolation and performance:
-
-- **Dedicated Endpoints**: Each evaluation task gets its own endpoint of the same model
-- **Parallel Deployment**: All endpoints created simultaneously (~3x faster than sequential)
-- **Resource Isolation**: Tasks run independently without interference
-- **Storage Isolation**: Each evaluation gets isolated directory `/shared/nv-eval-workspace/{invocation_id}`
-- **Simple Cleanup**: Single command removes all endpoints and storage
-
-**Architecture Diagram:**
-
-```mermaid
-graph TD
-    A["nv-eval run"] --> B["Load Tasks"]
-
-    B --> D["Endpoints Deployment"]
-
-    D --> E1["Deployment 1: Create Endpoint 1"]
-    D --> E2["Deployment 2: Create Endpoint 2"]
-    D --> E3["Deployment 3: Create Endpoint 3"]
-
-    E1 --> F["Wait for All Ready"]
-    E2 --> F
-    E3 --> F
-
-    F --> G["Mount Storage per Task"]
-
-    G --> H["Parallel Tasks Creation as Jobs in Lepton"]
-
-    H --> J1["Task 1: Job 1 Evaluation"]
-    H --> J2["Task 2: Job 2 Evaluation"]
-    H --> J3["Task 3: Job 3 Evaluation"]
-
-    J1 --> K["Execute in Parallel"]
-    J2 --> K
-    J3 --> K
-
-    K --> L["Finish"]
-
-
-    style D fill:#e1f5fe
-    style G fill:#fff3e0
-    style H fill:#f3e5f5
-    style K fill:#e8f5e8
-    style A fill:#fff3e0
-    style M fill:#ffebee
-```
-
-**Example Configuration:**
-```yaml
-evaluation:
-  tasks:
-    - name: gpqa_diamond    # Gets endpoint: nim-gpqa-d-0-abc123
-    - name: hellaswag       # Gets endpoint: nim-hellas-1-abc123
-    - name: winogrande      # Gets endpoint: nim-winogr-2-abc123
-```
-
-Generate all the configs:
-```bash
-python scripts/generate_configs.py
-```
-
-#### Creating Custom Configurations
-
-1. Create your own configuration directory:
-```bash
-mkdir my_configs
-```
-
-2. Copy an example configuration as a starting point:
-```bash
-cp examples/local_llama_3_1_8b_instruct.yaml my_configs/my_evaluation.yaml
-```
-
-3. Modify the configuration to suit your needs:
-   - Change the model endpoint
-   - Adjust evaluation parameters
-   - Select different benchmarks
-   - Configure execution settings
-
-4. Run your custom configuration:
-```bash
-nv-eval run --config-dir my_configs --config-name my_evaluation
-```
-
-#### Configuration Overrides
-
-You can override configuration values from the command line (`-o` can be used multiple times, the notation is following hydra)
-
-```bash
-nv-eval run --config-dir examples --config-name local_llama_3_1_8b_instruct \
-  -o execution.output_dir=my_results \
-  -o target.api_endpoint.model_id=model/another/one
-```
-
-#### Environment Variables in Deployment
-
-The platform supports passing environment variables to deployment containers in a Hydra-extensible way:
-
-**Direct Values:**
-```yaml
-deployment:
-  type: vllm
-  envs:
-    CUDA_VISIBLE_DEVICES: "0,1,2,3,4,5,6,7"
-    OMP_NUM_THREADS: "1"
-    VLLM_USE_FLASH_ATTN: "1"
-```
-
-**Environment Variable References:**
-```yaml
-deployment:
-  type: sglang
-  envs:
-    HF_TOKEN: ${oc.env:HF_TOKEN}  # References host environment variable
-    NGC_API_KEY: ${oc.env:NGC_API_KEY}
-```
-
-**Supported Executors:**
-- **SLURM**: Environment variables are exported in the sbatch script before running deployment commands
-- **Lepton**: Environment variables are passed to the container specification
-- **Local**: Environment variables are passed to Docker containers (when deployment support is added)
-
-**Example with SLURM:**
-```yaml
-deployment:
-  type: vllm
-  envs:
-    CUDA_VISIBLE_DEVICES: "0,1,2,3,4,5,6,7"
-    HF_TOKEN: ${oc.env:HF_TOKEN}
-    VLLM_USE_V2_BLOCK_MANAGER: "1"
-  command: vllm serve /checkpoint --port 8000
-```
-
-This will generate a sbatch script that exports these variables before running the deployment command.
-
-### 3. Check Evaluation Status
-
-Monitor the status of your evaluation jobs:
-
-```bash
-nv-eval status <job_id_or_invocation_id>
-```
-
-You can check:
-- **Individual job status**: `nv-eval status <job_id>`
-- **All jobs in an invocation**: `nv-eval status <invocation_id>`
-
-The status command returns JSON output with job status information.
-
-
-## Using python API
-
-Consider checking out [Python notebooks](./examples/notebooks)
-
-## Troubleshooting
-
-### View Full Configuration
-
-To see the complete resolved configuration:
-
-```bash
-nv-eval run --config-dir examples --config-name local_llama_3_1_8b_instruct --dry-run
-```
-
-## Contributing
-
-See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines on contributing to the project.
+For complete documentation, please see: [docs/nemo-evaluator-launcher/index.md](../../docs/nemo-evaluator-launcher/index.md)
diff --git a/packages/nemo-evaluator-launcher/examples/local_nvidia_nemotron_nano_9b_v2.yaml b/packages/nemo-evaluator-launcher/examples/local_nvidia_nemotron_nano_9b_v2.yaml
@@ -0,0 +1,49 @@
+# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+# How to use: copy this file locally into a directory, say `examples`, and run
+# Run this config with `nemo-evaluator-launcher run --config-dir examples --config-name nvidia_nemotron_nano_9b_v2`.
+
+# This is a TEST CONFIGURATION that limits all benchmarks to use only 10 samples total.
+# This allows you to test your setup end-to-end quickly without running full evaluations.
+#
+# ⚠️  WARNING: Results from this test run should NEVER be used to compare models or
+#     report benchmark performance. This is solely for testing configuration and setup.
+#     Always run full evaluations (without limit_samples) for actual benchmark results.
+
+# specify default configs for execution and deployment
+defaults:
+  - execution: local
+  - deployment: none
+  - _self_
+
+execution:
+  output_dir: nvidia_nemotron_nano_9b_v2_results
+
+target:
+  api_endpoint:
+    model_id: nvidia/nemotron-nano-9b-v2
+    url: https://integrate.api.nvidia.com/v1/chat/completions
+    api_key_name: API_KEY # API Key with access to build.nvidia.com
+
+# specify the benchmarks to evaluate
+evaluation:
+  overrides: # these overrides apply to all tasks; for task-specific overrides, use the `overrides` field
+    config.params.request_timeout: 3600
+    config.params.limit_samples: 10 # TEST ONLY: Limits all benchmarks to 10 samples total for quick testing
+    target.api_endpoint.adapter_config.use_reasoning: false # if true, strips reasoning tokens
+    target.api_endpoint.adapter_config.use_system_prompt: false
+  tasks:
+    - name: ifeval
diff --git a/packages/nemo-evaluator/README.md b/packages/nemo-evaluator/README.md
@@ -0,0 +1,3 @@
+# NeMo Evaluator
+
+For complete documentation, please see: [docs/nemo-evaluator/index.md](../../docs/nemo-evaluator/index.md)

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+# NeMo Evaluator`
	`2`	`+`
	`3`	`+For complete documentation, please see: [docs/nemo-evaluator/index.md](../../docs/nemo-evaluator/index.md)`