Skip to content

Commit 7a419fd

Browse files
wprazuchmarta-sdfgalko-ossko3n1g
authored
(fix) Docs Sphinx build warnings + use 25.08.1 container version (#122)
Signed-off-by: Wojciech Prazuch <[email protected]> Signed-off-by: Marta Stepniewska-Dziubinska <[email protected]> Signed-off-by: fgalko <[email protected]> Signed-off-by: oliver könig <[email protected]> Co-authored-by: Marta Stepniewska-Dziubinska <[email protected]> Co-authored-by: fgalko <[email protected]> Co-authored-by: oliver könig <[email protected]>
1 parent f8549b1 commit 7a419fd

File tree

35 files changed

+284
-275
lines changed

35 files changed

+284
-275
lines changed

README.md

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ The platform consists of two main components:
2020

2121
Most users only need to interact with the `nemo-evaluator-launcher` as universal gateway to different benchmarks and harnesses. It is however possible to interact directly with `nemo-evaluator` by following this [guide](./docs/nemo-evaluator/workflows/using-containers.md).
2222

23-
```mermaid
23+
```{mermaid}
2424
graph TD
2525
A[User] --> B{NeMo Evaluator Launcher};
2626
B -- " " --> C{Local};
@@ -104,23 +104,23 @@ NeMo Evaluator provides pre-built evaluation containers for different evaluation
104104

105105
| Container | Description | NGC Catalog | Latest Tag | Supported benchmarks |
106106
|-----------|-------------|-------------|------------| ------------|
107-
| **agentic_eval** | Agentic AI evaluation framework | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/agentic_eval) | `25.08.0` | Agentic Eval Topic Adherence, Agentic Eval Tool Call, Agentic Eval Goal and Answer Accuracy |
108-
| **bfcl** | Function calling | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/bfcl) | `25.08.0` | BFCL v2 and v3 |
109-
| **bigcode-evaluation-harness** | Code generation evaluation | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/bigcode-evaluation-harness) | `25.08.0` | MBPP, MBPP-Plus, HumanEval, HumanEval+, Multiple (cpp, cs, d, go, java, jl, js, lua, php, pl, py, r, rb, rkt, rs, scala, sh, swift, ts) |
110-
| **garak** | Safety and vulnerability testing | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/garak) | `25.08.0` | Garak |
111-
| **helm** | Holistic evaluation framework | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/helm) | `25.08.0` | MedHelm |
112-
| **hle** | Academic knowledge and problem solving | [Link]() | `25.08.0` | HLE |
113-
| **ifbench** | Instruction following | [Link]() | `25.08.0` | IFBench |
114-
| **livecodebench** | Coding | [Link]() | `25.08.0` | LiveCodeBench (v1-v6, 0724_0125, 0824_0225) |
115-
| **lm-evaluation-harness** | Language model benchmarks | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/lm-evaluation-harness) | `25.08.0` | ARC Challenge (also multilingual), GSM8K, HumanEval, HumanEval+, MBPP, MINERVA MMMLU-Pro, RACE, TruthfulQA, AGIEval, BBH, BBQ, CSQA, Frames, Global MMMLU, GPQA-D, HellaSwag (also multilingual), IFEval, MGSM, MMMLU, MMMLU-Pro, MMMLU-ProX (de, es, fr, it, ja), MMLU-Redux, MUSR, OpenbookQA, Piqa, Social IQa, TruthfulQA, WikiLingua, WinoGrande|
116-
| **mmath** | Multilingual math reasoning | [Link]() | `25.08.0` | EN, ZH, AR, ES, FR, JA, KO, PT, TH, VI |
117-
| **mtbench** | Multi-turn conversation evaluation | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/mtbench) | `25.08.0` | MT-Bench |
118-
| **rag_retriever_eval** | RAG system evaluation | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/rag_retriever_eval) | `25.08.0` | RAG, Retriever |
119-
| **safety-harness** | Safety and bias evaluation | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/safety-harness) | `25.08.0` | Aegis v2, BBQ, WildGuard |
120-
| **scicode** | Coding for scientific research | [Link]() | `25.08.0` | SciCode |
121-
| **simple-evals** | Common evaluation tasks | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/simple-evals) | `25.08.0` | GPQA-D, MATH-500, AIME 24 & 25, HumanEval, MGSM, MMMLU, MMMLU-Pro, MMMLU-lite (AR, BN, DE, EN, ES, FR, HI, ID, IT, JA, KO, MY, PT, SW, YO, ZH), SimpleQA |
122-
| **tooltalk** | Tool usage evaluation | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/tooltalk) | `25.08.0` | ToolTalk |
123-
| **vlmevalkit** | Vision-language model evaluation | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/vlmevalkit) | `25.08.0` | AI2D, ChartQA, OCRBench, SlideVQA |
107+
| **agentic_eval** | Agentic AI evaluation framework | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/agentic_eval) | `25.08.1` | Agentic Eval Topic Adherence, Agentic Eval Tool Call, Agentic Eval Goal and Answer Accuracy |
108+
| **bfcl** | Function calling | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/bfcl) | `25.08.1` | BFCL v2 and v3 |
109+
| **bigcode-evaluation-harness** | Code generation evaluation | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/bigcode-evaluation-harness) | `25.08.1` | MBPP, MBPP-Plus, HumanEval, HumanEval+, Multiple (cpp, cs, d, go, java, jl, js, lua, php, pl, py, r, rb, rkt, rs, scala, sh, swift, ts) |
110+
| **garak** | Safety and vulnerability testing | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/garak) | `25.08.1` | Garak |
111+
| **helm** | Holistic evaluation framework | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/helm) | `25.08.1` | MedHelm |
112+
| **hle** | Academic knowledge and problem solving | Link: TBD | `25.08.1` | HLE |
113+
| **ifbench** | Instruction following | Link: TBD | `25.08.1` | IFBench |
114+
| **livecodebench** | Coding | Link: TBD | `25.08.1` | LiveCodeBench (v1-v6, 0724_0125, 0824_0225) |
115+
| **lm-evaluation-harness** | Language model benchmarks | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/lm-evaluation-harness) | `25.08.1` | ARC Challenge (also multilingual), GSM8K, HumanEval, HumanEval+, MBPP, MINERVA MMMLU-Pro, RACE, TruthfulQA, AGIEval, BBH, BBQ, CSQA, Frames, Global MMMLU, GPQA-D, HellaSwag (also multilingual), IFEval, MGSM, MMMLU, MMMLU-Pro, MMMLU-ProX (de, es, fr, it, ja), MMLU-Redux, MUSR, OpenbookQA, Piqa, Social IQa, TruthfulQA, WikiLingua, WinoGrande|
116+
| **mmath** | Multilingual math reasoning | Link: TBD | `25.08.1` | EN, ZH, AR, ES, FR, JA, KO, PT, TH, VI |
117+
| **mtbench** | Multi-turn conversation evaluation | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/mtbench) | `25.08.1` | MT-Bench |
118+
| **rag_retriever_eval** | RAG system evaluation | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/rag_retriever_eval) | `25.08.1` | RAG, Retriever |
119+
| **safety-harness** | Safety and bias evaluation | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/safety-harness) | `25.08.1` | Aegis v2, BBQ, WildGuard |
120+
| **scicode** | Coding for scientific research | Link: TBD | `25.08.1` | SciCode |
121+
| **simple-evals** | Common evaluation tasks | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/simple-evals) | `25.08.1` | GPQA-D, MATH-500, AIME 24 & 25, HumanEval, MGSM, MMMLU, MMMLU-Pro, MMMLU-lite (AR, BN, DE, EN, ES, FR, HI, ID, IT, JA, KO, MY, PT, SW, YO, ZH), SimpleQA |
122+
| **tooltalk** | Tool usage evaluation | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/tooltalk) | `25.08.1` | ToolTalk |
123+
| **vlmevalkit** | Vision-language model evaluation | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/vlmevalkit) | `25.08.1` | AI2D, ChartQA, OCRBench, SlideVQA |
124124

125125

126126

docs/conf.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,8 +52,11 @@
5252
"deflist", # Supports definition lists with term: definition format
5353
"fieldlist", # Enables field lists for metadata like :author: Name
5454
"tasklist", # Adds support for GitHub-style task lists with [ ] and [x]
55+
"html_image", # Enables HTML image tags
5556
]
5657
myst_heading_anchors = 5 # Generates anchor links for headings up to level 5
58+
myst_auto_link_extensions = [] # Disable automatic link conversion
59+
myst_url_schemes = ["http", "https", "mailto"] # Only convert these URL schemes
5760

5861
# -- Options for Autodoc2 ---------------------------------------------------
5962
sys.path.insert(0, os.path.abspath(".."))
@@ -103,10 +106,17 @@
103106
nitpicky = False
104107
suppress_warnings = [
105108
"ref.python", # Suppress ambiguous cross-reference warnings
109+
"toc.not_included", # Suppress toctree warnings for myst-based docs
110+
"myst.header", # Suppress header level warnings
111+
"myst.directive_unknown", # Suppress unknown directive warnings
112+
"myst.xref_missing", # Suppress missing cross-reference warnings
113+
"ref.doc", # Suppress document reference warnings
106114
]
107115

108116
# Github links are now getting rate limited from the Github Actions
109117
linkcheck_ignore = [
110118
".*github\\.com.*",
111119
".*githubusercontent\\.com.*",
120+
".*catalog\\.ngc\\.nvidia\\.com.*", # Temporary: NGC catalog links that may not be publicly accessible
121+
".*platform\\.openai\\.com.*", # To diagnose: OpenAI platform links that may require authentication
112122
]

docs/nemo-evaluator-launcher/configuration/deployment/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,4 +25,4 @@ deployment:
2525

2626
## Configuration Files
2727

28-
See all available deployment configurations: [Deployment Configs](https://gitlab-master.nvidia.com/dl/JoC/competitive_evaluation/nv-eval-platform/-/tree/main/nemo_evaluator_launcher/src/nemo_evaluator_launcher/configs/deployment?ref_type=heads)
28+
See all available deployment configurations: [Deployment Configs](../../../../packages/nemo-evaluator-launcher/src/nemo_evaluator_launcher/configs/deployment)

docs/nemo-evaluator-launcher/configuration/deployment/nim.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ NIM (NVIDIA Inference Microservices) provides optimized inference microservices
44

55
## Configuration
66

7-
See the complete configuration structure in the [NIM Config File](https://gitlab-master.nvidia.com/dl/JoC/competitive_evaluation/nv-eval-platform/-/blob/main/nemo_evaluator_launcher/src/nemo_evaluator_launcher/configs/deployment/nim.yaml?ref_type=heads).
7+
See the complete configuration structure in the [NIM Config File](../../../../packages/nemo-evaluator-launcher/src/nemo_evaluator_launcher/configs/deployment/nim.yaml).
88

99
## Key Settings
1010

@@ -17,10 +17,10 @@ Tips:
1717
- You do not need to adjust params like tensor/data parallelism NIM should pick the best set up based on your hardware.
1818

1919
Examples:
20-
- [Lepton NIM Example](https://gitlab-master.nvidia.com/dl/JoC/competitive_evaluation/nv-eval-platform/-/blob/main/nemo_evaluator_launcher/examples/lepton_nim_llama_3_1_8b_instruct.yaml?ref_type=heads) - NIM deployment on Lepton platform
20+
- [Lepton NIM Example](../../../../packages/nemo-evaluator-launcher/examples/lepton_nim_llama_3_1_8b_instruct.yaml) - NIM deployment on Lepton platform
2121

2222
## Reference
2323

2424
- [NIM Documentation](https://docs.nvidia.com/nim/)
2525
- [NIM Deployment Guide](https://docs.nvidia.com/nim/large-language-models/latest/deployment-guide.html#)
26-
- [NIM Config File](https://gitlab-master.nvidia.com/dl/JoC/competitive_evaluation/nv-eval-platform/-/blob/main/nemo_evaluator_launcher/src/nemo_evaluator_launcher/configs/deployment/nim.yaml?ref_type=heads)
26+
- [NIM Config File](../../../../packages/nemo-evaluator-launcher/src/nemo_evaluator_launcher/configs/deployment/nim.yaml)

docs/nemo-evaluator-launcher/configuration/deployment/none.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -27,11 +27,11 @@ target:
2727
- If your model does not require an API key, you can skip the `api_key` field entirely
2828

2929
Examples:
30-
- [Local None Example](https://gitlab-master.nvidia.com/dl/JoC/competitive_evaluation/nv-eval-platform/-/blob/main/nemo_evaluator_launcher/examples/local_llama_3_1_8b_instruct.yaml?ref_type=heads) - Local evaluation with existing endpoint
31-
- [Lepton None Example](https://gitlab-master.nvidia.com/dl/JoC/competitive_evaluation/nv-eval-platform/-/blob/main/nemo_evaluator_launcher/examples/lepton_none_llama_3_1_8b_instruct.yaml?ref_type=heads) - Lepton evaluation with existing endpoint
32-
- [Slurm None Example](https://gitlab-master.nvidia.com/dl/JoC/competitive_evaluation/nv-eval-platform/-/blob/main/nemo_evaluator_launcher/examples/slurm_no_deployment_llama_3_1_8b_instruct.yaml?ref_type=heads) - Slurm evaluation with existing endpoint
33-
- [Local with Metadata](https://gitlab-master.nvidia.com/dl/JoC/competitive_evaluation/nv-eval-platform/-/blob/main/nemo_evaluator_launcher/examples/local_with_user_provided_metadata.yaml?ref_type=heads) - Local evaluation with custom metadata
34-
- [Auto Export Example](https://gitlab-master.nvidia.com/dl/JoC/competitive_evaluation/nv-eval-platform/-/blob/main/nemo_evaluator_launcher/examples/local_auto_export_llama_3_1_8b_instruct.yaml?ref_type=heads) - Local evaluation with automatic result export
30+
- [Local None Example](../../../../packages/nemo-evaluator-launcher/examples/local_llama_3_1_8b_instruct.yaml) - Local evaluation with existing endpoint
31+
- [Lepton None Example](../../../../packages/nemo-evaluator-launcher/examples/lepton_none_llama_3_1_8b_instruct.yaml) - Lepton evaluation with existing endpoint
32+
- [Slurm None Example](../../../../packages/nemo-evaluator-launcher/examples/slurm_no_deployment_llama_3_1_8b_instruct.yaml) - Slurm evaluation with existing endpoint
33+
- [Local with Metadata](../../../../packages/nemo-evaluator-launcher/examples/local_with_user_provided_metadata.yaml) - Local evaluation with custom metadata
34+
- [Auto Export Example](../../../../packages/nemo-evaluator-launcher/examples/local_auto_export_llama_3_1_8b_instruct.yaml) - Local evaluation with automatic result export
3535

3636
## Use Cases
3737

@@ -43,4 +43,4 @@ This deployment option is useful when:
4343

4444
## Reference
4545

46-
- [None Config File](https://gitlab-master.nvidia.com/dl/JoC/competitive_evaluation/nv-eval-platform/-/blob/main/nemo_evaluator_launcher/src/nemo_evaluator_launcher/configs/deployment/none.yaml?ref_type=heads)
46+
- [None Config File](../../../../packages/nemo-evaluator-launcher/src/nemo_evaluator_launcher/configs/deployment/none.yaml)

docs/nemo-evaluator-launcher/configuration/deployment/sglang.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ SGLang is a fast serving framework for large language models and vision language
44

55
## Configuration
66

7-
See the complete configuration structure in the [SGLang Config File](https://gitlab-master.nvidia.com/dl/JoC/competitive_evaluation/nv-eval-platform/-/blob/main/nemo_evaluator_launcher/src/nemo_evaluator_launcher/configs/deployment/sglang.yaml?ref_type=heads).
7+
See the complete configuration structure in the [SGLang Config File](../../../../packages/nemo-evaluator-launcher/src/nemo_evaluator_launcher/configs/deployment/sglang.yaml).
88

99
## Key Settings
1010

@@ -30,4 +30,4 @@ Tips:
3030
## Reference
3131

3232
- [SGLang Documentation](https://docs.sglang.ai/)
33-
- [SGLang Config File](https://gitlab-master.nvidia.com/dl/JoC/competitive_evaluation/nv-eval-platform/-/blob/main/nemo_evaluator_launcher/src/nemo_evaluator_launcher/configs/deployment/sglang.yaml?ref_type=heads)
33+
- [SGLang Config File](../../../../packages/nemo-evaluator-launcher/src/nemo_evaluator_launcher/configs/deployment/sglang.yaml)

docs/nemo-evaluator-launcher/configuration/deployment/vllm.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ vLLM is a fast and easy-to-use library for LLM inference and serving.
44

55
## Configuration
66

7-
See the complete configuration structure in the [vLLM Config File](https://gitlab-master.nvidia.com/dl/JoC/competitive_evaluation/nv-eval-platform/-/blob/main/nemo_evaluator_launcher/src/nemo_evaluator_launcher/configs/deployment/vllm.yaml?ref_type=heads).
7+
See the complete configuration structure in the [vLLM Config File](../../../../packages/nemo-evaluator-launcher/src/nemo_evaluator_launcher/configs/deployment/vllm.yaml).
88

99
## Key Settings
1010

@@ -27,13 +27,13 @@ Tips:
2727
- If `checkpoint_path` is provided instead, use that local path
2828

2929
Examples:
30-
- [Lepton vLLM Example](https://gitlab-master.nvidia.com/dl/JoC/competitive_evaluation/nv-eval-platform/-/blob/main/nemo_evaluator_launcher/examples/lepton_vllm_llama_3_1_8b_instruct.yaml?ref_type=heads) - vLLM deployment on Lepton platform
31-
- [Slurm vLLM Example](https://gitlab-master.nvidia.com/dl/JoC/competitive_evaluation/nv-eval-platform/-/blob/main/nemo_evaluator_launcher/examples/slurm_llama_3_1_8b_instruct.yaml?ref_type=heads) - vLLM deployment on Slurm cluster
32-
- [Slurm vLLM HF Example](https://gitlab-master.nvidia.com/dl/JoC/competitive_evaluation/nv-eval-platform/-/blob/main/nemo_evaluator_launcher/examples/slurm_llama_3_1_8b_instruct_hf.yaml?ref_type=heads) - vLLM with Hugging Face model
33-
- [Notebook API Example](https://gitlab-master.nvidia.com/dl/JoC/competitive_evaluation/nv-eval-platform/-/blob/main/nemo_evaluator_launcher/examples/notebooks/nv-eval-api.ipynb?ref_type=heads) - Python API usage with vLLM
30+
- [Lepton vLLM Example](../../../../packages/nemo-evaluator-launcher/examples/lepton_vllm_llama_3_1_8b_instruct.yaml) - vLLM deployment on Lepton platform
31+
- [Slurm vLLM Example](../../../../packages/nemo-evaluator-launcher/examples/slurm_llama_3_1_8b_instruct.yaml) - vLLM deployment on Slurm cluster
32+
- [Slurm vLLM HF Example](../../../../packages/nemo-evaluator-launcher/examples/slurm_llama_3_1_8b_instruct_hf.yaml) - vLLM with Hugging Face model
33+
- [Notebook API Example](../../../../packages/nemo-evaluator-launcher/examples/notebooks/nv-eval-api.ipynb) - Python API usage with vLLM
3434

3535

3636
## Reference
3737

3838
- [vLLM Documentation](https://docs.vllm.ai/en/latest/)
39-
- [vLLM Config File](https://gitlab-master.nvidia.com/dl/JoC/competitive_evaluation/nv-eval-platform/-/blob/main/nemo_evaluator_launcher/src/nemo_evaluator_launcher/configs/deployment/vllm.yaml?ref_type=heads)
39+
- [vLLM Config File](../../../../packages/nemo-evaluator-launcher/src/nemo_evaluator_launcher/configs/deployment/vllm.yaml)

0 commit comments

Comments
 (0)