NVIDIA-NeMo
diff --git a/‎README.md‎
Lines changed: 18 additions & 18 deletions b/‎README.md‎
Lines changed: 18 additions & 18 deletions
diff --git a/‎docs/conf.py‎
Lines changed: 10 additions & 0 deletions b/‎docs/conf.py‎
Lines changed: 10 additions & 0 deletions
diff --git a/‎docs/nemo-evaluator-launcher/configuration/deployment/index.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/nemo-evaluator-launcher/configuration/deployment/index.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/nemo-evaluator-launcher/configuration/deployment/nim.md‎
Lines changed: 3 additions & 3 deletions b/‎docs/nemo-evaluator-launcher/configuration/deployment/nim.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎docs/nemo-evaluator-launcher/configuration/deployment/none.md‎
Lines changed: 6 additions & 6 deletions b/‎docs/nemo-evaluator-launcher/configuration/deployment/none.md‎
Lines changed: 6 additions & 6 deletions
diff --git a/‎docs/nemo-evaluator-launcher/configuration/deployment/sglang.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/nemo-evaluator-launcher/configuration/deployment/sglang.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/nemo-evaluator-launcher/configuration/deployment/vllm.md‎
Lines changed: 6 additions & 6 deletions b/‎docs/nemo-evaluator-launcher/configuration/deployment/vllm.md‎
Lines changed: 6 additions & 6 deletions
@@ -20,7 +20,7 @@ The platform consists of two main components:
 
 Most users only need to interact with the `nemo-evaluator-launcher` as universal gateway to different benchmarks and harnesses. It is however possible to interact directly with `nemo-evaluator` by following this [guide](./docs/nemo-evaluator/workflows/using-containers.md).
 
-```mermaid
+```{mermaid}
 graph TD
     A[User] --> B{NeMo Evaluator Launcher};
     B -- " " --> C{Local};
@@ -104,23 +104,23 @@ NeMo Evaluator provides pre-built evaluation containers for different evaluation
 
 | Container | Description | NGC Catalog | Latest Tag | Supported benchmarks |
 |-----------|-------------|-------------|------------| ------------|
-| **agentic_eval** | Agentic AI evaluation framework | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/agentic_eval) | `25.08.0` | Agentic Eval Topic Adherence, Agentic Eval Tool Call, Agentic Eval Goal and Answer Accuracy |
-| **bfcl** | Function calling | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/bfcl) | `25.08.0` | BFCL v2 and v3 |
-| **bigcode-evaluation-harness** | Code generation evaluation | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/bigcode-evaluation-harness) | `25.08.0` | MBPP, MBPP-Plus, HumanEval, HumanEval+, Multiple (cpp, cs, d, go, java, jl, js, lua, php, pl, py, r, rb, rkt, rs, scala, sh, swift, ts) |
-| **garak** | Safety and vulnerability testing | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/garak) | `25.08.0` | Garak |
-| **helm** | Holistic evaluation framework | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/helm) | `25.08.0` | MedHelm |
-| **hle** | Academic knowledge and problem solving | [Link]() | `25.08.0` | HLE |
-| **ifbench** | Instruction following | [Link]() | `25.08.0` | IFBench |
-| **livecodebench** | Coding | [Link]() | `25.08.0` | LiveCodeBench (v1-v6, 0724_0125, 0824_0225) |
-| **lm-evaluation-harness** | Language model benchmarks | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/lm-evaluation-harness) | `25.08.0` | ARC Challenge (also multilingual), GSM8K, HumanEval, HumanEval+, MBPP, MINERVA MMMLU-Pro, RACE, TruthfulQA, AGIEval, BBH, BBQ, CSQA, Frames, Global MMMLU, GPQA-D, HellaSwag (also multilingual), IFEval, MGSM, MMMLU, MMMLU-Pro, MMMLU-ProX (de, es, fr, it, ja), MMLU-Redux, MUSR, OpenbookQA, Piqa, Social IQa, TruthfulQA, WikiLingua, WinoGrande|
-| **mmath** | Multilingual math reasoning | [Link]() | `25.08.0` | EN, ZH, AR, ES, FR, JA, KO, PT, TH, VI |
-| **mtbench** | Multi-turn conversation evaluation | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/mtbench) | `25.08.0` | MT-Bench |
-| **rag_retriever_eval** | RAG system evaluation | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/rag_retriever_eval) | `25.08.0` | RAG, Retriever |
-| **safety-harness** | Safety and bias evaluation | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/safety-harness) | `25.08.0` | Aegis v2, BBQ, WildGuard |
-| **scicode** | Coding for scientific research | [Link]() | `25.08.0` | SciCode |
-| **simple-evals** | Common evaluation tasks | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/simple-evals) | `25.08.0` | GPQA-D, MATH-500, AIME 24 & 25, HumanEval, MGSM, MMMLU, MMMLU-Pro, MMMLU-lite (AR, BN, DE, EN, ES, FR, HI, ID, IT, JA, KO, MY, PT, SW, YO, ZH), SimpleQA |
-| **tooltalk** | Tool usage evaluation | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/tooltalk) | `25.08.0` | ToolTalk |
-| **vlmevalkit** | Vision-language model evaluation | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/vlmevalkit) | `25.08.0` | AI2D, ChartQA, OCRBench, SlideVQA |
+| **agentic_eval** | Agentic AI evaluation framework | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/agentic_eval) | `25.08.1` | Agentic Eval Topic Adherence, Agentic Eval Tool Call, Agentic Eval Goal and Answer Accuracy |
+| **bfcl** | Function calling | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/bfcl) | `25.08.1` | BFCL v2 and v3 |
+| **bigcode-evaluation-harness** | Code generation evaluation | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/bigcode-evaluation-harness) | `25.08.1` | MBPP, MBPP-Plus, HumanEval, HumanEval+, Multiple (cpp, cs, d, go, java, jl, js, lua, php, pl, py, r, rb, rkt, rs, scala, sh, swift, ts) |
+| **garak** | Safety and vulnerability testing | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/garak) | `25.08.1` | Garak |
+| **helm** | Holistic evaluation framework | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/helm) | `25.08.1` | MedHelm |
+| **hle** | Academic knowledge and problem solving | Link: TBD | `25.08.1` | HLE |
+| **ifbench** | Instruction following | Link: TBD | `25.08.1` | IFBench |
+| **livecodebench** | Coding | Link: TBD | `25.08.1` | LiveCodeBench (v1-v6, 0724_0125, 0824_0225) |
+| **lm-evaluation-harness** | Language model benchmarks | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/lm-evaluation-harness) | `25.08.1` | ARC Challenge (also multilingual), GSM8K, HumanEval, HumanEval+, MBPP, MINERVA MMMLU-Pro, RACE, TruthfulQA, AGIEval, BBH, BBQ, CSQA, Frames, Global MMMLU, GPQA-D, HellaSwag (also multilingual), IFEval, MGSM, MMMLU, MMMLU-Pro, MMMLU-ProX (de, es, fr, it, ja), MMLU-Redux, MUSR, OpenbookQA, Piqa, Social IQa, TruthfulQA, WikiLingua, WinoGrande|
+| **mmath** | Multilingual math reasoning | Link: TBD | `25.08.1` | EN, ZH, AR, ES, FR, JA, KO, PT, TH, VI |
+| **mtbench** | Multi-turn conversation evaluation | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/mtbench) | `25.08.1` | MT-Bench |
+| **rag_retriever_eval** | RAG system evaluation | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/rag_retriever_eval) | `25.08.1` | RAG, Retriever |
+| **safety-harness** | Safety and bias evaluation | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/safety-harness) | `25.08.1` | Aegis v2, BBQ, WildGuard |
+| **scicode** | Coding for scientific research | Link: TBD | `25.08.1` | SciCode |
+| **simple-evals** | Common evaluation tasks | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/simple-evals) | `25.08.1` | GPQA-D, MATH-500, AIME 24 & 25, HumanEval, MGSM, MMMLU, MMMLU-Pro, MMMLU-lite (AR, BN, DE, EN, ES, FR, HI, ID, IT, JA, KO, MY, PT, SW, YO, ZH), SimpleQA |
+| **tooltalk** | Tool usage evaluation | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/tooltalk) | `25.08.1` | ToolTalk |
+| **vlmevalkit** | Vision-language model evaluation | [Link](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/eval-factory/containers/vlmevalkit) | `25.08.1` | AI2D, ChartQA, OCRBench, SlideVQA |
 
 
 
 
@@ -52,8 +52,11 @@
     "deflist",  # Supports definition lists with term: definition format
     "fieldlist",  # Enables field lists for metadata like :author: Name
     "tasklist",  # Adds support for GitHub-style task lists with [ ] and [x]
+    "html_image",  # Enables HTML image tags
 ]
 myst_heading_anchors = 5  # Generates anchor links for headings up to level 5
+myst_auto_link_extensions = []  # Disable automatic link conversion
+myst_url_schemes = ["http", "https", "mailto"]  # Only convert these URL schemes
 
 # -- Options for Autodoc2 ---------------------------------------------------
 sys.path.insert(0, os.path.abspath(".."))
@@ -103,10 +106,17 @@
 nitpicky = False
 suppress_warnings = [
     "ref.python",  # Suppress ambiguous cross-reference warnings
+    "toc.not_included",  # Suppress toctree warnings for myst-based docs
+    "myst.header",  # Suppress header level warnings
+    "myst.directive_unknown",  # Suppress unknown directive warnings
+    "myst.xref_missing",  # Suppress missing cross-reference warnings
+    "ref.doc",  # Suppress document reference warnings
 ]
 
 # Github links are now getting rate limited from the Github Actions
 linkcheck_ignore = [
     ".*github\\.com.*",
     ".*githubusercontent\\.com.*",
+    ".*catalog\\.ngc\\.nvidia\\.com.*",  # Temporary: NGC catalog links that may not be publicly accessible
+    ".*platform\\.openai\\.com.*",  # To diagnose: OpenAI platform links that may require authentication
 ]
@@ -25,4 +25,4 @@ deployment:
 
 ## Configuration Files
 
-See all available deployment configurations: [Deployment Configs](https://gitlab-master.nvidia.com/dl/JoC/competitive_evaluation/nv-eval-platform/-/tree/main/nemo_evaluator_launcher/src/nemo_evaluator_launcher/configs/deployment?ref_type=heads)
+See all available deployment configurations: [Deployment Configs](../../../../packages/nemo-evaluator-launcher/src/nemo_evaluator_launcher/configs/deployment)
@@ -4,7 +4,7 @@ NIM (NVIDIA Inference Microservices) provides optimized inference microservices
 
 ## Configuration
 
-See the complete configuration structure in the [NIM Config File](https://gitlab-master.nvidia.com/dl/JoC/competitive_evaluation/nv-eval-platform/-/blob/main/nemo_evaluator_launcher/src/nemo_evaluator_launcher/configs/deployment/nim.yaml?ref_type=heads).
+See the complete configuration structure in the [NIM Config File](../../../../packages/nemo-evaluator-launcher/src/nemo_evaluator_launcher/configs/deployment/nim.yaml).
 
 ## Key Settings
 
@@ -17,10 +17,10 @@ Tips:
 - You do  not need to adjust params like tensor/data parallelism NIM should pick the best set up based on your hardware.
 
 Examples:
-- [Lepton NIM Example](https://gitlab-master.nvidia.com/dl/JoC/competitive_evaluation/nv-eval-platform/-/blob/main/nemo_evaluator_launcher/examples/lepton_nim_llama_3_1_8b_instruct.yaml?ref_type=heads) - NIM deployment on Lepton platform
+- [Lepton NIM Example](../../../../packages/nemo-evaluator-launcher/examples/lepton_nim_llama_3_1_8b_instruct.yaml) - NIM deployment on Lepton platform
 
 ## Reference
 
 - [NIM Documentation](https://docs.nvidia.com/nim/)
 - [NIM Deployment Guide](https://docs.nvidia.com/nim/large-language-models/latest/deployment-guide.html#)
-- [NIM Config File](https://gitlab-master.nvidia.com/dl/JoC/competitive_evaluation/nv-eval-platform/-/blob/main/nemo_evaluator_launcher/src/nemo_evaluator_launcher/configs/deployment/nim.yaml?ref_type=heads)
+- [NIM Config File](../../../../packages/nemo-evaluator-launcher/src/nemo_evaluator_launcher/configs/deployment/nim.yaml)
@@ -27,11 +27,11 @@ target:
 - If your model does not require an API key, you can skip the `api_key` field entirely
 
 Examples:
-- [Local None Example](https://gitlab-master.nvidia.com/dl/JoC/competitive_evaluation/nv-eval-platform/-/blob/main/nemo_evaluator_launcher/examples/local_llama_3_1_8b_instruct.yaml?ref_type=heads) - Local evaluation with existing endpoint
-- [Lepton None Example](https://gitlab-master.nvidia.com/dl/JoC/competitive_evaluation/nv-eval-platform/-/blob/main/nemo_evaluator_launcher/examples/lepton_none_llama_3_1_8b_instruct.yaml?ref_type=heads) - Lepton evaluation with existing endpoint
-- [Slurm None Example](https://gitlab-master.nvidia.com/dl/JoC/competitive_evaluation/nv-eval-platform/-/blob/main/nemo_evaluator_launcher/examples/slurm_no_deployment_llama_3_1_8b_instruct.yaml?ref_type=heads) - Slurm evaluation with existing endpoint
-- [Local with Metadata](https://gitlab-master.nvidia.com/dl/JoC/competitive_evaluation/nv-eval-platform/-/blob/main/nemo_evaluator_launcher/examples/local_with_user_provided_metadata.yaml?ref_type=heads) - Local evaluation with custom metadata
-- [Auto Export Example](https://gitlab-master.nvidia.com/dl/JoC/competitive_evaluation/nv-eval-platform/-/blob/main/nemo_evaluator_launcher/examples/local_auto_export_llama_3_1_8b_instruct.yaml?ref_type=heads) - Local evaluation with automatic result export
+- [Local None Example](../../../../packages/nemo-evaluator-launcher/examples/local_llama_3_1_8b_instruct.yaml) - Local evaluation with existing endpoint
+- [Lepton None Example](../../../../packages/nemo-evaluator-launcher/examples/lepton_none_llama_3_1_8b_instruct.yaml) - Lepton evaluation with existing endpoint
+- [Slurm None Example](../../../../packages/nemo-evaluator-launcher/examples/slurm_no_deployment_llama_3_1_8b_instruct.yaml) - Slurm evaluation with existing endpoint
+- [Local with Metadata](../../../../packages/nemo-evaluator-launcher/examples/local_with_user_provided_metadata.yaml) - Local evaluation with custom metadata
+- [Auto Export Example](../../../../packages/nemo-evaluator-launcher/examples/local_auto_export_llama_3_1_8b_instruct.yaml) - Local evaluation with automatic result export
 
 ## Use Cases
 
@@ -43,4 +43,4 @@ This deployment option is useful when:
 
 ## Reference
 
-- [None Config File](https://gitlab-master.nvidia.com/dl/JoC/competitive_evaluation/nv-eval-platform/-/blob/main/nemo_evaluator_launcher/src/nemo_evaluator_launcher/configs/deployment/none.yaml?ref_type=heads)
+- [None Config File](../../../../packages/nemo-evaluator-launcher/src/nemo_evaluator_launcher/configs/deployment/none.yaml)
@@ -4,7 +4,7 @@ SGLang is a fast serving framework for large language models and vision language
 
 ## Configuration
 
-See the complete configuration structure in the [SGLang Config File](https://gitlab-master.nvidia.com/dl/JoC/competitive_evaluation/nv-eval-platform/-/blob/main/nemo_evaluator_launcher/src/nemo_evaluator_launcher/configs/deployment/sglang.yaml?ref_type=heads).
+See the complete configuration structure in the [SGLang Config File](../../../../packages/nemo-evaluator-launcher/src/nemo_evaluator_launcher/configs/deployment/sglang.yaml).
 
 ## Key Settings
 
@@ -30,4 +30,4 @@ Tips:
 ## Reference
 
 - [SGLang Documentation](https://docs.sglang.ai/)
-- [SGLang Config File](https://gitlab-master.nvidia.com/dl/JoC/competitive_evaluation/nv-eval-platform/-/blob/main/nemo_evaluator_launcher/src/nemo_evaluator_launcher/configs/deployment/sglang.yaml?ref_type=heads)
+- [SGLang Config File](../../../../packages/nemo-evaluator-launcher/src/nemo_evaluator_launcher/configs/deployment/sglang.yaml)
@@ -4,7 +4,7 @@ vLLM is a fast and easy-to-use library for LLM inference and serving.
 
 ## Configuration
 
-See the complete configuration structure in the [vLLM Config File](https://gitlab-master.nvidia.com/dl/JoC/competitive_evaluation/nv-eval-platform/-/blob/main/nemo_evaluator_launcher/src/nemo_evaluator_launcher/configs/deployment/vllm.yaml?ref_type=heads).
+See the complete configuration structure in the [vLLM Config File](../../../../packages/nemo-evaluator-launcher/src/nemo_evaluator_launcher/configs/deployment/vllm.yaml).
 
 ## Key Settings
 
@@ -27,13 +27,13 @@ Tips:
   - If `checkpoint_path` is provided instead, use that local path
 
 Examples:
-- [Lepton vLLM Example](https://gitlab-master.nvidia.com/dl/JoC/competitive_evaluation/nv-eval-platform/-/blob/main/nemo_evaluator_launcher/examples/lepton_vllm_llama_3_1_8b_instruct.yaml?ref_type=heads) - vLLM deployment on Lepton platform
-- [Slurm vLLM Example](https://gitlab-master.nvidia.com/dl/JoC/competitive_evaluation/nv-eval-platform/-/blob/main/nemo_evaluator_launcher/examples/slurm_llama_3_1_8b_instruct.yaml?ref_type=heads) - vLLM deployment on Slurm cluster
-- [Slurm vLLM HF Example](https://gitlab-master.nvidia.com/dl/JoC/competitive_evaluation/nv-eval-platform/-/blob/main/nemo_evaluator_launcher/examples/slurm_llama_3_1_8b_instruct_hf.yaml?ref_type=heads) - vLLM with Hugging Face model
-- [Notebook API Example](https://gitlab-master.nvidia.com/dl/JoC/competitive_evaluation/nv-eval-platform/-/blob/main/nemo_evaluator_launcher/examples/notebooks/nv-eval-api.ipynb?ref_type=heads) - Python API usage with vLLM
+- [Lepton vLLM Example](../../../../packages/nemo-evaluator-launcher/examples/lepton_vllm_llama_3_1_8b_instruct.yaml) - vLLM deployment on Lepton platform
+- [Slurm vLLM Example](../../../../packages/nemo-evaluator-launcher/examples/slurm_llama_3_1_8b_instruct.yaml) - vLLM deployment on Slurm cluster
+- [Slurm vLLM HF Example](../../../../packages/nemo-evaluator-launcher/examples/slurm_llama_3_1_8b_instruct_hf.yaml) - vLLM with Hugging Face model
+- [Notebook API Example](../../../../packages/nemo-evaluator-launcher/examples/notebooks/nv-eval-api.ipynb) - Python API usage with vLLM
 
 
 ## Reference
 
 - [vLLM Documentation](https://docs.vllm.ai/en/latest/)
-- [vLLM Config File](https://gitlab-master.nvidia.com/dl/JoC/competitive_evaluation/nv-eval-platform/-/blob/main/nemo_evaluator_launcher/src/nemo_evaluator_launcher/configs/deployment/vllm.yaml?ref_type=heads)
+- [vLLM Config File](../../../../packages/nemo-evaluator-launcher/src/nemo_evaluator_launcher/configs/deployment/vllm.yaml)
Original file line number	Diff line number	Diff line change
`@@ -25,4 +25,4 @@ deployment:`
`25`	`25`
`26`	`26`	`## Configuration Files`
`27`	`27`
`28`		`-See all available deployment configurations: [Deployment Configs](https://gitlab-master.nvidia.com/dl/JoC/competitive_evaluation/nv-eval-platform/-/tree/main/nemo_evaluator_launcher/src/nemo_evaluator_launcher/configs/deployment?ref_type=heads)`
	`28`	`+See all available deployment configurations: [Deployment Configs](../../../../packages/nemo-evaluator-launcher/src/nemo_evaluator_launcher/configs/deployment)`