-
Notifications
You must be signed in to change notification settings - Fork 1.7k
infra: [TRTLLM-5250] Add sanity check stage for ngc-release images (Build wheels for devel image) #4656
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
infra: [TRTLLM-5250] Add sanity check stage for ngc-release images (Build wheels for devel image) #4656
Conversation
3e98c7a
to
20bbacb
Compare
/bot run --stage-list "Build-Docker-Images" |
PR_Github #6754 [ run ] triggered by Bot |
PR_Github #6754 [ run ] completed with state |
85176ed
to
213ea3e
Compare
/bot run --stage-list "Build-Docker-Images" |
PR_Github #6865 [ run ] triggered by Bot |
PR_Github #6865 [ run ] completed with state |
PR_Github #6869 [ run ] triggered by Bot |
PR_Github #6869 [ run ] completed with state |
PR_Github #6872 [ run ] triggered by Bot |
PR_Github #6872 [ run ] completed with state |
PR_Github #6899 [ run ] triggered by Bot |
PR_Github #6899 [ run ] completed with state |
PR_Github #6904 [ run ] triggered by Bot |
PR_Github #6906 [ run ] triggered by Bot |
PR_Github #6904 [ run ] completed with state |
PR_Github #6906 [ run ] completed with state |
PR_Github #6911 [ run ] triggered by Bot |
PR_Github #6911 [ run ] completed with state |
PR_Github #6918 [ run ] triggered by Bot |
PR_Github #6918 [ run ] completed with state |
PR_Github #6919 [ run ] triggered by Bot |
PR_Github #6919 [ run ] completed with state |
PR_Github #6990 [ run ] triggered by Bot |
PR_Github #6990 [ run ] completed with state |
PR_Github #7006 [ run ] triggered by Bot |
PR_Github #7009 [ run ] triggered by Bot |
PR_Github #7006 [ run ] completed with state |
Signed-off-by: ZhanruiSunCh <[email protected]>
Signed-off-by: ZhanruiSunCh <[email protected]>
Signed-off-by: ZhanruiSunCh <[email protected]>
Co-authored-by: Yanchao Lu <[email protected]> Signed-off-by: Zhanrui Sun <[email protected]>
Signed-off-by: ZhanruiSunCh <[email protected]>
Co-authored-by: Yanchao Lu <[email protected]> Signed-off-by: Zhanrui Sun <[email protected]>
Signed-off-by: ZhanruiSunCh <[email protected]>
Signed-off-by: ZhanruiSunCh <[email protected]>
Co-authored-by: Yanchao Lu <[email protected]> Signed-off-by: Zhanrui Sun <[email protected]>
Signed-off-by: ZhanruiSunCh <[email protected]>
3b37261
to
4584177
Compare
/bot run --extra-stage "Build-Docker-Images" --skip-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
jenkins/BuildDockerImage.groovy (1)
516-574
: Review artifact polling implementation for potential improvements.The polling logic is functional but could benefit from some refinements:
- Magic numbers: The 60-minute timeout and 60-second polling interval are hardcoded
- Error handling: The wget command uses basic error handling but could be more robust
- Resource efficiency: The polling approach is reasonable but consider if there are webhook alternatives
Consider extracting the timeout and polling interval as parameters:
- def maxWaitMinutes = 60 - def pollIntervalSeconds = 60 + def maxWaitMinutes = params.artifactWaitTimeoutMinutes ?: 60 + def pollIntervalSeconds = params.artifactPollIntervalSeconds ?: 60Additionally, consider adding more detailed logging for debugging:
+ echo "Checking ${fileUrl}..." def exitCode = sh( script: "wget --spider --quiet --timeout=30 --tries=1 '${fileUrl}'", returnStatus: true ) + echo "Exit code: ${exitCode}"
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
jenkins/BuildDockerImage.groovy
(8 hunks)jenkins/L0_MergeRequest.groovy
(2 hunks)jenkins/L0_Test.groovy
(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- jenkins/L0_MergeRequest.groovy
- jenkins/L0_Test.groovy
🧰 Additional context used
🧠 Learnings (1)
jenkins/BuildDockerImage.groovy (1)
Learnt from: amitz-nv
PR: NVIDIA/TensorRT-LLM#5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.374Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks `is_adapter_in_cpu_cache()` and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (9)
jenkins/BuildDockerImage.groovy (9)
15-15
: LGTM: Environment variable initialization follows established pattern.The ARTIFACT_PATH variable initialization follows the same pattern as UPLOAD_PATH, providing a fallback value when the environment variable is not set.
29-29
: LGTM: Boolean flag initialization is appropriate.The RUN_SANITY_CHECK flag provides a clean way to conditionally enable the new sanity check functionality.
43-50
: LGTM: Global variable declaration aligns with pipeline coordination needs.The addition of IMAGE_KEY_TO_TAG field and its initialization in globalVars follows the established pattern and supports the coordination with test pipelines mentioned in the AI summary.
212-217
: LGTM: Conditional logic for post-merge builds is correctly implemented.The logic correctly gates the NGC artifact usage to post-merge builds only, which prevents potential issues with development builds that might not have the required NGC artifacts available.
271-273
: LGTM: imageKeyToTag assignment moved to appropriate location.Moving the imageKeyToTag assignment inside the build step ensures that the mapping is only created when the build actually succeeds, which aligns with the past review comment about ensuring imageKeyToTag only contains successful builds.
298-300
: LGTM: Consistent with the devel image assignment pattern.The imageKeyToTag assignment for the NGC release image follows the same pattern as the devel image, ensuring consistency in the tagging logic.
440-448
: LGTM: Common parameters function reduces code duplication.The getCommonParameters function appropriately centralizes the parameter mapping that's used for triggering downstream jobs, following DRY principles.
575-608
: LGTM: Sanity check stage implementation follows Jenkins best practices.The stage correctly:
- Uses conditional execution with
when
expression- Updates globalVars with imageKeyToTag before triggering downstream job
- Properly handles job parameters and propagation
- Includes appropriate error handling for failed downstream jobs
- Uses descriptive logging messages
609-609
: LGTM: Stage name updated to reflect security focus.The stage name change from generic "Register NGC Images" to "Register NGC Images for Security Checks" better describes the purpose of this stage.
PR_Github #12309 [ run ] triggered by Bot |
PR_Github #12309 [ run ] completed with state |
/bot skip --comment "Sanity check code passed, failed due to flaky network issue on NSPECT check" |
/bot skip --comment "Sanity check code passed, failed due to flaky network issue on NSPECT check" |
PR_Github #12429 [ skip ] triggered by Bot |
PR_Github #12424 [ skip ] triggered by Bot |
PR_Github #12429 [ skip ] completed with state |
PR_Github #12424 [ skip ] completed with state |
…uild wheels for devel image) (NVIDIA#4656) Signed-off-by: ZhanruiSunCh <[email protected]> Signed-off-by: Zhanrui Sun <[email protected]> Co-authored-by: Yanchao Lu <[email protected]>
…uild wheels for devel image) (NVIDIA#4656) Signed-off-by: ZhanruiSunCh <[email protected]> Signed-off-by: Zhanrui Sun <[email protected]> Co-authored-by: Yanchao Lu <[email protected]> Signed-off-by: Shreyas Misra <[email protected]>
…uild wheels for devel image) (NVIDIA#4656) Signed-off-by: ZhanruiSunCh <[email protected]> Signed-off-by: Zhanrui Sun <[email protected]> Co-authored-by: Yanchao Lu <[email protected]> Signed-off-by: Ransiki Zhang <[email protected]>
Summary by CodeRabbit
Summary by CodeRabbit
New Features
Enhancements