Skip to content

Conversation

@adelbertc
Copy link
Contributor

@adelbertc adelbertc commented Nov 24, 2025

Description

apk add may fail due to things like transient upstream unavailability. Prior to this change we would get stuck in an infinite loop of trying to invoke curl. This change makes the simplest change to make it fail and let K8s handle restarting the init container.

Alternatives considered were to do a retry loop in the command or to use an alternative image that comes with curl and jq like netshoot, but decided for a Quick Start local deployment simplicity wins.

Before (simulated failure):

fetch https://dl-cdn.alpinelinux.org/alpine/v3.18/main/aarch64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.18/community/aarch64/APKINDEX.tar.gz
ERROR: unable to select packages:
  curl3 (no such package):
    required by: world[curl3]
Waiting for backend-operator-token to be created...
sh: curl: not found
Token backend-operator-token not found, waiting...
Token backend-operator-token not found, waiting...
sh: curl: not found
sh: curl: not found
Token backend-operator-token not found, waiting...
sh: curl: not found
Token backend-operator-token not found, waiting...
sh: curl: not found
...

After:

fetch https://dl-cdn.alpinelinux.org/alpine/v3.18/main/aarch64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.18/community/aarch64/APKINDEX.tar.gz
ERROR: unable to select packages:
  curl3 (no such package):
    required by: world[curl3]

and

$ k get po --watch
NAME   READY   STATUS              RESTARTS   AGE
test   0/1     ContainerCreating   0          4s
test   1/1     Running             0          5s
test   0/1     Error               0          6s
test   0/1     Error               1 (2s ago)   7s
test   0/1     CrashLoopBackOff    1 (1s ago)   8s

Issue #79

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

apk add may fail due to things like transient upstream unavailability.
Prior to this change we would get stuck in an infinite loop of trying
to invoke curl. This change makes the simplest change to make it fail
and let K8s handle restarting the init container.
@adelbertc adelbertc requested a review from a team November 24, 2025 03:44
@adelbertc adelbertc enabled auto-merge (squash) November 24, 2025 03:48
Copy link
Contributor

@elookpotts-nvidia elookpotts-nvidia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, thanks for fixing this!

@adelbertc adelbertc merged commit 5e5e6e1 into main Nov 24, 2025
7 checks passed
@adelbertc adelbertc deleted the adelbertc/quick-start-fail-fast branch November 24, 2025 15:40
adelbertc added a commit that referenced this pull request Nov 26, 2025
…tiversion links (#93)

* Add prerequisite for Git LFS to develop OSMO (#64)

* Remove trigger response (#65)

* Fix readme docs link (#68)

* Enforce PR description in CI (#69)

* Enforce PR description in CI

* Add pull_request edit event

* Move to its own file so checks don't run for every edit

* formatting

* add reopened event

* Ethany/isaac lab examples (#66)

* Update Isaac Sim Workflow

* Add Isaac Lab Multi GPU/Node Samples

* #71 - Add a root documentation page (#72)

* #71 - Add a root documentation page

* simplify subdir overrides

* fix font

* restore fallback fonts

* fix deployment options card height

* reorganize user guide landing

* Make the SVG on the docs root page smaller (#74)

* Adds multiversion support for docs (#40)

* Use sphinx-multiversion so multiple versions of the docs are built

---------

Co-authored-by: Fernando Luo <[email protected]>

* docs: fix doc links after multiversion support (#76)

* #78 - docs: temp fix for multiversion links (#80)

* docs: fix broken user_guide multiversion links

* Revert img src change

Co-authored-by: RyaliNvidia <[email protected]>

---------

Co-authored-by: RyaliNvidia <[email protected]>

* quick-start: make apk fail fast (#84)

apk add may fail due to things like transient upstream unavailability.
Prior to this change we would get stuck in an infinite loop of trying
to invoke curl. This change makes the simplest change to make it fail
and let K8s handle restarting the init container.

* Make docs root page use a grid (#85)

* Make docs root page use a grid

* remove main branch specifier

* Remove unused file from root docs glob (#87)

* Add ci-docs step to run bazel tests on docs/ changes (#88)

* Add ci-docs step to run bazel tests on docs/ changes

* only run build - not test

* Resolve broken/redirect links in readme and documentation (#89)

* #82 - Add link checking to documentation (#83)

* Add link checking to documentation

* Verified check

* Fix artifacts upload

* Resolve issue

* update workflows logic

* docs: unify sphinx build, fix multiversion links

* Unify Sphinx build to build as one project instead
  of separate User Guide and Deployment Guide builds
* Change hard-coded URLs to use Sphinx roles so
  the correct multiversion link can be interpolated
  at build time

The goal of this change is to make the links in our docs
version-aware after moving to multiversion docs in
#40. Normally the
way to do this is to reference roles like `:doc:` and
`:ref:` instead of hard-coding URLs, but since we have
cross-guide links we also have to unify the Sphinx build
to make the builds aware of roles across all our docs.

* fix hard-coded links

* prepend main to some other hard-coded links

* fix missing sidebar

* cleanup multibuild state

* fix sidebar toc by overriding nvidia-sphinx-theme

* only linkcheck root now

* add note in sidebar-nav-bs

* restore custom index css to remove sidebar

* fix broken images

* fix multiversion build

* add copyright

* move usages from doc to ref

---------

Co-authored-by: Ethan Look-Potts <[email protected]>
Co-authored-by: Fernando L <[email protected]>
Co-authored-by: RyaliNvidia <[email protected]>
Co-authored-by: ethany-nv <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants