Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,11 @@ jobs:

golang:
uses: ./.github/workflows/golang.yaml


docs-check:
uses: ./.github/workflows/docs_check.yaml
secrets: inherit

image:
uses: ./.github/workflows/image.yaml
needs: [golang, code-scanning]
Expand Down
38 changes: 38 additions & 0 deletions .github/workflows/docs_check.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
## Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.##
## Licensed under the Apache License, Version 2.0 (the "License");
## you may not use this file except in compliance with the License.
## You may obtain a copy of the License at
##
## http://www.apache.org/licenses/LICENSE-2.0
##
## Unless required by applicable law or agreed to in writing, software
## distributed under the License is distributed on an "AS IS" BASIS,
## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
## See the License for the specific language governing permissions and
## limitations under the License.
##

name: Docs

on:
workflow_call:

jobs:
lint:
runs-on: ubuntu-latest

steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Set up Ruby
uses: ruby/setup-ruby@v1
with:
ruby-version: '3.5'

- name: Install mdl
run: gem install mdl -v 0.13.0

- name: Run Markdown lint
run: |
find docs/ -path docs/vendor -prune -false -o -name '*.md' | xargs mdl -s docs/mdl-style.rb
11 changes: 10 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,9 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
.PHONY: build fmt verify release lint vendor mod-tidy mod-vendor mod-verify check-vendor
.PHONY: build fmt verify release lint vendor mod-tidy mod-vendor mod-verify check-vendor mdlint

CONTAINER_RUN_CMD ?= docker run
GO_CMD ?= go
GO_FMT ?= gofmt
GO_SRC := $(shell find . -type f -name '*.go' -not -path "./vendor/*")
Expand Down Expand Up @@ -88,6 +89,14 @@ coverage: test
cat $(COVERAGE_FILE) | grep -v "_mock.go" > $(COVERAGE_FILE).no-mocks
go tool cover -func=$(COVERAGE_FILE).no-mocks

mdlint:
${CONTAINER_RUN_CMD} \
--rm \
--volume "${PWD}:/workdir:ro,z" \
--workdir /workdir \
ruby:slim \
/workdir/scripts/mdlint.sh

release:
@rm -rf bin
@mkdir -p bin
Expand Down
204 changes: 70 additions & 134 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,170 +2,106 @@

> * Tech preview, under heavy development *

A tool for creating and managing GPU ready Cloud test environments.
A tool for creating and managing GPU-ready Cloud test environments.

## Installation
---

## 📖 Documentation

- [Quick Start](docs/quick-start.md)
- [Prerequisites](docs/prerequisites.md)
- [Commands Reference](docs/commands/)
- [Contributing Guide](docs/contributing/)
- [Examples](docs/examples/)

---

## 🚀 Quick Start

See [docs/quick-start.md](docs/quick-start.md) for a full walkthrough.

```bash
make build
mv ./bin/holodeck /usr/local/bin/holodeck
sudo mv ./bin/holodeck /usr/local/bin/holodeck
holodeck --help
```

### Prerequisites
---

If utilizing the AWS provider, a valid AWS credentials must be available in the environment.
## 🛠️ Prerequisites

```yaml
apiVersion: holodeck.nvidia.com/v1alpha1
kind: Environment
metadata:
name: holodeck
description: "Devel infra environment"
spec:
provider: aws
```
- Go 1.20+
- (For AWS) Valid AWS credentials in your environment
- (For SSH) Reachable host and valid SSH key

If utilizing the SSH provider, a valid SSH key must and reachable host must be available in the environment file.

```yaml
apiVersion: holodeck.nvidia.com/v1alpha1
kind: Environment
metadata:
name: holodeck
description: "Devel infra environment"
spec:
provider: aws
auth:
keyName: user
privateKey: "/Users/user/.ssh/user.pem"
instance:
hostUrl: "<some-reachable-host-ip>"
```
See [docs/prerequisites.md](docs/prerequisites.md) for details.

---

## 📝 How to Contribute

See [docs/contributing/](docs/contributing/) for full details.

### Main Makefile Targets

- `make build` – Build the holodeck binary
- `make test` – Run all tests
- `make lint` – Run linters
- `make clean` – Remove build artifacts

---

## Usage
## 🧑‍💻 Usage

See [docs/commands/](docs/commands/) for detailed command documentation and examples.

```bash
holodeck --help
```

### The Environment CRD

```yaml
apiVersion: holodeck.nvidia.com/v1alpha1
kind: Environment
metadata:
name: holodeck
description: "Devel infra environment"
spec:
provider: aws # or ssh currently supported
auth:
keyName: user
privateKey: "/Users/user/.ssh/user.pem"
instance: # if provider is ssh you need to define here the hostUrl
type: g4dn.xlarge
region: eu-north-1
ingressIpRanges:
- 192.168.1.0/26
image:
architecture: amd64
imageId: ami-0fe8bec493a81c7da # Ubuntu 22.04 image
containerRuntime:
install: true
name: containerd
version: 1.6.24
kubernetes:
install: true
installer: kubeadm # supported installers: kubeadm, kind, microk8s
version: v1.28.5
```

The dependencies are resolved automatically, from top to bottom. Following the
pattern:
### Example: Create an environment

> Kubernetes -> Container Runtime -> Container Toolkit -> NVDriver
```bash
holodeck create -f ./examples/v1alpha1_environment.yaml
```

If Kubernetes is requested, and no container runtime is requested, a default
container runtime will be added to the environment..
### Example: List environments

If Container Toolkit is requested, and no container runtime is requested, a
default container runtime will be added to the environment.
```bash
holodeck list
```

### Create an environment
### Example: Delete an environment

```bash
$ holodeck create -f ./examples/v1alpha1_environment.yaml
...
holodeck delete <instance-id>
```

### Delete an environment
### Example: Check status

```bash
$ holodeck delete -f ./examples/v1alpha1_environment.yaml
...
holodeck status <instance-id>
```

### Dry Run
### Example: Dry Run

```bash
$ holodeck dryrun -f ./examples/v1alpha1_environment.yaml
Dryrun environment holodeck 🔍
✔ Checking if instance type g4dn.xlarge is supported in region eu-north-1
✔ Checking if image ami-0fe8bec493a81c7da is supported in region eu-north-1
✔ Resolving dependencies 📦
Dryrun succeeded 🎉
holodeck dryrun -f ./examples/v1alpha1_environment.yaml
```

## Supported Cuda-Drivers
---

Supported Nvidia drivers are:
## 📦 Supported Cuda-Drivers

```yaml
nvidiaDriver:
install: true
version: <version>
```
Where `<version>` can be a prefix of any package version. The following are example package versions:

- 570.86.15-0ubuntu1
- 570.86.10-0ubuntu1
- 565.57.01-0ubuntu1
- 560.35.05-0ubuntu1
- 560.35.03-1
- 560.28.03-1
- 555.42.06-1
- 555.42.02-1
- 550.144.03-0ubuntu1
- 550.127.08-0ubuntu1
- 550.127.05-0ubuntu1
- 550.90.12-0ubuntu1
- 550.90.07-1
- 550.54.15-1
- 550.54.14-1
- 545.23.08-1
- 545.23.06-1
- 535.230.02-0ubuntu1
- 535.216.03-0ubuntu1
- 535.216.01-0ubuntu1
- 535.183.06-1
- 535.183.01-1
- 535.161.08-1
- 535.161.07-1
- 535.154.05-1
- 535.129.03-1
- 535.104.12-1
- 535.104.05-1
- 535.86.10-1
- 535.54.03-1
- 530.30.02-1
- 525.147.05-1
- 525.125.06-1
- 525.105.17-1
- 525.85.12-1
- 525.60.13-1
- 520.61.05-1
- 515.105.01-1
- 515.86.01-1
- 515.65.07-1
- 515.65.01-1
- 515.48.07-1
- 515.43.04-1
See [docs/prerequisites.md](docs/prerequisites.md#supported-cuda-drivers) for the full list and usage.

---

## 📂 More

- [Examples](docs/examples/)
- [Guides](docs/guides/)

---

For more information, see the [docs/](docs/) directory.
2 changes: 1 addition & 1 deletion cmd/cli/create/create.go
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ func (m command) run(c *cli.Context, opts *options) error {
}
}

m.log.Info("Created instance %s", instanceID)
m.log.Info("\nCreated instance %s", instanceID)
return nil
}

Expand Down
39 changes: 19 additions & 20 deletions cmd/cli/delete/delete.go
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ func (m command) build() *cli.Command {
// Create the 'delete' command
delete := cli.Command{
Name: "delete",
Usage: "Delete a Holodeck instance",
Usage: "Delete one or more Holodeck instances",
Flags: []cli.Flag{
&cli.StringFlag{
Name: "cachepath",
Expand All @@ -53,37 +53,36 @@ func (m command) build() *cli.Command {
Destination: &m.cachePath,
Value: filepath.Join(os.Getenv("HOME"), ".cache", "holodeck"),
},
&cli.StringFlag{
Name: "instance-id",
Aliases: []string{"i"},
Usage: "Instance ID to delete",
},
},

Action: func(c *cli.Context) error {
// Delete using instance ID
instanceID := c.String("instance-id")
return m.run(c, instanceID)
if c.NArg() == 0 {
return fmt.Errorf("at least one instance ID is required")
}
return m.run(c)
},
}

return &delete
}

func (m command) run(c *cli.Context, instanceID string) error {
func (m command) run(c *cli.Context) error {
manager := instances.NewManager(m.log, m.cachePath)

// First check if the instance exists
instance, err := manager.GetInstance(instanceID)
if err != nil {
return fmt.Errorf("failed to get instance: %v", err)
}
// Process each instance ID provided as an argument
for _, instanceID := range c.Args().Slice() {
// First check if the instance exists
instance, err := manager.GetInstance(instanceID)
if err != nil {
return fmt.Errorf("failed to get instance %s: %v", instanceID, err)
}
Comment on lines +74 to +77
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem: The code retrieves an instance but doesn't check if it exists before attempting to delete it. The error returned might be due to the instance not existing, which should be handled differently than other errors.

Suggested Change: Add specific handling for the case where the instance doesn't exist to provide better error messaging and avoid confusion.

Severity (1 - 4): 3 - MAJOR

Lines: 74-77

Suggested change
instance, err := manager.GetInstance(instanceID)
if err != nil {
return fmt.Errorf("failed to get instance %s: %v", instanceID, err)
}
instance, err := manager.GetInstance(instanceID)
if err != nil {
if os.IsNotExist(err) {
return fmt.Errorf("instance %s does not exist", instanceID)
}
return fmt.Errorf("failed to get instance %s: %v", instanceID, err)
}

Generated by Claude 3.5 Sonnet

Was this helpful? 👍 👎


// Delete the instance
if err := manager.DeleteInstance(instanceID); err != nil {
return fmt.Errorf("failed to delete instance %s: %v", instanceID, err)
}

// Delete the instance
if err := manager.DeleteInstance(instanceID); err != nil {
return fmt.Errorf("failed to delete instance: %v", err)
m.log.Info("Successfully deleted instance %s (%s)", instanceID, instance.Name)
}

m.log.Info("Successfully deleted instance %s (%s)", instanceID, instance.Name)
return nil
}
Loading
Loading