Skip to content

Conversation

@t0mmylam
Copy link
Collaborator

Summary

Adds CLI commands to control Skyhook lifecycle via annotations:

  • skyhook pause <name> - Stops processing new nodes (sets skyhook.nvidia.com/pause)
  • skyhook resume <name> - Resumes a paused Skyhook
  • skyhook disable <name> - Completely disables a Skyhook (sets skyhook.nvidia.com/disable)
  • skyhook enable <name> - Re-enables a disabled Skyhook

Changes

  • New commands in operator/internal/cli/
  • Shared annotation helpers in utils/utils.go
  • Unit tests for all commands
  • Chainsaw e2e tests in k8s-tests/chainsaw/cli/lifecycle/

Comment on lines +38 to +50
- script:
timeout: 30s
content: |
echo "Waiting for Skyhook to be created..."
for i in $(seq 1 15); do
if kubectl get skyhook cli-lifecycle-test 2>/dev/null; then
echo "Skyhook created"
exit 0
fi
sleep 2
done
echo "Timeout waiting for Skyhook"
exit 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 38 to 52
- script:
timeout: 60s
content: |
NODE=$(kubectl get nodes -l skyhook.nvidia.com/test-node=skyhooke2e -o jsonpath='{.items[0].metadata.name}')
echo "Waiting for Skyhook activity on node $NODE..."
for i in $(seq 1 30); do
ANNOTATION=$(kubectl get node "$NODE" -o jsonpath='{.metadata.annotations.skyhook\.nvidia\.com/nodeState_cli-node-test}' 2>/dev/null)
if [ -n "$ANNOTATION" ]; then
echo "Skyhook activity detected"
exit 0
fi
sleep 2
done
echo "Timeout waiting for activity"
exit 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as before

Comment on lines 37 to 51
- script:
timeout: 60s
content: |
NODE=$(kubectl get nodes -l skyhook.nvidia.com/test-node=skyhooke2e -o jsonpath='{.items[0].metadata.name}')
echo "Waiting for Skyhook activity on node $NODE..."
for i in $(seq 1 30); do
ANNOTATION=$(kubectl get node "$NODE" -o jsonpath='{.metadata.annotations.skyhook\.nvidia\.com/nodeState_cli-package-test}' 2>/dev/null)
if [ -n "$ANNOTATION" ]; then
echo "Skyhook activity detected"
exit 0
fi
sleep 2
done
echo "Timeout waiting for activity"
exit 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

delete(node.Labels, v1alpha1.NodeIgnoreLabel)
}

_, err := kubeClient.Kubernetes().CoreV1().Nodes().Update(ctx, node, metav1.UpdateOptions{})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should do a patch for this change.

// Remove the Skyhook annotation
delete(node.Annotations, annotationKey)

_, err := kubeClient.Kubernetes().CoreV1().Nodes().Update(ctx, node, metav1.UpdateOptions{})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do a merge for this as well. merge is more efficient.

@t0mmylam t0mmylam changed the base branch from main to cli-tests December 16, 2025 00:17
@t0mmylam t0mmylam merged commit ac9018f into cli-tests Dec 16, 2025
1 check passed
@t0mmylam t0mmylam deleted the lifecycle-cmds branch December 16, 2025 23:13
t0mmylam added a commit that referenced this pull request Dec 17, 2025
* feat(cli): Add node management commands and ignore label support

* feat: Consolidate CLI e2e tests with proper assertions and CI integration

* feat(cli): Add lifecycle management commands (pause, resume, disable, enable) (#127)

* feat(cli): Add lifecycle management commands (pause, resume, disable, enable)

* update k8s tests

* change to patch
t0mmylam added a commit that referenced this pull request Dec 17, 2025
* feat(cli): Add node management commands and ignore label support

* feat: Consolidate CLI e2e tests with proper assertions and CI integration

* feat(cli): Add lifecycle management commands (pause, resume, disable, enable) (#127)

* feat(cli): Add lifecycle management commands (pause, resume, disable, enable)

* update k8s tests

* change to patch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants