-
Notifications
You must be signed in to change notification settings - Fork 433
Add a Jenkins cloud GC job to ensure timely cloud resource deletion #7638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
d763f2c to
b0e0e8c
Compare
| --aws-region us-west-2 | ||
| sudo ./ci/test-conformance-gke.sh --gc-cluster --gc-cluster-age-hours 3 \ | ||
| --gke-zone us-west1-a | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will these region and zone always be the value? I saw that the region and zone can be overridden in the jobs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed with Lan.
Would be possible to traverse through a list of regions and do cleanup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments. It should be possible to implement this feature by enhancing the jenkins builder, I will investigate and verify it.
| --aws-region us-west-2 | ||
| sudo ./ci/test-conformance-gke.sh --gc-cluster --gc-cluster-age-hours 3 \ | ||
| --gke-zone us-west1-a | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed with Lan.
Would be possible to traverse through a list of regions and do cleanup?
|
|
||
| if [ -z "$creation_epoch" ]; then | ||
| echo "WARNING: Could not parse creation time for cluster $cluster" | ||
| continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe introduce a failed_count for such scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I added failed_count to enhance the result.
| if [ $stack_age_seconds -gt $GC_CLUSTER_AGE_SECONDS ]; then | ||
| echo "Found old CloudFormation stack: $stack_name (age: ${stack_age_hours}h)" | ||
| echo "Deleting CloudFormation stack: $stack_name" | ||
| aws cloudformation delete-stack --region ${REGION} --stack-name ${stack_name} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also have retry for deleting stacks?
ci/test-conformance-eks.sh
Outdated
|
|
||
| if [[ "$RUN_GARBAGE_COLLECTION" == true ]]; then | ||
| trap "kill -9 $timeout_watcher_pid 2>/dev/null || true" EXIT | ||
| garbage_collection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be clusters_gc here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it should be clusters_gc, updated.
|
It would be good to explain why we need this in the pull request and commit message. |
Add a periodic garbage collection job to handle orphaned cloud clusters and prevent resource leakage when primary cleanup steps fail. Signed-off-by: Shuyang Xin <[email protected]>
b0e0e8c to
819343d
Compare
No description provided.