fix(helm_release): Preserve Terraform state on failed Helm operations (#1669)#1734
fix(helm_release): Preserve Terraform state on failed Helm operations (#1669)#1734desek wants to merge 4 commits intohashicorp:mainfrom
Conversation
|
@jrhouston / @jaylonmcshan19-x - any chance you can have a look at this? |
|
Merge please |
Fix issues in acceptance tests added by upstream PR hashicorp#1734: 1. TestAccResourceRelease_updateExistingFailed: - Fix expected revision from "3" to "4" since each failed upgrade (Step 2 and Step 3) increments the Helm revision 2. TestAccResourceRelease_statePreservedDuringRefresh: - Remove Config from RefreshState step as terraform-plugin-testing does not allow Config and RefreshState in the same TestStep 3. TestAccResourceRelease_refreshPreservesFailedState: - Same fix: Remove Config from RefreshState step
|
Hi @desek, thank you for this important fix! I've cherry-picked your changes into my fork and found a few issues in the test cases that cause CI failures: 1.
|
- Fix expected revision in Step 4 from "3" to "4" (failed upgrades increment revision) - Remove Config from RefreshState steps (terraform-plugin-testing disallows combining them) - Replace hardcoded "FAILED" with release.StatusFailed.String() (Helm returns lowercase) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@schnell3526 can you have a look if it passes the test now? |
Rollback Plan
If a change needs to be reverted, we will publish an updated version of the library.
Changes to Security Controls
No changes to security controls (access controls, encryption, logging) in this pull request.
Description
This PR fixes a critical bug where
helm_releaseresources are randomly removed from Terraform state after failed deployments or during refresh operations. This issue causes subsequentterraform applyruns to fail with the error "cannot re-use a name that is still in use" because Terraform attempts to recreate releases that already exist in the Kubernetes cluster but are no longer tracked in state.Root Causes Addressed:
Update Function (Primary Fix): When a Helm upgrade fails, the function now saves state before returning an error. Previously, it returned immediately without updating state, causing state loss when combined with subsequent Read operations.
Create Function: Added state persistence before returning error on failed create to prevent orphaning releases from Terraform tracking.
Read Function: Improved error handling order and added informative logging when removing resources from state.
resourceReleaseExists Function: Completely rewritten to use
action.Listinstead ofgetReleaseto detect releases in ALL states (deployed, failed, pending-install, pending-upgrade, etc.). This is more comprehensive and prevents false negatives that led to state removal.Changes Summary:
helm/resource_helm_release.go: Enhanced error handling in Create, Read, and Update functions to preserve state on failures; rewroteresourceReleaseExistsfor comprehensive release detectionhelm/resource_helm_release_test.go: Added 8 new acceptance tests covering various failure scenariosFixes #1669
Acceptance tests
New Acceptance Tests:
TestAccResourceRelease_updateExistingFailed- Tests failed deployment preserves state and recoveryTestAccResourceRelease_statePreservedDuringRefresh- Tests state not removed during refreshTestAccResourceRelease_refreshPreservesFailedState- Tests refresh preserves failed release stateTestAccResourceRelease_comprehensiveReleaseDetection- Tests release detection in all statesTestAccResourceRelease_failedInitialDeployPreservesState- Tests failed initial deployment preserves stateTestAccResourceRelease_failedInitialDeployAtomicNoState- Tests atomic=true behavior on failed installationTestAccResourceRelease_deleteOperationCorrectBehavior- Tests delete operation maintains correct behaviorTestAccResourceRelease_deleteAlreadyRemovedRelease- Tests delete handles already-removed release gracefullyRelease Note
Release note for CHANGELOG:
References
Community Note