Skip to content

fix(helm_release): Preserve Terraform state on failed Helm operations (#1669)#1734

Open
desek wants to merge 4 commits intohashicorp:mainfrom
desek:fix/issue-1669-helm-release-state-removal
Open

fix(helm_release): Preserve Terraform state on failed Helm operations (#1669)#1734
desek wants to merge 4 commits intohashicorp:mainfrom
desek:fix/issue-1669-helm-release-state-removal

Conversation

@desek
Copy link

@desek desek commented Dec 15, 2025

Rollback Plan

If a change needs to be reverted, we will publish an updated version of the library.

Changes to Security Controls

No changes to security controls (access controls, encryption, logging) in this pull request.

Description

This PR fixes a critical bug where helm_release resources are randomly removed from Terraform state after failed deployments or during refresh operations. This issue causes subsequent terraform apply runs to fail with the error "cannot re-use a name that is still in use" because Terraform attempts to recreate releases that already exist in the Kubernetes cluster but are no longer tracked in state.

Root Causes Addressed:

  1. Update Function (Primary Fix): When a Helm upgrade fails, the function now saves state before returning an error. Previously, it returned immediately without updating state, causing state loss when combined with subsequent Read operations.

  2. Create Function: Added state persistence before returning error on failed create to prevent orphaning releases from Terraform tracking.

  3. Read Function: Improved error handling order and added informative logging when removing resources from state.

  4. resourceReleaseExists Function: Completely rewritten to use action.List instead of getRelease to detect releases in ALL states (deployed, failed, pending-install, pending-upgrade, etc.). This is more comprehensive and prevents false negatives that led to state removal.

Changes Summary:

  • helm/resource_helm_release.go: Enhanced error handling in Create, Read, and Update functions to preserve state on failures; rewrote resourceReleaseExists for comprehensive release detection
  • helm/resource_helm_release_test.go: Added 8 new acceptance tests covering various failure scenarios

Fixes #1669

Acceptance tests

  • Have you added an acceptance test for the functionality being added?

New Acceptance Tests:

  • TestAccResourceRelease_updateExistingFailed - Tests failed deployment preserves state and recovery
  • TestAccResourceRelease_statePreservedDuringRefresh - Tests state not removed during refresh
  • TestAccResourceRelease_refreshPreservesFailedState - Tests refresh preserves failed release state
  • TestAccResourceRelease_comprehensiveReleaseDetection - Tests release detection in all states
  • TestAccResourceRelease_failedInitialDeployPreservesState - Tests failed initial deployment preserves state
  • TestAccResourceRelease_failedInitialDeployAtomicNoState - Tests atomic=true behavior on failed installation
  • TestAccResourceRelease_deleteOperationCorrectBehavior - Tests delete operation maintains correct behavior
  • TestAccResourceRelease_deleteAlreadyRemovedRelease - Tests delete handles already-removed release gracefully

Release Note

Release note for CHANGELOG:

bug: Fix helm_release resources being randomly removed from Terraform state after failed deployments or during refresh operations (#1669)

References

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

@desek desek requested a review from a team as a code owner December 15, 2025 13:43
@hashicorp-cla-app
Copy link

hashicorp-cla-app bot commented Dec 15, 2025

CLA assistant check
All committers have signed the CLA.

@desek
Copy link
Author

desek commented Jan 12, 2026

@jrhouston / @jaylonmcshan19-x - any chance you can have a look at this?

@CRASH-Tech
Copy link

Merge please

schnell3526 added a commit to schnell3526/terraform-provider-helm that referenced this pull request Jan 31, 2026
Fix issues in acceptance tests added by upstream PR hashicorp#1734:

1. TestAccResourceRelease_updateExistingFailed:
   - Fix expected revision from "3" to "4" since each failed upgrade
     (Step 2 and Step 3) increments the Helm revision

2. TestAccResourceRelease_statePreservedDuringRefresh:
   - Remove Config from RefreshState step as terraform-plugin-testing
     does not allow Config and RefreshState in the same TestStep

3. TestAccResourceRelease_refreshPreservesFailedState:
   - Same fix: Remove Config from RefreshState step
@schnell3526
Copy link

Hi @desek, thank you for this important fix! I've cherry-picked your changes into my fork and found a few issues in the test cases that cause CI failures:

1. TestAccResourceRelease_updateExistingFailed

The expected revision in Step 4 should be "4" instead of "3", because each failed upgrade (Step 2 and Step 3) increments the Helm revision:

  • Step 1: revision 1 (initial deploy)
  • Step 2: revision 2 (failed upgrade)
  • Step 3: revision 3 (failed upgrade again)
  • Step 4: revision 4 (successful recovery)

2. TestAccResourceRelease_statePreservedDuringRefresh and TestAccResourceRelease_refreshPreservesFailedState

These tests fail with:

TestStep cannot have Config or ConfigDirectory or ConfigFile and RefreshState

The terraform-plugin-testing framework does not allow Config and RefreshState to be used in the same TestStep. The fix is to remove Config from the RefreshState steps.

3. Hardcoded "FAILED" status

Several tests use hardcoded "FAILED" string, but Helm returns lowercase "failed". The fix is to use release.StatusFailed.String() for consistency with other tests in this file.


I've fixed these issues in my fork. Here are the commits if they help:

- Fix expected revision in Step 4 from "3" to "4" (failed upgrades increment revision)
- Remove Config from RefreshState steps (terraform-plugin-testing disallows combining them)
- Replace hardcoded "FAILED" with release.StatusFailed.String() (Helm returns lowercase)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@desek
Copy link
Author

desek commented Mar 13, 2026

@schnell3526 can you have a look if it passes the test now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

v3 Provider appears to randomly remove helm_release resources from TF state - Hard to reproduce

3 participants