Skip to content

CI: ensure teardown runs regardless of test outcome#391

Merged
rkoster merged 1 commit into
masterfrom
fix/ensure-teardown-on-test-failure
Jun 4, 2026
Merged

CI: ensure teardown runs regardless of test outcome#391
rkoster merged 1 commit into
masterfrom
fix/ensure-teardown-on-test-failure

Conversation

@neddp

@neddp neddp commented Jun 1, 2026

Copy link
Copy Markdown
Member

Problem

When run-bats or run-int fails, the teardown-infrastructure job never triggers because Concourse's passed constraint only matches successful builds (it queries the successful_build_outputs table internally). This leaves the BOSH director VM and its static external IP allocated, causing the next pipeline run to fail with:

CPI 'create_vm' method responded with error: CmdError{
  "type":"Bosh::Clouds::VMCreationFailed",
  "message":"VM failed to create: googleapi: Error 400: Invalid resource usage:
    'External IP address: xxx.xxx.xxx.xxx is already in-use.'., invalidResourceUsage",
  "ok_to_retry":true
}

This requires manual cleanup every time a test fails.

Solution

Merge deploy-ubuntu, run-bats, run-int, and teardown-infrastructure into a single deploy-ubuntu-and-run-tests job with a job-level ensure block that guarantees cleanup.

Key design decisions:

Aspect Approach
Test parallelism run-bats and run-int execute in parallel via in_parallel (fail_fast defaults to false, so both run to completion even if one fails)
Guaranteed cleanup Job-level ensure always runs: deletes orphaned VMs, tears down the director, and destroys terraform infrastructure
Partial failures Teardown wrapped in try so that infrastructure destroy proceeds even if the director state is missing (e.g., setup-director failed before creating the VM)
Sole-tenancy cleanup Preserved via nested ensure on run-int (unchanged behavior)

Pipeline structure (before → after)

Before:

test-unit → setup-infrastructure → deploy-ubuntu → run-bats ──┐
                                                   run-int  ──┤→ teardown-infrastructure → pre-release-fan-in

teardown-infrastructure only ran if both test jobs succeeded.

After:

test-unit → setup-infrastructure → deploy-ubuntu-and-run-tests → pre-release-fan-in
                                        │
                                        ├─ setup-director
                                        ├─ in_parallel: [run-bats, run-int]
                                        └─ ensure: teardown (always runs)

Trade-offs

  • Separate run-bats / run-int jobs are no longer visible as individual boxes in the pipeline UI, but individual task pass/fail is still visible within the job
  • The job is longer-running since it encompasses deploy + tests + teardown, but total wall-clock time is similar (tests still run in parallel)

Implemented with ClaudeAI.

Previously, teardown-infrastructure only triggered when both run-bats
and run-int passed (via Concourse 'passed' constraints which only match
successful builds). If either test failed, the director VM and its
static IP remained allocated, causing subsequent runs to fail with
'External IP address is already in-use'.

Fix by merging deploy-ubuntu, run-bats, run-int, and
teardown-infrastructure into a single job:
- Tests run in parallel via in_parallel (fail_fast defaults to false,
  so both complete even if one fails)
- Job-level 'ensure' guarantees teardown always runs
- Teardown wrapped in 'try' so infra destroy proceeds even if director
  state is missing (e.g. setup-director failed before creating the VM)
- sole-tenancy infrastructure cleanup preserved via nested ensure
@neddp neddp changed the title ci: ensure teardown runs regardless of test outcome CI: ensure teardown runs regardless of test outcome Jun 1, 2026
@coderabbitai

coderabbitai Bot commented Jun 1, 2026

Copy link
Copy Markdown

Review Change Stack

Walkthrough

This pull request consolidates the Concourse pipeline job structure for the bosh-google-cpi-release by combining four separate sequential jobs (deploy-ubuntu, run-bats, run-int, teardown-infrastructure) into a single deploy-ubuntu-and-run-tests job. The refactored job adds required resource inputs, orchestrates BATS and integration test execution in parallel branches, expands infrastructure cleanup logic in the job-level ensure block, and updates downstream dependencies to reference the new consolidated job.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title directly addresses the main change: ensuring teardown runs regardless of test outcome via a job-level ensure block.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description thoroughly explains the problem (teardown not running on test failure), the solution (merging jobs with ensure block), and provides detailed design decisions and trade-offs.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/ensure-teardown-on-test-failure

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR restructures the Concourse pipeline to guarantee infrastructure teardown even when BATS or integration tests fail, preventing leaked GCP resources (e.g., static external IPs) that break subsequent runs.

Changes:

  • Merged deploy-ubuntu, run-bats, run-int, and teardown-infrastructure into a single deploy-ubuntu-and-run-tests job.
  • Runs BATS + integration tests in parallel within the combined job.
  • Adds an ensure-style cleanup sequence intended to always delete orphaned VMs, tear down the director, and destroy Terraform-managed infrastructure.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ci/pipeline.yml

@aramprice aramprice left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excited to see cleanup be more resilliant.

@rkoster rkoster merged commit 13c208e into master Jun 4, 2026
3 checks passed
@github-project-automation github-project-automation Bot moved this from Pending Merge | Prioritized to Done in Foundational Infrastructure Working Group Jun 4, 2026
@rkoster rkoster deleted the fix/ensure-teardown-on-test-failure branch June 4, 2026 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

4 participants