[release-20.0] CI: deflakes, fork runner fallback, MySQL apt key, codecov gating#20195
[release-20.0] CI: deflakes, fork runner fallback, MySQL apt key, codecov gating#20195arthurschreiber wants to merge 14 commits into
Conversation
Forks don't have access to the `gh-hosted-runners-16cores-1-24.04` runner pool, so workflows that hardcoded it would never schedule. Gate the runner selection on `github.repository` so forks fall back to `ubuntu-24.04` automatically. The three e2e templates are updated and regenerated; the hand-maintained workflows (codecov, unit_race*, upgrade_downgrade_*, local/region examples) get the same expression inline. `docker_build_images.yml` is left as-is because both of its jobs already gate on `if: github.repository == 'vitessio/vitess'`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
The MySQL apt repo GPG key shipped in mysql-apt-config_0.8.33-1
(and 0.8.29-1) has expired, causing every CI job that installs
MySQL to fail with `EXPKEYSIG B7B3B788A8D3785C` during apt-get
update. The 0.8.35-1 package, used by release-23.0/release-24.0,
ships an updated key.
Backport the relevant pieces of the newer branches'
.github/actions/setup-mysql composite action into the existing
release-20.0 templates and hand-maintained workflows:
- Bump mysql-apt-config to 0.8.35-1 across all templates and
workflows that install MySQL.
- Uninstall the MySQL pre-installed on the ubuntu-24.04 runner
image before installing our own, so the package install
doesn't conflict.
- Recreate an empty apparmor profile before disabling /
reloading it (the profile is removed along with the
pre-installed MySQL packages).
- Pull libaio1 / libtinfo5 from archive.ubuntu.com instead of
mirrors.kernel.org, matching the newer composite action.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
The `percona-release setup ps80` shortcut no longer enables a repo that ships `percona-xtrabackup-80`, so the xb_backup / xb_recovery / backup_pitr_xtrabackup jobs were failing with: E: Unable to locate package percona-xtrabackup-80 Match what release-23.0 and release-24.0 do: set up the pdps8.0 distribution repo and pxb-80 (XtraBackup 8.0) repo, and re-enable the ps-80 release repo for percona-server packages. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
Signed-off-by: Matt Lord <mattalord@gmail.com> (cherry picked from commit 9290e31)
When the server's `VerifyPeerCertificate` returns "Certificate revoked", Go's TLS sends a `bad_certificate` alert and then closes. Whether the client reads the alert or the TCP RST first depends on kernel TCP flush timing — so the test would sometimes see `remote error: tls: bad certificate` and sometimes `connection reset by peer` / `broken pipe`. Both outcomes mean the revoked certificate was rejected, which is what the test cares about. Accept any of the three error strings. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
When a workflow declares `on: [push, pull_request]` (or the multi-line equivalent with bare `push:`/`pull_request:`), every commit pushed to a branch with an open PR triggers two runs of the workflow: once for the push, once for the pull_request event. Match what was done on main / release-21.0 / release-22.0 (PR vitessio#18649): restrict the push trigger to `main`, release branches, and tags, and keep `pull_request` for all branches. Push-only paths/filters on the vtadmin_web workflows are preserved. The longer "skip-workflow" step in the templates is left in place; the purpose of that PR's other simplification (removing the redundant skip check) is out of scope here. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
…itessio#19199) Signed-off-by: Matt Lord <mattalord@gmail.com> (cherry picked from commit 3839bd4)
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> (cherry picked from commit 135a6a8)
After SIGHUPing the static auth server to force a config reload, the
test slept a fixed 100ms (or 20ms) and then asserted the new entries
were live. On a slow CI runner the signal handler hasn't finished
processing yet, and the test fails with:
Expected nil, but got: []*mysql.AuthServerStaticEntry{...}
Match the fix from PR vitessio#19388: replace the fixed sleep with
require.EventuallyWithT polling, with a generous 30s deadline so
slower runners still pass.
Backport of the go/mysql/auth_server_static_test.go slice of vitessio#19388
(the rest of that PR is unrelated zkctl/zk2topo/tabletserver work
that doesn't apply to this branch).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
TestTrackerNoLock pushes 500,000 messages onto a channel and asserts each send completes within 10ms. Under CI load that's tight enough to flake regularly, surfacing as: tracker_test.go:199: failed to send health check to tracker Match the fix from PR vitessio#18317: bump the per-send timeout to 50ms. Backport of the go/vt/vtgate/schema/tracker_test.go slice of vitessio#18317 (the materializer_test.go slice is for an unrelated test). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
The previous deflake commit (e3cd779) ported the EventuallyWithT callback verbatim from upstream's PR vitessio#19388, which uses `require.X(c, ...)`. That works on upstream's testify v1.11+, but this branch is pinned to testify v1.9, where `CollectT.FailNow` is implemented as `panic("Assertion failed")` and `EventuallyWithT` doesn't recover from it — so the first failed poll crashes the goroutine. The job log showed exactly that: panic: Assertion failed testify/assert.(*CollectT).FailNow ... EventuallyWithT.func1 ... FAIL vitess.io/vitess/go/mysql Replace `require.X(c, ...)` with `assert.X(c, ...)` (which just flags the CollectT instead of panicking) and guard the `entries[0]` indexing on `assert.NotEmpty`, otherwise a `nil[0]` slice access escapes the same way. Hoisted the polling loop into a `waitForReload` helper since both hupTest and hupTestWithRotation now use the same body. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
The Upload step in codecov.yml has `fail_ci_if_error: true`, so when the workflow runs on a fork (or anywhere else without `secrets.CODECOV_TOKEN`) the upload returns: Token required - not valid tokenless upload ==> Failed to create-commit …and the whole job goes red even though the test suite passed. Gate the entire job on `secrets.CODECOV_TOKEN != ''` so forks skip both the test run and the upload — running unit tests just to throw away the coverage report is wasted CI time. Anyone who actually wants the coverage can opt in by configuring the secret on their fork. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Arthur Schreiber <arthur@planetscale.com> (cherry picked from commit dea0555)
The previous commit (51bb5c8) gated the Code Coverage job on `secrets.CODECOV_TOKEN != ''`. That breaks if upstream relies on tokenless / OIDC upload — they wouldn't have the secret set, and the job would skip on `vitessio/vitess` too. Switch to the same pattern we already use for runner selection: `if: github.repository == 'vitessio/vitess'`. Coverage runs on upstream unconditionally, and forks skip without burning ~16 minutes of unit tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
There was a problem hiding this comment.
Pull request overview
This PR stabilizes CI for the EOL release-20.0 branch so fork/backport workflows can run more reliably without changing production code.
Changes:
- Restricts workflow
pushtriggers and adds fork runner fallbacks for custom 16-core runners. - Updates MySQL/Percona installation steps for current apt keys/repos and Ubuntu 24.04 runner images.
- Deflakes selected Go/end-to-end tests by replacing fixed sleeps or overly strict assertions with polling/relaxed checks.
Reviewed changes
Copilot reviewed 101 out of 101 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| test/templates/unit_test.tpl | Updates generated unit-test workflow triggers and MySQL install setup. |
| test/templates/cluster_vitess_tester.tpl | Updates generated Vitess tester dependency setup. |
| test/templates/cluster_endtoend_test_docker.tpl | Adds trigger restrictions and fork runner fallback. |
| go/vt/vtgate/schema/tracker_test.go | Increases channel-send timeout to reduce flakiness. |
| go/test/endtoend/onlineddl/flow/onlineddl_flow_test.go | Replaces fixed wait with DML-progress polling. |
| go/test/endtoend/backup/vtbackup/backup_only_test.go | Matches redo-log messages instead of version-sensitive error codes. |
| go/mysql/server_test.go | Deflakes server stats and revoked TLS certificate assertions. |
| go/mysql/auth_server_static_test.go | Polls for auth config reload instead of fixed sleeps. |
| .github/workflows/codecov.yml | Adds trigger restrictions, runner fallback, MySQL apt update, and repository gate. |
| .github/workflows/codeql_analysis.yml | Updates MySQL apt config package. |
| .github/workflows/vtadmin_web_unit_tests.yml | Restricts push triggers. |
| .github/workflows/vtadmin_web_lint.yml | Restricts push triggers. |
| .github/workflows/vtadmin_web_build.yml | Restricts push triggers. |
| .github/workflows/check_make_vtadmin_web_proto.yml | Restricts push triggers. |
| .github/workflows/check_make_vtadmin_authz_testgen.yml | Restricts push triggers. |
| .github/workflows/unit_test_mysql80.yml | Regenerated unit workflow with trigger/MySQL updates. |
| .github/workflows/unit_test_mysql57.yml | Regenerated unit workflow with trigger/MySQL updates. |
| .github/workflows/unit_test_evalengine_mysql80.yml | Regenerated evalengine unit workflow with trigger/MySQL updates. |
| .github/workflows/unit_test_evalengine_mysql57.yml | Regenerated evalengine unit workflow with trigger/MySQL updates. |
| .github/workflows/unit_race.yml | Adds trigger restrictions and runner fallback. |
| .github/workflows/unit_race_evalengine.yml | Adds trigger restrictions and runner fallback. |
| .github/workflows/endtoend.yml | Restricts push triggers. |
| .github/workflows/e2e_race.yml | Restricts triggers and updates MySQL apt config. |
| .github/workflows/local_example.yml | Adds trigger restrictions and runner fallback. |
| .github/workflows/region_example.yml | Adds trigger restrictions and runner fallback. |
| .github/workflows/docker_test_cluster_10.yml | Restricts push triggers. |
| .github/workflows/docker_test_cluster_25.yml | Restricts push triggers. |
| .github/workflows/vitess_tester_vtgate.yml | Regenerated tester workflow dependency setup. |
| .github/workflows/upgrade_downgrade_test_*.yml | Adds trigger restrictions, runner fallback, and MySQL apt config updates across upgrade/downgrade jobs. |
| .github/workflows/cluster_endtoend_*.yml | Regenerated cluster workflows with trigger restrictions, MySQL/Percona install fixes, and apparmor handling. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # configured on forks, and Codecov no longer allows tokenless uploads | ||
| # ("Token required - not valid tokenless upload"). Without this gate | ||
| # we'd burn ~16 minutes on the test suite just to red-fail the upload. | ||
| if: ${{ github.repository == 'vitessio/vitess' }} |
There was a problem hiding this comment.
This is fine. We only want to skip codecov in PRs that are opened in fork repositories, not for PRs in vitessio/vitess that originate from a fork.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## release-20.0 #20195 +/- ##
================================================
+ Coverage 66.45% 68.77% +2.32%
================================================
Files 1543 1543
Lines 244950 198737 -46213
================================================
- Hits 162774 136677 -26097
+ Misses 82176 62060 -20116 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
The vitessio/vitess action policy requires every `uses:` to reference a full 40-char commit SHA, but a handful of workflows on release-20.0 still pin by major tag (or by `@master`). Any PR that touches CI on this branch fails at the `Prepare all required actions` step with: The action <name>@<ref> is not allowed in vitessio/vitess because all actions must be pinned to a full-length commit SHA. Pin the remaining references to the same SHAs upstream uses, keeping the major version unchanged: actions/setup-go@v5 → 0a12ed9d # v5.0.2 actions/setup-node@v4 → 1e60f620 # v4.0.3 actions/setup-python@v5 → 39cd1495 # v5.1.1 actions/stale@v5 → f7176fd3 # v5.2.1 fossa-contrib/fossa-action@v3 → 3d2ef181 # v3.0.1 Gamesight/slack-workflow-status@master → 68bf00d0 # v1.3.0 github/codeql-action/init@v3 → 4bdb89f4 # v3.28.18 github/codeql-action/analyze@v3 → 4bdb89f4 # v3.28.18 peter-evans/create-pull-request@v4 → 38e0b6e6 # v4.2.4 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
Description
release-20.0is long EOL and not supported. However, some users still run it internally, or have to run it for a short window as part of an upgrade path to a supported Vitess version. Those users sometimes need to backport fixes from newer branches into their own forks, and right now that's painful: CI on therelease-20.0branch is broken in several ways for any repo that isn'tvitessio/vitess(custom runners that forks can't schedule on, an expired MySQL apt key, a Percona repo that no longer ships the package we install, a Code Coverage job that goes red without an upload token, a handful of flaky tests, plus severaluses:references that don't satisfy the org's SHA-pin policy).This PR bundles the CI-only fixes needed to get
release-20.0green again so those backports can land. No production code is changed — everything is.github/, test helpers, or test code.To be clear: merging this is not a change in support status.
release-20.0remains EOL. This is a courtesy to make life easier for users still on the branch by accident or by upgrade-path necessity.The same set of fixes was opened against
release-21.0in #20196 andrelease-22.0in #20197.Fork-runner / infra fallbacks
ci: fall back to ubuntu-24.04 outside vitessio/vitess— forks can't schedule ongh-hosted-runners-16cores-1-24.04; gate ongithub.repository.ci: fix MySQL install on ubuntu-24.04 runners— the GPG key shipped inmysql-apt-config_0.8.33-1expired; bump to0.8.35-1(matching release-23.0/24.0) and uninstall the runner image's pre-installed MySQL before installing ours.ci: enable pxb-80 repo for percona-xtrabackup-80 install—percona-release setup ps80no longer shipspercona-xtrabackup-80; set up the pdps8.0 + pxb-80 repos like release-23.0/24.0 do.ci: don't run workflows twice for the same commit— backport of Simplify workflow files. #18649: restrict thepushtrigger tomain, release branches, and tags so PR pushes don't double-fire.Code Coverage gating
ci: skip Code Coverage job when CODECOV_TOKEN isn't available(cherry-pick of dea0555).ci: gate Code Coverage on github.repository instead of token presence— follow-up so tokenless/OIDC uploads onvitessio/vitessaren't accidentally disabled.Test deflakes (backports / cherry-picks)
Flakes: Address TestServerStats flakiness (#16991)— cherry-pick.go/mysql: relax TestTLSRequired revoked-cert assertion— acceptconnection reset by peer/broken pipealongsidebad certificate; all three mean the revoked cert was rejected.go/mysql: deflake TestStaticConfigHUP— backport of theauth_server_static_test.goslice of CI: Deflake Code Coverage workflow #19388 (poll withEventuallyWithTinstead of a fixed sleep).go/mysql: stop TestStaticConfigHUP panicking inside EventuallyWithT— follow-up: this branch is pinned to testify v1.9, whereCollectT.FailNowpanics andEventuallyWithTdoesn't recover. Useassert.X(c, ...)instead ofrequire.X(c, ...).go/vt/vtgate/schema: bump TestTrackerNoLock channel-send timeout— backport of thetracker_test.goslice of flaky test fix TestTrackerNoLock and TestCreateLookupVindexMultipleCreate #18317 (10ms → 50ms).CI: wait-for rather than 'assume' in Online DDL flow (#16210)— cherry-pick.CI: Look for expected log message rather than code in Backup tests (#19199)— cherry-pick.Action SHA pinning
ci: pin all GitHub Actions to full-length commit SHAs— the vitessio/vitess action policy requires everyuses:to reference a full 40-char commit SHA. Pin the nine remaining@v*/@masterreferences (setup-go, setup-node, setup-python, stale, fossa-action, slack-workflow-status, codeql-action init/analyze, peter-evans/create-pull-request) at the same major versions they were on, using the SHAs upstream uses on newer branches.Related Issue(s)
None — these are CI-only stabilization fixes.
Checklist
Deployment Notes
None — CI-only changes.