update go toolchain to 1.22 #20

iliana · 2025-04-18T03:59:16Z

Related to #10, but not quite fully fixing it.

This moves the Go toolchain from 1.17 to 1.22, a modernization of about 2.5 years, and a year behind current Go. Most of the changes here are related to fixing new lints from go vet and other tools, although I've only made fixes where it's not intrusive to do so. Some descriptions of fixes are called out here for context to reviewers, and to future maintainers (hello, me!) of this repo.

Why update?

Particularly when it comes to security protocols (such as TLS), Go has a very batteries-included standard library. Keeping on a currently-supported version of the Go runtime is very important, while also being somewhat painful for a codebase like Cockroach for reasons we'll get into.

While Oxide doesn't currently have any sensitivity to the Go TLS stack in the control plane, it is our intent to get internal services talking to each other over authenticated connections on the underlay network, and failing to do this maintenance work while it's relatively small creates a pretty big risk that we can't use built-in TLS support we already have.

(This is far from the only reason to keep the Go toolchain up to date but it's the one that feels most pertinent to us.)

Philosophy on new lints

We run the Cockroach linting test suite in CI, and changing the toolchain means updating Go's built-in lint tools (and some external tools that need updates to work on newer versions of Go).

I am generally of the opinion that linters and formatters are good. However, blindly applying 1.19's gofmt doc comment changes would have resulted in an unreviewable diff, in the literal "GitHub will not load the page" sense of unreviewable. (Some of that can be seen in the autogenerated code that was updated in this PR.) Similarly, some lints that have been introduced or improved to work in more cases since Go 1.17 that would only serve to introduce unnecessary code churn; I've opted to add rules to ignore them. Where new lints are particularly useful is when they tell us the Go runtime has meaningfully improved, which I'll go into detail on in the next section.

This philosophy is summarized near where we add the rules to ignore certain lints:

cockroach/pkg/testutils/lint/lint_test.go

Lines 1703 to 1709 in d256fc8

    
           // [Oxide specific] Ignore a wide variety of deprecations and style suggestions. 
        
           // Generally we should investigate why something is deprecated: if it's because 
        
           // there is a genuine improvement (such as the new default math/rand RNG in 
        
           // Go 1.22 that cannot be used if rand.Seed is used), we should pay attention 
        
           // and follow those recommendations; if it's because of standard library 
        
           // reorganization (such as all of "io/ioutil" moving elsewhere), maybe it's not 
        
           // worth the code churn on a maintenance branch.

If I don't go into a specific change below, it probably hit a new/improved lint. Most of the ones that should flag reviewer attention are because the result of something was never actually used; these are the sorts of embarrassing mistakes we're used to rustc or Clippy catching for us, so it's nice to see a lot of these linter improvements.

math/rand deprecations

In Go 1.20:

The math/rand package now automatically seeds the global random number generator (used by top-level functions like Float64 and Int) with a random value, and the top-level Seed function has been deprecated. Programs that need a reproducible sequence of random numbers should prefer to allocate their own random source, using rand.New(rand.NewSource(seed)).

I did not know Go had a global random number generator and I'm glad I learned this, I guess?

Go has made the PRNG that backs the global RNG better too, but for the sake of backwards compatibility if you ever seed it it reverts to the old PRNG. But if you weren't seeding the RNG and used it for things you wanted to actually be random, you were going to have a bad day, so you always seeded it with either crypto/rand or the current UNIX timestamp. In several places (most notably in pkg/cli, the entrypoint to the compiled program) we remove that seeding so that we default to the new PRNG implementation.

A lot of the deprecated uses in test suites are kept in place; these have explicit //lint:ignore directives. Cockroach has test infrastructure for picking a random seed and printing it out on test failures so you can attempt to reproduce the issue locally. This capability might be removed in a future version of the Go runtime; in that situation it's probably best to disable that infrastructure rather than try to plumb a new "global" RNG into all the test interfaces.

pkg/util/uuid is one place where I made more invasive changes as part of correcting this deprecation lint. Previously there were two ways of generating a UUID with this package, either by passing in an io.Reader that hopefully read random bytes, or by using fastGen which used math/rand by way of a struct that implemented the io.Reader interface. I ported it to use math/rand/v2 instead (which uses the newer PRNG) while maintaining compatibility for using a given io.Reader (such as crypto/rand.Reader).

pkg/util/goschedstats

Cockroach uses the number of goroutines in the "runnable" state, i.e. ready and waiting to run, as an analog for system load; more ready-and-waiting goroutines means you're possibly overloaded. This metric is used for admission control of new work to a cluster. There's just one problem: Go doesn't actually tell you how many goroutines are in the "runnable" state. (A Stack Overflow comment suggests this is because "runnable" is an implementation detail of the Go runtime.)

I have had the misfortune of learning that Cockroach simply links to various parts of the Go runtime's scheduler and reads the data it's looking for. To do this, it keeps a copy of the relevant internal types in-tree along with a comment explaining that you must update them for any changes in the Go runtime.

If you're curious why this PR doesn't move us to 1.23 or 1.24, it's because Go 1.23 now disallows //go:linkname from linking to arbitrary parts of the Go runtime with a few exceptions that would have otherwise broken large parts of the Go ecosystem. This particular part of the scheduler is not one of those exceptions. I am not exactly sure yet how I want to approach this problem, but I didn't want to delay moving to a relatively-recent runtime for it.

Sorting

Go 1.19 switched the sorting algorithm used in the standard library to pdqsort. Cockroach has an in-tree copy of Go 1.17's sorting algorithm with an additional context parameter that permits early cancellation of a sort operation (in case a query is cancelled). A test case in pkg/sql/rowcontainer ensured (among other things) that sorting algorithm is consistent with the Go runtime and caught this change for me, because both sorting algorithms are unstable (equivalent rows do not necessarily maintain the same position relative to each other), and otherwise-equivalent rows were returning in different orders.

Instead of copying in the new sorting algorithm, I have made the less risky decision to make the test slightly more dubious by testing the abstract row sorting machinery against the in-tree cancelable sort functions.

gcassert

gcassert is a linting tool developed by Cockroach developers to be able to check certain things about compiled code. Go has poor language support for controlling inlining and bounds checking, so the best you can do is make a tool that parses compiler diagnostic output to ensure your assumptions hold. Of course, compiler behavior changes between releases and there's often nothing you can do about it, so whether this is valuable is dubious to me.

The version of gcassert previously used only processed directives without a space, such as //gcassert:bce, not // gcassert:bce, similar to Go build directives. The gofmt change in 1.19, of course, broke this behavior by limiting this to only Go build directives, and newer versions of gcassert support handling the variant with a space. Some // gcassert: directives (with the space) were removed because their assertions were never correct and never properly checked. In other cases where //gcassert was properly spelled, I made an attempt to correct any issues flagged; //gcassert:bce is used in a few places to ensure bounds checking is elided in tight loops by running the bounds check once before the loop.

(For context, the code where gcassert is used is in a place where "execgen" is also used, which performs code generation, inlining, etc. Those are the .eg.go files that GitHub is hiding by default; they generally come from similarly named _tmpl.go files.)

Node runtime update

Node.js is used to build the console frontend UI that is baked into the binary. The UI failed to build on versions of Node later than 16 due to Webpack's use of the md4 algorithm (used for cache-busting filenames) through Node's native OpenSSL bindings, which were removed in Node 18 and later. Fortunately Webpack 4 and terser-webpack-plugin made final maintenance releases to use a different md4 implementation that is compatible with later releases of Node, and that was all that was necessary to get off of Node 16. (It was also fortunately simple enough to stop vendoring npm packages via a Git submodule.)

We now pin to Node 20.11.1, the latest version available in Helios right now. Next time we bump those packages I'll move to the new latest release.

But does it work?

I've run the Omicron test suite locally with this version of Cockroach and it passes all the tests. I still need to verify that nothing about the on-disk format has changed, although that would be very unexpected. I intend to test this by deploying Omicron release 14 to a system and then upgrading the version of Cockroach, as well as deploying a version with a new Cockroach and then downgrading it back to the version we currently have; this way if we find a problem we at least are confident we can simply roll back.

Since Go 1.22, the default unseeded global RNG is more secure: https://go.dev/blog/chacha8rand However when the RNG is seeded it falls back to the old implementation. One call to rand.Seed remains in randutil.SeedForTests. This is part of the COCKROACH_RANDOM_SEED infrastructure for recording what seed was used for a particular test failure so it could be reproduced. (Had the global RNG not been seedable in the first place, perhaps there would be more sensible infrastructure in place here.) We will leave this in place until Go 1.24 when rand.Seed becomes a no-op by default.

sudomateo · 2025-04-18T04:33:33Z

Thanks for working on this iliana. I've only read the write up so far and I don't have any reservations about moving to the new Go runtime. You've covered all the major points (e.g., v2 packages, linking) that I was going to mention if they weren't already mentioned.

I'll try to carve out some time to read the code next week. Meanwhile I'll subscribe to this so I stay current with what's happening.

sudomateo

Thank you for working on this! I gave each file a brief look and I didn't see anything out of the ordinary. I left some non-blocking comments and questions.

I'll note that I did not run a build locally to verify the changes but instead kept my review to just that, a review.

.github/workflows/build-macos.yml

pkg/cmd/roachprod-stress/main.go

pkg/col/coldata/bytes.go

pkg/rpc/auth_test.go

iliana · 2025-05-09T03:43:24Z

I am going to leave this open while I'm out for the next few weeks, since I didn't get a chance to do the full integration testing (upgrade/downgrade testing) I wanted to do with Omicron, and we're getting pretty far into the R15 cycle anyway. When I'm back we should be at the end of the R15 cycle and I can put in the work to land this as soon as R15 branches so it can get as much soak time for R16 as possible.

Anyone else can feel free to do the upgrade/downgrade testing while I'm gone though! The two tests I want to do are:

Rack init on the go1.16 version, thrash the database, mupdate to the go1.22 version, thrash the database, mupdate back to the go1.16 version, thrash the database
Rack init on the go1.22 version, thrash the database, mupdate to the go1.16 version, thrash the database

to ensure that whether or not rack init occurs on the go1.16 or go1.22 versions, that we can move into either version safely without any unexpected incompatibility in what's stored on-disk. Probably the best way to thrash the database is to run sled add/remove in a loop while also launching and shutting off instances.

iliana · 2025-07-14T22:30:46Z

I've finally had the chance to do the first test above (go1.16 -> go1.22 -> go1.16) on a test rack and things are working as expected. Due to Reasons it will be easier to test rack init on the go1.22 version after merging this and getting a repo built in Omicron CI but I have tested initializing a cluster on the go1.22 version and then replacing with the go1.16 version, so I have extreme doubt that would discover an issue.

I'm going to go ahead and merge today, and then I'll work on burning down the Dependabot alerts and possibly getting beyond go1.22.

iliana added 26 commits November 15, 2024 23:04

build on 1.22

7d8174d

wip

8f7cf62

remove workaround

96e07a0

bump to go1.22.9

e9aa26c

bump golang.org/x/tools to fix segfault?

d5dc1f9

a few more fixes

2b3f27b

small fixes

56a79ea

consistently apply the same sort algorithm

643b660

try to fix make lint

504b2f5

try to make TestMakePriority flake less

099365b

disable TestGofmtSimplify

146a4aa

set GOTOOLCHAIN=local in CI

8e884ef

update Staticcheck

65031b5

fix unkeyedLiteralExceptions in TestRoachVet

dc820f3

resolve or ignore staticcheck lints

bb2d0b5

try re-enabling errcheck

2a0757a

fix ui builds on node >=18

fccafd3

update gcassert; remove previously-ignored and wrong asserts

d9de79e

should have added this in fccafd3

ab7056f

go1.22.11

b1f3496

drop panicparse fork

403ffeb

correctly fix roachtest/prometheus test

a9fc834

re-evaluate gcassert

79a272b

remove a comment explaining a removed option

6fd86a0

reword IgnoreLint

d256fc8

iliana linked an issue Apr 18, 2025 that may be closed by this pull request

errcheck lint disabled due to OOM #4

Closed

iliana added 2 commits April 29, 2025 17:43

empty commit to make sure CI still works

ffccec3

add an order-only prereq to keep go from finding cmake compile tests

01ee450

i hate make

37bec85

sudomateo reviewed May 9, 2025

View reviewed changes

.github/workflows/build-macos.yml Show resolved Hide resolved

pkg/cmd/roachprod-stress/main.go Show resolved Hide resolved

pkg/col/coldata/bytes.go Show resolved Hide resolved

pkg/rpc/auth_test.go Show resolved Hide resolved

comment / add more to make ordering hack

3af489d

iliana mentioned this pull request Jul 2, 2025

build(deps): bump pbkdf2 from 3.1.2 to 3.1.3 in /pkg/ui #21

Merged

poke ci to make sure it still works

fcd91a0

iliana merged commit 712357d into release-22.1-oxide Jul 14, 2025
13 checks passed

iliana deleted the iliana/go-1.22 branch July 14, 2025 23:19

iliana mentioned this pull request Aug 12, 2025

update cockroachdb build (go1.22 + small dep updates) oxidecomputer/omicron#8831

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

update go toolchain to 1.22 #20

update go toolchain to 1.22 #20

iliana commented Apr 18, 2025 •

edited

Loading

Uh oh!

sudomateo commented Apr 18, 2025

Uh oh!

sudomateo left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

iliana commented May 9, 2025 •

edited

Loading

Uh oh!

iliana commented Jul 14, 2025

Uh oh!

Uh oh!

Uh oh!

	// [Oxide specific] Ignore a wide variety of deprecations and style suggestions.
	// Generally we should investigate why something is deprecated: if it's because
	// there is a genuine improvement (such as the new default math/rand RNG in
	// Go 1.22 that cannot be used if rand.Seed is used), we should pay attention
	// and follow those recommendations; if it's because of standard library
	// reorganization (such as all of "io/ioutil" moving elsewhere), maybe it's not
	// worth the code churn on a maintenance branch.

update go toolchain to 1.22 #20

update go toolchain to 1.22 #20

Conversation

iliana commented Apr 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why update?

Philosophy on new lints

math/rand deprecations

pkg/util/goschedstats

Sorting

gcassert

Node runtime update

But does it work?

Uh oh!

sudomateo commented Apr 18, 2025

Uh oh!

sudomateo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

iliana commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

iliana commented Jul 14, 2025

Uh oh!

Uh oh!

Uh oh!

iliana commented Apr 18, 2025 •

edited

Loading

iliana commented May 9, 2025 •

edited

Loading