Overview of the Issue
In #11604 we added support for dynamic tablet throttler configuration.
The problem with the implementation we landed on is a general one because we stored the configuration directly in the cell local SrvKeyspace records.
The Keyspace record is the persistent and (mostly) static keyspace configuration stored in the global topo. The "Keyspace [Serving] Graph" is stored in the SrvKeyspace records across the cells and that is dynamic and ephemeral, regularly built and rebuilt from the global topo info and the live state. We build and rebuild it internally per cell and across all cells when certain operations happen, and a user can rebuild them as well using the RebuildKeyspaceGraph client command. Currently in release-16.0 and main, anytime a SrvKeyspace record in a cell is built/rebuilt, it WILL NOT have any previously configured ThrottlerConfig value. So what tablets have what config will be undefined and chaotic (not to mention the Vitess Operator’s SrvKeyspace pruning behavior).
We can instead store the configuration in the global topo Keyspace record and propagate that to the SrvKeyspace records on build/rebuild — leaving the rest of the work largely unchanged (see draft PR for an example).
There were also some more minor issues:
- The
IsEnabled property is no longer a static property of the throttler via tablet flags but is dynamic. There was no way to observe whether or not the throttler was currently active/enabled on a tablet.
- We did not wait for the throttler to become enabled on all relevant tablets in the endtoend tests before proceeding (which was difficult to do because of the first issue).
- We did not return the cells we actually updated when we had partial results for updating the topo.
Reproduction Steps
An easy way to demonstrate the general problem:
git checkout main
make build
cd examples/local
./101_initial_cluster.sh
vtctlclient TopoCat -- --cell=zone1 --decode_proto_json keyspaces/commerce/SrvKeyspace | jq '.[0].throttlerConfig'
vtctldclient UpdateThrottlerConfig --enable commerce
vtctlclient TopoCat -- --cell=zone1 --decode_proto_json keyspaces/commerce/SrvKeyspace | jq '.[0].throttlerConfig'
vtctldclient RebuildKeyspaceGraph commerce
vtctlclient TopoCat -- --cell=zone1 --decode_proto_json keyspaces/commerce/SrvKeyspace | jq '.[0].throttlerConfig'
You will see that the throttler config is wiped out after the rebuild:
$ vtctlclient TopoCat -- --cell=zone1 --decode_proto_json keyspaces/commerce/SrvKeyspace | jq '.[0].throttlerConfig'
null
$ vtctldclient UpdateThrottlerConfig --enable commerce
$ vtctlclient TopoCat -- --cell=zone1 --decode_proto_json keyspaces/commerce/SrvKeyspace | jq '.[0].throttlerConfig'
{
"enabled": true
}
$ vtctldclient RebuildKeyspaceGraph commerce
$ vtctlclient TopoCat -- --cell=zone1 --decode_proto_json keyspaces/commerce/SrvKeyspace | jq '.[0].throttlerConfig'
null
Binary Version
v16.0.0 and v17.0.0-SNAPSHOT
Operating System and Environment details
Log Fragments
Overview of the Issue
In #11604 we added support for dynamic tablet throttler configuration.
The problem with the implementation we landed on is a general one because we stored the configuration directly in the cell local
SrvKeyspacerecords.The
Keyspacerecord is the persistent and (mostly) static keyspace configuration stored in the global topo. The "Keyspace [Serving] Graph" is stored in theSrvKeyspacerecords across the cells and that is dynamic and ephemeral, regularly built and rebuilt from the global topo info and the live state. We build and rebuild it internally per cell and across all cells when certain operations happen, and a user can rebuild them as well using theRebuildKeyspaceGraphclient command. Currently inrelease-16.0andmain, anytime aSrvKeyspacerecord in a cell is built/rebuilt, it WILL NOT have any previously configuredThrottlerConfigvalue. So what tablets have what config will be undefined and chaotic (not to mention the Vitess Operator’sSrvKeyspacepruning behavior).We can instead store the configuration in the global topo
Keyspacerecord and propagate that to theSrvKeyspacerecords on build/rebuild — leaving the rest of the work largely unchanged (see draft PR for an example).There were also some more minor issues:
IsEnabledproperty is no longer a static property of the throttler via tablet flags but is dynamic. There was no way to observe whether or not the throttler was currently active/enabled on a tablet.Reproduction Steps
An easy way to demonstrate the general problem:
You will see that the throttler config is wiped out after the rebuild:
Binary Version
Operating System and Environment details
Log Fragments