Skip to content

Use prefix in all vtorc check and recover logs#17526

Merged
GuptaManan100 merged 4 commits into
vitessio:mainfrom
ejortegau:ejortegau/vtorc_recovery_log_prefix
Jan 23, 2025
Merged

Use prefix in all vtorc check and recover logs#17526
GuptaManan100 merged 4 commits into
vitessio:mainfrom
ejortegau:ejortegau/vtorc_recovery_log_prefix

Conversation

@ejortegau
Copy link
Copy Markdown
Contributor

@ejortegau ejortegau commented Jan 15, 2025

Description

This is meant to make recovery actions more easily identified from the logs. See #17465

Some examples of recoveries, when running the local example cluster:

Initial recovery for cluster with no primary
I0117 11:19:11.970564  231305 log.go:133] Recovery for ClusterHasNoPrimary on commerce/0: Starting checkAndRecover
I0117 11:19:11.970622  231305 log.go:138] Recovery for ClusterHasNoPrimary on commerce/0: executeCheckAndRecoverFunction: proceeding with ClusterHasNoPrimary detection on zone1-0000000100; isActionable?: true
I0117 11:19:11.984405  231305 log.go:138] Recovery for ClusterHasNoPrimary on commerce/0: executeCheckAndRecoverFunction: Proceeding with ClusterHasNoPrimary recovery on zone1-0000000100 validation after acquiring shard lock.
I0117 11:19:11.985613  231305 log.go:133] Recovery for ClusterHasNoPrimary on commerce/0: Force refreshing all shard tablets
I0117 11:19:12.002617  231305 log.go:138] Recovery for ClusterHasNoPrimary on commerce/0: executeCheckAndRecoverFunction: proceeding with recovery on zone1-0000000100; isRecoverable?: true
I0117 11:19:12.002932  231305 log.go:138] Recovery for ClusterHasNoPrimary on commerce/0: Analysis: ClusterHasNoPrimary, will elect a new primary for commerce:0
W0117 11:19:12.014369  231305 log.go:153] Recovery for ClusterHasNoPrimary on commerce/0: PRS - no replication statue from zone1-0000000101, using empty gtid set
W0117 11:19:12.014626  231305 log.go:153] Recovery for ClusterHasNoPrimary on commerce/0: PRS - no replication statue from zone1-0000000100, using empty gtid set
I0117 11:19:12.970190  231305 log.go:133] Recovery for ClusterHasNoPrimary on commerce/0: Starting checkAndRecover
I0117 11:19:12.970230  231305 log.go:138] Recovery for ClusterHasNoPrimary on commerce/0: executeCheckAndRecoverFunction: proceeding with ClusterHasNoPrimary detection on zone1-0000000100; isActionable?: true
E0117 11:19:12.971729  231305 log.go:168] Recovery for ClusterHasNoPrimary on commerce/0: Failed to lock shard, aborting recovery: node already exists: lock already exists at path keyspaces/commerce/shards/0
I0117 11:19:13.004532  231305 log.go:133] Recovery for ClusterHasNoPrimary on commerce/0: Recovery succeeded
I0117 11:19:13.005549  231305 log.go:138] Recovery for ClusterHasNoPrimary on commerce/0: Topology recovery: {"ID":1,"AnalysisEntry":{"AnalyzedInstanceAlias":"zone1-0000000100","AnalyzedInstancePrimaryAlias":"\u003cnil\u003e","TabletType":2,"PrimaryTimeStamp":"0001-01-01T00:00:00Z","ClusterDetails":{"Keyspace":"commerce","Shard":"0"},"AnalyzedKeyspace":"commerce","AnalyzedShard":"0","ShardPrimaryTermTimestamp":"","AnalyzedInstanceBinlogCoordinates":{"LogFile":"vt-0000000100-bin.000001","LogPos":157,"Type":0},"IsPrimary":true,"IsClusterPrimary":false,"LastCheckValid":true,"LastCheckPartialSuccess":true,"CountReplicas":0,"CountValidReplicas":0,"CountValidReplicatingReplicas":0,"ReplicationStopped":true,"ErrantGTID":"","ReplicaNetTimeout":0,"HeartbeatInterval":0,"Analysis":"ClusterHasNoPrimary","Description":"Cluster has no primary","StructureAnalysis":["NoWriteablePrimaryStructureWarning"],"OracleGTIDImmediateTopology":false,"BinlogServerImmediateTopology":false,"SemiSyncPrimaryEnabled":false,"SemiSyncPrimaryStatus":false,"SemiSyncPrimaryWaitForReplicaCount":1,"SemiSyncPrimaryClients":0,"SemiSyncReplicaEnabled":false,"CountSemiSyncReplicasEnabled":0,"CountLoggingReplicas":0,"CountStatementBasedLoggingReplicas":0,"CountMixedBasedLoggingReplicas":0,"CountRowBasedLoggingReplicas":0,"CountDistinctMajorVersionsLoggingReplicas":0,"CountDelayedReplicas":0,"CountLaggingReplicas":0,"IsActionableRecovery":true,"RecoveryId":1,"GTIDMode":"ON","MinReplicaGTIDMode":"","MaxReplicaGTIDMode":"","MaxReplicaGTIDErrant":"","IsReadOnly":true},"SuccessorAlias":"zone1-0000000100","IsSuccessful":true,"AllErrors":[],"RecoveryStartTimestamp":"","RecoveryEndTimestamp":"","DetectionID":0}
I0117 11:19:13.005686  231305 log.go:133] Recovery for ClusterHasNoPrimary on commerce/0: Forcing refresh of all tablets post recovery
Recovering from a stopped MySQL in primary tablet
I0117 11:20:44.971064  231305 log.go:133] Recovery for DeadPrimary on commerce/0: Starting checkAndRecover
I0117 11:20:44.971142  231305 log.go:138] Recovery for DeadPrimary on commerce/0: executeCheckAndRecoverFunction: proceeding with DeadPrimary detection on zone1-0000000100; isActionable?: true
I0117 11:20:44.983679  231305 log.go:138] Recovery for DeadPrimary on commerce/0: executeCheckAndRecoverFunction: Proceeding with DeadPrimary recovery on zone1-0000000100 validation after acquiring shard lock.
I0117 11:20:44.984512  231305 log.go:133] Recovery for DeadPrimary on commerce/0: Force refreshing all shard tablets
I0117 11:20:44.996572  231305 log.go:138] Recovery for DeadPrimary on commerce/0: executeCheckAndRecoverFunction: proceeding with recovery on zone1-0000000100; isRecoverable?: true
I0117 11:20:44.996823  231305 log.go:138] Recovery for DeadPrimary on commerce/0: Analysis: DeadPrimary, RecoverDeadPrimary zone1-0000000100
I0117 11:20:44.997167  231305 log.go:138] Recovery for DeadPrimary on commerce/0: ERS - will initiate emergency reparent shard in keyspace - commerce, shard - 0
I0117 11:20:44.997934  231305 log.go:138] Recovery for DeadPrimary on commerce/0: ERS - Getting a new durability policy for semi_sync
I0117 11:20:45.000444  231305 log.go:138] Recovery for DeadPrimary on commerce/0: ERS - getting replication position from zone1-0000000101
I0117 11:20:45.000465  231305 log.go:138] Recovery for DeadPrimary on commerce/0: ERS - getting replication position from zone1-0000000102
I0117 11:20:45.000463  231305 log.go:138] Recovery for DeadPrimary on commerce/0: ERS - getting replication position from zone1-0000000100
W0117 11:20:45.002324  231305 log.go:153] Recovery for DeadPrimary on commerce/0: ERS - failed to get replication status from zone1-0000000100: rpc error: code = Unknown desc = TabletManager.StopReplicationAndGetStatus on zone1-0000000100: before status failed: net.Dial(/home/eduardo.ortega/vitess_sandbox/v22-dev/examples/local/vtdataroot/vt_0000000100/mysql.sock) to local server failed: dial unix /home/eduardo.ortega/vitess_sandbox/v22-dev/examples/local/vtdataroot/vt_0000000100/mysql.sock: connect: no such file or directory (errno 2002) (sqlstate HY000)
I0117 11:20:45.006592  231305 log.go:138] Recovery for DeadPrimary on commerce/0: ERS - started finding the intermediate source
I0117 11:20:45.006785  231305 log.go:138] Recovery for DeadPrimary on commerce/0: ERS - finding intermediate source - sorted replica: cell:"zone1"  uid:102
I0117 11:20:45.006887  231305 log.go:138] Recovery for DeadPrimary on commerce/0: ERS - finding intermediate source - sorted replica: cell:"zone1"  uid:101
I0117 11:20:45.006991  231305 log.go:138] Recovery for DeadPrimary on commerce/0: ERS - intermediate source selected - cell:"zone1"  uid:102
I0117 11:20:45.007065  231305 log.go:138] Recovery for DeadPrimary on commerce/0: ERS - found better candidate - cell:"zone1"  uid:102
I0117 11:20:45.007129  231305 log.go:138] Recovery for DeadPrimary on commerce/0: ERS - intermediate source is ideal candidate- true
I0117 11:20:45.007516  231305 log.go:138] Recovery for DeadPrimary on commerce/0: ERS - starting promotion for the new primary - zone1-0000000102
I0117 11:20:45.007552  231305 log.go:138] Recovery for DeadPrimary on commerce/0: ERS - setting new primary on replica zone1-0000000100
I0117 11:20:45.007560  231305 log.go:138] Recovery for DeadPrimary on commerce/0: ERS - setting new primary on replica zone1-0000000101
I0117 11:20:45.074764  231305 log.go:138] Recovery for DeadPrimary on commerce/0: ERS - populating reparent journal on new primary zone1-0000000102
I0117 11:20:45.080035  231305 log.go:133] Recovery for DeadPrimary on commerce/0: Recovery succeeded
I0117 11:20:45.080105  231305 log.go:138] Recovery for DeadPrimary on commerce/0: Topology recovery: {"ID":6,"AnalysisEntry":{"AnalyzedInstanceAlias":"zone1-0000000100","AnalyzedInstancePrimaryAlias":"\u003cnil\u003e","TabletType":1,"PrimaryTimeStamp":"0001-01-01T00:00:00Z","ClusterDetails":{"Keyspace":"commerce","Shard":"0"},"AnalyzedKeyspace":"commerce","AnalyzedShard":"0","ShardPrimaryTermTimestamp":"2025-01-17 10:19:12.022420359 +0000 UTC","AnalyzedInstanceBinlogCoordinates":{"LogFile":"vt-0000000100-bin.000001","LogPos":18489,"Type":0},"IsPrimary":true,"IsClusterPrimary":true,"LastCheckValid":false,"LastCheckPartialSuccess":false,"CountReplicas":2,"CountValidReplicas":2,"CountValidReplicatingReplicas":0,"ReplicationStopped":true,"ErrantGTID":"","ReplicaNetTimeout":0,"HeartbeatInterval":0,"Analysis":"DeadPrimary","Description":"Primary cannot be reached by vtorc and none of its replicas is replicating","StructureAnalysis":null,"OracleGTIDImmediateTopology":true,"BinlogServerImmediateTopology":false,"SemiSyncPrimaryEnabled":true,"SemiSyncPrimaryStatus":true,"SemiSyncPrimaryWaitForReplicaCount":1,"SemiSyncPrimaryClients":2,"SemiSyncReplicaEnabled":true,"CountSemiSyncReplicasEnabled":2,"CountLoggingReplicas":2,"CountStatementBasedLoggingReplicas":0,"CountMixedBasedLoggingReplicas":0,"CountRowBasedLoggingReplicas":2,"CountDistinctMajorVersionsLoggingReplicas":1,"CountDelayedReplicas":0,"CountLaggingReplicas":0,"IsActionableRecovery":true,"RecoveryId":19,"GTIDMode":"ON","MinReplicaGTIDMode":"ON","MaxReplicaGTIDMode":"ON","MaxReplicaGTIDErrant":"","IsReadOnly":false},"SuccessorAlias":"zone1-0000000102","IsSuccessful":true,"AllErrors":[],"RecoveryStartTimestamp":"","RecoveryEndTimestamp":"","DetectionID":0}
I0117 11:20:45.080127  231305 log.go:133] Recovery for DeadPrimary on commerce/0: Forcing refresh of all tablets post recovery

Related Issue(s)

#17465

Checklist

  • "Backport to:" labels have been added if this change should be back-ported to release branches
  • If this change is to be back-ported to previous releases, a justification is included in the PR description
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation was added or is not required

Deployment Notes

n/a

This is meant to make recovery actions more easily identified from the logs.
See vitessio#17465

Signed-off-by: Eduardo J. Ortega U. <5791035+ejortegau@users.noreply.github.com>
@vitess-bot
Copy link
Copy Markdown
Contributor

vitess-bot Bot commented Jan 15, 2025

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot Bot added NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Jan 15, 2025
@github-actions github-actions Bot added this to the v22.0.0 milestone Jan 15, 2025
@codecov
Copy link
Copy Markdown

codecov Bot commented Jan 15, 2025

Codecov Report

Attention: Patch coverage is 0% with 140 lines in your changes missing coverage. Please review.

Project coverage is 67.67%. Comparing base (5468f5d) to head (ee34454).
Report is 25 commits behind head on main.

Files with missing lines Patch % Lines
go/vt/vtorc/logic/topology_recovery.go 0.00% 89 Missing ⚠️
go/vt/log/log.go 0.00% 51 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #17526      +/-   ##
==========================================
- Coverage   67.70%   67.67%   -0.04%     
==========================================
  Files        1584     1585       +1     
  Lines      254718   254905     +187     
==========================================
+ Hits       172463   172508      +45     
- Misses      82255    82397     +142     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Eduardo J. Ortega U. <5791035+ejortegau@users.noreply.github.com>
@ejortegau ejortegau marked this pull request as ready for review January 16, 2025 08:37
@timvaillancourt timvaillancourt added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: VTOrc Vitess Orchestrator integration and removed NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says NeedsIssue A linked issue is missing for this Pull Request NeedsBackportReason If backport labels have been applied to a PR, a justification is required labels Jan 16, 2025
@timvaillancourt
Copy link
Copy Markdown
Contributor

@ejortegau is it possible to get a sanitized example of what this looks like?

* Improved PrefixedLogger consistency between formatted & unformatted logs.
* Use PrefixedLogger in more places during vtorc recoveries.

Signed-off-by: Eduardo J. Ortega U. <5791035+ejortegau@users.noreply.github.com>
@ejortegau
Copy link
Copy Markdown
Contributor Author

is it possible to get a sanitized example of what this looks like?

@timvaillancourt , Added them in the PR's description.

Copy link
Copy Markdown
Contributor

@GuptaManan100 GuptaManan100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM!

@GuptaManan100 GuptaManan100 merged commit 91dd79d into vitessio:main Jan 23, 2025
ejortegau added a commit to slackhq/vitess that referenced this pull request Jan 23, 2025
Signed-off-by: Eduardo J. Ortega U. <5791035+ejortegau@users.noreply.github.com>
ejortegau added a commit to slackhq/vitess that referenced this pull request Jan 24, 2025
This is a backport of vitessio#17526 . Original PR description below:

Description
This is meant to make recovery actions more easily identified from the logs. See vitessio#17465

Signed-off-by: Eduardo J. Ortega U. <5791035+ejortegau@users.noreply.github.com>
timvaillancourt pushed a commit to slackhq/vitess that referenced this pull request Feb 19, 2025
Signed-off-by: Eduardo J. Ortega U. <5791035+ejortegau@users.noreply.github.com>
timvaillancourt added a commit to slackhq/vitess that referenced this pull request Feb 20, 2025
* Move to native sqlite3 queries (vitessio#17124)

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
Co-authored-by: Tim Vaillancourt <tim@timvaillancourt.com>

* Improve efficiency of `vtorc` topo calls  (vitessio#17071)

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Co-authored-by: Matt Lord <mattalord@gmail.com>

* Ensure all topo read calls consider `--topo_read_concurrency` (vitessio#17276)

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* Avoid flaky topo concurrency test (vitessio#17407)

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* `vtorc`: fetch all tablets from cells once + filter during refresh (vitessio#17388)

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* Support KeyRange in `--clusters_to_watch` flag (vitessio#17604)

Signed-off-by: Manan Gupta <manan@planetscale.com>

* `vtorc`: improve handling of partial cell topo results (vitessio#17718)

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* Add stats for shards watched by VTOrc

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* add more tests

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* cleanup

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* fix ineffassign

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* fix test for v21

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* Use prefix in all vtorc check and recover logs (vitessio#17526)

Signed-off-by: Eduardo J. Ortega U. <5791035+ejortegau@users.noreply.github.com>

---------

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Eduardo J. Ortega U. <5791035+ejortegau@users.noreply.github.com>
Co-authored-by: Dirkjan Bussink <d.bussink@gmail.com>
Co-authored-by: Matt Lord <mattalord@gmail.com>
Co-authored-by: Manan Gupta <35839558+GuptaManan100@users.noreply.github.com>
Co-authored-by: Eduardo J. Ortega U. <5791035+ejortegau@users.noreply.github.com>
twthorn pushed a commit to slackhq/vitess that referenced this pull request Mar 17, 2025
This is a backport of vitessio#17526 . Original PR description below:

Description
This is meant to make recovery actions more easily identified from the logs. See vitessio#17465

Signed-off-by: Eduardo J. Ortega U. <5791035+ejortegau@users.noreply.github.com>
makinje16 pushed a commit to slackhq/vitess that referenced this pull request Mar 20, 2025
This is a backport of vitessio#17526 . Original PR description below:

Description
This is meant to make recovery actions more easily identified from the logs. See vitessio#17465

Signed-off-by: Eduardo J. Ortega U. <5791035+ejortegau@users.noreply.github.com>
tanjinx added a commit to slackhq/vitess that referenced this pull request Mar 24, 2025
…d Journal Events (#585)

* VTGate VStream: Ensure reasonable delivery time for reshard journal event  (vitessio#16639)

Signed-off-by: Malcolm Akinje <malcolm.akinje@gmail.com>
Signed-off-by: Malcolm Akinje <makinje@slack-corp.com>

* Backport sqlparser patch for v15->v19 upgrade: 14763 Fix accepting bind variables in time related function calls (#590)

* Fix accepting bind variables in time related function calls. (vitessio#14763)

Signed-off-by: Manan Gupta <manan@planetscale.com>

* fix test

---------

Signed-off-by: Manan Gupta <manan@planetscale.com>
Co-authored-by: Manan Gupta <35839558+GuptaManan100@users.noreply.github.com>

* Upgrade vitess addons to 0.19.8 (#591)

This upgrade allows us to control whether vtorc raises problems or not
via an environment variable.

Signed-off-by: Eduardo J. Ortega U. <5791035+ejortegau@users.noreply.github.com>

* Use prefix in all vtorc check and recover logs (vitessio#17526) (#592)

This is a backport of vitessio#17526 . Original PR description below:

Description
This is meant to make recovery actions more easily identified from the logs. See vitessio#17465

Signed-off-by: Eduardo J. Ortega U. <5791035+ejortegau@users.noreply.github.com>

* `slack-19.0`: various backports for `vtorc`, part 2 (#596)

* Ensure all topo read calls consider `--topo_read_concurrency` (vitessio#17276)

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* Revert "add keyrange support for vtorc clusters_to_watch (#457)"

This reverts commit 45c2199.

* [release-19.0] `vtorc`: require topo for `Healthy: true` in `/debug/health` (vitessio#17129) (vitessio#17351)

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Co-authored-by: vitess-bot[bot] <108069721+vitess-bot[bot]@users.noreply.github.com>
Co-authored-by: Tim Vaillancourt <tim@timvaillancourt.com>
Co-authored-by: Manan Gupta <manan@planetscale.com>

* `vtorc`: fetch all tablets from cells once + filter during refresh (vitessio#17388)

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* Support KeyRange in `--clusters_to_watch` flag (vitessio#17604)

Signed-off-by: Manan Gupta <manan@planetscale.com>

* missing func

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* Add api end point to print the current database state in VTOrc (vitessio#15485)

Signed-off-by: Manan Gupta <manan@planetscale.com>

---------

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Co-authored-by: vitess-bot[bot] <108069721+vitess-bot[bot]@users.noreply.github.com>
Co-authored-by: Manan Gupta <manan@planetscale.com>
Co-authored-by: Manan Gupta <35839558+GuptaManan100@users.noreply.github.com>

* `slack-19.0`: `vtorc`: improve handling of partial cell topo results (#599)

* `vtorc`: improve handling of partial cell topo results

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* add unit test

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* improve test

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* add comments

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* move sort to test

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* goimports

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

---------

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* `slack-19.0`: skip tests that will fail on v15 downgrade testing (#605)

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* `slack-19.0`: Add stats for shards watched by VTOrc (#606)

* Add stats for shards watched by VTOrc

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* Use len() in make

---------

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* Add `GetServerStatus` RPC to use in PRS (vitessio#16022) (#607)

Signed-off-by: Manan Gupta <manan@planetscale.com>
Co-authored-by: Manan Gupta <35839558+GuptaManan100@users.noreply.github.com>

* backport/patch connection pool bug/perf fixes (#604)

* [release-19.0] smartconnpool: do not allow connections to starve (vitessio#17675) (vitessio#17683)

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
Co-authored-by: vitess-bot[bot] <108069721+vitess-bot[bot]@users.noreply.github.com>

* smartconnpool: Better handling for idle expiration (vitessio#17756)

Signed-off-by: Vicent Marti <vmg@strn.cat>

---------

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Co-authored-by: vitess-bot[bot] <108069721+vitess-bot[bot]@users.noreply.github.com>
Co-authored-by: Vicent Martí <42793+vmg@users.noreply.github.com>
Co-authored-by: Tim Vaillancourt <tim@timvaillancourt.com>

* pool: reopen connection closed by idle timeout (vitessio#17818) (#609)

Signed-off-by: Harshit Gangal <harshit@planetscale.com>
Signed-off-by: Vicent Martí <42793+vmg@users.noreply.github.com>
Co-authored-by: Harshit Gangal <harshit@planetscale.com>
Co-authored-by: Vicent Martí <42793+vmg@users.noreply.github.com>

* VReplication: Support excluding lagging tablets and use this in vstream manager (vitessio#17835) (#612)

* `slack-19.0`: backport v22 VTOrc optimizations, part 2 (#613)

* `vtorc`: remove duplicate instance read from backend (vitessio#17834)

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* `vtorc`: add index for `inst.ReadInstanceClusterAttributes` table scan

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

---------

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* Add stats for shards watched by VTOrc, purge stale shards (vitessio#17815) (#616)

* --consolidator-query-waiter-cap to set the max number of waiter for consolidated query (vitessio#17244) (#614)

Signed-off-by: Jun Wang <jun.wang@demonware.net>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Co-authored-by: jwang <121262788+jwangace@users.noreply.github.com>
Co-authored-by: Jun Wang <jun.wang@demonware.net>

* `slack-19.0` backport v22 `vtorc` optimizations + stats, part 3 (#618)

* Remove unused code in discovery queue creation (vitessio#17515)

Signed-off-by: Manan Gupta <manan@planetscale.com>

* vtorc: Cleanup unused code (vitessio#15508)

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

* `vtorc`: cleanup discover queue, add concurrency flag (vitessio#17825)

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* `vtorc`: add tablets watched stats

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* fix missing merge conflict update

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* `vtorc`: skip unnecessary `inst.ReadTablet` in `logic.LockShard(...)`

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* `vtorc`: use `errgroup` in keyspace/shard discovery

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* fix import

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* fix ineffassign

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* missing import

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* `vtorc`: add stats for discovery workers

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* get count from backend

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* rm unused map

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

---------

Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Co-authored-by: Manan Gupta <35839558+GuptaManan100@users.noreply.github.com>
Co-authored-by: Dirkjan Bussink <d.bussink@gmail.com>

* Bp pr 17558 pr 17858.slack19.0 (#615)

* VReplication: Improve error handling in VTGate VStreams (vitessio#17558)

Signed-off-by: Tom Thornton <thomaswilliamthornton@gmail.com>

* Backport vitessio#17858

---------

Signed-off-by: Tom Thornton <thomaswilliamthornton@gmail.com>

* `slack-19.0`: re-backport tweaks from vitessio#17911 (#621)

* fix bug in reverse `if`

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* simplify

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* add `ReadTabletCountsByShard` test

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* use map of map

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* capitalize Cell

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* gofmt lint

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* fix plural in names

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

---------

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* fix releasing the global read lock when mysqlshell backup fails (vitessio#17000) (#623)

Signed-off-by: Renan Rangel <rrangel@slack-corp.com>

* VStream API: allow keyspace-level heartbeats to be streamed (vitessio#16593) (#620)

* VStream API: allow keyspace-level heartbeats to be streamed (vitessio#16593)

Signed-off-by: Malcolm Akinje <makinje@slack-corp.com>

* `slack-19.0` backport v22 `vtorc` optimizations + stats, part 3 (#618)

* Remove unused code in discovery queue creation (vitessio#17515)

Signed-off-by: Manan Gupta <manan@planetscale.com>

* vtorc: Cleanup unused code (vitessio#15508)

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

* `vtorc`: cleanup discover queue, add concurrency flag (vitessio#17825)

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* `vtorc`: add tablets watched stats

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* fix missing merge conflict update

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* `vtorc`: skip unnecessary `inst.ReadTablet` in `logic.LockShard(...)`

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* `vtorc`: use `errgroup` in keyspace/shard discovery

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* fix import

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* fix ineffassign

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* missing import

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* `vtorc`: add stats for discovery workers

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* get count from backend

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* rm unused map

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

---------

Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Co-authored-by: Manan Gupta <35839558+GuptaManan100@users.noreply.github.com>
Co-authored-by: Dirkjan Bussink <d.bussink@gmail.com>

* Bp pr 17558 pr 17858.slack19.0 (#615)

* VReplication: Improve error handling in VTGate VStreams (vitessio#17558)

Signed-off-by: Tom Thornton <thomaswilliamthornton@gmail.com>

* Backport vitessio#17858

---------

Signed-off-by: Tom Thornton <thomaswilliamthornton@gmail.com>

* `slack-19.0`: re-backport tweaks from vitessio#17911 (#621)

* fix bug in reverse `if`

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* simplify

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* add `ReadTabletCountsByShard` test

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* use map of map

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* capitalize Cell

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* gofmt lint

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* fix plural in names

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

---------

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

---------

Signed-off-by: Malcolm Akinje <makinje@slack-corp.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Tom Thornton <thomaswilliamthornton@gmail.com>
Signed-off-by: Malcolm Akinje <malcolm.akinje@gmail.com>
Co-authored-by: Tim Vaillancourt <tim@timvaillancourt.com>
Co-authored-by: Manan Gupta <35839558+GuptaManan100@users.noreply.github.com>
Co-authored-by: Dirkjan Bussink <d.bussink@gmail.com>
Co-authored-by: Tom Thornton <thomaswilliamthornton@gmail.com>

* Increase health check channel buffer (vitessio#17821) (#625)

Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Malcolm Akinje <makinje@slack-corp.com>
Co-authored-by: Manan Gupta <35839558+GuptaManan100@users.noreply.github.com>

* VStream: Allow for automatic resume after Reshard across VStreams (vitessio#15393) (#627)

Signed-off-by: Tanjin Xu <tanjin.xu@slack-corp.com>
Co-authored-by: Matt Lord <mattalord@gmail.com>

---------

Signed-off-by: Malcolm Akinje <malcolm.akinje@gmail.com>
Signed-off-by: Malcolm Akinje <makinje@slack-corp.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Eduardo J. Ortega U. <5791035+ejortegau@users.noreply.github.com>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Harshit Gangal <harshit@planetscale.com>
Signed-off-by: Vicent Martí <42793+vmg@users.noreply.github.com>
Signed-off-by: Jun Wang <jun.wang@demonware.net>
Signed-off-by: Tom Thornton <thomaswilliamthornton@gmail.com>
Signed-off-by: Renan Rangel <rrangel@slack-corp.com>
Signed-off-by: Tanjin Xu <tanjin.xu@slack-corp.com>
Co-authored-by: Tanjin Xu <109303790+tanjinx@users.noreply.github.com>
Co-authored-by: Manan Gupta <35839558+GuptaManan100@users.noreply.github.com>
Co-authored-by: Eduardo J. Ortega U. <5791035+ejortegau@users.noreply.github.com>
Co-authored-by: Tim Vaillancourt <tim@timvaillancourt.com>
Co-authored-by: vitess-bot[bot] <108069721+vitess-bot[bot]@users.noreply.github.com>
Co-authored-by: Manan Gupta <manan@planetscale.com>
Co-authored-by: Vicent Martí <42793+vmg@users.noreply.github.com>
Co-authored-by: Harshit Gangal <harshit@planetscale.com>
Co-authored-by: Tom Thornton <thomaswilliamthornton@gmail.com>
Co-authored-by: jwang <121262788+jwangace@users.noreply.github.com>
Co-authored-by: Jun Wang <jun.wang@demonware.net>
Co-authored-by: Dirkjan Bussink <d.bussink@gmail.com>
Co-authored-by: Renan Rangel <rvrangel@users.noreply.github.com>
Co-authored-by: Matt Lord <mattalord@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Component: VTOrc Vitess Orchestrator integration Type: Enhancement Logical improvement (somewhere between a bug and feature)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants