Overview of the Issue
We've seen vttablet crashing in CI endtoend tests; @harshit-gangal was able to reproduce and to obtain the following log:
fatal error: concurrent map iteration and map write
goroutine 39594 [running]:
vitess.io/vitess/go/vt/sqlparser.(*CommentDirectives).CachedSize(0x140005c4a10, 0xe0?)
vitess.io/vitess/go/vt/sqlparser/cached_size.go:787 +0x220
vitess.io/vitess/go/vt/sqlparser.(*ParsedComments).CachedSize(0x140024bcb80, 0x1)
vitess.io/vitess/go/vt/sqlparser/cached_size.go:3072 +0xc0
vitess.io/vitess/go/vt/sqlparser.(*RevertMigration).CachedSize(0x140015b97e8, 0x1)
vitess.io/vitess/go/vt/sqlparser/cached_size.go:3579 +0x48
vitess.io/vitess/go/vt/vttablet/tabletserver/planbuilder.(*Plan).CachedSize(0x14001fc8800, 0x1)
vitess.io/vitess/go/vt/vttablet/tabletserver/planbuilder/cached_size.go:72 +0x214
vitess.io/vitess/go/vt/vttablet/tabletserver.(*TabletPlan).CachedSize(0x140015c3a40, 0x1)
vitess.io/vitess/go/vt/vttablet/tabletserver/cached_size.go:31 +0x34
vitess.io/vitess/go/cache.NewDefaultCacheImpl.func1({0x105026c20?, 0x140015c3a40})
vitess.io/vitess/go/cache/cache.go:60 +0x44
vitess.io/vitess/go/cache/ristretto.(*Cache).processItems(0x1400014c180)
vitess.io/vitess/go/cache/ristretto/cache.go:466 +0x114
created by vitess.io/vitess/go/cache/ristretto.(*Cache).Clear
vitess.io/vitess/go/cache/ristretto/cache.go:362 +0x128
As it turns out, we have a read/write concurrency issue for RevertMigration.Comments, and it's subtle.
On one hand, vttablet's plan builder caches a RevertMigration plan:
|
case *sqlparser.RevertMigration: |
|
plan, err = &Plan{PlanID: PlanRevertMigration, FullStmt: stmt}, nil |
On the other hand, we assign RevertMigration.Comments here:
|
case *sqlparser.RevertMigration: |
|
comments = stmt.Comments |
(called by onlineddl/Executor.go)
And then, we iterate those comments. The subtlety is that reading those comments actually modifies them. This is a bit unexpected:
|
func (c *ParsedComments) Directives() *CommentDirectives { |
|
if c == nil { |
|
return nil |
|
} |
|
if c._directives == nil { |
|
c._directives = &CommentDirectives{m: make(map[string]string)} |
|
|
|
for _, commentStr := range c.comments { |
|
if commentStr[0:5] != commentDirectivePreamble { |
|
continue |
|
} |
|
|
|
// Split on whitespace and ignore the first and last directive |
|
// since they contain the comment start/end |
|
directives := strings.Fields(commentStr) |
|
for i := 1; i < len(directives)-1; i++ { |
|
directive, val, ok := strings.Cut(directives[i], "=") |
|
if !ok { |
|
val = "true" |
|
} |
|
c._directives.m[strings.ToLower(directive)] = val |
|
} |
|
} |
|
} |
|
return c._directives |
|
} |
We should address this concurrency issue.
Reproduction Steps
Run enough of REVERT VITESS_MIGRATION commands concurrently; or just re-run onlineddl_revert CI job.
Binary Version
Operating System and Environment details
Log Fragments
No response
Overview of the Issue
We've seen
vttabletcrashing in CIendtoendtests; @harshit-gangal was able to reproduce and to obtain the following log:As it turns out, we have a read/write concurrency issue for
RevertMigration.Comments, and it's subtle.On one hand,
vttablet's plan builder caches aRevertMigrationplan:vitess/go/vt/vttablet/tabletserver/planbuilder/plan.go
Lines 226 to 227 in f97c369
On the other hand, we assign
RevertMigration.Commentshere:vitess/go/vt/schema/online_ddl.go
Lines 271 to 272 in f97c369
(called by
onlineddl/Executor.go)And then, we iterate those comments. The subtlety is that reading those comments actually modifies them. This is a bit unexpected:
vitess/go/vt/sqlparser/comments.go
Lines 233 to 258 in f97c369
We should address this concurrency issue.
Reproduction Steps
Run enough of
REVERT VITESS_MIGRATIONcommands concurrently; or just re-runonlineddl_revertCI job.Binary Version
Operating System and Environment details
Log Fragments
No response