Skip to content

Bug Report: vttablet crash due to read/write concurrency issue for RevertMigration.Comments #13699

@shlomi-noach

Description

@shlomi-noach

Overview of the Issue

We've seen vttablet crashing in CI endtoend tests; @harshit-gangal was able to reproduce and to obtain the following log:

fatal error: concurrent map iteration and map write
​
goroutine 39594 [running]:
vitess.io/vitess/go/vt/sqlparser.(*CommentDirectives).CachedSize(0x140005c4a10, 0xe0?)
	vitess.io/vitess/go/vt/sqlparser/cached_size.go:787 +0x220
vitess.io/vitess/go/vt/sqlparser.(*ParsedComments).CachedSize(0x140024bcb80, 0x1)
	vitess.io/vitess/go/vt/sqlparser/cached_size.go:3072 +0xc0
vitess.io/vitess/go/vt/sqlparser.(*RevertMigration).CachedSize(0x140015b97e8, 0x1)
	vitess.io/vitess/go/vt/sqlparser/cached_size.go:3579 +0x48
vitess.io/vitess/go/vt/vttablet/tabletserver/planbuilder.(*Plan).CachedSize(0x14001fc8800, 0x1)
	vitess.io/vitess/go/vt/vttablet/tabletserver/planbuilder/cached_size.go:72 +0x214
vitess.io/vitess/go/vt/vttablet/tabletserver.(*TabletPlan).CachedSize(0x140015c3a40, 0x1)
	vitess.io/vitess/go/vt/vttablet/tabletserver/cached_size.go:31 +0x34
vitess.io/vitess/go/cache.NewDefaultCacheImpl.func1({0x105026c20?, 0x140015c3a40})
	vitess.io/vitess/go/cache/cache.go:60 +0x44
vitess.io/vitess/go/cache/ristretto.(*Cache).processItems(0x1400014c180)
	vitess.io/vitess/go/cache/ristretto/cache.go:466 +0x114
created by vitess.io/vitess/go/cache/ristretto.(*Cache).Clear
	vitess.io/vitess/go/cache/ristretto/cache.go:362 +0x128
​

As it turns out, we have a read/write concurrency issue for RevertMigration.Comments, and it's subtle.

On one hand, vttablet's plan builder caches a RevertMigration plan:

case *sqlparser.RevertMigration:
plan, err = &Plan{PlanID: PlanRevertMigration, FullStmt: stmt}, nil

On the other hand, we assign RevertMigration.Comments here:

case *sqlparser.RevertMigration:
comments = stmt.Comments

(called by onlineddl/Executor.go)

And then, we iterate those comments. The subtlety is that reading those comments actually modifies them. This is a bit unexpected:

func (c *ParsedComments) Directives() *CommentDirectives {
if c == nil {
return nil
}
if c._directives == nil {
c._directives = &CommentDirectives{m: make(map[string]string)}
for _, commentStr := range c.comments {
if commentStr[0:5] != commentDirectivePreamble {
continue
}
// Split on whitespace and ignore the first and last directive
// since they contain the comment start/end
directives := strings.Fields(commentStr)
for i := 1; i < len(directives)-1; i++ {
directive, val, ok := strings.Cut(directives[i], "=")
if !ok {
val = "true"
}
c._directives.m[strings.ToLower(directive)] = val
}
}
}
return c._directives
}

We should address this concurrency issue.

Reproduction Steps

Run enough of REVERT VITESS_MIGRATION commands concurrently; or just re-run onlineddl_revert CI job.

Binary Version

-

Operating System and Environment details

-

Log Fragments

No response

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions