Skip to content

Bug Report: Unexpected transaction rollback on client session that previously executed GET_LOCK #10130

@vkovacik

Description

@vkovacik

Overview of the Issue

We have been observing an unexpected ROLLBACK that is injected by vttablet into the client transaction. The simplified client transaction would look like (for the failure case):

BEGIN
INSERT INTO Test(i) VALUES(1)    - succeeds
ROLLBACK                                     - unexpected, injected by vttablet
<some other queries>
SELECT FROM Test where i=1       - returns no data (expected)
<some other quries>
COMMIT

When application detects an error (no data), we can see that SELECT returned no rows for the data inserted in the same transaction. When looking into vttablet query logs we found that vttablet executed ROLLBACK right after INSERT statement. Application was not aware that the transaction was affected and continued with SELECT which returned with no error but with no data.

We found that "unexpected ROLLBACK" was triggered when vtgate failed to execute lock heartbeat related to the same client connection.

scatter_conn.go:280] Locking heartbeat failed, held locks might be released: held locks released: vttablet: rpc error: code = Aborted desc = transaction 1650647255709171310: ended at 2022-04-22 21:21:27.249 UTC (exceeded timeout: 30s) (CallerID: userData1)

Error message transaction ... ended at 2022-04-22 21:21:27.249 UTC told us that something happened earlier on this client session which now affects our transaction. vttablet log shows that "some" other transaction had been killed on this connection before.

tx_pool.go:126] killing transaction (exceeded timeout: 30s): 1650647255709171310

We found that transaction 1650647255709171310 is related GET_LOCK/RELEASE_LOCK that was executed on the same client connection few minutes before our transaction.

UNEXPECTED BEHAVIOUR:

It seems that vttablet does not correctly clean up reserved connection (from GET_LOCK) after RELEASE_LOCK. 30 seconds later (queryserver-config-transaction-timeout) when client transaction is executed it will be unexpectedly rolled back by Vitess.

Reproduction Steps

  1. Deploy the following vschema in a sharded keyspace.
{
    "sharded": true,
    "vindexes": {
        "hash_vdx": {
            "type": "hash"
        }
    },
    "tables": {
        "Test": { "column_vindexes": [ { "column": "i", "name": "hash_vdx" } ] }
    }
}
  1. Create table.
CREATE TABLE Test(i int);
  1. Open mysql client session
SELECT GET_LOCK('my-lock', 5);
<wait 1s>
SELECT RELEASE_LOCK('my-lock');

<wait 35s>

BEGIN;
INSERT INTO Test(i) VALUES (1);
SELECT * FROM Test WHERE i=1;
COMMIT;

Unexpected behaviour:

mysql> SELECT * FROM Test WHERE i=1;
Empty set (0.01 sec)

Expected behaviour:

mysql> SELECT * FROM Test WHERE i=1;
+------+
| i    |
+------+
|    1 |
+------+
1 row in set (0.01 sec)

Binary Version

$ /vt/bin/vtgate --version
Version: 13.0.0 (Git revision bc4a9606e1 branch 'heads/v13.0.0') built on Tue Feb 22 14:23:16 UTC 2022 by vitess@buildkitsandbox using go1.17.6 linux/amd64

Default vttablet/mysqld configuration (timeouts not changed).

Operating System and Environment details

Kubernetes deployment, MySQL Percona version 5.7.36-39

Log Fragments

Logs from the above reproduction steps:

GET_LOCK/RELEASE_LOCK (vttablet query log):

Reserve	172.18.0.5:52250:/queryservice.Query/ReserveExecute(gRPC)	gRPC	'userData1'	'root'	2022-04-22 21:20:50.672964	2022-04-22 21:20:50.674007	0.001044		""	map[#maxLimit:type:INT64 value:"10001" vtg1:type:VARCHAR value:"my-lock" vtg2:type:INT64 value:"5"]	0	""	none	0.000000	0.000000	0	0	0	""	
Execute	172.18.0.5:52250:/queryservice.Query/ReserveExecute(gRPC)	gRPC	'userData1'	'root'	2022-04-22 21:20:50.674048	2022-04-22 21:20:50.674729	0.000683	Select	"select GET_LOCK(:vtg1, :vtg2) from dual"	map[#maxLimit:type:INT64 value:"10001" vtg1:type:VARCHAR value:"my-lock" vtg2:type:INT64 value:"5"]	1	"select GET_LOCK('my-lock', 5) from dual limit 10001"	mysql	0.000591	0.000000	0	1650647255709171310	1	""	
Execute	172.18.0.5:52250:/queryservice.Query/Execute(gRPC)	gRPC	'userData1'	'root'	2022-04-22 21:20:56.729215	2022-04-22 21:20:56.729856	0.000641	Select	"select RELEASE_LOCK(:vtg1) from dual"	map[#maxLimit:type:INT64 value:"10001" vtg1:type:VARCHAR value:"my-lock"]	1	"select RELEASE_LOCK('my-lock') from dual limit 10001"	mysql	0.000471	0.000000	0	0	1	""	

vtgate failed to execute lock heartbeat (vtgate error log):

W0422 21:21:45.931273    5651 scatter_conn.go:280] Locking heartbeat failed, held locks might be released: held locks released: vttablet: rpc error: code = Aborted desc = transaction 1650647255709171310: ended at 2022-04-22 21:21:27.249 UTC (exceeded timeout: 30s) (CallerID: userData1)

lock heartbeat failed on vttablet with:

W0422 21:21:45.930192    2515 tabletserver.go:1391] Code: ABORTED
transaction 1650647255709171310: ended at 2022-04-22 21:21:27.249 UTC (exceeded timeout: 30s) (CallerID: userData1)
: Sql: "select 1", BindVars: {}

likely because lock transaction (reserved connection on shard #1) was already killed (vttablet error log):

W0422 21:21:27.249041    2515 tx_pool.go:126] killing transaction (exceeded timeout: 30s): 1650647255709171310	

Client transaction - note ROLLBACK after INSERT (vttablet query log):

Begin	172.18.0.5:52250:/queryservice.Query/BeginExecute(gRPC)	gRPC	'userData1'	'root'	2022-04-22 21:21:45.929774	2022-04-22 21:21:45.930670	0.000896		"begin"	map[]	0	""	none	0.000000	0.000000	01650647255709171311	0	""	
Execute	172.18.0.5:52250:/queryservice.Query/BeginExecute(gRPC)	gRPC	'userData1'	'root'	2022-04-22 21:21:45.930691	2022-04-22 21:21:45.931636	0.000945	Insert	"insert into Test(i) values (:_i_0)"	map[_i_0:type:INT64 value:"1" vtg1:type:INT64 value:"1"]	1	"insert into Test(i) values (1)"	mysql	0.000858	0.000000	1	1650647255709171311	0	""	
Rollback	172.18.0.5:52250:/queryservice.Query/Rollback(gRPC)	gRPC	'userData1'	'root'	2022-04-22 21:21:45.933583	2022-04-22 21:21:45.952414	0.018831		"rollback"	map[]	0	""	none	0.000000	0.000000	0	1650647255709171311	0	""	
Execute	172.18.0.5:52250:/queryservice.Query/Execute(gRPC)	gRPC	'userData1'	'root'	2022-04-22 21:21:45.960452	2022-04-22 21:21:45.961422	0.000970	Select	"select * from Test where i = :vtg1"	map[#maxLimit:type:INT64 value:"10001" vtg1:type:INT64 value:"1"]	1	"select * from Test where i = 1 limit 10001"	mysql	0.000747	0.000017	0	0	0	""

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions