You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is an RFC for adding Foreign Key Support in Vitess.
Use Case
The use case is to support cross-shard foreign key constraints for a keyspace to establish relations between different tables.
Current restrictions require the schema to be structured such that rows for tables linked by foreign keys need to live on the same shard. This constrains schema design and sharding key options.
After the suggested changes, we would be able to support adding arbitrary foreign keys irrespective of the vschema / sharding key configuration.
erDiagram
Area {
id int PK
name varchar
zipcode int
}
Product {
id int PK
name varchar
}
Order {
id int PK
produce_id int FK
customer_id int FK
}
Customer {
id int PK
name varchar
area_id int FK
}
Contact {
id int PK
contact varchar
customer_id int FK
}
Product ||--}o Order :""
Customer ||--}o Order :""
Customer ||--}o Contact :""
Area ||--}o Customer :""
Loading
The data we insert for the examples that follow are -
This section dives into more specific details of what we intend to do to support foreign keys.
Basic Design
Foreign keys will be passed down to MySQL and created there. This will give us inherent support for information schema queries since foreign key definitions will exist on MySQL level.
We will set FOREIGN_KEY_CHECKS=0 on all vttablet connections so that DML queries don't fail because of these constraints on MySQL. The constraint verification will be done in vtgates.
Vitess will be aware of the foreign key constraints and enforce them on all DMLs.
We’ll change Schema Tracking to start using GetSchema RPC call instead of execute Select statements directly. Schema tracking will then be used for getting the create definition calls from GetSchema.
On vtgate we’ll create a graph of foreign key constraints (table-relationship graph). We’ll also need to store default values for all columns. (Required for SET DEFAULT).
Description of the data structure to follow.
We’ll also need to store the uniqueness constraints for the table. (Not required immediately, but will be needed for ON DUPLICATE KEY UPDATE support).
Note - There is a delay between the DDL operations executing and between schema tracking updating vtgate. Needs discussion.
foreign_key_mode is a flag that already exists in VTGate and it controls whether VTGates allow passing the foreign key constraints to MySQL or erroring out. We’ll deprecate that flag and put ForeignKeyMode as a VSchema configuration.
Planning INSERTs
While planning inserts, we’ll consult the table-relationship graph and get the foreign key constraints where the column is a child and plan the SELECT validation queries to make sure the corresponding parent row exists. We’ll get a shared lock on the SELECT.
If we’re in AUTOCOMMIT mode, we still have to start a transaction.
Overall steps look like -
Step 1: If there is no transaction, we’ll start one.
Step 2: Execute all the validation queries parallel and get a lock in share mode
Step 3: If any of those validation queries return empty results, we fail and rollback (only if the transaction was started implicitly).
Step 4: Execute the Insert and vindex updates.
Step 5: commit the transaction (if we started one in step 1).
ON DUPLICATE KEY UPDATE
Plan the update and insert. Runtime defines which one you execute.
Run a SELECT statement to know which side to run.
Example
For example, if the user was to execute insert into orders (id, product_id, customer_id) values (4, :a, :b);, the set of steps that Vitess would take -
Start a transaction. START TRANSACTION
Validation queries - SELECT 1 FROM product WHERE ID = :a FOR SHARE, SELECT 1 FROM customer WHERE ID = :b FOR SHARE
If both the queries return a row, then we execute the insert.
COMMIT
Planning UPDATEs and DELETEs
Planning of UPDATE and DELETE depends on the referential actions that they are configured with. Let's dive into each one that MySQL allows
RESTRICT/NO ACTION (default)
In this configuration, MySQL rejects any update or delete to a row which has a column that is a parent in a foreign key relation.
Planning for this, is very similar to INSERTS, in the part that before we run the update or delete we need to verify if there exists a foreign key constraint that references the row.
There are subtle differences between the two though, the first being that here we care for foreign key constraints where the table being updated is the parent in the relation, the second is that here we fail if we find a single row matching our SELECT validation query, the third and the most interesting difference is that while planning INSERTs we have the full list of column values being inserted at plan time, but for updates and deletes, we'll only know the column values once we run the query!
Overall, the steps look like the following -
Step 1: If there is no transaction, we’ll start one.
Step 2: Convert the UPDATE/DELETE to a SELECT that returns the rows that are being updated/deleted.
Step 3: Run the SELECT query and use these results to generate the validation queries.
Step 4: Execute all the validation queries parallel and get a lock in share mode
Step 5: If any of those validation queries return any results, we fail and roll back (only if the transaction was started implicitly).
Step 6: Execute the Update/Delete and vindex updates.
Step 7: commit the transaction (if we started one in step 1).
Possible abuse from the user.
If the user runs a query like DELETE FROM customer which tries to bulk delete a lot of rows, then vtgate will end up reading a huge list of rows and the SELECT validation query it executes might be extremely large too. This could lead to OOMs. We can add a LIMIT clause to the equivalent SELECT statement of the DELETE and reject such mass updates/deletes if we get more results than the LIMIT allows.
Example
For example, if the user was to execute DELETE FROM customer WHERE area_id = 2;, then Vitess would need to take the following steps -
Start a transaction. START TRANSACTION
Convert the DELETE query into a SELECT with the same WHERE clause. So Vitess would execute SELECT id FROM customer WHERE area_id = 2;. We would get back the following result -
These results will drive the next validation queries we run. Since the customer table has 2 foreign key constraints where it is the parent, we'll need a validation query for both of them - SELECT 1 FROM contact WHERE (customer_id) in ((1), (3)) Limit 1 FOR SHARE and SELECT 1 FROM orders WHERE (customer_id) in ((1), (3)) Limit 1 FOR SHARE.
If any of the validation queries return any rows, we fail and rollback.
Execute the delete and vindex updates (if required).
COMMIT.
CASCADE
In this configuration, MySQL cascades any updates/deletes to the children in the foreign key constraints. So a deletion of the parent will trigger the deletion of the child rows that are referenced by it.
Planning for CASCADEs is a little complex since the rows that we need to cascade the delete too will only be available on run time. Those rows themselves could have foreign key constraints that could fail the operation, in which case, the entire transaction has to be rolled back. In the RESTRICT case, we didn't do any writes until we knew it was going to succeed, so we never had to rollback writes. In this case however, we might need to rollback writes if a cascaded delete fails down the line (because that could have a RESTRICT constraint on it).
Overall, the steps look like the following -
Step 1: If there is no transaction, we’ll start one.
Step 2: Convert the UPDATE/DELETE to a SELECT that returns the rows that are being updated/deleted.
Step 3: Run the SELECT query and use these results to find the rows that need to have DELETE/UPDATEs cascaded to.
Step 4: Execute the original Update/Delete and vindex updates.
Step 5: Repeat the UPDATE/DELETE for the children rows in the same transaction. Do this until no further cascades are required.
Step 5: If any of them fail, we'll need to ROLLBACK.
Step 6: commit the transaction (if we started one in step 1).
Possible abuse from the user.
Cascading deletes can end up deleting a lot of rows, especially since deleting a child row could trigger a cascade from a foreign key constraint for a different column other than the child column in the original foreign key constraint. We could add the LIMIT clause to the SELECTs, but in this case it won't be enough, since each row deletion would lead to another SELECT query. We would have to impose an overall limit on the vtgate to prevent OOMs.
SET NULL
The planning for SET NULL is very similar to CASCADE. After finding the children rows of the foreign key constraint, we would need to SET the children column to NULLs, so DELETE queries on the parents would trigger an UPDATE on the children rows.
SET DEFAULT
Not supported in InnoDb. We still need to decide if we need to support this in Vitess.
If we decide to support it, then it is very similar to SET NULL. Only difference being that instead of setting NULL, we'll set the default value after finding it from our schema tracking data.
Planning REPLACE
Currently REPLACE statements are only supported in unsharded mode.
To support REPLACE we'll plan the DELETE and INSERT and execute a SELECT query to decide if a DELETE is necessary.
Important Considerations
Checking for constraints will lead to cross-shard transactions. If the INSERT/UPDATE/DELETE only touch one row (including CASCADEs), then the cross-shard transaction will only be writing in one shard. All the queries executed in other shards will only be SELECT... FOR SHARE statements. So, in case of point updates, we don't have any risk any partial commits/inconsistent state. The write being successful will just be contingent on the COMMIT succeeding in the shard having the write. For DMLs that touch more than 1 row, this guarantee can't be provided and the cross-shard transaction will be best effort. It can leave the database in an inconsistent state in case of partial failure during commit phase.
Users should still try to design schema such that parent and child row lives in the same shard as it will be more optimized for enforcing FK constraints.
Cyclic foreign key constraints: The design we described above will work fine even when two tables have foreign key references to each other as long as it is not cyclic on the column level. Having a cycle of foreign key constraints where a single column is both a parent and a child in the cycle, then that would mean that no data can be inserted into that column ever! Since to insert data into that column, the data should already exist in that column! We don't plan to worry about this initially, we'll see if we need to add support for this eventually. (It can be done on MySQL. Inserting data has to be done by setting FOREIGN_KEY_CHECKS to 0).
flowchart TD
subgraph This is acceptable
Table1
col1[[col1]]
col2[[col2]]
Table1 --- col1
Table1 --- col2
Table2
col3[[col3]]
col4[[col4]]
Table2 --- col3
Table2 --- col4
col1 -.-> col3
col4 -.-> col2
end
subgraph This won't work
Table3
col5[[col1]]
col6[[col2]]
Table3 --- col5
Table3 --- col6
Table4
col8[[col3]]
col7[[col4]]
Table4 --- col8
Table4 --- col7
col5 -.-> col8
col8 -.-> col5
end
Loading
Data structure to store FK constraints in VSchema
Schema tracking will give us a list of foreign key constraints as part of the SHOW CREATE TABLE output. We want to store this
output in the VSchema struct in a form that gives us the best performance while planning.
We'll need to answer queries of the following sorts -
For a given table, find all the foreign key constraints where it is a child. (Needed for planning INSERTs)
For a given table, find all the foreign key constraints where it is a parent. (Needed for planning DELETE's and UPDATE`s)
The VSchema struct stores a map of KeyspaceSchema for each keyspace. Within a KeyspaceSchema we have a map of Table.
We'll store the foreign key constraints inside this Table struct.
Essentially, we'll store the list of foreign key constraints where the table is a parent and a list where it is a child.
The ForeignKeyConstraint struct would look something like this -
typeForeignKeyConstraintstruct {
ParentTableTableNameParentColumns []ColumnsChildTableTableNameChildColumns []ColumnsOnDeleteActionint// This will be an enumOnUpdateActionint
}
Performance Improvements
We can get MySQL to run the INSERT, UPDATE/DELETE (with Restrict) checks for us by using FOREIGN_KEY_CHECKS=1 on the connection for unsharded and single-sharded cases.
For CASCADE and SET NULL we want vtgate to split them up into separate queries so that we have binlog entries for them and vreplication operations like Reshard and MoveTables work.
Phases
Minimal Viable Product
Support for basic INSERT, UPDATE and DELETE statements for unsharded.
RESTRICT/NO ACTION, CASCADE, SET NULL mode for foreign key constraints will be supported.
Support for ON DUPLICATE KEY UPDATE in INSERTs for unsharded.
Support for REPLACE for unsharded.
Phase two
INSERT/UPDATE/DELETE ... (SELECT) (SELECT subquery in DMLs) for unsharded.
Introduction
This is an RFC for adding Foreign Key Support in Vitess.
Use Case
Scope of the project
Out of Scope
Schema
The following is the schema we will use for providing examples as we discuss the design of the foreign key support.
MySQL Schema
erDiagram Area { id int PK name varchar zipcode int } Product { id int PK name varchar } Order { id int PK produce_id int FK customer_id int FK } Customer { id int PK name varchar area_id int FK } Contact { id int PK contact varchar customer_id int FK } Product ||--}o Order :"" Customer ||--}o Order :"" Customer ||--}o Contact :"" Area ||--}o Customer :""The data we insert for the examples that follow are -
Design
This section dives into more specific details of what we intend to do to support foreign keys.
Basic Design
FOREIGN_KEY_CHECKS=0on all vttablet connections so that DML queries don't fail because of these constraints on MySQL. The constraint verification will be done in vtgates.SET DEFAULT).ON DUPLICATE KEY UPDATEsupport).foreign_key_modeis a flag that already exists in VTGate and it controls whether VTGates allow passing the foreign key constraints to MySQL or erroring out. We’ll deprecate that flag and putForeignKeyModeas a VSchema configuration.Planning
INSERTsON DUPLICATE KEY UPDATEExample
For example, if the user was to execute
insert into orders (id, product_id, customer_id) values (4, :a, :b);, the set of steps that Vitess would take -START TRANSACTIONSELECT 1 FROM product WHERE ID = :a FOR SHARE,SELECT 1 FROM customer WHERE ID = :b FOR SHARECOMMITPlanning
UPDATEsandDELETEsUPDATEandDELETEdepends on the referential actions that they are configured with. Let's dive into each one that MySQL allowsRESTRICT/NO ACTION(default)SELECTvalidation query, the third and the most interesting difference is that while planning INSERTs we have the full list of column values being inserted at plan time, but for updates and deletes, we'll only know the column values once we run the query!UPDATE/DELETEto aSELECTthat returns the rows that are being updated/deleted.SELECTquery and use these results to generate the validation queries.DELETE FROM customerwhich tries to bulk delete a lot of rows, then vtgate will end up reading a huge list of rows and theSELECTvalidation query it executes might be extremely large too. This could lead to OOMs. We can add aLIMITclause to the equivalentSELECTstatement of theDELETEand reject such mass updates/deletes if we get more results than theLIMITallows.Example
For example, if the user was to execute
DELETE FROM customer WHERE area_id = 2;, then Vitess would need to take the following steps -START TRANSACTIONDELETEquery into aSELECTwith the sameWHEREclause. So Vitess would executeSELECT id FROM customer WHERE area_id = 2;. We would get back the following result -customertable has 2 foreign key constraints where it is the parent, we'll need a validation query for both of them -SELECT 1 FROM contact WHERE (customer_id) in ((1), (3)) Limit 1 FOR SHAREandSELECT 1 FROM orders WHERE (customer_id) in ((1), (3)) Limit 1 FOR SHARE.COMMIT.CASCADERESTRICTcase, we didn't do any writes until we knew it was going to succeed, so we never had to rollback writes. In this case however, we might need to rollback writes if a cascaded delete fails down the line (because that could have a RESTRICT constraint on it).UPDATE/DELETEto aSELECTthat returns the rows that are being updated/deleted.SELECTquery and use these results to find the rows that need to have DELETE/UPDATEs cascaded to.UPDATE/DELETEfor the children rows in the same transaction. Do this until no further cascades are required.ROLLBACK.LIMITclause to the SELECTs, but in this case it won't be enough, since each row deletion would lead to another SELECT query. We would have to impose an overall limit on the vtgate to prevent OOMs.SET NULLSET NULLis very similar toCASCADE. After finding the children rows of the foreign key constraint, we would need to SET the children column to NULLs, so DELETE queries on the parents would trigger an UPDATE on the children rows.SET DEFAULTSET NULL. Only difference being that instead of setting NULL, we'll set the default value after finding it from our schema tracking data.Planning
REPLACEREPLACEstatements are only supported in unsharded mode.REPLACEwe'll plan the DELETE and INSERT and execute aSELECTquery to decide if aDELETEis necessary.Important Considerations
INSERT/UPDATE/DELETEonly touch one row (including CASCADEs), then the cross-shard transaction will only be writing in one shard. All the queries executed in other shards will only beSELECT... FOR SHAREstatements. So, in case of point updates, we don't have any risk any partial commits/inconsistent state. The write being successful will just be contingent on theCOMMITsucceeding in the shard having the write. For DMLs that touch more than 1 row, this guarantee can't be provided and the cross-shard transaction will be best effort. It can leave the database in an inconsistent state in case of partial failure during commit phase.FOREIGN_KEY_CHECKSto 0).flowchart TD subgraph This is acceptable Table1 col1[[col1]] col2[[col2]] Table1 --- col1 Table1 --- col2 Table2 col3[[col3]] col4[[col4]] Table2 --- col3 Table2 --- col4 col1 -.-> col3 col4 -.-> col2 end subgraph This won't work Table3 col5[[col1]] col6[[col2]] Table3 --- col5 Table3 --- col6 Table4 col8[[col3]] col7[[col4]] Table4 --- col8 Table4 --- col7 col5 -.-> col8 col8 -.-> col5 endData structure to store FK constraints in VSchema
Schema tracking will give us a list of foreign key constraints as part of the
SHOW CREATE TABLEoutput. We want to store thisoutput in the
VSchemastruct in a form that gives us the best performance while planning.We'll need to answer queries of the following sorts -
INSERTs)DELETE's andUPDATE`s)The
VSchemastruct stores a map ofKeyspaceSchemafor each keyspace. Within aKeyspaceSchemawe have a map ofTable.We'll store the foreign key constraints inside this
Tablestruct.We'll add 2 more fields to the
Tablestruct -Essentially, we'll store the list of foreign key constraints where the table is a parent and a list where it is a child.
The
ForeignKeyConstraintstruct would look something like this -Performance Improvements
INSERT,UPDATE/DELETE(with Restrict) checks for us by usingFOREIGN_KEY_CHECKS=1on the connection for unsharded and single-sharded cases.Phases
INSERT,UPDATEandDELETEstatements for unsharded.RESTRICT/NO ACTION,CASCADE,SET NULLmode for foreign key constraints will be supported.ON DUPLICATE KEY UPDATEin INSERTs for unsharded.REPLACEfor unsharded.INSERT/UPDATE/DELETE ... (SELECT)(SELECT subquery in DMLs) for unsharded.Prerequisites
INSERTplanning in Gen4. Gen4: move insert planner to gen4 #12934GetSchemaRPC to also work forTableandAlltype of input #13197Tasks
- [ ] On the vttablet, we need FOREIGN_KEY_CHECKS=0.INSERTplanning #13676UPDATEplanning #13762DELETEplanning #13746