RFC: Foreign Key Support in Vitess

# Introduction

This is an RFC for adding Foreign Key Support in Vitess.

# Use Case

- The use case is to support cross-shard foreign key constraints for a keyspace to establish relations between different tables.
- Current restrictions require the schema to be structured such that rows for tables linked by foreign keys need to live on the same shard. This constrains schema design and sharding key options.
- After the suggested changes, we would be able to support adding arbitrary foreign keys irrespective of the vschema / sharding key configuration.

# Scope of the project

- Foreign keys in a single keyspace. 
- Foreign keys will be created in MySQL. [Online DDL with vanilla MySQL](https://vitess.io/blog/2021-06-15-online-ddl-why-no-fk/) will not work if foreign keys are created. 

# Out of Scope
- Cross-keyspace foreign key support.

# Schema

The following is the schema we will use for providing examples as we discuss the design of the foreign key support.

<details><summary>
MySQL Schema

```mysql
mysql [localhost:8032] {msandbox} (foreign_key_rfc) > show tables;
+---------------------------+
| Tables_in_foreign_key_rfc |
+---------------------------+
| area                      |
| contact                   |
| customer                  |
| orders                    |
| product                   |
+---------------------------+
5 rows in set (0.00 sec)
```
</summary>

```mysql
mysql [localhost:8032] {msandbox} (foreign_key_rfc) > show create table area;
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                                        |
  +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  | area  | CREATE TABLE `area` (
                                  `id` int NOT NULL,
                                  `name` varchar(30) DEFAULT NULL,
                                  `zipcode` int DEFAULT NULL,
                                  PRIMARY KEY (`id`)
            ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql [localhost:8032] {msandbox} (foreign_key_rfc) > show create table contact;
+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table   | Create Table                                                                                                                                                                                                                                                                                                                                  |
  +---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  | contact | CREATE TABLE `contact` (
                                       `id` int NOT NULL,
                                       `contactnum` varchar(10) DEFAULT NULL,
                                       `customer_id` int DEFAULT NULL,
                                       PRIMARY KEY (`id`),
                                       KEY `customer_id` (`customer_id`),
                                       CONSTRAINT `contact_ibfk_1` FOREIGN KEY (`customer_id`) REFERENCES `customer` (`id`)
              ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql [localhost:8032] {msandbox} (foreign_key_rfc) > show create table customer;
+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table    | Create Table                                                                                                                                                                                                                                                                                                          |
  +----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  | customer | CREATE TABLE `customer` (
                                         `id` int NOT NULL,
                                         `name` varchar(30) DEFAULT NULL,
                                         `area_id` int DEFAULT NULL,
                                         PRIMARY KEY (`id`),
                                         KEY `area_id` (`area_id`),
                                         CONSTRAINT `customer_ibfk_1` FOREIGN KEY (`area_id`) REFERENCES `area` (`id`)
               ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

mysql [localhost:8032] {msandbox} (foreign_key_rfc) > show create table orders;
+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table  | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                              |
  +--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  | orders | CREATE TABLE `orders` (
                                     `id` int DEFAULT NULL,
                                     `product_id` int DEFAULT NULL,
                                     `customer_id` int DEFAULT NULL,
                                     KEY `product_id` (`product_id`),
                                     KEY `customer_id` (`customer_id`),
                                     CONSTRAINT `orders_ibfk_1` FOREIGN KEY (`product_id`) REFERENCES `product` (`id`),
                                     CONSTRAINT `orders_ibfk_2` FOREIGN KEY (`customer_id`) REFERENCES `customer` (`id`)
             ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql [localhost:8032] {msandbox} (foreign_key_rfc) > show create table product;
+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table   | Create Table                                                                                                                                                             |
  +---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  | product | CREATE TABLE `product` (
                                       `id` int NOT NULL,
                                       `name` varchar(30) DEFAULT NULL,
                                       PRIMARY KEY (`id`)
              ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
```
</details>

```mermaid
erDiagram
    Area {
        id int PK
        name varchar
        zipcode int
    }

    Product {
        id int PK
        name varchar
    }

    Order {
        id int PK
        produce_id int FK
        customer_id int FK
    }

    Customer {
        id int PK
        name varchar
        area_id int FK
    }

    Contact {
        id int PK
        contact varchar
        customer_id int FK
    }

    Product ||--}o Order :""
    Customer ||--}o Order :""
    Customer ||--}o Contact :""
    Area ||--}o Customer :""
```

The data we insert for the examples that follow are -
```mysql
delete from area;
delete from product;
delete from customer;
delete from orders;
delete from contact;
insert into product values ('1', 'Fan'), ('2', 'Cooler'), ('3', 'AC');
insert into area values(1, 'US', '521763'), (2, 'India', '432143');
insert into customer values(1, 'Manan', 2), (2, 'Andres', 1), (3, 'Harshit', 2);
insert into contact values (1, '897876876', 3), (2, '234123453', 1), (3, '789975234', 2), (4, '343214314', 1);
insert into orders values (1, 1, 2), (2, 1, 1), (3, 1, 1), (4, 2, 1), (4, 2, 3);
```

# Design

This section dives into more specific details of what we intend to do to support foreign keys.

## Basic Design

- Foreign keys will be passed down to MySQL and created there. This will give us inherent support for information schema queries since foreign key definitions will exist on MySQL level.
- We will set `FOREIGN_KEY_CHECKS=0` on all vttablet connections so that DML queries don't fail because of these constraints on MySQL. The constraint verification will be done in vtgates.
- Vitess will be aware of the foreign key constraints and enforce them on all DMLs.
  - We’ll change Schema Tracking to start using GetSchema RPC call instead of execute Select statements directly. Schema tracking will then be used for getting the create definition calls from GetSchema.
  - On vtgate we’ll create a graph of foreign key constraints (table-relationship graph). We’ll also need to store default values for all columns. (Required for `SET DEFAULT`).
  - Description of the data structure to follow.
  - We’ll also need to store the uniqueness constraints for the table. (Not required immediately, but will be needed for `ON DUPLICATE KEY UPDATE` support).
  - **Note - There is a delay between the DDL operations executing and between schema tracking updating vtgate. Needs discussion.**
-  `foreign_key_mode` is a flag that already exists in VTGate and it controls whether VTGates allow passing the foreign key constraints to MySQL or erroring out. We’ll deprecate that flag and put `ForeignKeyMode` as a VSchema configuration.

## Planning `INSERTs`

- While planning inserts, we’ll consult the table-relationship graph and get the foreign key constraints where the column is a child and plan the SELECT validation queries to make sure the corresponding parent row exists. We’ll get a shared lock on the SELECT.
- If we’re in AUTOCOMMIT mode, we still have to start a transaction.
- Overall steps look like -
  - Step 1: If there is no transaction, we’ll start one.
  - Step 2: Execute all the validation queries parallel and get a lock in share mode
  - Step 3: If any of those validation queries return empty results, we fail and rollback (only if the transaction was started implicitly).
  - Step 4: Execute the Insert and vindex updates.
  - Step 5: commit the transaction (if we started one in step 1).
- `ON DUPLICATE KEY UPDATE`
  - Plan the update and insert. Runtime defines which one you execute.
  - Run a SELECT statement to know which side to run.
#### Example
For example, if the user was to execute `insert into orders (id, product_id, customer_id) values (4, :a, :b);`, the set of steps that Vitess would take -
1. Start a transaction. `START TRANSACTION`
2. Validation queries - `SELECT 1 FROM product WHERE ID = :a FOR SHARE`, `SELECT 1 FROM customer WHERE ID = :b FOR SHARE`
3. If both the queries return a row, then we execute the insert.
4. `COMMIT`

## Planning `UPDATEs` and `DELETEs`
- Planning of `UPDATE` and `DELETE` depends on the referential actions that they are configured with. Let's dive into each one that MySQL allows

### `RESTRICT`/`NO ACTION` (default)
- In this configuration, MySQL rejects any update or delete to a row which has a column that is a parent in a foreign key relation.
- Planning for this, is very similar to INSERTS, in the part that before we run the update or delete we need to verify if there exists a foreign key constraint that references the row.
- There are subtle differences between the two though, the first being that here we care for foreign key constraints where the table being updated is the parent in the relation, the second is that here we fail if we find a single row matching our `SELECT` validation query, the third and the most interesting difference is that while planning INSERTs we have the full list of column values being inserted at plan time, but for updates and deletes, we'll only know the column values once we run the query!
- Overall, the steps look like the following -
  - Step 1: If there is no transaction, we’ll start one.
  - Step 2: Convert the `UPDATE`/`DELETE` to a `SELECT` that returns the rows that are being updated/deleted.
  - Step 3: Run the `SELECT` query and use these results to generate the validation queries.
  - Step 4: Execute all the validation queries parallel and get a lock in share mode
  - Step 5: If any of those validation queries return any results, we fail and roll back (only if the transaction was started implicitly).
  - Step 6: Execute the Update/Delete and vindex updates.
  - Step 7: commit the transaction (if we started one in step 1).
- Possible abuse from the user.
  - If the user runs a query like `DELETE FROM customer` which tries to bulk delete a lot of rows, then vtgate will end up reading a huge list of rows and the `SELECT` validation query it executes might be extremely large too. This could lead to OOMs. We can add a `LIMIT` clause to the equivalent `SELECT` statement of the `DELETE` and reject such mass updates/deletes if we get more results than the `LIMIT` allows.
#### Example
For example, if the user was to execute `DELETE FROM customer WHERE area_id = 2;`, then Vitess would need to take the following steps -
1. Start a transaction. `START TRANSACTION`
2. Convert the `DELETE` query into a `SELECT` with the same `WHERE` clause. So Vitess would execute `SELECT id FROM customer WHERE area_id = 2;`. We would get back the following result -
  ```mysql
  +----+
  | id |
  +----+
  |  1 |
  |  3 |
  +----+
  2 rows in set (0.00 sec)
  ``` 
3. These results will drive the next validation queries we run. Since the `customer` table has 2 foreign key constraints where it is the parent, we'll need a validation query for both of them - `SELECT 1 FROM contact WHERE (customer_id) in ((1), (3)) Limit 1 FOR SHARE` and `SELECT 1 FROM orders WHERE (customer_id) in ((1), (3)) Limit 1 FOR SHARE`.
4. If any of the validation queries return any rows, we fail and rollback.
5. Execute the delete and vindex updates (if required).
6. `COMMIT`.

### `CASCADE`
- In this configuration, MySQL cascades any updates/deletes to the children in the foreign key constraints. So a deletion of the parent will trigger the deletion of the child rows that are referenced by it.
- Planning for CASCADEs is a little complex since the rows that we need to cascade the delete too will only be available on run time. Those rows themselves could have foreign key constraints that could fail the operation, in which case, the entire transaction has to be rolled back. In the `RESTRICT` case, we didn't do any writes until we knew it was going to succeed, so we never had to rollback writes. In this case however, we might need to rollback writes if a cascaded delete fails down the line (because that could have a RESTRICT constraint on it).
- Overall, the steps look like the following -
  - Step 1: If there is no transaction, we’ll start one.
  - Step 2: Convert the `UPDATE`/`DELETE` to a `SELECT` that returns the rows that are being updated/deleted.
  - Step 3: Run the `SELECT` query and use these results to find the rows that need to have DELETE/UPDATEs cascaded to.
  - Step 4: Execute the original Update/Delete and vindex updates.
  - Step 5: Repeat the `UPDATE`/`DELETE` for the children rows in the same transaction. Do this until no further cascades are required.
  - Step 5: If any of them fail, we'll need to `ROLLBACK`.
  - Step 6: commit the transaction (if we started one in step 1).
- Possible abuse from the user.
  - Cascading deletes can end up deleting a lot of rows, especially since deleting a child row could trigger a cascade from a foreign key constraint for a different column other than the child column in the original foreign key constraint. We could add the `LIMIT` clause to the SELECTs, but in this case it won't be enough, since each row deletion would lead to another SELECT query. We would have to impose an overall limit on the vtgate to prevent OOMs.
### `SET NULL`
- The planning for `SET NULL` is very similar to `CASCADE`. After finding the children rows of the foreign key constraint, we would need to SET the children column to NULLs, so DELETE queries on the parents would trigger an UPDATE on the children rows.
### `SET DEFAULT`
- Not supported in InnoDb. **We still need to decide if we need to support this in Vitess.**
- If we decide to support it, then it is very similar to `SET NULL`. Only difference being that instead of setting NULL, we'll set the default value after finding it from our schema tracking data.

## Planning `REPLACE`
- Currently `REPLACE` statements are only supported in unsharded mode.
- To support `REPLACE` we'll plan the DELETE and INSERT and execute a `SELECT` query to decide if a `DELETE` is necessary.

## Important Considerations

- Checking for constraints will lead to cross-shard transactions. If the `INSERT`/`UPDATE`/`DELETE` only touch one row (including CASCADEs), then the cross-shard transaction will only be writing in one shard. All the queries executed in other shards will only be `SELECT... FOR SHARE` statements. So, in case of point updates, we don't have any risk any partial commits/inconsistent state. The write being successful will just be contingent on the `COMMIT` succeeding in the shard having the write. For DMLs that touch more than 1 row, this guarantee can't be provided and the cross-shard transaction will be best effort. It can leave the database in an inconsistent state in case of partial failure during commit phase.
- Users should still try to design schema such that parent and child row lives in the same shard as it will be more optimized for enforcing FK constraints.
- Cyclic foreign key constraints: The design we described above will work fine even when two tables have foreign key references to each other as long as it is not cyclic on the column level. Having a cycle of foreign key constraints where a single column is both a parent and a child in the cycle, then that would mean that no data can be inserted into that column ever! Since to insert data into that column, the data should already exist in that column! We don't plan to worry about this initially, we'll see if we need to add support for this eventually. (It can be done on MySQL. Inserting data has to be done by setting `FOREIGN_KEY_CHECKS` to 0).

```mermaid
flowchart TD
    subgraph This is acceptable
    Table1
    col1[[col1]]
    col2[[col2]]
    Table1 --- col1
    Table1 --- col2

    Table2
    col3[[col3]]
    col4[[col4]]
    Table2 --- col3
    Table2 --- col4

    col1 -.-> col3
    col4 -.-> col2
    end

    subgraph This won't work
    Table3
    col5[[col1]]
    col6[[col2]]
    Table3 --- col5
    Table3 --- col6

    Table4
    col8[[col3]]
    col7[[col4]]
    Table4 --- col8
    Table4 --- col7

    col5 -.-> col8
    col8 -.-> col5
    end
```

## Data structure to store FK constraints in VSchema

Schema tracking will give us a list of foreign key constraints as part of the `SHOW CREATE TABLE` output. We want to store this
output in the `VSchema` struct in a form that gives us the best performance while planning.

We'll need to answer queries of the following sorts -
1. For a given table, find all the foreign key constraints where it is a child. (Needed for planning `INSERT`s)
2. For a given table, find all the foreign key constraints where it is a parent. (Needed for planning `DELETE's and `UPDATE`s)

The `VSchema` struct stores a map of `KeyspaceSchema` for each keyspace. Within a `KeyspaceSchema` we have a map of `Table`.
We'll store the foreign key constraints inside this `Table` struct.

We'll add 2 more fields to the `Table` struct -
```go
type Table struct {
	...
	ParentFKs []*ForeignKeyConstraint
	ChildFKs []*ForeingKeyConstraint
}
```
Essentially, we'll store the list of foreign key constraints where the table is a parent and a list where it is a child.

The `ForeignKeyConstraint` struct would look something like this -
```go
type ForeignKeyConstraint struct {
    ParentTable TableName
    ParentColumns []Columns
    ChildTable TableName
    ChildColumns []Columns
    OnDeleteAction int // This will be an enum
    OnUpdateAction int
}
```

## Performance Improvements
1. We can get MySQL to run the `INSERT`, `UPDATE/DELETE` (with Restrict) checks for us by using `FOREIGN_KEY_CHECKS=1` on the connection for unsharded and single-sharded cases.
2. For CASCADE and SET NULL we want vtgate to split them up into separate queries so that we have binlog entries for them and  vreplication operations like Reshard and MoveTables work.

# Phases
1. Minimal Viable Product
   1. Support for basic `INSERT`, `UPDATE` and `DELETE` statements for unsharded.
   3. `RESTRICT`/`NO ACTION`, `CASCADE`, `SET NULL` mode for foreign key constraints will be supported.
   4. Support for `ON DUPLICATE KEY UPDATE` in INSERTs for unsharded.
   5. Support for `REPLACE` for unsharded.
2. Phase two
   1. `INSERT/UPDATE/DELETE ... (SELECT)` (SELECT subquery in DMLs) for unsharded.
   2. Support for single-shard foreign key constraints.
4. Phase three
   1. Cross-shard support.

# Prerequisites
- [x] `INSERT` planning in Gen4. https://github.com/vitessio/vitess/pull/12934
- [x] Table Schema to be provided through GetSchema vttablet RPC - https://github.com/vitessio/vitess/pull/13197

# Tasks
- [x] Adding a keyspace configuration in the vschema for foreign keys and deprecate the flag on vtgate #13552
~- [ ] On the vttablet, we need FOREIGN_KEY_CHECKS=0.~
- [x] Augment schema-tracking to get foreign keys information and build the data structure in vtgate (in VSchema) #13544 #13657
- [x] Foreign key INSERT planning (shard scoped fk constraint). #13676
- [x] Foreign key UPDATE planning. (shard scoped RESTRICT constraint). #13762
- [x] Foreign key DELETE planning. (shard scoped RESTRICT constraint). #13746
- [x] Add end to end tests for foreign keys with online-ddl, move tables, vreplication and other components of Vitess.
- [x] Benchmarking of writes, for getting the performance impact. https://github.com/vitessio/vitess/pull/14542
- [x] Documentation + Release notes changes
- [x] Foreign key UPDATE planning. (cascade constraint). #13823
- [x] Foreign key DELETE planning. (cascade constraint). #13823

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Foreign Key Support in Vitess #12967

Introduction

Use Case

Scope of the project

Out of Scope

Schema

Design

Basic Design

Planning `INSERTs`

Example

Planning `UPDATEs` and `DELETEs`

`RESTRICT`/`NO ACTION` (default)

Example

`CASCADE`

`SET NULL`

`SET DEFAULT`

Planning `REPLACE`

Important Considerations

Data structure to store FK constraints in VSchema

Performance Improvements

Phases

Prerequisites

Tasks

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

RFC: Foreign Key Support in Vitess #12967

Description

Introduction

Use Case

Scope of the project

Out of Scope

Schema

Design

Basic Design

Planning INSERTs

Example

Planning UPDATEs and DELETEs

RESTRICT/NO ACTION (default)

Example

CASCADE

SET NULL

SET DEFAULT

Planning REPLACE

Important Considerations

Data structure to store FK constraints in VSchema

Performance Improvements

Phases

Prerequisites

Tasks

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Planning `INSERTs`

Planning `UPDATEs` and `DELETEs`

`RESTRICT`/`NO ACTION` (default)

`CASCADE`

`SET NULL`

`SET DEFAULT`

Planning `REPLACE`