Skip to content

Make queries friendly to prepared statements cache #317

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

dominiquelefevre
Copy link

This PR fixes two scenarios where GORM quickly overflows pgx's cache of prepared statements and degrades into a mode where every DB request has to do two round-trips for Prepare() + ExecPrepared() instead of just ExecPrepared().

See commit messages for details about the scenarios and their fixes.

Postgres has two extensions to ANSI SQL. One can write `col = ANY(?)`
instead of `col IN (?)`, and `col != ALL(?)` instead of `col NOT IN (?)`.

Semantically, the two forms are the same. The difference between them
lies in their interaction with prepared statements:

 1. A condition .Where("col IN (?)", values) expands to `col IN ($1,$2,...)`
    where the list has len(values) items. Every value[i] is sent to postgres
    as a separate query argument.
 2. A condition .Where("col = ANY(?)", values) expands to `col = ANY($1)`, and
    values are sent to postgres as exactly one query argument (of an array type).

The option 1 does not iteract well with prepared statements. It produces
a different query for different len(values). which overflows the cache
of prepared statements quickly. Option 2, on the other hand, needs
only one prepared statement for any len(values).

This patch does not seek to optimise all instanceof of `col IN (?)`
because GORM does not really parse SQL clauses. It handles two cases
that I believe to be the most frequent:
1. .Where("col IN (?)", values) and .Where("col NOT IN (?)", values),
2. .Where(map[string]any{"col": values}).
An INSERT INTO that creates N rows is expanded into the following
query:

  INSERT INTO table(...) VALUES ($1,$2,...), ($K,$K+1,...),...
                                ------------------------------
                                           N tuples

This way, we get a unique query for every value of N. This overflows
the cache of prepared statements quickly. Just do not use prepared
statements when doing bulk inserts.

Postgres has a better solution with its COPY protocol. It allows us
to use one prepared statement for N = 1 to insert multiple values.
Unfortunately, I have found no good way to use the COPY protocol
when an insert has an ON CONFLICT clause. Let us stick with a simpler
solution for now.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant