Skip to content

feat: implement hash join#156

Open
RichardKnop wants to merge 1 commit intomainfrom
feat/hash-join
Open

feat: implement hash join#156
RichardKnop wants to merge 1 commit intomainfrom
feat/hash-join

Conversation

@RichardKnop
Copy link
Copy Markdown
Owner

Nested loop join (what minisql does today, with index on the join column):

  • Complexity: O(N × log M) with an index, O(N × M) without
  • Best for: small tables, OR when there's a useful index on the inner table's join column
  • Memory: essentially O(1)

Hash join (equi-join only):

  • Complexity: O(N + M) — linear
  • Best for: large tables with no useful join-column index, where you'd otherwise pay O(N × M)
  • Memory: O(min(N, M)) — must materialise the smaller ("build") table into a hash map

So hash join is faster than unindexed nested loop on large tables, but it costs memory. The 64MB threshold doesn't mean "switch back to nested loop because it's better there" — it means "beyond this size, the build-side hash table may not fit in RAM, so in-memory hash join is no longer safe to use." A production DB would do grace hash join (spill to disk) instead; for minisql it's reasonable to just fall back to nested loop.

Corrected strategy for minisql:

Condition Plan
Join column has an index Indexed nested loop (current behaviour) │
│ No index, build side ≤ threshold In-memory hash join — O(N+M) │
│ No index, build side > threshold Nested loop sequential — O(N×M), slow but no memory risk │

The threshold protects memory, not because nested loop is algorithmically better at large scale — it's strictly worse without an index. So the plan should be inverted: use hash join for large-enough tables where a full scan is needed (making it worth the memory cost), and keep nested loop for small tables or indexed joins.

@RichardKnop RichardKnop self-assigned this May 6, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

Code Coverage

Total: 70.2% (threshold: 70%)

Package Coverage
github.com/RichardKnop/minisql 80.7%
github.com/RichardKnop/minisql/e2e_tests [no
github.com/RichardKnop/minisql/internal/minisql 69.5%
github.com/RichardKnop/minisql/internal/parser 84.2%
github.com/RichardKnop/minisql/pkg/bitwise 100.0%
github.com/RichardKnop/minisql/pkg/lrucache 81.7%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant