⚡ Optimized comment count subquery to avoid full table scan#26624
⚡ Optimized comment count subquery to avoid full table scan#26624
Conversation
ref https://linear.app/ghost/issue/ONC-1509/ The OR condition between parent_id and in_reply_to_id columns in the direct_replies count subquery defeats MySQL index usage, causing a full table scan (~77K rows) per comment row. Split into two separate indexed COUNT subqueries summed together, allowing MySQL to use the existing foreign key indexes.
WalkthroughThe comment direct-replies counting was changed from a single OR-based subquery to two correlated, indexed subqueries: r1 counts direct root replies where parent_id matches and in_reply_to_id IS NULL; r2 counts replies-to-child where in_reply_to_id matches. Both subqueries apply status filtering via duplicated parameterized placeholders and their results are summed into 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
EXPLAIN before/after (15K comments, dev dataset)Before — OR condition defeats index usage, full table scan per comment:
After — two separate indexed subqueries:
~730x fewer rows scanned per comment (15,369 → ~21). For a page load on the busiest post (40 comment objects needing On the reported production dataset (77K comments), this would reduce from ~8.9M row scans per request to ~2,400. |
EvanHahn
left a comment
There was a problem hiding this comment.
Love having to work around the query planner...
Co-authored-by: Evan Hahn <evan@ghost.org>
|
It's a harsh mistress. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
ghost/core/core/server/models/comment.js (1)
266-282: Please add/confirm a regression test forcount.direct_repliesvscount.repliessemantics.Given this logic rewrite, it would be good to lock behavior with a case that includes descendants and mixed statuses, asserting direct-only counts remain distinct from total descendant counts.
Based on learnings: In Ghost comments API,
count.repliesis the backward-compatible total-descendants alias, whilecount.direct_repliesis direct-only.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ghost/core/core/server/models/comment.js` around lines 266 - 282, Add a regression test that verifies count.direct_replies returns only immediate child comments while count.replies (the backward-compatible alias) returns total descendant counts; create test data with a comment tree (parent -> child -> grandchild) and mixed statuses (hidden, deleted, published) and assert count__direct_replies (as produced by the direct_replies query/alias) equals only the number of immediate children while count.replies equals all descendants excluding statuses per the excludedCommentStatuses logic used in direct_replies; place the test near existing comment model/query tests and reference the direct_replies behavior (count__direct_replies) and the legacy count.replies alias to lock expected semantics.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@ghost/core/core/server/models/comment.js`:
- Around line 266-282: Add a regression test that verifies count.direct_replies
returns only immediate child comments while count.replies (the
backward-compatible alias) returns total descendant counts; create test data
with a comment tree (parent -> child -> grandchild) and mixed statuses (hidden,
deleted, published) and assert count__direct_replies (as produced by the
direct_replies query/alias) equals only the number of immediate children while
count.replies equals all descendants excluding statuses per the
excludedCommentStatuses logic used in direct_replies; place the test near
existing comment model/query tests and reference the direct_replies behavior
(count__direct_replies) and the legacy count.replies alias to lock expected
semantics.
ref https://linear.app/ghost/issue/ONC-1509/
The
ORcondition betweenparent_idandin_reply_to_idcolumns in thedirect_repliescount subquery defeats MySQL index usage, causing a full table scan per comment row. Split into two separate indexed COUNT subqueries summed together, allowing MySQL to use the existing foreign key indexes.This could easily end up reading several million rows for a single query to get the top 20 comments+replies.