Skip to content

⚡ Optimized comment count subquery to avoid full table scan#26624

Merged
9larsons merged 2 commits intomainfrom
comments-perf-fix
Feb 26, 2026
Merged

⚡ Optimized comment count subquery to avoid full table scan#26624
9larsons merged 2 commits intomainfrom
comments-perf-fix

Conversation

@9larsons
Copy link
Contributor

@9larsons 9larsons commented Feb 26, 2026

ref https://linear.app/ghost/issue/ONC-1509/

The OR condition between parent_id and in_reply_to_id columns in the direct_replies count subquery defeats MySQL index usage, causing a full table scan per comment row. Split into two separate indexed COUNT subqueries summed together, allowing MySQL to use the existing foreign key indexes.

This could easily end up reading several million rows for a single query to get the top 20 comments+replies.

ref https://linear.app/ghost/issue/ONC-1509/

The OR condition between parent_id and in_reply_to_id columns in the
direct_replies count subquery defeats MySQL index usage, causing a full
table scan (~77K rows) per comment row. Split into two separate indexed
COUNT subqueries summed together, allowing MySQL to use the existing
foreign key indexes.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 26, 2026

Walkthrough

The comment direct-replies counting was changed from a single OR-based subquery to two correlated, indexed subqueries: r1 counts direct root replies where parent_id matches and in_reply_to_id IS NULL; r2 counts replies-to-child where in_reply_to_id matches. Both subqueries apply status filtering via duplicated parameterized placeholders and their results are summed into count__direct_replies. A comment notes indexing and avoidance of the OR path. No public exports or method signatures were changed; only the internal countRelations.direct_replies implementation was modified.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: optimizing a comment count subquery to avoid full table scans by splitting an OR-based query into separate indexed subqueries.
Description check ✅ Passed The description is clearly related to the changeset, explaining the performance issue, the solution approach, and referencing a specific issue tracker link.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch comments-perf-fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@9larsons 9larsons added affects:perf ok to merge for me You can merge this on my behalf if you want. labels Feb 26, 2026
@9larsons 9larsons changed the title ⚡ Optimized count__direct_replies query to avoid full table scan ⚡ Optimized comment count subquery to avoid full table scan Feb 26, 2026
@9larsons
Copy link
Contributor Author

EXPLAIN before/after (15K comments, dev dataset)

Before — OR condition defeats index usage, full table scan per comment:

table type key rows Extra
replies ALL NULL 15,369 Range checked for each record

After — two separate indexed subqueries:

table type key rows Extra
r1 ref comments_parent_id_foreign 17 Using where
r2 ref comments_in_reply_to_id_foreign 4 Using where

~730x fewer rows scanned per comment (15,369 → ~21).

For a page load on the busiest post (40 comment objects needing count__direct_replies): 614,760 → 840 total rows scanned. Both /members/api/comments/post/:id/ and /ghost/api/admin/comments/post/:id/ go through the same model code path.

On the reported production dataset (77K comments), this would reduce from ~8.9M row scans per request to ~2,400.

Copy link
Contributor

@EvanHahn EvanHahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love having to work around the query planner...

Co-authored-by: Evan Hahn <evan@ghost.org>
@9larsons
Copy link
Contributor Author

It's a harsh mistress.

@9larsons 9larsons enabled auto-merge (squash) February 26, 2026 20:17
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
ghost/core/core/server/models/comment.js (1)

266-282: Please add/confirm a regression test for count.direct_replies vs count.replies semantics.

Given this logic rewrite, it would be good to lock behavior with a case that includes descendants and mixed statuses, asserting direct-only counts remain distinct from total descendant counts.

Based on learnings: In Ghost comments API, count.replies is the backward-compatible total-descendants alias, while count.direct_replies is direct-only.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ghost/core/core/server/models/comment.js` around lines 266 - 282, Add a
regression test that verifies count.direct_replies returns only immediate child
comments while count.replies (the backward-compatible alias) returns total
descendant counts; create test data with a comment tree (parent -> child ->
grandchild) and mixed statuses (hidden, deleted, published) and assert
count__direct_replies (as produced by the direct_replies query/alias) equals
only the number of immediate children while count.replies equals all descendants
excluding statuses per the excludedCommentStatuses logic used in direct_replies;
place the test near existing comment model/query tests and reference the
direct_replies behavior (count__direct_replies) and the legacy count.replies
alias to lock expected semantics.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@ghost/core/core/server/models/comment.js`:
- Around line 266-282: Add a regression test that verifies count.direct_replies
returns only immediate child comments while count.replies (the
backward-compatible alias) returns total descendant counts; create test data
with a comment tree (parent -> child -> grandchild) and mixed statuses (hidden,
deleted, published) and assert count__direct_replies (as produced by the
direct_replies query/alias) equals only the number of immediate children while
count.replies equals all descendants excluding statuses per the
excludedCommentStatuses logic used in direct_replies; place the test near
existing comment model/query tests and reference the direct_replies behavior
(count__direct_replies) and the legacy count.replies alias to lock expected
semantics.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0991ddb and bef5455.

📒 Files selected for processing (1)
  • ghost/core/core/server/models/comment.js

@9larsons 9larsons merged commit 156a662 into main Feb 26, 2026
32 checks passed
@9larsons 9larsons deleted the comments-perf-fix branch February 26, 2026 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

affects:perf ok to merge for me You can merge this on my behalf if you want.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants