Improve ROM scan speed with two-phase pipeline and IGDB optimizations by Praeses0 · Pull Request #3165 · rommapp/romm

Praeses0 · 2026-03-23T20:00:41Z

Summary

Split scan into discovery → enrichment phases: ROMs appear in the library within seconds (~14 roms/s), metadata fills in progressively afterward. Previously, each ROM blocked on IGDB API calls before appearing.
Fix IGDB rate limiter lock-during-sleep bug: The token bucket was sleeping while holding the asyncio lock, serializing all concurrent IGDB requests to ~1 req/s instead of the intended 4 req/s.
Reduce IGDB API calls per ROM: Merge the two-phase filtered/unfiltered search into a single API call with local game_type preference. Eliminates redundant expanded searches. Worst case drops from 6 to 3 calls per ROM.
Add per-scan search term dedup cache: Avoids duplicate IGDB API calls when multiple ROMs normalize to the same search term (e.g. regional variants).
Show scan phase in UI: Frontend now displays "Discovering" vs "Fetching metadata" phase indicator in both the scan page and admin task progress panel.
Increase default SCAN_WORKERS from 1 to 10: Better I/O overlap between hash computation, DB writes, and metadata API calls.

Benchmark Results

Tested with 1,469 ROMs (1,100 Game Boy + 368 Game Gear + 1 Switch) using IGDB metadata:

Metric	Before	After	Improvement
ROMs visible in library	~29 min	36 seconds	~50x
End-to-end scan time	~29 min	~9.5 min	~3x
IGDB metadata throughput	0.83 roms/s	2.72 roms/s	3.3x

The hard ceiling is IGDB's 4 req/s rate limit — these optimizations maximize throughput within that constraint.

Test plan

All 714 tests pass (713 existing + 1 new, only pre-existing Docker root-user CHD test fails)
New regression test for HASHES scan type persisting ROM hash fields
New unit tests for IGDB search cache semantics (positive hit cached, failures not cached, cache clear)
Benchmarked on 51-ROM and 1,469-ROM sets
trunk fmt and trunk check pass
Verified scan UI shows phase indicator during both discovery and enrichment

AI Disclosure

This PR was authored with the assistance of Claude Code (AI).

- Increase default SCAN_WORKERS from 1 to 5, allowing multiple ROMs to be scanned concurrently via the existing asyncio semaphore - Add IGDB API rate limiter (4 req/s token bucket) to prevent 429 responses that cause expensive 2-second retry penalties - Parallelize cover, manual, and screenshot downloads using asyncio.gather instead of sequential awaits Benchmark results (51 ROMs, quick scan + IGDB): Before: 0.83 roms/s, 61.57s total After: 2.80 roms/s, 18.23s total (3.4x faster) AI-assisted: Claude Code

Fix rate limiter lock-during-sleep bug that serialized concurrent IGDB requests. The acquire() method was sleeping while holding the asyncio lock, blocking all other coroutines from proceeding. Now sleeps outside the lock, enabling true 4 req/s throughput with concurrent workers. Refactor _search_rom to eliminate redundant IGDB API calls. The old pattern called _search_rom twice (with and without game_type filter), each making up to 3 API calls including an identical expanded search fallback. The new approach chains: search with filter -> search without filter -> expanded search (once), reducing worst-case from 6 to 4 calls per ROM. Increase SCAN_WORKERS default from 5 to 10 for better I/O overlap between hash computation, DB writes, and metadata API calls. Benchmark results (51 ROMs, IGDB metadata): - Before: 0.83 roms/s (61.57s total) - After: 3.74 roms/s (13.63s total) - Speedup: 4.5x AI-assisted: Claude Code

Adds offset parameter to _request() and list_games() methods in IGDBService, enabling paginated queries through the IGDB Apicalypse API. This is standard IGDB API functionality that was missing from the adapter. AI-assisted: Claude Code

…ference Previously _search_rom made two separate API calls: first with a game_type filter (main games only), then without if no match was found. This wastes an API call for every ROM where the filtered search fails. Now a single unfiltered search is made, with results split into main game types and other types locally. Main games are tried first, falling back to DLC/bundle/mod types only if no main game matches. This preserves the same matching priority while saving ~25% of API calls. Also adds: - Per-scan search term dedup cache to avoid duplicate API calls when multiple ROMs normalize to the same search term - game_type field to GAMES_FIELDS for local type filtering Benchmark (1469 ROMs, IGDB metadata): - Before: 2.24 roms/s (prior commit) - After: 2.72 roms/s - Improvement: ~21% Combined improvement over baseline (0.83 roms/s): - 51 ROMs: ~4.5x faster - 1469 ROMs: ~3.3x faster (539s vs ~1770s estimated baseline) AI-assisted: Claude Code

Refactor _identify_rom into two phases: - Phase 1 (discovery): Create DB entries, hash files, save to DB. No metadata API calls. Runs at ~14 roms/s — all ROMs appear in the library within seconds. - Phase 2 (enrichment): Fetch metadata from IGDB/other sources, download covers/screenshots. Rate-limited by IGDB API at ~2.3 roms/s. This improves perceived scan speed significantly: users see their entire ROM library immediately instead of waiting for metadata to load one ROM at a time. Metadata fills in progressively afterward. AI-assisted: Claude Code

- Add scan_phase field ("discovering" / "enriching") to ScanStats so the frontend and benchmark tool can show which phase is active - Improve phase transition log messages with platform name and ROM count - Move IGDB search cache from class variable to instance variable - Clear search cache once at the start of each scan (alongside gamelist cache), not per-platform — cache remains beneficial within a scan for deduplicating regional variant searches AI-assisted: Claude Code

Add visual phase indicator to both the scan page and admin task progress panel, showing "Discovering" (orange) during the fast filesystem discovery phase and "Fetching metadata" (blue) during the IGDB/metadata enrichment phase. Changes: - Extend ScanStats type in scanning store with scan_phase field - Add phase chip with icon to Scan.vue sticky bottom stats bar - Add phase label to ScanTaskProgress.vue admin panel - Add i18n keys for phase labels (en_US, en_GB) AI-assisted: Claude Code

High: Fix HASHES scan type not persisting ROM-level hash fields. The two-phase split returned early before scan_rom() could write crc_hash/md5_hash/sha1_hash/ra_hash/fs_size_bytes to the ROM record. Now _discover_rom explicitly persists these fields via update_rom() before the HASHES early return. Medium: Fix search cache memoizing API failures as "no match". Only positive matches are now cached. Transient IGDB errors (timeouts, 5xx responses) no longer suppress all subsequent ROMs with the same search term for the rest of the scan. Low: Wire i18n into ScanTaskProgress.vue admin panel. The phase labels were hardcoded in English; now they use t("scan.phase-discovering") and t("scan.phase-enriching") like the main scan page. Tests: Add regression test for HASHES scan persisting ROM hash fields, and unit tests for IGDB search cache semantics (positive cache hit, negative result not cached, cache clear). AI-assisted: Claude Code

Copilot

Pull request overview

This PR improves perceived and end-to-end ROM scan performance by splitting scanning into a fast “discovery” phase (DB + hashes) followed by an async “enrichment” phase (metadata + assets), while also optimizing IGDB request throughput and de-duplicating IGDB searches. It also surfaces scan phase in the UI and increases concurrency defaults.

Changes:

Backend: introduce discovery→enrichment scan pipeline, add scan phase to stats, improve IGDB search efficiency + per-scan dedup cache, and fix IGDB rate limiting concurrency.
Frontend: display “Discovering” vs “Fetching metadata” phase indicator in scan UI and admin task progress.
Config/tests: bump default scan worker concurrency and add regression/unit tests for hashes persistence and IGDB search cache behavior.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
frontend/src/views/Scan.vue	Shows scan phase chip in the scan stats footer.
frontend/src/stores/scanning.ts	Extends scan stats type to include `scan_phase` from backend.
frontend/src/locales/en_US/scan.json	Adds English strings for the new scan phase labels.
frontend/src/locales/en_GB/scan.json	Adds British English strings for the new scan phase labels.
frontend/src/components/Settings/Administration/tasks/ScanTaskProgress.vue	Adds scan phase chip to admin task progress UI with i18n labels.
backend/tests/handler/metadata/test_igdb_handler.py	New unit tests for IGDB per-scan search cache semantics.
backend/tests/endpoints/sockets/test_scan.py	New regression test ensuring HASHES scan persists ROM-level hash fields.
backend/handler/metadata/igdb_handler.py	Adds per-scan search dedup cache and reduces IGDB calls by preferring game types locally.
backend/endpoints/sockets/scan.py	Implements two-phase scanning, adds `scan_phase` to stats, and clears IGDB search cache per scan.
backend/config/init.py	Increases default `SCAN_WORKERS` from 1 to 10.
backend/adapters/services/igdb.py	Adds a token-bucket rate limiter and supports `offset` in IGDB requests.

Comments suppressed due to low confidence (1)

backend/endpoints/sockets/scan.py:299

In _discover_rom(), calculate_hashes is derived from SKIP_HASH_CALCULATION even for ScanType.HASHES (line 293). If a user sets SKIP_HASH_CALCULATION=true, a HASHES scan will still proceed but fs_rom_handler.get_rom_files(..., calculate_hashes=False) will not compute hashes, so the “recalculate hashes” scan type can’t fulfill its purpose. Consider forcing calculate_hashes=True when scan_type == ScanType.HASHES, or explicitly disabling HASHES scans when hashes are globally skipped.

    if should_update_files:
        calculate_hashes = not cm.get_config().SKIP_HASH_CALCULATION
        if calculate_hashes:
            log.debug(f"Calculating file hashes for {rom.fs_name}...")

        parsed_rom_files = await fs_rom_handler.get_rom_files(
            rom, calculate_hashes=calculate_hashes
        )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-23T20:37:43Z

frontend/src/locales/en_US/scan.json

+  "phase-discovering": "Discovering",
+  "phase-enriching": "Fetching metadata",


New i18n keys phase-discovering / phase-enriching were added only to en_US (and en_GB). The repo has frontend/src/locales/check_i18n_locales.py, which enforces that every locale’s scan.json includes the same keys as en_US; missing these keys will fail that check (and will show raw key strings at runtime for non-English locales). Please add these keys to all other locale scan.json files (even if the value is temporarily English) to keep locales in sync.

Copilot · 2026-03-23T20:37:44Z

backend/adapters/services/igdb.py

+class IGDBRateLimiter:
+    """Token bucket rate limiter for IGDB API (4 requests/second)."""
+
+    def __init__(self, rate: float = 4.0) -> None:
+        self._rate = rate
+        self._lock = asyncio.Lock()
+        self._tokens = rate
+        self._last_refill = asyncio.get_event_loop().time()
+


The shared module-level _igdb_rate_limiter holds an asyncio.Lock() created on first use. If the process ever runs IGDB requests across different event loops (common in test suites with multiple loops, or in app reload scenarios), reusing an asyncio primitive across loops can raise “attached to a different loop” errors. Consider making the limiter loop-local (e.g., store one per running loop), or avoid asyncio.Lock in a global singleton by using an anyio limiter or a pure-time-based atomic approach. Also prefer asyncio.get_running_loop().time() over get_event_loop() in async code for 3.13+ compatibility.

Merged PR rommapp#3165 (two-phase scan pipeline + IGDB optimizations): - Discovery phase: ROMs appear in library in seconds - Enrichment phase: metadata fills in progressively - Fixed IGDB rate limiter lock-during-sleep bug - Search term dedup cache reduces API calls Additional improvement: - Discovery semaphore 3x higher than enrichment (I/O-bound, no API limit) - SCAN_WORKERS=20 in docker-compose for more parallelism Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Praeses0 added 8 commits March 23, 2026 20:17

Add offset parameter support to IGDB API service

b884074

Adds offset parameter to _request() and list_games() methods in IGDBService, enabling paginated queries through the IGDB Apicalypse API. This is standard IGDB API functionality that was missing from the adapter. AI-assisted: Claude Code

Praeses0 marked this pull request as ready for review March 23, 2026 20:02

gantoine requested review from Copilot and gantoine March 23, 2026 20:30

Copilot started reviewing on behalf of gantoine March 23, 2026 20:31 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

gantoine added the on-hold Pending further research or blocked by another issue label Mar 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve ROM scan speed with two-phase pipeline and IGDB optimizations#3165

Improve ROM scan speed with two-phase pipeline and IGDB optimizations#3165
Praeses0 wants to merge 8 commits intorommapp:masterfrom
Praeses0:improve-scan-speed

Praeses0 commented Mar 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		"phase-discovering": "Discovering",
		"phase-enriching": "Fetching metadata",

Uh oh!

Conversation

Praeses0 commented Mar 23, 2026

Summary

Benchmark Results

Test plan

AI Disclosure

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants