Skip to content

Conversation

@dangarmol
Copy link

@dangarmol dangarmol commented Dec 16, 2025

Description

Adds a new observable gauge immich_queues_<queue_name>_<waiting|paused|delayed|active> to track the number of jobs in waiting, paused, delayed or active states.

  • Polls queue statistics every 5 seconds.
  • Fetches queue counts in parallel.
  • Only runs polling loop if job telemetry is enabled.
  • Includes debug logging for failed metric updates.
  • Included *_active metrics in the gauge I created for code consistency and reliability (previously was event based).

Addresses Feature Request #24615.
Related to Feature Request #11069.
Related to Issue #24520 (does not fix, but helps troubleshoot).

How Has This Been Tested?

Ran watch -n 1 'curl -s http://localhost:8081/metrics | grep -E "^immich_queues.*_(waiting|paused|delayed|active)" | grep -v "^#"' on a separate terminal and started all available jobs on a sample library with 1000 files to observe the amount of jobs in different states change over time at the same rate as shown on the /admin/queues page (formerly admin/jobs-status). The metrics refresh every 5 seconds and show the numbers as expected.

Checklist:

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation if applicable
  • I have no unrelated changes in the PR.
  • I have confirmed that any new dependencies are strictly necessary.
  • I have written tests for new code (if applicable)
  • I have followed naming conventions/patterns in the surrounding code
  • All code in src/services/ uses repositories implementations for database calls, filesystem operations, etc.
  • All code in src/repositories/ is pretty basic/simple and does not have any immich specific logic (that belongs in src/services/)

Please describe to which degree, if any, an LLM was used in creating this pull request.

I am not proficient in TypeScript, especially in server code. I've used Claude Sonnet 4.5 to understand the codebase and for code suggestions and Gemini 3 Pro to run a presubmit code review and verify my changes.

I am happy to take any suggestions on best practices for TypeScript or the Immich project. Feel free to edit my PR if you believe it's necessary.

@dangarmol
Copy link
Author

Fixed the CI errors. Now pnpm format && pnpm check passes locally.

@dangarmol dangarmol force-pushed the server/queued_jobs_telemetry branch from 562be00 to 7985d13 Compare December 17, 2025 16:07
@dangarmol dangarmol requested a review from bo0tzz December 17, 2025 16:08
@dangarmol dangarmol force-pushed the server/queued_jobs_telemetry branch from 7985d13 to e5b2362 Compare December 17, 2025 16:54
@dangarmol
Copy link
Author

Hey @bo0tzz, thanks for your help and patience. I've added the new changes and modified the PR description to match the new implementation. You can now test it by exposing port 8081 in the Docker Compose and then running watch -n 1 'curl -s http://localhost:8081/metrics | grep -E "^immich_queues.*_(waiting|paused|delayed|active)" | grep -v "^#"' in a new terminal. Find attached a screenshot (I paused the OCR job to show paused working as well).

Screenshot 2025-12-17 at 17 56 37

Thanks @nicholasbergesen for the suggestion of splitting the metrics 😉

NOTE: I merged the existing implementation for "active" jobs into the one I proposed here for consistency. Also, I feel like it might be more accurate to poll in case an event is missed for whatever reason. This way we don't drag errors over time and get clean metrics every time. If you like small commits and feel like that should be a separate PR let me know, but I think this is related enough that it is ok to make this change together.

Adds a new observable gauge `immich.queues.<queue_name>.queued` to track the number of jobs in waiting, paused, or delayed states.

- Polls queue statistics every 5 seconds.
- Fetches queue counts in parallel.
- Only runs polling loop if job telemetry is enabled.
- Includes debug logging for failed metric updates.
@dangarmol dangarmol force-pushed the server/queued_jobs_telemetry branch from e5b2362 to 62994c3 Compare December 19, 2025 09:52
@dangarmol
Copy link
Author

Hi, the failing test seems to me like a flaky test unrelated to my changes. I've pulled main and rebased, in case this has been fixed there.

Copy link

@nicholasbergesen nicholasbergesen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants