feat(scrapers): run scrapers from the UI with live status, logs, and cancellation by hdmGOAT · Pull Request #7 · hdmGOAT/veent-event-scraper

hdmGOAT · 2026-06-17T03:36:55Z

Summary

UI-triggered scraper runs: Run any registered scraper (or all at once) directly from the Scraper Center via POST to /api/scrapers/<key>/run/ or /api/scrapers/run-all/. Run buttons and a "Run All" header action are wired in the SvelteKit frontend.
Live polling: The Scraper Center polls /api/scrapers/runs/active/ every 2.5 s while any run is queued/running. The spinner stops and history auto-refreshes when all active runs complete.
ScraperRun table: New ScraperRun model (migrations 0010, 0011) tracks every run with status (queued / running / success / failed / cancelled), timestamps, item counts, error_message (full traceback on failure), and pid for process-kill cancellation.
Subprocess-based cancellation: The worker is spawned with start_new_session=True (POSIX setsid). cancel_run() sends SIGTERM to the entire process group via os.killpg, which also kills Playwright's chromium. A SELECT FOR UPDATE race guard prevents overwriting a worker that finished in the same instant.
New API endpoints: GET /api/scrapers/runs/, GET /api/scrapers/runs/active/, GET /api/scrapers/runs/<id>/, POST /api/scrapers/runs/<id>/cancel/ — all standard JSON; existing GET /api/scrapers/ enriched with a last_run field per key.
ScraperRunAdmin: registered as a read-only history view (no add/delete).
84 backend tests passing (63 new: model, runner, API endpoints). svelte-check and build clean.

Auth posture / security follow-up

The three mutation endpoints (/run/, /run-all/, /cancel/) are @csrf_exempt and unauthenticated. The SvelteKit frontend has no Django session — there is no CSRF surface from a browser cross-site attack on this internal-only tool during development. The Vite proxy makes all /api/* calls same-origin in dev. These endpoints must be re-gated behind real auth when user authentication is added (planned Phase 2).

Not yet verified live

Real process-kill smoke test (cancel a running Playwright scraper and confirm chromium exits).
Header date / auto-refresh visually in-browser after a full run completes.

Test plan

cd apps/backend && ./venv/bin/python manage.py test events — all 84 tests pass
cd apps/frontend && pnpm svelte-check — no type errors
cd apps/frontend && pnpm build — build succeeds
Start Django (./venv/bin/python manage.py runserver) + SvelteKit dev (pnpm dev), open Scraper Center, trigger a fast scraper, verify status updates live
Trigger a run, then cancel before it finishes, verify status flips to cancelled
Trigger Run All, verify one row per registered scraper appears in the history table

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

New Features
- Added scraper run management in the Scraper Center: trigger individual runs and “Run All”
- Added active-run tracking with live polling, run history, and detailed run views (counts, timing, and errors)
- Added cancellation for active runs, including status updates and safe handling of races
UI/UX
- Updated status badges (now supports queued/running/success/failed/cancelled with optional category styling)
Tests
- Expanded coverage for the scraper-run lifecycle, API behavior, and cancellation/trigger edge cases

Introduces a ScraperRun model that serves as both the live job table (while queued/running) and durable run history. Statuses: queued, running, success, failed, cancelled. Stores pid for subprocess-kill cancellation, triggered_by FK to auth.User, duration helpers, and extra_counts for per-scraper summary fields. Migrations 0010 (initial ScraperRun) and 0011 (pid + CANCELLED status for subprocess cancellation). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ation runner.py: trigger_scraper_run() guards against duplicate active runs, creates a ScraperRun row, and spawns manage.py run_scraper_job as an independent subprocess (start_new_session=True / POSIX setsid) so the entire process tree — including Playwright's chromium — can be killed wholesale. cancel_run() sends SIGTERM to the process group via os.killpg, and uses SELECT FOR UPDATE inside a transaction to safely race the worker writing SUCCESS/FAILED simultaneously. run_scraper_job management command: executed in the worker subprocess, looks up the ScraperRun row, runs the scraper, and writes status, counts, and error_message back to the DB row. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…erRunAdmin views.py: six new endpoints — POST /api/scrapers/<key>/run/ (trigger), POST /api/scrapers/run-all/, GET /api/scrapers/runs/, GET /api/scrapers/runs/active/, GET /api/scrapers/runs/<id>/, POST /api/scrapers/runs/<id>/cancel/. api_scrapers() enriched with last_run from latest ScraperRun per key (single DISTINCT ON query). All mutation endpoints are @csrf_exempt with a documented security note — re-evaluate when real auth exists (the SvelteKit frontend has no Django session today). urls.py: routes wired with more-specific paths ordered before the api/scrapers/ catch-all. admin.py: ScraperRunAdmin registered as read-only history (no add/delete permissions, all fields readonly). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…PI endpoints New test classes: ScraperRunModelTests, ScraperRunCancelledStatusTests, RunnerSubprocessTests (TransactionTestCase — required for select_for_update compatibility with Postgres), ScraperRunAPITests, and ScraperAdminAccessTests. Covers: model defaults and properties, CANCELLED status and pid field, trigger dedup (409 on active run), cancel race (select_for_update), all six run-job API endpoints, run-all endpoint, ScraperRunAdmin read-only permissions, and api_scrapers last_run enrichment. All mutation tests use mock.patch on subprocess.Popen so no real worker processes are spawned. Total test suite grows to 84 passing tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…polling types.ts: ScraperRun, ScraperLastRun, RunAllResult interfaces; Scraper extended with last_run and active_run fields. api.ts: post() helper with CSRF token extraction; runScraper, runAll, scraperRuns, activeRuns, scraperRun, cancelRun API methods. +page.ts: parallel-loads scrapers + recentRuns on page load. +page.svelte (Svelte 5 runes): per-card Run / Cancel buttons with in-flight state; Run All button in header; 2.5 s polling via setInterval that auto-stops when no active runs remain and refreshes history + per-card last_run lines; spin indicator while active; expandable failure tracebacks; recent-run history table with toggle for full list; formatted duration helper. PageHeader.svelte: optional action snippet slot — passes Run All button in; falls back to the existing date/bell display when no action provided. Badge.svelte: running / queued / success / failed / cancelled status colour mappings for ScraperRun badges. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…mpleted plan all-context.md (rev 3): updated current-state summary to reflect monorepo layout, Neon Postgres, SvelteKit primary UI, 8 scrapers, ScraperRun model; corrected path prefixes from flat-Django to apps/backend/; added context-freshness rule. tests/all-tests.md: updated test count to 64 (was empty placeholder), corrected venv path to apps/backend/venv/bin/python, added TestCase vs. TransactionTestCase guidance for runner tests. Plans: archived completed_scraper-run-jobs_PLAN_17-06-26.md to general-plans/completed/; new active plan run-cancellation_PLAN_17-06-26.md for subprocess cancellation phase. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-06-17T03:37:09Z

📝 Walkthrough

Walkthrough

Adds a ScraperRun Django model with a five-state lifecycle (queued/running/success/failed/cancelled), three migrations (model creation, pid/cancelled fields, unique constraint), a subprocess-based runner that spawns manage.py run_scraper_job in its own process group and cancels via SIGTERM, seven new CSRF-exempt JSON API endpoints, and replaces the static SvelteKit Scraper Center with an interactive polling UI supporting per-scraper run/cancel controls and a recent runs table.

Changes

ScraperRun Feature: Backend to Frontend

Layer / File(s)	Summary
ScraperRun model, migrations, and admin registration `apps/backend/events/models.py`, `apps/backend/events/migrations/0010_scraperrun.py`, `apps/backend/events/migrations/0011_run_cancellation.py`, `apps/backend/events/migrations/0012_scraperrun_unique_active_constraint.py`, `apps/backend/events/admin.py`	`ScraperRun` model defines `Status` TextChoices (queued/running/success/failed/cancelled), lifecycle timestamp fields, created/updated counters, `extra_counts` JSON, `error_message`, `pid`, `triggered_by` FK, and `duration_seconds`/`is_active` properties. Migration 0010 creates the table; migration 0011 adds `pid` and the `cancelled` choice; migration 0012 adds a `UniqueConstraint` on `scraper_key` for active runs. Admin registers `ScraperRunAdmin` as read-only (add/delete permissions disabled).
Subprocess runner and `run_scraper_job` management command `apps/backend/events/runner.py`, `apps/backend/events/management/commands/run_scraper_job.py`	`runner.py` adds `_map_result`, `trigger_scraper_run` (concurrency guard → `Popen(start_new_session=True)` → stores pid), and `cancel_run` (`SELECT FOR UPDATE` → `os.killpg` SIGTERM → race-safe `refresh_from_db`). `run_scraper_job` command takes `--run-id`, validates scraper key, transitions QUEUED→RUNNING→SUCCESS/FAILED with targeted `update_fields`.
Scraper run API views and URL routing `apps/backend/events/views.py`, `apps/backend/events/urls.py`	Adds `_serialize_run` helper and seven CSRF-exempt endpoints: trigger by key, run-all, list runs, list active, run detail, and cancel (returning 404/409 by signal). Extends `api_scrapers` with a Postgres `distinct("scraper_key")` query to inject `last_run` per scraper. URLs insert new specific patterns before the existing `api/scrapers/` catch-all.
Backend tests `apps/backend/events/tests.py`	Adds `ScraperRunModelTests`, `ScraperRunCancelledStatusTests`, `RunnerSubprocessTests` (TransactionTestCase with mocked `Popen`/`os.killpg`, including SUCCESS-race guard), `ScraperJobCommandTests`, `RunnerMappingTests`, `CancelEndpointTests` (200/404/409/405), `RunEndpointTests` (trigger/run-all/public access), and `ApiScrapersLastRunTests` (last_run decoration and cross-key isolation).
Frontend types, API client, Badge, and PageHeader `apps/frontend/src/lib/types.ts`, `apps/frontend/src/lib/api.ts`, `apps/frontend/src/lib/components/Badge.svelte`, `apps/frontend/src/lib/components/PageHeader.svelte`	`types.ts` adds `ScraperRunStatus`, `ScraperRun`, `ScraperLastRun`, `RunAllResult`, and extends `Scraper` with `last_run`/`active_run`. `api.ts` adds `getCsrfToken()`, typed `post()` with `X-CSRFToken`, and run-related API methods. `Badge.svelte` gains `category` prop with hash-based color and status entries for running/queued/success/failed/cancelled. `PageHeader.svelte` adds optional `action` Snippet prop.
Scraper Center interactive page `apps/frontend/src/routes/scrapers/+page.ts`, `apps/frontend/src/routes/scrapers/+page.svelte`	`+page.ts` fetches `scrapers` and `recentRuns` in parallel. `+page.svelte` adds per-scraper Run/Cancel buttons with in-flight tracking, `api.activeRuns()` polling every 2.5s that refreshes cards and history when idle, expandable failure traces, Run All button, and a Recent Runs table with a 20-row limit toggle.

Process Documentation Updates

Layer / File(s)	Summary
Context and test docs `process/context/all-context.md`, `process/context/tests/all-tests.md`	`all-context.md` refreshed to rev 3: documents 10 scrapers, `ScraperRun` model with lifecycle fields, subprocess runner details (process-group SIGTERM), API endpoint table, CSRF-exempt public auth posture, frontend API client with CSRF header, and updated migration/test counts. `all-tests.md` updated with 97-test count, `TestCase` vs `TransactionTestCase` guidance for threaded runner tests, revised command matrix with venv/working-dir assumptions, expanded debugging reference with Neon Postgres behavior, and refreshed Known Gaps.
Plan files `process/general-plans/active/run-cancellation_PLAN_17-06-26.md`, `process/general-plans/completed/completed_scraper-run-jobs_PLAN_17-06-26.md`	Adds the active run-cancellation plan (6-phase checklist covering `CANCELLED` status/`pid` field addition, management command lifecycle, subprocess runner rewrite with atomic cancel, cancel endpoint and URL routing, frontend `cancelRun` API and Cancel button, and test rewrites for subprocess behavior). Archives the completed scraper-run-jobs plan with explicit deviations record (Neon Postgres test harness instead of SQLite, public CSRF-exempt auth posture instead of staff-gated, and run-all scope addition).

Sequence Diagram(s)

sequenceDiagram
    participant Page as +page.svelte
    participant APIClient as api.ts
    participant Django as Django JSON API
    participant Runner as runner.py
    participant Subprocess as run_scraper_job subprocess

    Page->>Django: load() — api.scrapers + api.scraperRuns (parallel)
    Django-->>Page: scrapers[], recentRuns[]

    loop poll api.activeRuns() every 2.5s while active runs exist
        Page->>Django: GET /api/scrapers/runs/active/
        Django-->>Page: active ScraperRun[]
        alt no active runs left
            Page->>Django: GET /api/scrapers/runs/ + /api/scrapers/
            Django-->>Page: refreshed recentRuns + scrapers with last_run
        end
    end

    Page->>APIClient: api.runScraper(key)
    APIClient->>Django: POST /api/scrapers/key/run/ (X-CSRFToken)
    Django->>Runner: trigger_scraper_run(key)
    Runner->>Django: create ScraperRun QUEUED + Popen(start_new_session=True)
    Runner->>Django: save pid on row
    Django-->>Page: {id, status: "queued"}

    Subprocess->>Django: set RUNNING + started_at
    Subprocess->>Subprocess: SCRAPERS[key]().run()
    Subprocess->>Django: set SUCCESS/FAILED + finished_at

    Page->>APIClient: api.cancelRun(id)
    APIClient->>Django: POST /api/scrapers/runs/id/cancel/
    Django->>Runner: cancel_run(id) SELECT FOR UPDATE
    Runner->>Subprocess: os.killpg(os.getpgid(pid), SIGTERM)
    Runner->>Django: refresh_from_db → set CANCELLED + finished_at
    Django-->>Page: ScraperRun {status: "cancelled"}

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

hdmGOAT/veent-event-scraper#3: This PR extends the scraper JSON API surface in apps/backend/events/views.py and apps/backend/events/urls.py—building on the prior API wiring by adding subprocess-backed run/cancel/list/detail endpoints that depend on the ScraperRun model.

Poem

🐇 Hop hop, a scraper runs free,
Spawned in a subprocess, PID stored with glee!
SIGTERM sent when Cancel is pressed,
The DB row refreshed—no race distressed.
Polling the frontend, badges aglow,
Run All, or just one—watch the statuses flow! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 8.86% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly and specifically describes the main feature addition: enabling scraper execution from the UI with live status tracking and cancellation.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/run-scrapers-and-logs

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…ers-and-logs # Conflicts: # apps/backend/events/tests.py # apps/backend/events/views.py # apps/frontend/src/routes/scrapers/+page.svelte # process/context/all-context.md # process/context/tests/all-tests.md

coderabbitai

Actionable comments posted: 10

🧹 Nitpick comments (2)

apps/frontend/src/lib/types.ts (1)

119-121: ⚡ Quick win

Tighten RunAllResult status typing to the known status union.

Line 120 currently allows any string, but this payload is constrained to scraper run statuses. Using ScraperRunStatus preserves compile-time contract checks and prevents accidental UI branch drift.

Proposed diff

 export interface RunAllResult {
-	created: { key: string; id: number; status: string }[];
+	created: { key: string; id: number; status: ScraperRunStatus }[];
 	skipped: string[];
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/frontend/src/lib/types.ts` around lines 119 - 121, In the RunAllResult
interface, the status field within the created array is typed as a plain string,
which allows any value and loses compile-time validation. Change the status
field type from string to ScraperRunStatus to enforce that only valid scraper
run statuses are used, ensuring the UI contract is correctly enforced at compile
time and preventing accidental branch drift.

apps/frontend/src/lib/api.ts (1)

78-78: ⚡ Quick win

Use ScraperRunStatus for runScraper response typing.

Line 78 uses a broad string for status; narrowing it to ScraperRunStatus keeps the client API contract strict and consistent with the rest of the scraper-run types.

Proposed diff

 import type {
 	CategoryCount,
 	EventRow,
 	Organizer,
 	OrganizerDetail,
 	Paginated,
 	RunAllResult,
 	Scraper,
+	ScraperRunStatus,
 	ScraperRun,
 	SourceCount,
 	Stats,
 	VenueRow
 } from './types';
@@
-	runScraper: (key: string) => post<{ id: number; status: string }>(`/scrapers/${key}/run/`),
+	runScraper: (key: string) =>
+		post<{ id: number; status: ScraperRunStatus }>(`/scrapers/${key}/run/`),

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/frontend/src/lib/api.ts` at line 78, The runScraper function in the
api.ts file uses a generic string type for the status field in its response
type, which reduces type safety. Replace the string type with ScraperRunStatus
in the response type definition of the runScraper function (the post generic
parameter) to enforce a strict API contract and maintain consistency with other
scraper-run related types. Ensure ScraperRunStatus is imported if it is not
already available in the file.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@apps/backend/events/admin.py`:
- Around line 76-80: The `pid` field is missing from the readonly_fields tuple
in the admin configuration. Add `pid` to the readonly_fields tuple to prevent
manual editing of this field in the Django admin interface. This is critical
because the `cancel_run` function relies on the `pid` field value to send
SIGTERM signals to the correct process group, and allowing manual edits could
cause the signal to be sent to the wrong target. Include `pid` in the
readonly_fields tuple alongside the other read-only fields like scraper_key,
status, started_at, and finished_at.

In `@apps/backend/events/management/commands/run_scraper_job.py`:
- Around line 47-49: The status transition to RUNNING in the ScraperRun update
is unconditional and vulnerable to a race condition where a cancelled run can be
resurrected. Instead of directly setting run.status to ScraperRun.Status.RUNNING
and calling save(), use Django's atomic conditional update pattern (such as
filter().update() with a queryset condition) to ensure the status transitions
from QUEUED to RUNNING only if the current status is still QUEUED. This mirrors
the defensive approach already used in cancel_run with select_for_update() and
refresh_from_db(), preventing a worker process from overwriting a terminal
CANCELLED state that was set by an admin between the time the job was queued and
when the worker actually started execution.

In `@apps/backend/events/runner.py`:
- Around line 57-65: The code at lines 57-64 has a race condition where two
concurrent requests can both pass the exists() check and create duplicate active
ScraperRun entries for the same scraper_key. Add a partial unique constraint to
the ScraperRun model database to enforce that only one active run (with status
QUEUED or RUNNING) can exist per scraper_key, then wrap the
ScraperRun.objects.create() call in a try-except block to catch the database
integrity constraint violation and return the same response (None, True) that
would be returned if active_exists was true, ensuring the race condition is
prevented at the database level.

In `@apps/backend/events/tests.py`:
- Around line 757-765: Before releasing to production, verify that the three
public mutation endpoints (api_scraper_trigger, api_scraper_run_all, and
api_scraper_run_cancel) are properly gated at the infrastructure level since
they lack application-level authentication. Work with the DevOps and
Infrastructure team to confirm that production deployments have reverse proxy
rules, firewall restrictions, or network policies in place that restrict access
to these endpoints to only internal networks or trusted sources. Document the
deployment gates used (firewall rules, reverse proxy configuration, load
balancer restrictions) and ensure this security configuration is enforced
consistently across all production environments before the application is
released.

In `@apps/backend/events/views.py`:
- Around line 445-551: Add authentication and authorization checks to all
scraper-run endpoints to enforce staff-only access. The endpoints
api_scraper_trigger, api_scraper_run_all, api_scraper_runs,
api_scraper_runs_active, api_scraper_run_detail, and api_scraper_run_cancel
currently lack any access control despite being documented as staff-only. Add
either a decorator-based approach (such as Django's staff_member_required or
login_required with manual staff check) or implement explicit authentication
logic at the start of each function to verify the user is authenticated and has
staff privileges, returning an appropriate 403 Forbidden response if
authorization fails.
- Around line 513-516: The limit parameter parsing in the try block does not
enforce a minimum bound, allowing zero and negative values to pass through which
causes problematic queryset slicing behavior. Modify the limit assignment to
clamp the value to the range [1, 200] by using max() with a minimum value of 1
in addition to the existing min() call that enforces the maximum of 200,
ensuring the API behaves deterministically regardless of the input value.

In `@apps/frontend/src/routes/scrapers/`+page.svelte:
- Around line 176-177: The run variable at line 176 is only derived from
runningMap which contains only active runs (queued/running status), making the
success and failed status branches at lines 219 and 226 unreachable dead code.
Update the const run assignment to first check runningMap for an active run, and
if no active run exists, fall back to the latest run from an available data
source (such as a latest run property or API response). This will allow the
success and failed card details to be displayed when runs complete.
- Around line 123-125: The duration formatter has a bug where rounding the
remainder seconds can produce 60 as the seconds value, creating invalid time
strings like "1m 60s". Fix this by first normalizing the input seconds to a
whole number using Math.round before splitting into minutes and seconds. This
ensures the remainder operation always produces a value between 0 and 59,
avoiding the invalid 60-second bucket.

In `@process/context/all-context.md`:
- Around line 303-317: The ScraperRun model documentation section is incomplete
and needs two updates. First, in the status field documentation where it
currently lists the TextChoices as queued/running/success/failed, add CANCELLED
to this list to reflect the complete set of status options. Second, add a new
line documenting the pid field (PositiveIntegerField, null=True, blank=True)
which stores the OS PID of the worker subprocess and is used for cancellation
purposes. Insert the pid field documentation after the triggered_by field entry
to maintain logical grouping of model fields.
- Around line 376-394: The Run-Jobs Subsystem section in the documentation
incorrectly describes the system as thread-based using threading.Thread when the
actual implementation uses subprocess.Popen with start_new_session=True. Revise
the section starting at "The system is thread-based" to accurately document that
the system spawns independent process groups via subprocess.Popen, stores the
pid on the ScraperRun row, uses os.killpg for process group termination on
cancellation, and is controlled by the run_scraper_job management command (not
the _run_scraper thread function). Remove references to threading.Thread,
django.db.connection.close() in finally blocks, and the thread-based execution
model, replacing them with subprocess-based process group management details.

---

Nitpick comments:
In `@apps/frontend/src/lib/api.ts`:
- Line 78: The runScraper function in the api.ts file uses a generic string type
for the status field in its response type, which reduces type safety. Replace
the string type with ScraperRunStatus in the response type definition of the
runScraper function (the post generic parameter) to enforce a strict API
contract and maintain consistency with other scraper-run related types. Ensure
ScraperRunStatus is imported if it is not already available in the file.

In `@apps/frontend/src/lib/types.ts`:
- Around line 119-121: In the RunAllResult interface, the status field within
the created array is typed as a plain string, which allows any value and loses
compile-time validation. Change the status field type from string to
ScraperRunStatus to enforce that only valid scraper run statuses are used,
ensuring the UI contract is correctly enforced at compile time and preventing
accidental branch drift.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: a78166f2-eeb1-4459-8e10-ed407292d798

📥 Commits

Reviewing files that changed from the base of the PR and between 417bade and 0062a3d.

📒 Files selected for processing (19)

apps/backend/events/admin.py
apps/backend/events/management/commands/run_scraper_job.py
apps/backend/events/migrations/0010_scraperrun.py
apps/backend/events/migrations/0011_run_cancellation.py
apps/backend/events/models.py
apps/backend/events/runner.py
apps/backend/events/tests.py
apps/backend/events/urls.py
apps/backend/events/views.py
apps/frontend/src/lib/api.ts
apps/frontend/src/lib/components/Badge.svelte
apps/frontend/src/lib/components/PageHeader.svelte
apps/frontend/src/lib/types.ts
apps/frontend/src/routes/scrapers/+page.svelte
apps/frontend/src/routes/scrapers/+page.ts
process/context/all-context.md
process/context/tests/all-tests.md
process/general-plans/active/run-cancellation_PLAN_17-06-26.md
process/general-plans/completed/completed_scraper-run-jobs_PLAN_17-06-26.md

coderabbitai · 2026-06-17T05:14:17Z

+    @mock.patch("events.views.cancel_run")
+    def test_cancel_is_public(self, mock_cancel):
+        run = ScraperRun.objects.create(
+            scraper_key="myruntime", status=ScraperRun.Status.CANCELLED
+        )
+        mock_cancel.return_value = (run, "ok")
+        resp = self.client.post(f"/api/scrapers/runs/{run.id}/cancel/")
+        self.assertEqual(resp.status_code, 200)
+


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Confirm public mutation behavior is intentionally encoded in tests and views. rg -n "test_.*is_public|unauthenticated|/api/scrapers/.*/run/|/api/scrapers/run-all/|/cancel/" apps/backend/events/tests.py rg -n "def api_scraper_trigger|def api_scraper_run_all|def api_scraper_run_cancel|csrf_exempt|unauthenticated intentionally" apps/backend/events/views.py # Look for explicit environment gates controlling scraper mutation exposure. rg -n "SCRAPER|INTERNAL|ALLOW_UNAUTH|FEATURE_FLAG|AUTH" -g "*settings*.py"

Repository: hdmGOAT/veent-event-scraper

Length of output: 2135

🏁 Script executed:

# Get full context of the SECURITY NOTE in views.py sed -n '469,490p' apps/backend/events/views.py sed -n '492,520p' apps/backend/events/views.py sed -n '538,560p' apps/backend/events/views.py

Repository: hdmGOAT/veent-event-scraper

Length of output: 2883

🏁 Script executed:

# Check for any middleware or decorators that gate these endpoints rg -n "middleware|permission|is_authenticated|request.user|INTERNAL|whitelist|allowed_hosts" apps/backend/events/views.py | head -30

Repository: hdmGOAT/veent-event-scraper

Length of output: 229

🏁 Script executed:

# Look for deployment configuration files (docker, env, nginx, etc.) find . -type f \( -name "*.env*" -o -name "docker*" -o -name "Dockerfile" -o -name "nginx*" -o -name ".github" \) 2>/dev/null | head -20

Repository: hdmGOAT/veent-event-scraper

Length of output: 127

🏁 Script executed:

# Look for URL routing and middleware application find apps/backend -type f \( -name "urls.py" -o -name "routing.py" \) | xargs cat -n 2>/dev/null | grep -A 5 -B 5 "api_scraper"

Repository: hdmGOAT/veent-event-scraper

Length of output: 1421

🏁 Script executed:

# Check for any nginx, Docker, or deployment config files find . -type f \( -name "nginx.conf" -o -name "Dockerfile*" -o -name ".dockerignore" -o -name "docker-compose*" -o -name "deployment*" -o -name "k8s*" \) 2>/dev/null

Repository: hdmGOAT/veent-event-scraper

Length of output: 53

🏁 Script executed:

# Look for any middleware that might gate these paths rg -n "class.*Middleware|def.*middleware|MIDDLEWARE" apps/backend --type py | head -20

Repository: hdmGOAT/veent-event-scraper

Length of output: 121

🏁 Script executed:

# Check the MIDDLEWARE setting to see if there's any authentication applied sed -n '73,90p' apps/backend/config/settings.py

Repository: hdmGOAT/veent-event-scraper

Length of output: 694

🏁 Script executed:

# Check if there are any auth checks in the base URL config or middleware application rg -n "login_required|permission_required|IsAuthenticated|BasicAuth" apps/backend/config

Repository: hdmGOAT/veent-event-scraper

Length of output: 53

Confirm production deployment gates these public mutation endpoints before release.

The endpoints are intentionally unauthenticated in Phase 1 design (documented in security notes), but no enforcement mechanism is visible in code or infrastructure. The tests correctly lock in the 200 response, but before release you must verify these endpoints are restricted to internal networks or protected by reverse proxy/firewall rules in production, since no application-level gating exists.

Applies to: api_scraper_trigger (lines 798–808), api_scraper_run_all (lines 896–903), and api_scraper_run_cancel (lines 757–765).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/backend/events/tests.py` around lines 757 - 765, Before releasing to production, verify that the three public mutation endpoints (api_scraper_trigger, api_scraper_run_all, and api_scraper_run_cancel) are properly gated at the infrastructure level since they lack application-level authentication. Work with the DevOps and Infrastructure team to confirm that production deployments have reverse proxy rules, firewall restrictions, or network policies in place that restrict access to these endpoints to only internal networks or trusted sources. Document the deployment gates used (firewall rules, reverse proxy configuration, load balancer restrictions) and ensure this security configuration is enforced consistently across all production environments before the application is released.

coderabbitai · 2026-06-17T05:14:17Z

+# ---------------------------------------------------------------------------
+# Scraper run jobs — trigger runs from the UI and poll/list their status.
+# All endpoints are staff-only, mirroring the /review/ convention.
+# ---------------------------------------------------------------------------
+
+
+def _serialize_run(run):
+    """Serialise a ScraperRun to the standard dict shape used by all run endpoints."""
+    return {
+        "id": run.id,
+        "scraper_key": run.scraper_key,
+        "status": run.status,
+        "started_at": run.started_at.isoformat() if run.started_at else None,
+        "finished_at": run.finished_at.isoformat() if run.finished_at else None,
+        "created_count": run.created_count,
+        "updated_count": run.updated_count,
+        "extra_counts": run.extra_counts,
+        "error_message": run.error_message or None,
+        "triggered_by": run.triggered_by.username if run.triggered_by_id else None,
+        "created_at": run.created_at.isoformat(),
+        "duration_seconds": run.duration_seconds,
+    }
+
+
+@csrf_exempt
+@require_POST
+def api_scraper_trigger(request, key):
+    # SECURITY NOTE: This endpoint is unauthenticated intentionally. The
+    # SvelteKit frontend has no Django session (the "Admin User" in the sidebar
+    # is static UI, not a real session), and the Vite proxy makes all /api/*
+    # calls same-origin in dev, so there is no CSRF surface from a browser
+    # cross-site attack on this internal-only tool. @csrf_exempt is consistent
+    # with all other JSON API endpoints in this file. Re-evaluate when real
+    # auth is added (Phase 2 roadmap).
+    from .scrapers import SCRAPERS
+
+    if key not in SCRAPERS:
+        return JsonResponse({"error": "Unknown scraper key"}, status=404)
+
+    triggered_by = request.user if request.user.is_authenticated else None
+    run, already_active = trigger_scraper_run(key, triggered_by=triggered_by)
+    if already_active:
+        return JsonResponse({"error": "Scraper already running"}, status=409)
+
+    return JsonResponse({"id": run.id, "status": run.status}, status=200)
+
+
+@csrf_exempt
+@require_POST
+def api_scraper_run_all(request):
+    # SECURITY NOTE: Same posture as api_scraper_trigger above — unauthenticated
+    # intentionally for the same reasons. Revisit when real auth is added.
+    from .scrapers import SCRAPERS
+
+    triggered_by = request.user if request.user.is_authenticated else None
+    created = []
+    skipped = []
+    for key in SCRAPERS:
+        run, already_active = trigger_scraper_run(key, triggered_by=triggered_by)
+        if already_active:
+            skipped.append(key)
+        else:
+            created.append({"key": key, "id": run.id, "status": run.status})
+
+    return JsonResponse({"created": created, "skipped": skipped}, status=200)
+
+
+def api_scraper_runs(request):
+    try:
+        limit = min(int(request.GET.get("limit", 50)), 200)
+    except ValueError:
+        limit = 50
+    runs = (
+        ScraperRun.objects.select_related("triggered_by")
+        .order_by("-created_at")[:limit]
+    )
+    return JsonResponse([_serialize_run(r) for r in runs], safe=False)
+
+
+def api_scraper_runs_active(request):
+    runs = ScraperRun.objects.filter(
+        status__in=[ScraperRun.Status.QUEUED, ScraperRun.Status.RUNNING]
+    ).select_related("triggered_by")
+    return JsonResponse([_serialize_run(r) for r in runs], safe=False)
+
+
+def api_scraper_run_detail(request, run_id):
+    run = get_object_or_404(
+        ScraperRun.objects.select_related("triggered_by"), id=run_id
+    )
+    return JsonResponse(_serialize_run(run))
+
+
+@csrf_exempt
+@require_POST
+def api_scraper_run_cancel(request, run_id):
+    # SECURITY NOTE: same posture as api_scraper_trigger — unauthenticated
+    # intentionally (no Django session from SvelteKit). Re-evaluate when real
+    # auth is added.
+    run, signal = cancel_run(run_id)
+    if signal == "not_found":
+        return JsonResponse({"error": "Run not found"}, status=404)
+    if signal == "not_active":
+        return JsonResponse(
+            {"error": "Run is not active", "run": _serialize_run(run)}, status=409
+        )
+    return JsonResponse(_serialize_run(run), status=200)


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Lock down scraper-run endpoints before release.

This block is documented as staff-only, but the trigger/run-all/cancel routes are currently unauthenticated and CSRF-exempt, which lets any reachable caller start/stop scraper jobs. The same unauthenticated run APIs also expose run metadata (including failure details), so this is an immediate control-plane and information-exposure risk. Add a real authorization gate across all scraper-run endpoints (staff session or a dedicated signed API credential) before shipping.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/backend/events/views.py` around lines 445 - 551, Add authentication and authorization checks to all scraper-run endpoints to enforce staff-only access. The endpoints api_scraper_trigger, api_scraper_run_all, api_scraper_runs, api_scraper_runs_active, api_scraper_run_detail, and api_scraper_run_cancel currently lack any access control despite being documented as staff-only. Add either a decorator-based approach (such as Django's staff_member_required or login_required with manual staff check) or implement explicit authentication logic at the start of each function to verify the user is authenticated and has staff privileges, returning an appropriate 403 Forbidden response if authorization fails.

- admin: add pid to ScraperRunAdmin readonly_fields - run_scraper_job: guard QUEUED→RUNNING with conditional filter().update() so a cancelled run is never resurrected by a late-starting worker - models/runner: add unique_active_scraper_run partial DB constraint (scraper_key unique where status IN queued/running) + catch IntegrityError in trigger_scraper_run to close the exists()+create() race window (0012 migration) - views: clamp api_scraper_runs limit to [1, 200] - page.svelte: fix formatDuration 60s bucket bug; fall back to recentRuns for per-card success/failed display when no active run is in runningMap - api.ts/types.ts: type scraper run status fields as ScraperRunStatus - tests: update RunEndpointTests fixtures to respect unique_active_scraper_run - all-context.md: correct thread→subprocess model, add pid field and CANCELLED status to ScraperRun docs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@process/context/all-context.md`:
- Around line 642-644: The phrase "not a local Neon PostgreSQL DB" in the test
database documentation is contradictory and unclear. Revise this text to
explicitly state that the test database uses SQLite (not Neon PostgreSQL) as a
contrast point, so readers understand clearly which database technology is being
used for testing. Replace the confusing negative phrasing with a direct,
positive statement about the actual database choice to make the guidance
unambiguous.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 37538897-ef1f-42d0-a1b3-494b5f0e5841

📥 Commits

Reviewing files that changed from the base of the PR and between 0062a3d and 7241faa.

📒 Files selected for processing (11)

apps/backend/events/admin.py
apps/backend/events/management/commands/run_scraper_job.py
apps/backend/events/migrations/0012_scraperrun_unique_active_constraint.py
apps/backend/events/models.py
apps/backend/events/runner.py
apps/backend/events/tests.py
apps/backend/events/views.py
apps/frontend/src/lib/api.ts
apps/frontend/src/lib/types.ts
apps/frontend/src/routes/scrapers/+page.svelte
process/context/all-context.md

✅ Files skipped from review due to trivial changes (1)

apps/frontend/src/lib/types.ts

🚧 Files skipped from review as they are similar to previous changes (4)

apps/backend/events/management/commands/run_scraper_job.py
apps/backend/events/runner.py
apps/backend/events/tests.py
apps/frontend/src/routes/scrapers/+page.svelte

…ers-and-logs

coderabbitai

🧹 Nitpick comments (1)

apps/backend/events/models.py (1)
96-96: 💤 Low value

Replace EN DASH with ASCII hyphen in comment.

The comment contains a Unicode en-dash character (–) instead of an ASCII hyphen-minus (-) in "1–2 labels". While functionally harmless, this can cause issues with some text editors or tools that don't handle Unicode well.
📝 Suggested fix
-    # AI-assigned canonical categories (1–2 labels from CANONICAL_CATEGORIES).
+    # AI-assigned canonical categories (1-2 labels from CANONICAL_CATEGORIES).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/backend/events/models.py` at line 96, In the comment starting with
"AI-assigned canonical categories", replace the Unicode en-dash character (–)
with a standard ASCII hyphen (-) in the text "1–2 labels". Change it from "1–2
labels" to "1-2 labels" to ensure compatibility with text editors and tools that
don't handle Unicode characters well.
Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@apps/backend/events/models.py`:
- Line 96: In the comment starting with "AI-assigned canonical categories",
replace the Unicode en-dash character (–) with a standard ASCII hyphen (-) in
the text "1–2 labels". Change it from "1–2 labels" to "1-2 labels" to ensure
compatibility with text editors and tools that don't handle Unicode characters
well.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 98d48337-102a-483f-9cfa-e8b89bb55e32

📥 Commits

Reviewing files that changed from the base of the PR and between 7241faa and 4f33e1d.

📒 Files selected for processing (4)

apps/backend/events/models.py
apps/backend/events/tests.py
apps/backend/events/views.py
apps/frontend/src/lib/types.ts

✅ Files skipped from review due to trivial changes (1)

apps/frontend/src/lib/types.ts

🚧 Files skipped from review as they are similar to previous changes (1)

apps/backend/events/views.py

…ers-and-logs

hdmGOAT and others added 6 commits June 17, 2026 11:34

hdmGOAT changed the base branch from main to development June 17, 2026 03:39

Merge remote-tracking branch 'origin/development' into feat/run-scrap…

0062a3d

…ers-and-logs # Conflicts: # apps/backend/events/tests.py # apps/backend/events/views.py # apps/frontend/src/routes/scrapers/+page.svelte # process/context/all-context.md # process/context/tests/all-tests.md

coderabbitai Bot reviewed Jun 17, 2026

View reviewed changes

Comment thread process/context/all-context.md

Merge remote-tracking branch 'origin/development' into feat/run-scrap…

4f33e1d

…ers-and-logs

coderabbitai Bot reviewed Jun 17, 2026

View reviewed changes

Merge remote-tracking branch 'origin/development' into feat/run-scrap…

abfddc4

…ers-and-logs

hdmGOAT merged commit 52c4867 into development Jun 17, 2026
1 check passed

coderabbitai Bot mentioned this pull request Jun 17, 2026

feat: AI-powered agent_categories for standardized event classification #8

Merged

4 tasks

Conversation

hdmGOAT commented Jun 17, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Auth posture / security follow-up

Not yet verified live

Test plan

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hdmGOAT commented Jun 17, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 17, 2026 •

edited

Loading