feat(scrapers): run scrapers from the UI with live status, logs, and cancellation#7
Conversation
Introduces a ScraperRun model that serves as both the live job table (while queued/running) and durable run history. Statuses: queued, running, success, failed, cancelled. Stores pid for subprocess-kill cancellation, triggered_by FK to auth.User, duration helpers, and extra_counts for per-scraper summary fields. Migrations 0010 (initial ScraperRun) and 0011 (pid + CANCELLED status for subprocess cancellation). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ation runner.py: trigger_scraper_run() guards against duplicate active runs, creates a ScraperRun row, and spawns manage.py run_scraper_job as an independent subprocess (start_new_session=True / POSIX setsid) so the entire process tree — including Playwright's chromium — can be killed wholesale. cancel_run() sends SIGTERM to the process group via os.killpg, and uses SELECT FOR UPDATE inside a transaction to safely race the worker writing SUCCESS/FAILED simultaneously. run_scraper_job management command: executed in the worker subprocess, looks up the ScraperRun row, runs the scraper, and writes status, counts, and error_message back to the DB row. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…erRunAdmin views.py: six new endpoints — POST /api/scrapers/<key>/run/ (trigger), POST /api/scrapers/run-all/, GET /api/scrapers/runs/, GET /api/scrapers/runs/active/, GET /api/scrapers/runs/<id>/, POST /api/scrapers/runs/<id>/cancel/. api_scrapers() enriched with last_run from latest ScraperRun per key (single DISTINCT ON query). All mutation endpoints are @csrf_exempt with a documented security note — re-evaluate when real auth exists (the SvelteKit frontend has no Django session today). urls.py: routes wired with more-specific paths ordered before the api/scrapers/ catch-all. admin.py: ScraperRunAdmin registered as read-only history (no add/delete permissions, all fields readonly). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…PI endpoints New test classes: ScraperRunModelTests, ScraperRunCancelledStatusTests, RunnerSubprocessTests (TransactionTestCase — required for select_for_update compatibility with Postgres), ScraperRunAPITests, and ScraperAdminAccessTests. Covers: model defaults and properties, CANCELLED status and pid field, trigger dedup (409 on active run), cancel race (select_for_update), all six run-job API endpoints, run-all endpoint, ScraperRunAdmin read-only permissions, and api_scrapers last_run enrichment. All mutation tests use mock.patch on subprocess.Popen so no real worker processes are spawned. Total test suite grows to 84 passing tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…polling types.ts: ScraperRun, ScraperLastRun, RunAllResult interfaces; Scraper extended with last_run and active_run fields. api.ts: post() helper with CSRF token extraction; runScraper, runAll, scraperRuns, activeRuns, scraperRun, cancelRun API methods. +page.ts: parallel-loads scrapers + recentRuns on page load. +page.svelte (Svelte 5 runes): per-card Run / Cancel buttons with in-flight state; Run All button in header; 2.5 s polling via setInterval that auto-stops when no active runs remain and refreshes history + per-card last_run lines; spin indicator while active; expandable failure tracebacks; recent-run history table with toggle for full list; formatted duration helper. PageHeader.svelte: optional action snippet slot — passes Run All button in; falls back to the existing date/bell display when no action provided. Badge.svelte: running / queued / success / failed / cancelled status colour mappings for ScraperRun badges. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…mpleted plan all-context.md (rev 3): updated current-state summary to reflect monorepo layout, Neon Postgres, SvelteKit primary UI, 8 scrapers, ScraperRun model; corrected path prefixes from flat-Django to apps/backend/; added context-freshness rule. tests/all-tests.md: updated test count to 64 (was empty placeholder), corrected venv path to apps/backend/venv/bin/python, added TestCase vs. TransactionTestCase guidance for runner tests. Plans: archived completed_scraper-run-jobs_PLAN_17-06-26.md to general-plans/completed/; new active plan run-cancellation_PLAN_17-06-26.md for subprocess cancellation phase. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughAdds a ChangesScraperRun Feature: Backend to Frontend
Process Documentation Updates
Sequence Diagram(s)sequenceDiagram
participant Page as +page.svelte
participant APIClient as api.ts
participant Django as Django JSON API
participant Runner as runner.py
participant Subprocess as run_scraper_job subprocess
Page->>Django: load() — api.scrapers + api.scraperRuns (parallel)
Django-->>Page: scrapers[], recentRuns[]
loop poll api.activeRuns() every 2.5s while active runs exist
Page->>Django: GET /api/scrapers/runs/active/
Django-->>Page: active ScraperRun[]
alt no active runs left
Page->>Django: GET /api/scrapers/runs/ + /api/scrapers/
Django-->>Page: refreshed recentRuns + scrapers with last_run
end
end
Page->>APIClient: api.runScraper(key)
APIClient->>Django: POST /api/scrapers/key/run/ (X-CSRFToken)
Django->>Runner: trigger_scraper_run(key)
Runner->>Django: create ScraperRun QUEUED + Popen(start_new_session=True)
Runner->>Django: save pid on row
Django-->>Page: {id, status: "queued"}
Subprocess->>Django: set RUNNING + started_at
Subprocess->>Subprocess: SCRAPERS[key]().run()
Subprocess->>Django: set SUCCESS/FAILED + finished_at
Page->>APIClient: api.cancelRun(id)
APIClient->>Django: POST /api/scrapers/runs/id/cancel/
Django->>Runner: cancel_run(id) SELECT FOR UPDATE
Runner->>Subprocess: os.killpg(os.getpgid(pid), SIGTERM)
Runner->>Django: refresh_from_db → set CANCELLED + finished_at
Django-->>Page: ScraperRun {status: "cancelled"}
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
…ers-and-logs # Conflicts: # apps/backend/events/tests.py # apps/backend/events/views.py # apps/frontend/src/routes/scrapers/+page.svelte # process/context/all-context.md # process/context/tests/all-tests.md
There was a problem hiding this comment.
Actionable comments posted: 10
🧹 Nitpick comments (2)
apps/frontend/src/lib/types.ts (1)
119-121: ⚡ Quick winTighten
RunAllResultstatus typing to the known status union.Line 120 currently allows any string, but this payload is constrained to scraper run statuses. Using
ScraperRunStatuspreserves compile-time contract checks and prevents accidental UI branch drift.Proposed diff
export interface RunAllResult { - created: { key: string; id: number; status: string }[]; + created: { key: string; id: number; status: ScraperRunStatus }[]; skipped: string[]; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/frontend/src/lib/types.ts` around lines 119 - 121, In the RunAllResult interface, the status field within the created array is typed as a plain string, which allows any value and loses compile-time validation. Change the status field type from string to ScraperRunStatus to enforce that only valid scraper run statuses are used, ensuring the UI contract is correctly enforced at compile time and preventing accidental branch drift.apps/frontend/src/lib/api.ts (1)
78-78: ⚡ Quick winUse
ScraperRunStatusforrunScraperresponse typing.Line 78 uses a broad
stringforstatus; narrowing it toScraperRunStatuskeeps the client API contract strict and consistent with the rest of the scraper-run types.Proposed diff
import type { CategoryCount, EventRow, Organizer, OrganizerDetail, Paginated, RunAllResult, Scraper, + ScraperRunStatus, ScraperRun, SourceCount, Stats, VenueRow } from './types'; @@ - runScraper: (key: string) => post<{ id: number; status: string }>(`/scrapers/${key}/run/`), + runScraper: (key: string) => + post<{ id: number; status: ScraperRunStatus }>(`/scrapers/${key}/run/`),🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/frontend/src/lib/api.ts` at line 78, The runScraper function in the api.ts file uses a generic string type for the status field in its response type, which reduces type safety. Replace the string type with ScraperRunStatus in the response type definition of the runScraper function (the post generic parameter) to enforce a strict API contract and maintain consistency with other scraper-run related types. Ensure ScraperRunStatus is imported if it is not already available in the file.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@apps/backend/events/admin.py`:
- Around line 76-80: The `pid` field is missing from the readonly_fields tuple
in the admin configuration. Add `pid` to the readonly_fields tuple to prevent
manual editing of this field in the Django admin interface. This is critical
because the `cancel_run` function relies on the `pid` field value to send
SIGTERM signals to the correct process group, and allowing manual edits could
cause the signal to be sent to the wrong target. Include `pid` in the
readonly_fields tuple alongside the other read-only fields like scraper_key,
status, started_at, and finished_at.
In `@apps/backend/events/management/commands/run_scraper_job.py`:
- Around line 47-49: The status transition to RUNNING in the ScraperRun update
is unconditional and vulnerable to a race condition where a cancelled run can be
resurrected. Instead of directly setting run.status to ScraperRun.Status.RUNNING
and calling save(), use Django's atomic conditional update pattern (such as
filter().update() with a queryset condition) to ensure the status transitions
from QUEUED to RUNNING only if the current status is still QUEUED. This mirrors
the defensive approach already used in cancel_run with select_for_update() and
refresh_from_db(), preventing a worker process from overwriting a terminal
CANCELLED state that was set by an admin between the time the job was queued and
when the worker actually started execution.
In `@apps/backend/events/runner.py`:
- Around line 57-65: The code at lines 57-64 has a race condition where two
concurrent requests can both pass the exists() check and create duplicate active
ScraperRun entries for the same scraper_key. Add a partial unique constraint to
the ScraperRun model database to enforce that only one active run (with status
QUEUED or RUNNING) can exist per scraper_key, then wrap the
ScraperRun.objects.create() call in a try-except block to catch the database
integrity constraint violation and return the same response (None, True) that
would be returned if active_exists was true, ensuring the race condition is
prevented at the database level.
In `@apps/backend/events/tests.py`:
- Around line 757-765: Before releasing to production, verify that the three
public mutation endpoints (api_scraper_trigger, api_scraper_run_all, and
api_scraper_run_cancel) are properly gated at the infrastructure level since
they lack application-level authentication. Work with the DevOps and
Infrastructure team to confirm that production deployments have reverse proxy
rules, firewall restrictions, or network policies in place that restrict access
to these endpoints to only internal networks or trusted sources. Document the
deployment gates used (firewall rules, reverse proxy configuration, load
balancer restrictions) and ensure this security configuration is enforced
consistently across all production environments before the application is
released.
In `@apps/backend/events/views.py`:
- Around line 445-551: Add authentication and authorization checks to all
scraper-run endpoints to enforce staff-only access. The endpoints
api_scraper_trigger, api_scraper_run_all, api_scraper_runs,
api_scraper_runs_active, api_scraper_run_detail, and api_scraper_run_cancel
currently lack any access control despite being documented as staff-only. Add
either a decorator-based approach (such as Django's staff_member_required or
login_required with manual staff check) or implement explicit authentication
logic at the start of each function to verify the user is authenticated and has
staff privileges, returning an appropriate 403 Forbidden response if
authorization fails.
- Around line 513-516: The limit parameter parsing in the try block does not
enforce a minimum bound, allowing zero and negative values to pass through which
causes problematic queryset slicing behavior. Modify the limit assignment to
clamp the value to the range [1, 200] by using max() with a minimum value of 1
in addition to the existing min() call that enforces the maximum of 200,
ensuring the API behaves deterministically regardless of the input value.
In `@apps/frontend/src/routes/scrapers/`+page.svelte:
- Around line 176-177: The run variable at line 176 is only derived from
runningMap which contains only active runs (queued/running status), making the
success and failed status branches at lines 219 and 226 unreachable dead code.
Update the const run assignment to first check runningMap for an active run, and
if no active run exists, fall back to the latest run from an available data
source (such as a latest run property or API response). This will allow the
success and failed card details to be displayed when runs complete.
- Around line 123-125: The duration formatter has a bug where rounding the
remainder seconds can produce 60 as the seconds value, creating invalid time
strings like "1m 60s". Fix this by first normalizing the input seconds to a
whole number using Math.round before splitting into minutes and seconds. This
ensures the remainder operation always produces a value between 0 and 59,
avoiding the invalid 60-second bucket.
In `@process/context/all-context.md`:
- Around line 303-317: The ScraperRun model documentation section is incomplete
and needs two updates. First, in the status field documentation where it
currently lists the TextChoices as queued/running/success/failed, add CANCELLED
to this list to reflect the complete set of status options. Second, add a new
line documenting the pid field (PositiveIntegerField, null=True, blank=True)
which stores the OS PID of the worker subprocess and is used for cancellation
purposes. Insert the pid field documentation after the triggered_by field entry
to maintain logical grouping of model fields.
- Around line 376-394: The Run-Jobs Subsystem section in the documentation
incorrectly describes the system as thread-based using threading.Thread when the
actual implementation uses subprocess.Popen with start_new_session=True. Revise
the section starting at "The system is thread-based" to accurately document that
the system spawns independent process groups via subprocess.Popen, stores the
pid on the ScraperRun row, uses os.killpg for process group termination on
cancellation, and is controlled by the run_scraper_job management command (not
the _run_scraper thread function). Remove references to threading.Thread,
django.db.connection.close() in finally blocks, and the thread-based execution
model, replacing them with subprocess-based process group management details.
---
Nitpick comments:
In `@apps/frontend/src/lib/api.ts`:
- Line 78: The runScraper function in the api.ts file uses a generic string type
for the status field in its response type, which reduces type safety. Replace
the string type with ScraperRunStatus in the response type definition of the
runScraper function (the post generic parameter) to enforce a strict API
contract and maintain consistency with other scraper-run related types. Ensure
ScraperRunStatus is imported if it is not already available in the file.
In `@apps/frontend/src/lib/types.ts`:
- Around line 119-121: In the RunAllResult interface, the status field within
the created array is typed as a plain string, which allows any value and loses
compile-time validation. Change the status field type from string to
ScraperRunStatus to enforce that only valid scraper run statuses are used,
ensuring the UI contract is correctly enforced at compile time and preventing
accidental branch drift.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro Plus
Run ID: a78166f2-eeb1-4459-8e10-ed407292d798
📒 Files selected for processing (19)
apps/backend/events/admin.pyapps/backend/events/management/commands/run_scraper_job.pyapps/backend/events/migrations/0010_scraperrun.pyapps/backend/events/migrations/0011_run_cancellation.pyapps/backend/events/models.pyapps/backend/events/runner.pyapps/backend/events/tests.pyapps/backend/events/urls.pyapps/backend/events/views.pyapps/frontend/src/lib/api.tsapps/frontend/src/lib/components/Badge.svelteapps/frontend/src/lib/components/PageHeader.svelteapps/frontend/src/lib/types.tsapps/frontend/src/routes/scrapers/+page.svelteapps/frontend/src/routes/scrapers/+page.tsprocess/context/all-context.mdprocess/context/tests/all-tests.mdprocess/general-plans/active/run-cancellation_PLAN_17-06-26.mdprocess/general-plans/completed/completed_scraper-run-jobs_PLAN_17-06-26.md
| @mock.patch("events.views.cancel_run") | ||
| def test_cancel_is_public(self, mock_cancel): | ||
| run = ScraperRun.objects.create( | ||
| scraper_key="myruntime", status=ScraperRun.Status.CANCELLED | ||
| ) | ||
| mock_cancel.return_value = (run, "ok") | ||
| resp = self.client.post(f"/api/scrapers/runs/{run.id}/cancel/") | ||
| self.assertEqual(resp.status_code, 200) | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm public mutation behavior is intentionally encoded in tests and views.
rg -n "test_.*is_public|unauthenticated|/api/scrapers/.*/run/|/api/scrapers/run-all/|/cancel/" apps/backend/events/tests.py
rg -n "def api_scraper_trigger|def api_scraper_run_all|def api_scraper_run_cancel|csrf_exempt|unauthenticated intentionally" apps/backend/events/views.py
# Look for explicit environment gates controlling scraper mutation exposure.
rg -n "SCRAPER|INTERNAL|ALLOW_UNAUTH|FEATURE_FLAG|AUTH" -g "*settings*.py"Repository: hdmGOAT/veent-event-scraper
Length of output: 2135
🏁 Script executed:
# Get full context of the SECURITY NOTE in views.py
sed -n '469,490p' apps/backend/events/views.py
sed -n '492,520p' apps/backend/events/views.py
sed -n '538,560p' apps/backend/events/views.pyRepository: hdmGOAT/veent-event-scraper
Length of output: 2883
🏁 Script executed:
# Check for any middleware or decorators that gate these endpoints
rg -n "middleware|permission|is_authenticated|request.user|INTERNAL|whitelist|allowed_hosts" apps/backend/events/views.py | head -30Repository: hdmGOAT/veent-event-scraper
Length of output: 229
🏁 Script executed:
# Look for deployment configuration files (docker, env, nginx, etc.)
find . -type f \( -name "*.env*" -o -name "docker*" -o -name "Dockerfile" -o -name "nginx*" -o -name ".github" \) 2>/dev/null | head -20Repository: hdmGOAT/veent-event-scraper
Length of output: 127
🏁 Script executed:
# Look for URL routing and middleware application
find apps/backend -type f \( -name "urls.py" -o -name "routing.py" \) | xargs cat -n 2>/dev/null | grep -A 5 -B 5 "api_scraper"Repository: hdmGOAT/veent-event-scraper
Length of output: 1421
🏁 Script executed:
# Check for any nginx, Docker, or deployment config files
find . -type f \( -name "nginx.conf" -o -name "Dockerfile*" -o -name ".dockerignore" -o -name "docker-compose*" -o -name "deployment*" -o -name "k8s*" \) 2>/dev/nullRepository: hdmGOAT/veent-event-scraper
Length of output: 53
🏁 Script executed:
# Look for any middleware that might gate these paths
rg -n "class.*Middleware|def.*middleware|MIDDLEWARE" apps/backend --type py | head -20Repository: hdmGOAT/veent-event-scraper
Length of output: 121
🏁 Script executed:
# Check the MIDDLEWARE setting to see if there's any authentication applied
sed -n '73,90p' apps/backend/config/settings.pyRepository: hdmGOAT/veent-event-scraper
Length of output: 694
🏁 Script executed:
# Check if there are any auth checks in the base URL config or middleware application
rg -n "login_required|permission_required|IsAuthenticated|BasicAuth" apps/backend/configRepository: hdmGOAT/veent-event-scraper
Length of output: 53
Confirm production deployment gates these public mutation endpoints before release.
The endpoints are intentionally unauthenticated in Phase 1 design (documented in security notes), but no enforcement mechanism is visible in code or infrastructure. The tests correctly lock in the 200 response, but before release you must verify these endpoints are restricted to internal networks or protected by reverse proxy/firewall rules in production, since no application-level gating exists.
Applies to: api_scraper_trigger (lines 798–808), api_scraper_run_all (lines 896–903), and api_scraper_run_cancel (lines 757–765).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@apps/backend/events/tests.py` around lines 757 - 765, Before releasing to
production, verify that the three public mutation endpoints
(api_scraper_trigger, api_scraper_run_all, and api_scraper_run_cancel) are
properly gated at the infrastructure level since they lack application-level
authentication. Work with the DevOps and Infrastructure team to confirm that
production deployments have reverse proxy rules, firewall restrictions, or
network policies in place that restrict access to these endpoints to only
internal networks or trusted sources. Document the deployment gates used
(firewall rules, reverse proxy configuration, load balancer restrictions) and
ensure this security configuration is enforced consistently across all
production environments before the application is released.
| # --------------------------------------------------------------------------- | ||
| # Scraper run jobs — trigger runs from the UI and poll/list their status. | ||
| # All endpoints are staff-only, mirroring the /review/ convention. | ||
| # --------------------------------------------------------------------------- | ||
|
|
||
|
|
||
| def _serialize_run(run): | ||
| """Serialise a ScraperRun to the standard dict shape used by all run endpoints.""" | ||
| return { | ||
| "id": run.id, | ||
| "scraper_key": run.scraper_key, | ||
| "status": run.status, | ||
| "started_at": run.started_at.isoformat() if run.started_at else None, | ||
| "finished_at": run.finished_at.isoformat() if run.finished_at else None, | ||
| "created_count": run.created_count, | ||
| "updated_count": run.updated_count, | ||
| "extra_counts": run.extra_counts, | ||
| "error_message": run.error_message or None, | ||
| "triggered_by": run.triggered_by.username if run.triggered_by_id else None, | ||
| "created_at": run.created_at.isoformat(), | ||
| "duration_seconds": run.duration_seconds, | ||
| } | ||
|
|
||
|
|
||
| @csrf_exempt | ||
| @require_POST | ||
| def api_scraper_trigger(request, key): | ||
| # SECURITY NOTE: This endpoint is unauthenticated intentionally. The | ||
| # SvelteKit frontend has no Django session (the "Admin User" in the sidebar | ||
| # is static UI, not a real session), and the Vite proxy makes all /api/* | ||
| # calls same-origin in dev, so there is no CSRF surface from a browser | ||
| # cross-site attack on this internal-only tool. @csrf_exempt is consistent | ||
| # with all other JSON API endpoints in this file. Re-evaluate when real | ||
| # auth is added (Phase 2 roadmap). | ||
| from .scrapers import SCRAPERS | ||
|
|
||
| if key not in SCRAPERS: | ||
| return JsonResponse({"error": "Unknown scraper key"}, status=404) | ||
|
|
||
| triggered_by = request.user if request.user.is_authenticated else None | ||
| run, already_active = trigger_scraper_run(key, triggered_by=triggered_by) | ||
| if already_active: | ||
| return JsonResponse({"error": "Scraper already running"}, status=409) | ||
|
|
||
| return JsonResponse({"id": run.id, "status": run.status}, status=200) | ||
|
|
||
|
|
||
| @csrf_exempt | ||
| @require_POST | ||
| def api_scraper_run_all(request): | ||
| # SECURITY NOTE: Same posture as api_scraper_trigger above — unauthenticated | ||
| # intentionally for the same reasons. Revisit when real auth is added. | ||
| from .scrapers import SCRAPERS | ||
|
|
||
| triggered_by = request.user if request.user.is_authenticated else None | ||
| created = [] | ||
| skipped = [] | ||
| for key in SCRAPERS: | ||
| run, already_active = trigger_scraper_run(key, triggered_by=triggered_by) | ||
| if already_active: | ||
| skipped.append(key) | ||
| else: | ||
| created.append({"key": key, "id": run.id, "status": run.status}) | ||
|
|
||
| return JsonResponse({"created": created, "skipped": skipped}, status=200) | ||
|
|
||
|
|
||
| def api_scraper_runs(request): | ||
| try: | ||
| limit = min(int(request.GET.get("limit", 50)), 200) | ||
| except ValueError: | ||
| limit = 50 | ||
| runs = ( | ||
| ScraperRun.objects.select_related("triggered_by") | ||
| .order_by("-created_at")[:limit] | ||
| ) | ||
| return JsonResponse([_serialize_run(r) for r in runs], safe=False) | ||
|
|
||
|
|
||
| def api_scraper_runs_active(request): | ||
| runs = ScraperRun.objects.filter( | ||
| status__in=[ScraperRun.Status.QUEUED, ScraperRun.Status.RUNNING] | ||
| ).select_related("triggered_by") | ||
| return JsonResponse([_serialize_run(r) for r in runs], safe=False) | ||
|
|
||
|
|
||
| def api_scraper_run_detail(request, run_id): | ||
| run = get_object_or_404( | ||
| ScraperRun.objects.select_related("triggered_by"), id=run_id | ||
| ) | ||
| return JsonResponse(_serialize_run(run)) | ||
|
|
||
|
|
||
| @csrf_exempt | ||
| @require_POST | ||
| def api_scraper_run_cancel(request, run_id): | ||
| # SECURITY NOTE: same posture as api_scraper_trigger — unauthenticated | ||
| # intentionally (no Django session from SvelteKit). Re-evaluate when real | ||
| # auth is added. | ||
| run, signal = cancel_run(run_id) | ||
| if signal == "not_found": | ||
| return JsonResponse({"error": "Run not found"}, status=404) | ||
| if signal == "not_active": | ||
| return JsonResponse( | ||
| {"error": "Run is not active", "run": _serialize_run(run)}, status=409 | ||
| ) | ||
| return JsonResponse(_serialize_run(run), status=200) |
There was a problem hiding this comment.
Lock down scraper-run endpoints before release.
This block is documented as staff-only, but the trigger/run-all/cancel routes are currently unauthenticated and CSRF-exempt, which lets any reachable caller start/stop scraper jobs. The same unauthenticated run APIs also expose run metadata (including failure details), so this is an immediate control-plane and information-exposure risk. Add a real authorization gate across all scraper-run endpoints (staff session or a dedicated signed API credential) before shipping.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@apps/backend/events/views.py` around lines 445 - 551, Add authentication and
authorization checks to all scraper-run endpoints to enforce staff-only access.
The endpoints api_scraper_trigger, api_scraper_run_all, api_scraper_runs,
api_scraper_runs_active, api_scraper_run_detail, and api_scraper_run_cancel
currently lack any access control despite being documented as staff-only. Add
either a decorator-based approach (such as Django's staff_member_required or
login_required with manual staff check) or implement explicit authentication
logic at the start of each function to verify the user is authenticated and has
staff privileges, returning an appropriate 403 Forbidden response if
authorization fails.
- admin: add pid to ScraperRunAdmin readonly_fields - run_scraper_job: guard QUEUED→RUNNING with conditional filter().update() so a cancelled run is never resurrected by a late-starting worker - models/runner: add unique_active_scraper_run partial DB constraint (scraper_key unique where status IN queued/running) + catch IntegrityError in trigger_scraper_run to close the exists()+create() race window (0012 migration) - views: clamp api_scraper_runs limit to [1, 200] - page.svelte: fix formatDuration 60s bucket bug; fall back to recentRuns for per-card success/failed display when no active run is in runningMap - api.ts/types.ts: type scraper run status fields as ScraperRunStatus - tests: update RunEndpointTests fixtures to respect unique_active_scraper_run - all-context.md: correct thread→subprocess model, add pid field and CANCELLED status to ScraperRun docs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@process/context/all-context.md`:
- Around line 642-644: The phrase "not a local Neon PostgreSQL DB" in the test
database documentation is contradictory and unclear. Revise this text to
explicitly state that the test database uses SQLite (not Neon PostgreSQL) as a
contrast point, so readers understand clearly which database technology is being
used for testing. Replace the confusing negative phrasing with a direct,
positive statement about the actual database choice to make the guidance
unambiguous.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 37538897-ef1f-42d0-a1b3-494b5f0e5841
📒 Files selected for processing (11)
apps/backend/events/admin.pyapps/backend/events/management/commands/run_scraper_job.pyapps/backend/events/migrations/0012_scraperrun_unique_active_constraint.pyapps/backend/events/models.pyapps/backend/events/runner.pyapps/backend/events/tests.pyapps/backend/events/views.pyapps/frontend/src/lib/api.tsapps/frontend/src/lib/types.tsapps/frontend/src/routes/scrapers/+page.svelteprocess/context/all-context.md
✅ Files skipped from review due to trivial changes (1)
- apps/frontend/src/lib/types.ts
🚧 Files skipped from review as they are similar to previous changes (4)
- apps/backend/events/management/commands/run_scraper_job.py
- apps/backend/events/runner.py
- apps/backend/events/tests.py
- apps/frontend/src/routes/scrapers/+page.svelte
There was a problem hiding this comment.
🧹 Nitpick comments (1)
apps/backend/events/models.py (1)
96-96: 💤 Low valueReplace EN DASH with ASCII hyphen in comment.
The comment contains a Unicode en-dash character (–) instead of an ASCII hyphen-minus (-) in "1–2 labels". While functionally harmless, this can cause issues with some text editors or tools that don't handle Unicode well.
📝 Suggested fix
- # AI-assigned canonical categories (1–2 labels from CANONICAL_CATEGORIES). + # AI-assigned canonical categories (1-2 labels from CANONICAL_CATEGORIES).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/backend/events/models.py` at line 96, In the comment starting with "AI-assigned canonical categories", replace the Unicode en-dash character (–) with a standard ASCII hyphen (-) in the text "1–2 labels". Change it from "1–2 labels" to "1-2 labels" to ensure compatibility with text editors and tools that don't handle Unicode characters well.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@apps/backend/events/models.py`:
- Line 96: In the comment starting with "AI-assigned canonical categories",
replace the Unicode en-dash character (–) with a standard ASCII hyphen (-) in
the text "1–2 labels". Change it from "1–2 labels" to "1-2 labels" to ensure
compatibility with text editors and tools that don't handle Unicode characters
well.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 98d48337-102a-483f-9cfa-e8b89bb55e32
📒 Files selected for processing (4)
apps/backend/events/models.pyapps/backend/events/tests.pyapps/backend/events/views.pyapps/frontend/src/lib/types.ts
✅ Files skipped from review due to trivial changes (1)
- apps/frontend/src/lib/types.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- apps/backend/events/views.py
Summary
/api/scrapers/<key>/run/or/api/scrapers/run-all/. Run buttons and a "Run All" header action are wired in the SvelteKit frontend./api/scrapers/runs/active/every 2.5 s while any run is queued/running. The spinner stops and history auto-refreshes when all active runs complete.ScraperRunmodel (migrations0010,0011) tracks every run with status (queued/running/success/failed/cancelled), timestamps, item counts,error_message(full traceback on failure), andpidfor process-kill cancellation.start_new_session=True(POSIXsetsid).cancel_run()sendsSIGTERMto the entire process group viaos.killpg, which also kills Playwright's chromium. ASELECT FOR UPDATErace guard prevents overwriting a worker that finished in the same instant.GET /api/scrapers/runs/,GET /api/scrapers/runs/active/,GET /api/scrapers/runs/<id>/,POST /api/scrapers/runs/<id>/cancel/— all standard JSON; existingGET /api/scrapers/enriched with alast_runfield per key.Auth posture / security follow-up
The three mutation endpoints (
/run/,/run-all/,/cancel/) are@csrf_exemptand unauthenticated. The SvelteKit frontend has no Django session — there is no CSRF surface from a browser cross-site attack on this internal-only tool during development. The Vite proxy makes all/api/*calls same-origin in dev. These endpoints must be re-gated behind real auth when user authentication is added (planned Phase 2).Not yet verified live
Test plan
cd apps/backend && ./venv/bin/python manage.py test events— all 84 tests passcd apps/frontend && pnpm svelte-check— no type errorscd apps/frontend && pnpm build— build succeeds./venv/bin/python manage.py runserver) + SvelteKit dev (pnpm dev), open Scraper Center, trigger a fast scraper, verify status updates livecancelled🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
New Features
UI/UX
Tests