Skip to content

feat: process approved data source uploads into /explore (#190)#196

Merged
William-Hill merged 21 commits into
mainfrom
feat/190-datasource-pipeline
Apr 19, 2026
Merged

feat: process approved data source uploads into /explore (#190)#196
William-Hill merged 21 commits into
mainfrom
feat/190-datasource-pipeline

Conversation

@William-Hill

@William-Hill William-Hill commented Apr 19, 2026

Copy link
Copy Markdown
Owner

Summary

  • Parses staff CSV/XLSX uploads at submit time with contributor-declared column mapping; normalized rows land in uploaded_datasets in the same transaction as the Upload row.
  • New Staff Uploads tab on /explore with a dataset picker; approved uploads render through the existing map/chart/table components.
  • Guide Section 1 now describes the shipped behavior.

Closes #190.

Spec + plan

  • Spec: docs/superpowers/specs/2026-04-18-datasource-upload-pipeline-design.md
  • Plan: docs/superpowers/plans/2026-04-18-datasource-upload-pipeline.md

Test plan

  • pytest — 888 passed (excluding optional tests/test_training/test_integration_models.py, which requires Ollama + fine-tuned models and can fail on non-JSON model output)
  • npm run build && npm run lint — clean (one pre-existing hooks warning in app/page.tsx)
  • Manual QA per plan Task 18 (upload → approve → /explore, error paths, localStorage)

Made with Cursor

Summary by CodeRabbit

  • New Features

    • Staff data source upload: upload CSV/XLSX with required column mapping (geo, metric value, metric name; optional race/year); immediate parsing/validation with structured errors and preview.
  • Explore

    • New "Staff Uploads" source and picker on Explore to browse and filter approved staff datasets.
  • Admin Improvements

    • Review UI shows declared mapping, preview table, and parsed row counts.
  • API Endpoints

    • New endpoints to list available staff uploads and to fetch explore-shaped data for a selected approved upload.
  • Documentation & Tests

    • Guide updated and new tests cover parsing, validation, upload, review, and explore flows.

William Hill added 20 commits April 18, 2026 14:19
@coderabbitai

coderabbitai Bot commented Apr 19, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

Adds a staff datasource upload pipeline: backend CSV/XLSX parsing/validation, immediate normalization and bulk-insert of rows at upload, two explore endpoints for approved uploads, schema and admin UI mapping inputs, frontend picker/integration on /explore, tests, and openpyxl dependency.

Changes

Cohort / File(s) Summary
Design & Docs
docs/superpowers/plans/2026-04-18-datasource-upload-pipeline.md, docs/superpowers/specs/2026-04-18-datasource-upload-pipeline-design.md
End-to-end spec and plan for datasource upload parsing, validation, DB usage, APIs, and frontend UX.
Datasource processing service
src/d4bl/services/datasource_processing/...
src/d4bl/services/datasource_processing/validation.py, parser.py, __init__.py
New package with pure validation helpers (validate_metric_name, derive_state_fips, coerce_numeric, coerce_year), CSV/XLSX readers, MappingConfig, DatasourceParseError, and parse_datasource_file enforcing thresholds and returning normalized rows/preview.
Backend schemas & upload endpoint
src/d4bl/app/schemas.py, src/d4bl/app/upload_routes.py
DataSourceUploadRequest gains mapping fields (geo_column, metric_value_column, metric_name, optional race_column/year_column) and cross-field validation; upload route accepts mapping form fields, parses file (to_thread), returns structured 422 on parse errors, and bulk-inserts normalized rows into uploaded_datasets within the same transaction.
Explore API
src/d4bl/app/api.py
Added GET /api/explore/staff-uploads/available (list approved uploads) and GET /api/explore/staff-uploads (ExploreResponse-shaped aggregated data for an approved upload), with SQL aggregation and parameter validation.
Dependency
pyproject.toml
Added openpyxl>=3.1 for XLSX parsing.
Backend tests & fixtures
tests/conftest.py, tests/test_datasource_processing.py, tests/test_upload_api.py, tests/test_explore_api.py, tests/test_settings.py
New unit/integration tests for validation/coercion, parsing, schema validation, upload flow, explore endpoints; added make_xlsx_bytes fixture and updated settings test.
Admin UI: upload & review
ui-nextjs/components/admin/UploadDataSource.tsx, ui-nextjs/components/admin/ReviewDetail.tsx
Upload form adds mapping inputs, conditional year input, structured 422 error formatting; review detail shows mapping and parsed preview rows and hides mapping/preview metadata from generic list.
Explore frontend integration
ui-nextjs/app/explore/page.tsx, ui-nextjs/components/explore/StaffDatasetPicker.tsx, ui-nextjs/components/explore/MetricFilterPanel.tsx, ui-nextjs/lib/explore-config.ts
Adds staff-uploads data source config, StaffDatasetPicker component, persisted uploadId in filters, conditional data loading and filter behavior for staff uploads, and ExploreFilters extended with uploadId.
Guide content
ui-nextjs/app/guide/page.tsx
Updated contributor guide to require mapping fields, describe immediate parsing/validation, admin review preview, and staff-uploads availability in /explore.

Sequence Diagram(s)

sequenceDiagram
    participant User as Staff Contributor
    participant Client as Browser
    participant API as Backend API
    participant Parser as Datasource Parser
    participant DB as Database

    User->>Client: Upload CSV/XLSX + mapping
    Client->>API: POST /api/admin/uploads/datasource (file, mapping...)
    API->>Parser: parse_datasource_file(bytes, ext, MappingConfig)
    Parser->>Parser: read_csv_bytes / read_xlsx_bytes
    Parser->>Parser: validate headers, normalize rows (FIPS, numeric, year)
    Parser->>Parser: apply thresholds (FIPS %, numeric %, min rows)
    alt Validation fails
        Parser-->>API: DatasourceParseError with detail
        API-->>Client: 422 with structured error.detail
    else Success
        Parser-->>API: ParseResult (normalized_rows, preview)
        API->>DB: BEGIN
        API->>DB: INSERT uploads (pending_review + metadata)
        API->>DB: BULK INSERT uploaded_datasets (normalized rows as jsonb)
        API->>DB: COMMIT
        API-->>Client: 200 OK (upload recorded)
    end
Loading
sequenceDiagram
    participant User as End User
    participant Client as Browser
    participant API as Backend API
    participant DB as Database

    User->>Client: Open /explore, select "Staff Uploads"
    Client->>API: GET /api/explore/staff-uploads/available
    API->>DB: SELECT uploads WHERE upload_type='datasource' AND status='approved'
    DB-->>API: [{upload_id, metric_name, has_race, row_count, ...}]
    API-->>Client: list of available staff uploads
    User->>Client: Choose dataset + filters (state/race/year)
    Client->>API: GET /api/explore/staff-uploads?upload_id=...&state_fips=...&race=...&year=...
    API->>DB: SELECT aggregated values FROM uploaded_datasets WHERE upload_id=... AND filters...
    DB-->>API: aggregated rows, national_average, available_* values
    API-->>Client: ExploreResponse-shaped payload
    Client->>Client: Render map/chart based on response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.87% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: process approved data source uploads into /explore (#190)' accurately describes the main change: implementing processing and integration of staff-uploaded datasources into the explore interface.
Linked Issues check ✅ Passed The PR implements all major coding objectives from #190: parsing CSV/XLSX uploads at submit time with column mapping validation, normalizing rows into uploaded_datasets, adding staff-uploads tab on /explore with dataset picker, and updating guide copy.
Out of Scope Changes check ✅ Passed All changes directly support the linked issue objectives: new datasource processing package, API endpoints, upload flow updates, frontend explore integration, and guide updates are all in-scope for #190.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/190-datasource-pipeline

Comment @coderabbitai help to get the list of available commands and usage tips.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 67df103039

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/d4bl/services/datasource_processing/validation.py
Comment thread ui-nextjs/app/explore/page.tsx
@greptile-apps

greptile-apps Bot commented Apr 19, 2026

Copy link
Copy Markdown

Greptile Summary

This PR ships an end-to-end staff CSV/XLSX upload pipeline: files are parsed, validated, and normalized at submit time; rows land in uploaded_datasets in the same transaction as the Upload row; and a new "Staff Uploads" tab on /explore surfaces approved datasets through the existing map/chart/table components. The architecture is sound — parse-on-upload with a pure-function validation layer is the right call, and reusing the existing ExploreResponse shape keeps the frontend changes minimal.

Key findings:

  • P1 – Infinity values crash json.dumps with a 500: coerce_numeric blocks NaN but allows float("Infinity"). Those values propagate to json.dumps(row) in upload_routes.py, which raises an unhandled ValueError (not a DatasourceParseError), returning a 500 instead of a 422.
  • P1 – Full file read before size gate: await file.read() buffers the whole upload into memory before the 50 MB limit is checked. Using file.read(MAX_DATASOURCE_SIZE + 1) reads at most one byte past the cap.
  • P1 – Race filter restored to 'total' on page reload for race-less staff datasets: resolveInitialState uses the static source.hasRace (always true for staff-uploads) to set the race default when persisted.race is null. A race-column-less dataset will return zero rows after a page refresh.
  • P2 – Empty sourceUrl for staff-uploads renders a dead "Learn more" link that reloads the page.
  • P2 – No explicit guard for a header-only file (0 data rows) before the MIN_VALID_ROWS check produces a confusing error message.

Confidence Score: 3/5

Not safe to merge as-is: two backend bugs (Infinity crash + pre-read size check) and one frontend bug (race filter reset on reload) need fixing before production use.

Three P1 issues were found. The Infinity-in-json.dumps bug turns a legitimate 422 into an unhandled 500, the full-file-read-before-size-check is a DoS-adjacent pattern, and the race filter page-reload regression silently breaks the explore view for race-less staff datasets. The rest of the implementation — parse pipeline, DB transaction, explore endpoints, test coverage — is well-built. Fixing these three items should be straightforward and bring the PR to merge-ready.

src/d4bl/services/datasource_processing/validation.py (Infinity guard), src/d4bl/app/upload_routes.py (size check ordering), ui-nextjs/app/explore/page.tsx (race filter initialization)

Important Files Changed

Filename Overview
src/d4bl/services/datasource_processing/validation.py Pure coercion helpers — solid overall, but coerce_numeric allows Infinity/-Infinity which will crash json.dumps downstream (unhandled 500).
src/d4bl/services/datasource_processing/parser.py Well-structured parse pipeline with quality gates; MIN_VALID_ROWS path for header-only files produces a slightly confusing error message but is functionally correct.
src/d4bl/app/upload_routes.py Parse-on-upload route is clean and transactional, but await file.read() buffers the full upload into memory before the size check is applied.
src/d4bl/app/api.py Two new explore endpoints for staff-uploads aggregate JSONB rows by state/race/year and list approved datasets — both well-scoped and correctly gated by auth + status='approved'.
ui-nextjs/app/explore/page.tsx Staff-uploads tab integration is well-structured; contains a race-filter initialization bug where persisted null race gets promoted to 'total' on page reload for datasets without race columns.
ui-nextjs/components/explore/StaffDatasetPicker.tsx Clean dataset picker with proper cancellation and error handling.
ui-nextjs/lib/explore-config.ts Staff-uploads DataSourceConfig entry is correct except sourceUrl: "" renders a non-functional "Learn more" link that navigates to the current page.
tests/test_datasource_processing.py Comprehensive unit tests for validation, parsing, and integration paths; notably missing a test for Infinity input to coerce_numeric.
src/d4bl/app/schemas.py New DataSourceUploadRequest schema correctly validates source_name and metric_name with existing validators; clean addition.

Sequence Diagram

sequenceDiagram
    participant C as Contributor (browser)
    participant API as FastAPI /api/admin/uploads/datasource
    participant Parser as datasource_processing.parser
    participant DB as PostgreSQL

    C->>API: POST multipart (file + MappingConfig form fields)
    API->>API: validate file ext + size
    API->>API: validate DataSourceUploadRequest schema
    API->>Parser: parse_datasource_file(content, ext, mapping) [thread]
    Parser->>Parser: read_csv_bytes / read_xlsx_bytes
    Parser->>Parser: _check_columns_exist
    Parser->>Parser: _normalize_rows (FIPS/numeric/year coercion + drop tracking)
    Parser->>Parser: quality gates (bad_fips ratio, numeric ratio, MIN_VALID_ROWS)
    Parser-->>API: ParseResult (normalized_rows, preview_rows, dropped_counts)
    API->>DB: BEGIN txn INSERT uploads + bulk INSERT uploaded_datasets chunks
    DB-->>API: COMMIT
    API-->>C: 200 UploadResponse

    note over C,DB: Admin approval (status flip only)
    C->>API: PATCH /api/admin/uploads/{id}/review
    API->>DB: UPDATE uploads SET status=approved
    DB-->>API: ok
    API-->>C: status approved

    note over C,DB: Explore
    C->>API: GET /api/explore/staff-uploads?upload_id=...
    API->>DB: SELECT from uploaded_datasets JOIN uploads WHERE status=approved
    DB-->>API: aggregated rows AVG by state_fips/race/year
    API-->>C: ExploreResponse
Loading

Comments Outside Diff (1)

  1. ui-nextjs/app/explore/page.tsx, line 82-91 (link)

    P1 Null race persisted for staff-uploads gets promoted to 'total' on page reload, causing empty results

    When a user selects a staff-uploads dataset without a race column, StaffDatasetPicker.onChange correctly resets race to null. persistFilters saves race: null. On the next page load resolveInitialState evaluates:

    race: persisted.race ?? (source.hasRace ? 'total' : null)

    Because staff-uploads has hasRace: true in DATA_SOURCES, the ?? fallback fires and sets race = 'total'. The subsequent data fetch then includes race=total in the query params, but no rows in a race-column-less dataset have race = 'total' (they store null), so the explore view renders empty even though data exists.

    A targeted fix: use the dataset-level has_race flag (available via activeUploadSummary) to decide the race default, rather than the static source-level hasRace:

    race: persisted.race ?? (
      source.key === 'staff-uploads'
        ? null                          // pick race after dataset loads
        : (source.hasRace ? 'total' : null)
    ),

Reviews (1): Last reviewed commit: "test(settings): isolate task model defau..." | Re-trigger Greptile

Comment thread src/d4bl/services/datasource_processing/validation.py
Comment thread src/d4bl/app/upload_routes.py Outdated
Comment thread ui-nextjs/lib/explore-config.ts
Comment thread src/d4bl/services/datasource_processing/parser.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/d4bl/app/schemas.py`:
- Around line 642-647: Add the same whitespace-normalizing and non-empty
validation used for other required string fields to the geo_column and
metric_value_column fields so whitespace-only values are rejected; locate the
model/class in src/d4bl/app/schemas.py (the class that declares geo_column,
metric_value_column, metric_name, etc.) and add validators (matching the pattern
used for source_name) that strip surrounding whitespace and raise a validation
error if the result is empty or only whitespace, ensuring consistent downstream
behavior.

In `@src/d4bl/app/upload_routes.py`:
- Around line 94-99: Make data_year optional when a year_column exists: change
the route parameter signature so data_year is Optional[int] = Form(None) instead
of required, and add validation in the same endpoint (using the parameter names
data_year and year_column) to raise an HTTPException if year_column is
None/empty and data_year is still None. Keep existing behavior that
MappingConfig.data_year remains a fallback for files without a year column, and
ensure any downstream code that uses data_year handles the optional type (e.g.,
fall back to MappingConfig.data_year or abort with the same validation message).

In `@src/d4bl/services/datasource_processing/parser.py`:
- Around line 58-61: The except block that catches StopIteration and raises
DatasourceParseError should preserve or intentionally suppress exception
chaining; update the raise to include an explicit "from None" (i.e., raise
DatasourceParseError("file has no header row") from None) so the StopIteration
context is not leaked; modify the try/except around next(reader) where
raw_header is set in parser.py accordingly.
- Around line 79-101: The workbook opened with load_workbook (variable wb) is
not explicitly closed; ensure wb.close() is always called to release resources
by wrapping workbook usage in a try/finally (create wb, then try: use
ws/rows_iter/raw_header/rows and return; finally: wb.close()) or use
contextlib.closing(wb) as a context manager so that wb.close() runs even on
errors or early returns.

In `@tests/test_datasource_processing.py`:
- Around line 76-86: Add a test to ensure coerce_year rejects boolean inputs:
update the TestCoerceYear test suite to include True and False (e.g., via
pytest.mark.parametrize or a new test method) and assert that calling
coerce_year(True) and coerce_year(False) raises ValueError; target the
coerce_year function so the validation branch that explicitly rejects booleans
(validation.py handling) is covered.
- Around line 54-73: Add unit tests to cover native numeric passthrough for
coerce_numeric by asserting that passing an int (e.g., 42) and a float (e.g.,
14.3) returns the same numeric values (42.0 or 42 and 14.3 respectively,
matching existing behavior); place these new assertions alongside the existing
TestCoerceNumeric tests so they exercise the logic in coerce_numeric that
handles int/float inputs directly.

In `@ui-nextjs/app/explore/page.tsx`:
- Around line 362-371: When handling dataset switches in the StaffDatasetPicker
onChange handler (the callback that calls setActiveUploadSummary and
setFilters), also clear filters.metric and clear the selectedState so leftover
metric or state from the previous upload doesn't mismatch the new dataset;
update the setFilters call that currently resets uploadId, race, and year to
also set metric: null (or a default) and ensure you call the state setter for
selectedState (e.g., setSelectedState(null)) so the UI, legend, detail card, and
chart reflect the newly selected dataset.

In `@ui-nextjs/components/admin/UploadDataSource.tsx`:
- Around line 329-337: The inputs bound to raceColumn/setRaceColumn (and the
similar yearColumn/setYearColumn input) lack accessible labels; add explicit
label elements tied to each input by adding unique id attributes (e.g.,
id="race-column" and id="year-column") on the inputs and corresponding <label
htmlFor="..."> elements that describe the field (or use a visually-hidden class
if you don't want visible text), ensuring the label text conveys purpose (e.g.,
"Race column" / "Year column") and keeping existing required/placeholder
behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 93ed8b5d-e7bb-4c15-a434-822e647dc79d

📥 Commits

Reviewing files that changed from the base of the PR and between 1fe44d5 and 67df103.

📒 Files selected for processing (21)
  • docs/superpowers/plans/2026-04-18-datasource-upload-pipeline.md
  • docs/superpowers/specs/2026-04-18-datasource-upload-pipeline-design.md
  • pyproject.toml
  • src/d4bl/app/api.py
  • src/d4bl/app/schemas.py
  • src/d4bl/app/upload_routes.py
  • src/d4bl/services/datasource_processing/__init__.py
  • src/d4bl/services/datasource_processing/parser.py
  • src/d4bl/services/datasource_processing/validation.py
  • tests/conftest.py
  • tests/test_datasource_processing.py
  • tests/test_explore_api.py
  • tests/test_settings.py
  • tests/test_upload_api.py
  • ui-nextjs/app/explore/page.tsx
  • ui-nextjs/app/guide/page.tsx
  • ui-nextjs/components/admin/ReviewDetail.tsx
  • ui-nextjs/components/admin/UploadDataSource.tsx
  • ui-nextjs/components/explore/MetricFilterPanel.tsx
  • ui-nextjs/components/explore/StaffDatasetPicker.tsx
  • ui-nextjs/lib/explore-config.ts

Comment thread src/d4bl/app/schemas.py
Comment thread src/d4bl/app/upload_routes.py Outdated
Comment thread src/d4bl/services/datasource_processing/parser.py Outdated
Comment thread src/d4bl/services/datasource_processing/parser.py Outdated
Comment thread tests/test_datasource_processing.py
Comment thread tests/test_datasource_processing.py
Comment thread ui-nextjs/components/admin/UploadDataSource.tsx Outdated
- Reject non-finite numerics before json serialization; pad 4-digit FIPS
- Enforce max upload size with bounded read; optional data_year when year_column set
- Pydantic: non-blank geo/metric columns; year from data_year or year_column
- Parser: empty data rows, StopIteration chains, close XLSX workbooks in finally
- Explore: staff-uploads persistence race default; clear metric/state on dataset change
- Admin upload form: accessible labels; omit data_year when per-row year column
- Staff picker: nullable data_year; staff-uploads learn-more links to /guide

Made-with: Cursor
@William-Hill

Copy link
Copy Markdown
Owner Author

Review follow-up (commit d4083ab)

Addressed Greptile / Codex / CodeRabbit items:

  • validation: coerce_numeric rejects non-finite values (inf / -inf / nan); derive_state_fips pads 4-digit all-numeric FIPS (Excel leading-zero loss).
  • upload_routes: first read capped at MAX_DATASOURCE_SIZE + 1 before rejecting oversize files.
  • schemas: geo_column / metric_value_column must be non-blank after strip; optional race_column / year_column stripped; either data_year or year_column required (data_year optional on form when year column is used).
  • parser: explicit error for header-only files (no_data_rows); StopIteration raised without chaining noise; wb.close() in finally for XLSX reads.
  • explore: persisted filters for staff-uploads no longer default race to total when null; staff dataset change clears metric and selectedState; Learn more uses internal /guide for staff-uploads without target=_blank when URL is relative.
  • explore-config: staff-uploads sourceUrl/guide.
  • UploadDataSource: labels for race/year inputs; conditional constant data year + form omits data_year when a year column is mapped.
  • StaffDatasetPicker: data_year typed as nullable; label shows multi-year when absent.
  • tests: coverage for the above.

CI: ruff check, pytest tests/ (excluding live Ollama integration), npm run lint, npx tsc --noEmit, npm run build all pass locally.

Please re-run checks and resolve review threads if this matches your expectations.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
ui-nextjs/app/explore/page.tsx (1)

151-153: ⚠️ Potential issue | 🟠 Major

Clear loading on the no-session early return.

If auth flips while a request is in flight, the aborted request skips the finally unset, and the next invocation returns here with loading still true. That leaves the explore view stuck behind its skeleton/overlay until a remount.

💡 Proposed fix
   const fetchData = useCallback(async (signal: AbortSignal) => {
-    if (!session?.access_token) return;
+    if (!session?.access_token) {
+      setLoading(false);
+      setExploreData(null);
+      setBills([]);
+      return;
+    }

Based on learnings: PolicyExploreView intentionally resets loading on the missing-auth early return because an abort during auth change can otherwise leave the spinner stuck indefinitely.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ui-nextjs/app/explore/page.tsx` around lines 151 - 153, The fetchData async
callback (fetchData(signal: AbortSignal)) can early-return when
session?.access_token is missing but leaves the loading flag true if a previous
request was aborted; update fetchData to explicitly clear the loading state
before returning on the no-session path (e.g., call the same
setLoading(false)/resetLoading used in PolicyExploreView’s missing-auth
handling) so the explore skeleton/overlay is not left visible after auth flips
mid-request.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/d4bl/services/datasource_processing/parser.py`:
- Around line 203-206: The unsupported-extension error raised in parser.py (the
DatasourceParseError instance in the branch that checks file extension) sets
detail={"allowed": sorted(SUPPORTED_EXTS)} but omits a human-friendly "message"
key; update the raise to include a readable message in the detail payload (e.g.,
include "message": f"unsupported file type {ext!r}") so UI fallbacks get a
friendly string alongside the "allowed" list while keeping the existing error
text and keys.
- Around line 146-150: The persisted geo_fips value is taken directly from raw
text and can lose leading zeros (e.g., Excel-stored 01001 -> "1001"); update the
logic where geo_fips is assigned (the geo_fips variable populated from
mapping.geo_column near derive_state_fips) to canonicalize using the recovered
state_fips: after calling derive_state_fips(geo_raw), if state_fips exists and
geo_fips is digits-only and its length is shorter than the full FIPS length
(state_fips length + county code length), left-pad the numeric portion with
zeros to produce a canonical 5-digit county FIPS (state_fips + county.zfill(3))
and assign that back to geo_fips; apply the same canonicalization in the second
occurrence around the block referenced (lines ~179-185) so stored geo_fips
always preserves leading zeros.

In `@src/d4bl/services/datasource_processing/validation.py`:
- Around line 42-47: The helper pads county FIPS when len(s)==4 but misses tract
FIPS dropped to len(s)==10 and also silently accepts bad lengths; update the
logic around variable s so that you also pad when len(s)==10 (prepend "0") in
addition to the existing len==4 and len==1 cases, then validate the final length
and raise an error (or return a failure) for any s whose length is not one of
the expected canonical lengths (2, 5, or 11) before returning s[:2]; keep the
final return of s[:2] but ensure invalid inputs are rejected instead of
truncated.

In `@ui-nextjs/app/explore/page.tsx`:
- Around line 369-380: When handling StaffDatasetPicker's onChange, also clear
the upload-scoped sentinel and previous results: set
didAutoSelectDefaults.current = false and reset exploreData (via
setExploreData(null) or the relevant setter) immediately when switching uploads,
in addition to setActiveUploadSummary and setFilters so the new upload doesn't
inherit the old auto-select state or render stale exploreData if the new fetch
fails.

---

Outside diff comments:
In `@ui-nextjs/app/explore/page.tsx`:
- Around line 151-153: The fetchData async callback (fetchData(signal:
AbortSignal)) can early-return when session?.access_token is missing but leaves
the loading flag true if a previous request was aborted; update fetchData to
explicitly clear the loading state before returning on the no-session path
(e.g., call the same setLoading(false)/resetLoading used in PolicyExploreView’s
missing-auth handling) so the explore skeleton/overlay is not left visible after
auth flips mid-request.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9c824797-9c65-49c7-ac57-59a0bdf0b94a

📥 Commits

Reviewing files that changed from the base of the PR and between 67df103 and d4083ab.

📒 Files selected for processing (10)
  • src/d4bl/app/schemas.py
  • src/d4bl/app/upload_routes.py
  • src/d4bl/services/datasource_processing/parser.py
  • src/d4bl/services/datasource_processing/validation.py
  • tests/test_datasource_processing.py
  • tests/test_upload_api.py
  • ui-nextjs/app/explore/page.tsx
  • ui-nextjs/components/admin/UploadDataSource.tsx
  • ui-nextjs/components/explore/StaffDatasetPicker.tsx
  • ui-nextjs/lib/explore-config.ts

Comment thread src/d4bl/services/datasource_processing/parser.py
Comment thread src/d4bl/services/datasource_processing/parser.py
Comment thread src/d4bl/services/datasource_processing/validation.py
Comment thread ui-nextjs/app/explore/page.tsx
@William-Hill William-Hill merged commit cd660a5 into main Apr 19, 2026
4 checks passed
@William-Hill William-Hill deleted the feat/190-datasource-pipeline branch April 19, 2026 03:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: process approved data source uploads into /explore

1 participant