feat: process approved data source uploads into /explore (#190) by William-Hill · Pull Request #196 · William-Hill/d4bl_ai_agent

William-Hill · 2026-04-19T00:36:51Z

Summary

Parses staff CSV/XLSX uploads at submit time with contributor-declared column mapping; normalized rows land in uploaded_datasets in the same transaction as the Upload row.
New Staff Uploads tab on /explore with a dataset picker; approved uploads render through the existing map/chart/table components.
Guide Section 1 now describes the shipped behavior.

Closes #190.

Spec + plan

Spec: docs/superpowers/specs/2026-04-18-datasource-upload-pipeline-design.md
Plan: docs/superpowers/plans/2026-04-18-datasource-upload-pipeline.md

Test plan

pytest — 888 passed (excluding optional tests/test_training/test_integration_models.py, which requires Ollama + fine-tuned models and can fail on non-JSON model output)
npm run build && npm run lint — clean (one pre-existing hooks warning in app/page.tsx)
Manual QA per plan Task 18 (upload → approve → /explore, error paths, localStorage)

Made with Cursor

Summary by CodeRabbit

New Features
- Staff data source upload: upload CSV/XLSX with required column mapping (geo, metric value, metric name; optional race/year); immediate parsing/validation with structured errors and preview.
Explore
- New "Staff Uploads" source and picker on Explore to browse and filter approved staff datasets.
Admin Improvements
- Review UI shows declared mapping, preview table, and parsed row counts.
API Endpoints
- New endpoints to list available staff uploads and to fetch explore-shaped data for a selected approved upload.
Documentation & Tests
- Guide updated and new tests cover parsing, validation, upload, review, and explore flows.

Made-with: Cursor

coderabbitai · 2026-04-19T00:37:03Z

📝 Walkthrough

Walkthrough

Adds a staff datasource upload pipeline: backend CSV/XLSX parsing/validation, immediate normalization and bulk-insert of rows at upload, two explore endpoints for approved uploads, schema and admin UI mapping inputs, frontend picker/integration on /explore, tests, and openpyxl dependency.

Changes

Cohort / File(s)	Summary
Design & Docs `docs/superpowers/plans/2026-04-18-datasource-upload-pipeline.md`, `docs/superpowers/specs/2026-04-18-datasource-upload-pipeline-design.md`	End-to-end spec and plan for datasource upload parsing, validation, DB usage, APIs, and frontend UX.
Datasource processing service `src/d4bl/services/datasource_processing/...` `src/d4bl/services/datasource_processing/validation.py`, `parser.py`, `__init__.py`	New package with pure validation helpers (`validate_metric_name`, `derive_state_fips`, `coerce_numeric`, `coerce_year`), CSV/XLSX readers, `MappingConfig`, `DatasourceParseError`, and `parse_datasource_file` enforcing thresholds and returning normalized rows/preview.
Backend schemas & upload endpoint `src/d4bl/app/schemas.py`, `src/d4bl/app/upload_routes.py`	`DataSourceUploadRequest` gains mapping fields (`geo_column`, `metric_value_column`, `metric_name`, optional `race_column`/`year_column`) and cross-field validation; upload route accepts mapping form fields, parses file (to_thread), returns structured 422 on parse errors, and bulk-inserts normalized rows into `uploaded_datasets` within the same transaction.
Explore API `src/d4bl/app/api.py`	Added `GET /api/explore/staff-uploads/available` (list approved uploads) and `GET /api/explore/staff-uploads` (ExploreResponse-shaped aggregated data for an approved upload), with SQL aggregation and parameter validation.
Dependency `pyproject.toml`	Added `openpyxl>=3.1` for XLSX parsing.
Backend tests & fixtures `tests/conftest.py`, `tests/test_datasource_processing.py`, `tests/test_upload_api.py`, `tests/test_explore_api.py`, `tests/test_settings.py`	New unit/integration tests for validation/coercion, parsing, schema validation, upload flow, explore endpoints; added `make_xlsx_bytes` fixture and updated settings test.
Admin UI: upload & review `ui-nextjs/components/admin/UploadDataSource.tsx`, `ui-nextjs/components/admin/ReviewDetail.tsx`	Upload form adds mapping inputs, conditional year input, structured 422 error formatting; review detail shows mapping and parsed preview rows and hides mapping/preview metadata from generic list.
Explore frontend integration `ui-nextjs/app/explore/page.tsx`, `ui-nextjs/components/explore/StaffDatasetPicker.tsx`, `ui-nextjs/components/explore/MetricFilterPanel.tsx`, `ui-nextjs/lib/explore-config.ts`	Adds `staff-uploads` data source config, `StaffDatasetPicker` component, persisted `uploadId` in filters, conditional data loading and filter behavior for staff uploads, and `ExploreFilters` extended with `uploadId`.
Guide content `ui-nextjs/app/guide/page.tsx`	Updated contributor guide to require mapping fields, describe immediate parsing/validation, admin review preview, and staff-uploads availability in /explore.

Sequence Diagram(s)

sequenceDiagram
    participant User as Staff Contributor
    participant Client as Browser
    participant API as Backend API
    participant Parser as Datasource Parser
    participant DB as Database

    User->>Client: Upload CSV/XLSX + mapping
    Client->>API: POST /api/admin/uploads/datasource (file, mapping...)
    API->>Parser: parse_datasource_file(bytes, ext, MappingConfig)
    Parser->>Parser: read_csv_bytes / read_xlsx_bytes
    Parser->>Parser: validate headers, normalize rows (FIPS, numeric, year)
    Parser->>Parser: apply thresholds (FIPS %, numeric %, min rows)
    alt Validation fails
        Parser-->>API: DatasourceParseError with detail
        API-->>Client: 422 with structured error.detail
    else Success
        Parser-->>API: ParseResult (normalized_rows, preview)
        API->>DB: BEGIN
        API->>DB: INSERT uploads (pending_review + metadata)
        API->>DB: BULK INSERT uploaded_datasets (normalized rows as jsonb)
        API->>DB: COMMIT
        API-->>Client: 200 OK (upload recorded)
    end

sequenceDiagram
    participant User as End User
    participant Client as Browser
    participant API as Backend API
    participant DB as Database

    User->>Client: Open /explore, select "Staff Uploads"
    Client->>API: GET /api/explore/staff-uploads/available
    API->>DB: SELECT uploads WHERE upload_type='datasource' AND status='approved'
    DB-->>API: [{upload_id, metric_name, has_race, row_count, ...}]
    API-->>Client: list of available staff uploads
    User->>Client: Choose dataset + filters (state/race/year)
    Client->>API: GET /api/explore/staff-uploads?upload_id=...&state_fips=...&race=...&year=...
    API->>DB: SELECT aggregated values FROM uploaded_datasets WHERE upload_id=... AND filters...
    DB-->>API: aggregated rows, national_average, available_* values
    API-->>Client: ExploreResponse-shaped payload
    Client->>Client: Render map/chart based on response

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

feat: staff contributor guide — upload UI, review queue, and /guide page #189: Prior work introducing the initial upload/review endpoints and upload workflow that this PR extends with parsing, normalization, and explore integration.
Explore: extend DataSourceConfig with descriptions, hasData, and hide empty tabs #113: Changes to explore-config.ts adding data source configuration patterns similar to the new staff-uploads entry.
Explore: add description banner, metric tooltips, and skeleton loading #116: Explore UI updates affecting app/explore/page.tsx and filter behavior that overlap with this PR's frontend integration.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 18.87% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: process approved data source uploads into /explore (`#190`)' accurately describes the main change: implementing processing and integration of staff-uploaded datasources into the explore interface.
Linked Issues check	✅ Passed	The PR implements all major coding objectives from `#190`: parsing CSV/XLSX uploads at submit time with column mapping validation, normalizing rows into uploaded_datasets, adding staff-uploads tab on /explore with dataset picker, and updating guide copy.
Out of Scope Changes check	✅ Passed	All changes directly support the linked issue objectives: new datasource processing package, API endpoints, upload flow updates, frontend explore integration, and guide updates are all in-scope for `#190`.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/190-datasource-pipeline

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 67df103039

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

greptile-apps · 2026-04-19T00:43:15Z

Greptile Summary

This PR ships an end-to-end staff CSV/XLSX upload pipeline: files are parsed, validated, and normalized at submit time; rows land in uploaded_datasets in the same transaction as the Upload row; and a new "Staff Uploads" tab on /explore surfaces approved datasets through the existing map/chart/table components. The architecture is sound — parse-on-upload with a pure-function validation layer is the right call, and reusing the existing ExploreResponse shape keeps the frontend changes minimal.

Key findings:

P1 – Infinity values crash json.dumps with a 500: coerce_numeric blocks NaN but allows float("Infinity"). Those values propagate to json.dumps(row) in upload_routes.py, which raises an unhandled ValueError (not a DatasourceParseError), returning a 500 instead of a 422.
P1 – Full file read before size gate: await file.read() buffers the whole upload into memory before the 50 MB limit is checked. Using file.read(MAX_DATASOURCE_SIZE + 1) reads at most one byte past the cap.
P1 – Race filter restored to 'total' on page reload for race-less staff datasets: resolveInitialState uses the static source.hasRace (always true for staff-uploads) to set the race default when persisted.race is null. A race-column-less dataset will return zero rows after a page refresh.
P2 – Empty sourceUrl for staff-uploads renders a dead "Learn more" link that reloads the page.
P2 – No explicit guard for a header-only file (0 data rows) before the MIN_VALID_ROWS check produces a confusing error message.

Confidence Score: 3/5

Not safe to merge as-is: two backend bugs (Infinity crash + pre-read size check) and one frontend bug (race filter reset on reload) need fixing before production use.

Three P1 issues were found. The Infinity-in-json.dumps bug turns a legitimate 422 into an unhandled 500, the full-file-read-before-size-check is a DoS-adjacent pattern, and the race filter page-reload regression silently breaks the explore view for race-less staff datasets. The rest of the implementation — parse pipeline, DB transaction, explore endpoints, test coverage — is well-built. Fixing these three items should be straightforward and bring the PR to merge-ready.

src/d4bl/services/datasource_processing/validation.py (Infinity guard), src/d4bl/app/upload_routes.py (size check ordering), ui-nextjs/app/explore/page.tsx (race filter initialization)

Important Files Changed

Filename	Overview
src/d4bl/services/datasource_processing/validation.py	Pure coercion helpers — solid overall, but `coerce_numeric` allows `Infinity`/`-Infinity` which will crash `json.dumps` downstream (unhandled 500).
src/d4bl/services/datasource_processing/parser.py	Well-structured parse pipeline with quality gates; `MIN_VALID_ROWS` path for header-only files produces a slightly confusing error message but is functionally correct.
src/d4bl/app/upload_routes.py	Parse-on-upload route is clean and transactional, but `await file.read()` buffers the full upload into memory before the size check is applied.
src/d4bl/app/api.py	Two new explore endpoints for staff-uploads aggregate JSONB rows by state/race/year and list approved datasets — both well-scoped and correctly gated by auth + status='approved'.
ui-nextjs/app/explore/page.tsx	Staff-uploads tab integration is well-structured; contains a race-filter initialization bug where persisted `null` race gets promoted to `'total'` on page reload for datasets without race columns.
ui-nextjs/components/explore/StaffDatasetPicker.tsx	Clean dataset picker with proper cancellation and error handling.
ui-nextjs/lib/explore-config.ts	Staff-uploads `DataSourceConfig` entry is correct except `sourceUrl: ""` renders a non-functional "Learn more" link that navigates to the current page.
tests/test_datasource_processing.py	Comprehensive unit tests for validation, parsing, and integration paths; notably missing a test for Infinity input to `coerce_numeric`.
src/d4bl/app/schemas.py	New `DataSourceUploadRequest` schema correctly validates source_name and metric_name with existing validators; clean addition.

Sequence Diagram

sequenceDiagram
    participant C as Contributor (browser)
    participant API as FastAPI /api/admin/uploads/datasource
    participant Parser as datasource_processing.parser
    participant DB as PostgreSQL

    C->>API: POST multipart (file + MappingConfig form fields)
    API->>API: validate file ext + size
    API->>API: validate DataSourceUploadRequest schema
    API->>Parser: parse_datasource_file(content, ext, mapping) [thread]
    Parser->>Parser: read_csv_bytes / read_xlsx_bytes
    Parser->>Parser: _check_columns_exist
    Parser->>Parser: _normalize_rows (FIPS/numeric/year coercion + drop tracking)
    Parser->>Parser: quality gates (bad_fips ratio, numeric ratio, MIN_VALID_ROWS)
    Parser-->>API: ParseResult (normalized_rows, preview_rows, dropped_counts)
    API->>DB: BEGIN txn INSERT uploads + bulk INSERT uploaded_datasets chunks
    DB-->>API: COMMIT
    API-->>C: 200 UploadResponse

    note over C,DB: Admin approval (status flip only)
    C->>API: PATCH /api/admin/uploads/{id}/review
    API->>DB: UPDATE uploads SET status=approved
    DB-->>API: ok
    API-->>C: status approved

    note over C,DB: Explore
    C->>API: GET /api/explore/staff-uploads?upload_id=...
    API->>DB: SELECT from uploaded_datasets JOIN uploads WHERE status=approved
    DB-->>API: aggregated rows AVG by state_fips/race/year
    API-->>C: ExploreResponse

Comments Outside Diff (1)

ui-nextjs/app/explore/page.tsx, line 82-91 (link)

Null race persisted for staff-uploads gets promoted to 'total' on page reload, causing empty results

When a user selects a staff-uploads dataset without a race column, StaffDatasetPicker.onChange correctly resets race to null. persistFilters saves race: null. On the next page load resolveInitialState evaluates:
```
race: persisted.race ?? (source.hasRace ? 'total' : null)
```
Because staff-uploads has hasRace: true in DATA_SOURCES, the ?? fallback fires and sets race = 'total'. The subsequent data fetch then includes race=total in the query params, but no rows in a race-column-less dataset have race = 'total' (they store null), so the explore view renders empty even though data exists.

A targeted fix: use the dataset-level has_race flag (available via activeUploadSummary) to decide the race default, rather than the static source-level hasRace:
```
race: persisted.race ?? (
  source.key === 'staff-uploads'
    ? null                          // pick race after dataset loads
    : (source.hasRace ? 'total' : null)
),
```

_{Reviews (1): Last reviewed commit: "test(settings): isolate task model defau..." | Re-trigger Greptile}

coderabbitai

Actionable comments posted: 8

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/d4bl/app/schemas.py`:
- Around line 642-647: Add the same whitespace-normalizing and non-empty
validation used for other required string fields to the geo_column and
metric_value_column fields so whitespace-only values are rejected; locate the
model/class in src/d4bl/app/schemas.py (the class that declares geo_column,
metric_value_column, metric_name, etc.) and add validators (matching the pattern
used for source_name) that strip surrounding whitespace and raise a validation
error if the result is empty or only whitespace, ensuring consistent downstream
behavior.

In `@src/d4bl/app/upload_routes.py`:
- Around line 94-99: Make data_year optional when a year_column exists: change
the route parameter signature so data_year is Optional[int] = Form(None) instead
of required, and add validation in the same endpoint (using the parameter names
data_year and year_column) to raise an HTTPException if year_column is
None/empty and data_year is still None. Keep existing behavior that
MappingConfig.data_year remains a fallback for files without a year column, and
ensure any downstream code that uses data_year handles the optional type (e.g.,
fall back to MappingConfig.data_year or abort with the same validation message).

In `@src/d4bl/services/datasource_processing/parser.py`:
- Around line 58-61: The except block that catches StopIteration and raises
DatasourceParseError should preserve or intentionally suppress exception
chaining; update the raise to include an explicit "from None" (i.e., raise
DatasourceParseError("file has no header row") from None) so the StopIteration
context is not leaked; modify the try/except around next(reader) where
raw_header is set in parser.py accordingly.
- Around line 79-101: The workbook opened with load_workbook (variable wb) is
not explicitly closed; ensure wb.close() is always called to release resources
by wrapping workbook usage in a try/finally (create wb, then try: use
ws/rows_iter/raw_header/rows and return; finally: wb.close()) or use
contextlib.closing(wb) as a context manager so that wb.close() runs even on
errors or early returns.

In `@tests/test_datasource_processing.py`:
- Around line 76-86: Add a test to ensure coerce_year rejects boolean inputs:
update the TestCoerceYear test suite to include True and False (e.g., via
pytest.mark.parametrize or a new test method) and assert that calling
coerce_year(True) and coerce_year(False) raises ValueError; target the
coerce_year function so the validation branch that explicitly rejects booleans
(validation.py handling) is covered.
- Around line 54-73: Add unit tests to cover native numeric passthrough for
coerce_numeric by asserting that passing an int (e.g., 42) and a float (e.g.,
14.3) returns the same numeric values (42.0 or 42 and 14.3 respectively,
matching existing behavior); place these new assertions alongside the existing
TestCoerceNumeric tests so they exercise the logic in coerce_numeric that
handles int/float inputs directly.

In `@ui-nextjs/app/explore/page.tsx`:
- Around line 362-371: When handling dataset switches in the StaffDatasetPicker
onChange handler (the callback that calls setActiveUploadSummary and
setFilters), also clear filters.metric and clear the selectedState so leftover
metric or state from the previous upload doesn't mismatch the new dataset;
update the setFilters call that currently resets uploadId, race, and year to
also set metric: null (or a default) and ensure you call the state setter for
selectedState (e.g., setSelectedState(null)) so the UI, legend, detail card, and
chart reflect the newly selected dataset.

In `@ui-nextjs/components/admin/UploadDataSource.tsx`:
- Around line 329-337: The inputs bound to raceColumn/setRaceColumn (and the
similar yearColumn/setYearColumn input) lack accessible labels; add explicit
label elements tied to each input by adding unique id attributes (e.g.,
id="race-column" and id="year-column") on the inputs and corresponding <label
htmlFor="..."> elements that describe the field (or use a visually-hidden class
if you don't want visible text), ensuring the label text conveys purpose (e.g.,
"Race column" / "Year column") and keeping existing required/placeholder
behavior.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 93ed8b5d-e7bb-4c15-a434-822e647dc79d

📥 Commits

Reviewing files that changed from the base of the PR and between 1fe44d5 and 67df103.

📒 Files selected for processing (21)

docs/superpowers/plans/2026-04-18-datasource-upload-pipeline.md
docs/superpowers/specs/2026-04-18-datasource-upload-pipeline-design.md
pyproject.toml
src/d4bl/app/api.py
src/d4bl/app/schemas.py
src/d4bl/app/upload_routes.py
src/d4bl/services/datasource_processing/__init__.py
src/d4bl/services/datasource_processing/parser.py
src/d4bl/services/datasource_processing/validation.py
tests/conftest.py
tests/test_datasource_processing.py
tests/test_explore_api.py
tests/test_settings.py
tests/test_upload_api.py
ui-nextjs/app/explore/page.tsx
ui-nextjs/app/guide/page.tsx
ui-nextjs/components/admin/ReviewDetail.tsx
ui-nextjs/components/admin/UploadDataSource.tsx
ui-nextjs/components/explore/MetricFilterPanel.tsx
ui-nextjs/components/explore/StaffDatasetPicker.tsx
ui-nextjs/lib/explore-config.ts

- Reject non-finite numerics before json serialization; pad 4-digit FIPS - Enforce max upload size with bounded read; optional data_year when year_column set - Pydantic: non-blank geo/metric columns; year from data_year or year_column - Parser: empty data rows, StopIteration chains, close XLSX workbooks in finally - Explore: staff-uploads persistence race default; clear metric/state on dataset change - Admin upload form: accessible labels; omit data_year when per-row year column - Staff picker: nullable data_year; staff-uploads learn-more links to /guide Made-with: Cursor

William-Hill · 2026-04-19T01:47:14Z

Review follow-up (commit `d4083ab`)

Addressed Greptile / Codex / CodeRabbit items:

validation: coerce_numeric rejects non-finite values (inf / -inf / nan); derive_state_fips pads 4-digit all-numeric FIPS (Excel leading-zero loss).
upload_routes: first read capped at MAX_DATASOURCE_SIZE + 1 before rejecting oversize files.
schemas: geo_column / metric_value_column must be non-blank after strip; optional race_column / year_column stripped; either data_year or year_column required (data_year optional on form when year column is used).
parser: explicit error for header-only files (no_data_rows); StopIteration raised without chaining noise; wb.close() in finally for XLSX reads.
explore: persisted filters for staff-uploads no longer default race to total when null; staff dataset change clears metric and selectedState; Learn more uses internal /guide for staff-uploads without target=_blank when URL is relative.
explore-config: staff-uploads sourceUrl → /guide.
UploadDataSource: labels for race/year inputs; conditional constant data year + form omits data_year when a year column is mapped.
StaffDatasetPicker: data_year typed as nullable; label shows multi-year when absent.
tests: coverage for the above.

CI: ruff check, pytest tests/ (excluding live Ollama integration), npm run lint, npx tsc --noEmit, npm run build all pass locally.

Please re-run checks and resolve review threads if this matches your expectations.

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

ui-nextjs/app/explore/page.tsx (1)
151-153: ⚠️ Potential issue | 🟠 Major

Clear loading on the no-session early return.

If auth flips while a request is in flight, the aborted request skips the finally unset, and the next invocation returns here with loading still true. That leaves the explore view stuck behind its skeleton/overlay until a remount.
💡 Proposed fix
   const fetchData = useCallback(async (signal: AbortSignal) => {
-    if (!session?.access_token) return;
+    if (!session?.access_token) {
+      setLoading(false);
+      setExploreData(null);
+      setBills([]);
+      return;
+    }
Based on learnings: PolicyExploreView intentionally resets loading on the missing-auth early return because an abort during auth change can otherwise leave the spinner stuck indefinitely.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ui-nextjs/app/explore/page.tsx` around lines 151 - 153, The fetchData async
callback (fetchData(signal: AbortSignal)) can early-return when
session?.access_token is missing but leaves the loading flag true if a previous
request was aborted; update fetchData to explicitly clear the loading state
before returning on the no-session path (e.g., call the same
setLoading(false)/resetLoading used in PolicyExploreView’s missing-auth
handling) so the explore skeleton/overlay is not left visible after auth flips
mid-request.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/d4bl/services/datasource_processing/parser.py`:
- Around line 203-206: The unsupported-extension error raised in parser.py (the
DatasourceParseError instance in the branch that checks file extension) sets
detail={"allowed": sorted(SUPPORTED_EXTS)} but omits a human-friendly "message"
key; update the raise to include a readable message in the detail payload (e.g.,
include "message": f"unsupported file type {ext!r}") so UI fallbacks get a
friendly string alongside the "allowed" list while keeping the existing error
text and keys.
- Around line 146-150: The persisted geo_fips value is taken directly from raw
text and can lose leading zeros (e.g., Excel-stored 01001 -> "1001"); update the
logic where geo_fips is assigned (the geo_fips variable populated from
mapping.geo_column near derive_state_fips) to canonicalize using the recovered
state_fips: after calling derive_state_fips(geo_raw), if state_fips exists and
geo_fips is digits-only and its length is shorter than the full FIPS length
(state_fips length + county code length), left-pad the numeric portion with
zeros to produce a canonical 5-digit county FIPS (state_fips + county.zfill(3))
and assign that back to geo_fips; apply the same canonicalization in the second
occurrence around the block referenced (lines ~179-185) so stored geo_fips
always preserves leading zeros.

In `@src/d4bl/services/datasource_processing/validation.py`:
- Around line 42-47: The helper pads county FIPS when len(s)==4 but misses tract
FIPS dropped to len(s)==10 and also silently accepts bad lengths; update the
logic around variable s so that you also pad when len(s)==10 (prepend "0") in
addition to the existing len==4 and len==1 cases, then validate the final length
and raise an error (or return a failure) for any s whose length is not one of
the expected canonical lengths (2, 5, or 11) before returning s[:2]; keep the
final return of s[:2] but ensure invalid inputs are rejected instead of
truncated.

In `@ui-nextjs/app/explore/page.tsx`:
- Around line 369-380: When handling StaffDatasetPicker's onChange, also clear
the upload-scoped sentinel and previous results: set
didAutoSelectDefaults.current = false and reset exploreData (via
setExploreData(null) or the relevant setter) immediately when switching uploads,
in addition to setActiveUploadSummary and setFilters so the new upload doesn't
inherit the old auto-select state or render stale exploreData if the new fetch
fails.

---

Outside diff comments:
In `@ui-nextjs/app/explore/page.tsx`:
- Around line 151-153: The fetchData async callback (fetchData(signal:
AbortSignal)) can early-return when session?.access_token is missing but leaves
the loading flag true if a previous request was aborted; update fetchData to
explicitly clear the loading state before returning on the no-session path
(e.g., call the same setLoading(false)/resetLoading used in PolicyExploreView’s
missing-auth handling) so the explore skeleton/overlay is not left visible after
auth flips mid-request.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9c824797-9c65-49c7-ac57-59a0bdf0b94a

📥 Commits

Reviewing files that changed from the base of the PR and between 67df103 and d4083ab.

📒 Files selected for processing (10)

src/d4bl/app/schemas.py
src/d4bl/app/upload_routes.py
src/d4bl/services/datasource_processing/parser.py
src/d4bl/services/datasource_processing/validation.py
tests/test_datasource_processing.py
tests/test_upload_api.py
ui-nextjs/app/explore/page.tsx
ui-nextjs/components/admin/UploadDataSource.tsx
ui-nextjs/components/explore/StaffDatasetPicker.tsx
ui-nextjs/lib/explore-config.ts

William Hill added 20 commits April 18, 2026 14:19

docs: add design spec for data source upload pipeline (#190)

2c87b45

docs: add implementation plan for data source upload pipeline (#190)

5a3dbd6

feat(deps): add openpyxl and scaffold datasource_processing package

87e60f6

Made-with: Cursor

feat(datasource): add validation + coercion helpers

244caf2

Made-with: Cursor

feat(datasource): add MappingConfig + DatasourceParseError

b857abd

Made-with: Cursor

feat(datasource): add CSV reader

c5a4d73

Made-with: Cursor

feat(datasource): add XLSX reader

d75ef3c

Made-with: Cursor

feat(datasource): add parse_datasource_file orchestrator

f29e1cb

Made-with: Cursor

feat(schemas): add mapping fields to DataSourceUploadRequest

306e395

Made-with: Cursor

feat(uploads): parse+validate+persist datasource uploads

bdc2c0b

Made-with: Cursor

test(uploads): pin datasource approval is a pure status flip

0b65a41

Made-with: Cursor

feat(explore): add staff-uploads/available endpoint

f8f0263

Made-with: Cursor

feat(explore): add staff-uploads main endpoint

d955da5

Made-with: Cursor

feat(upload-ui): add column mapping fields to datasource form

a300d08

Made-with: Cursor

feat(review-ui): datasource-aware mapping + preview rendering

367c979

Made-with: Cursor

feat(explore): add staff-uploads DataSourceConfig

e779a46

Made-with: Cursor

feat(explore): add StaffDatasetPicker component

9dfdd33

Made-with: Cursor

feat(explore-ui): wire staff-uploads picker with conditional race filter

b7844a1

Made-with: Cursor

docs(guide): describe shipped datasource upload pipeline

6a941bc

Made-with: Cursor

test(settings): isolate task model defaults from host env

67df103

Made-with: Cursor

chatgpt-codex-connector Bot reviewed Apr 19, 2026

View reviewed changes

Comment thread src/d4bl/services/datasource_processing/validation.py

Comment thread ui-nextjs/app/explore/page.tsx

greptile-apps Bot reviewed Apr 19, 2026

View reviewed changes

Comment thread src/d4bl/services/datasource_processing/validation.py

Comment thread src/d4bl/app/upload_routes.py Outdated

Comment thread ui-nextjs/lib/explore-config.ts

Comment thread src/d4bl/services/datasource_processing/parser.py

coderabbitai Bot requested changes Apr 19, 2026

View reviewed changes

Comment thread src/d4bl/services/datasource_processing/parser.py

Comment thread src/d4bl/services/datasource_processing/parser.py

Comment thread src/d4bl/services/datasource_processing/validation.py

Comment thread ui-nextjs/app/explore/page.tsx

coderabbitai Bot approved these changes Apr 19, 2026

View reviewed changes

William-Hill merged commit cd660a5 into main Apr 19, 2026
4 checks passed

William-Hill deleted the feat/190-datasource-pipeline branch April 19, 2026 03:38

Conversation

William-Hill commented Apr 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Spec + plan

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (1)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

William-Hill commented Apr 19, 2026

Review follow-up (commit d4083ab)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

William-Hill commented Apr 19, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 19, 2026 •

edited

Loading

greptile-apps Bot commented Apr 19, 2026 •

edited

Loading

Review follow-up (commit `d4083ab`)