ferologics
diff --git a/‎AGENTS.md‎
Lines changed: 1 addition & 0 deletions b/‎AGENTS.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎README.md‎
Lines changed: 2 additions & 1 deletion b/‎README.md‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎TODO.md‎
Lines changed: 27 additions & 89 deletions b/‎TODO.md‎
Lines changed: 27 additions & 89 deletions
diff --git a/‎docs/README.md‎
Lines changed: 26 additions & 25 deletions b/‎docs/README.md‎
Lines changed: 26 additions & 25 deletions
diff --git a/‎docs/adr/0001-runtime-g2-decision.md‎
Lines changed: 3 additions & 3 deletions b/‎docs/adr/0001-runtime-g2-decision.md‎
Lines changed: 3 additions & 3 deletions
@@ -21,6 +21,7 @@ mise run check-ci           # fast CI gate (format-check + lint + compile + fast
 mise run check-full         # full gate (check + live regressions)
 mise run test-regressions         # live app regression suite
 mise run test-regressions-if-needed # run live regressions only when impacted files changed
+mise run test-scope-negative # scope-negative harness suite
 mise run install-git-hooks  # reinstall pre-commit/pre-push hooks
 mise run tauri-dev          # run app
 mise run runtime-build      # build VM runtime pack
 
@@ -38,6 +38,7 @@ mise run test
 mise run test-vite
 mise run test-rust
 mise run test-regressions
+mise run test-scope-negative
 ```
 
 ```bash
@@ -73,7 +74,7 @@ mise run test-stop
 Scope enforcement suite:
 
 ```bash
-./scripts/harness/path-i-lite-negative.sh
+mise run test-scope-negative
 ```
 
 ### Cleanup
 
@@ -1,110 +1,48 @@
 # TODO
 
-> Execution sequencing for cleanup work lives in `docs/cleanup-execution-plan.md` (execution archive + remaining closeout checklist).
+## Now
 
-## Now: Foundation cleanup
-
-- [ ] **Testing sanity gate (P0)** — define and ship a real automated regression suite for the liftoff path (Vitest + Rust), keep daily checks fast (`mise run check`) while enforcing live regressions in `mise run check-full`, and stop relying on ad-hoc/manual shell-harness runs for core correctness.
-  - [x] Add a machine-readable `state_snapshot` contract for deterministic assertions (no log-grep testing).
-  - [x] Add first gated Vitest regression: reopen cwd correctness (`/mnt/workdir...`) on folder-bound task reopen.
-  - [x] Add remaining gated Vitest regressions: folder-bind continuity (no UI reset), working-folder panel refresh on folder change, and runtime-mismatch badge rules.
-  - [x] Add one sequential live journey canary (messages + models + workdir + artifacts + reopen), while keeping focused canaries for isolated invariants.
-  - [x] Add CI enforcement for both gates (`check` on PRs, `check-full` required before merge).
-  - [x] Remove low-signal/noise tests and stale assertions (smoke test, legacy session-file assumptions, noisy debug logs).
-  - [x] Add `mise run test-regressions` task (currently runs live-app regression tests).
-  - [x] Split gates: `mise run check` (fast) and `mise run check-full` (includes regressions).
-  - [x] Add path-aware regression gating for push/CI so live tests only run when integration-impacting files changed.
-  - [ ] Burn in the CI/hook split gate behavior (`check-full` skip/run on path filters + `PIWORK_FORCE_CHECK_FULL=1` local pre-push path), then mark this parent item done.
-  - [x] Add fast protocol guardrail (`mise run audit-protocol`) via Vitest contract tests.
-- [ ] **CI/local gate alignment + cache policy cleanup (P0 ergonomics)** — reduce path-detection magic and keep CI/local gating behavior predictable.
-  - Decide whether docs-only changes should run a no-op gate vs lightweight docs validation.
-  - Keep integration trigger lists explicit and scoped to runtime-impacting paths (CI + local pre-push should stay in sync).
-  - Reconcile Rust cache strategy (`rust-cache` sharing between jobs, optional sccache persistence experiment) and keep observable metrics in logs.
-  - Revisit trigger: after 5–10 mixed pushes (docs-only, frontend-only, rust/integration) with expected run/skip behavior and predictable cache hit patterns.
 - [ ] **Reactive model/bootstrap sequencing (P0 stability)** — move runtime model setup from timeout-driven polling to explicit task readiness states.
   - Add a per-task child-command queue in `taskd` so bootstrap `set_model` and first `get_available_models` are serialized.
   - Expose bootstrap readiness/error in `runtime_get_state` and gate UI model-fetch requests on that signal.
   - Keep timeout values only as fallback safety rails, not primary control flow.
-  - Revisit trigger: after timeout tuning lands and CI regressions are green for several consecutive runs, implement this to reduce remaining flake/time spent in wait loops.
-- [x] **Kill v1 runtime** — remove `PIWORK_RUNTIME_V2_TASKD` flag, v1 code paths in runtimeService (`handleTaskSwitchV1`, `handleFolderChangeV1`, `ensureTaskSessionReady`), v1 `nc -l` loop in init script, `RuntimeMode` type. taskd is the only runtime.
-- [x] **Enforce V2-only host protocol** — removed legacy host request handling in taskd, host parser is strict `{ id, type, payload }`, and RuntimeService only resolves pending RPCs from taskd V2 response envelopes.
-- [x] **Finalize runtime naming cleanup (P0)** — dropped `handleV2*`/`sendV2*` helper names, removed the `__legacy__` sentinel path from runtime/UI mismatch logic, and switched to neutral runtime envelope naming.
-- [x] **Extract init script** — move the heredoc out of `mise-tasks/runtime-build` into `runtime/init.sh`
-- [x] **Fix context pollution** — infrastructure bash commands (grep mount check, mkdir, session writes) go through pi's RPC and pollute the agent's conversation. Add `system_bash` to taskd that bypasses pi sessions, or do checks in taskd before spawning pi.
-- [x] **Simplify auth/settings** — strip Settings modal to: show current auth status + "Import from pi" button. Kill multi-profile UI. For MVP: baked auth or `~/.pi/agent/auth.json` import.
 - [ ] **Proper auth MVP (P0)** — make auth first-class for non-existing pi setups: working OAuth `/login` flow and provider API key entry in Settings; keep "Import from pi" as convenience, not primary path.
-- [x] **Lock working folder after first bind** — `workingFolder` supports one-time bind (`null -> path`), then becomes immutable for that task; use a new task for a different folder.
-- [x] **Define task artifact persistence contract** — documented in `docs/task-artifact-contract.md` (`outputs` writable, `uploads` read-only, Scratchpad aggregates both).
-- [x] **Implement artifact contract in runtime/UI** — enforce one-time folder bind, surface Scratchpad from `outputs` + `uploads`, and apply uploads read-only policy.
 - [ ] **Add file import UX (P0)** — support importing local files into task `uploads`, then show/preview them in Scratchpad immediately while keeping uploads read-only after import.
 - [ ] **Untangle auth state from runtime artifacts** — keep auth storage purpose clear; avoid mixing credentials with unrelated pi/session artifacts.
-- [ ] **Fix sendLogin optimistic log** — logs `[info] Sent /login` even if not connected
-- [x] **Fix opener permission path** — added `opener:allow-open-path` capability so `Open in Finder` is authorized.
-- [x] **Fix right-panel error isolation** — Working-folder action errors are now scoped to the Working folder card.
-- [x] **Fix first `/mnt/workdir` write reliability (race mitigation)** — task-bound folder changes now mark `taskSwitching` before validation, and prompt send is blocked until runtime is ready.
-- [x] **Add harness regression for working-folder writes** — set folder → write file immediately → assert host path has file.
-- [x] **Add harness check for open-folder action** — validate Working-folder header icon opens Finder path successfully.
+- [ ] **Fix sendLogin optimistic log** — logs `[info] Sent /login` even if not connected.
 - [ ] **Inject minimal FS runtime hint into prompts** — include working-folder host path + `/mnt/workdir` alias + scratchpad path, and refresh when folder is bound later (not just at startup).
-- [x] **Fix dev cwd chip staleness on task reopen** — reopen now validates persisted working folder before runtime prep, then refreshes on `task_ready`, so cwd settles to `/mnt/workdir...` instead of sticking at `/mnt/taskstate/.../outputs`.
-- [ ] **Delete remaining slop (P0)** — review docs and code for stale references to v1/v2/legacy naming, old sync protocol language, and obsolete smoke-suite assumptions.
-  - [x] Removed runtime/taskd `handleV2*`/`sendV2*` helper naming + `__legacy__` mismatch sentinel path.
-  - [x] Removed unused UI leftovers: `src/lib/components/ProviderList.svelte`, `src/lib/utils/notice.svelte.ts`.
-- [ ] **Script hygiene pass (P0)** — reduce `scripts/` sprawl by making `mise` the single entrypoint for dev/test ops, moving one-off experiments to `scripts/lab/`, and deleting wrappers not used by `mise` or CI.
-  - [x] Moved MITM spike scripts to `scripts/lab/` and updated docs pointers.
-- [ ] **Dev watch scope (P0)** — avoid restarting `tauri dev` for non-runtime docs/content edits (e.g. Markdown), keep hot reload scoped to relevant source/config files.
-- [ ] **Close out cleanup execution plan (P0)** — finish remaining PR-5 leftovers (naming consistency, slop purge, script/watch hygiene), then mark `docs/cleanup-execution-plan.md` closed.
-- [x] **Roadmap sync hygiene** — synced `docs/ui-roadmap.md` with current `TODO.md` execution state (2026-02-07).
 
-## Next: Make it usable
+## Next
 
-- [x] **Model picker realism (no fake fallback)** — removed hardcoded fallback model lists in runtime/UI; model picker now uses runtime-reported models only.
-- [x] **Model availability empty/error state** — picker now shows explicit loading/empty/error states and disables selection when unavailable.
 - [ ] **Model scope toggle in Settings** — add `Preferred only` (default shortlist we define) vs `All available` filtering for model picker results.
-- [x] **Persist model selection to task metadata** — picker updates now persist `{ provider, model }` on task metadata so switching/reopening tasks restores model intent.
-- [x] **Finish auth profile cull for MVP** — removed profile switching plumbing; runtime/test/auth paths are standardized on the default profile.
 - [ ] **Markdown rendering** — render agent responses (bold, lists, code blocks). Biggest UX gap.
-- [ ] **Tool call display** — collapsible "Created a file ›", "Ran command ›" in message stream
-- [ ] **Interruptible composer (steering + follow-ups)** — deferred spec captured in `docs/followup-steering-spec.md` (`Enter` while running = steering, `Option+Enter` = queue follow-up, `Option+Up` = recall queued draft, `Esc`/Stop button = interrupt). Revisit after Markdown + tool-call display stabilize.
-- [x] **Right panel IA pass** — replace “Downloads” with “Working folder” card semantics (dynamic title = folder basename when set), clear empty states, and open-in-Finder affordance.
-- [x] **Move Working-folder open action to header** — icon-only action is now in card header (left of chevron), body button removed.
-- [x] **Scratchpad continuity** — keep Scratchpad visible for every task and aggregate artifacts from both `outputs` and `uploads`.
-- [x] **Artifact explorer parity** — make file listing/preview behavior consistent across working-folder and no-folder tasks, including uploads read-only behavior.
-- [x] **Auto-refresh artifact panels** — Scratchpad now refreshes on `tool_execution_end` / `turn_end` / `agent_end` events (manual refresh still available).
-- [x] **Working-folder file visibility** — Working-folder card now lists files and updates from runtime events.
+- [ ] **Tool call display** — collapsible "Created a file ›", "Ran command ›" in message stream.
+- [ ] **Interruptible composer (steering + follow-ups)** — deferred spec captured in `docs/research/followup-steering-spec.md` (`Enter` while running = steering, `Option+Enter` = queue follow-up, `Option+Up` = recall queued draft, `Esc`/Stop button = interrupt). Revisit after Markdown + tool-call display stabilize.
 - [ ] **Context panel usefulness (enrichment)** — panel exists; improve it to surface active connectors/tools and task-referenced files instead of mostly static copy.
 
-## Later: Production
+## Later
 
 - [ ] **Auth hardening follow-up** — provider-by-provider `/login` reliability through VM NAT, edge-case diagnostics, and clearer failure UX after proper auth MVP ships.
+- [ ] **macOS distribution pilot (post-auth)** — once proper auth MVP is stable, publish a downloadable macOS build so external users can try Piwork without local dev setup.
+  - Revisit trigger: proper auth MVP done + runtime startup/install path is reliable for fresh machines.
+  - Scope: macOS first; Linux/Windows remain later.
 - [ ] **Multi-task runtime behavior** — define expected behavior for switching between active tasks without losing running session state (foreground/background semantics, status visibility, resume behavior).
-- [ ] **Runtime download** — first-run pack download for non-dev users
-- [ ] **Bundle pi** — include pi in runtime pack instead of copying from global npm
-- [ ] **Onboarding** — first-run experience that doesn't require `mise run runtime-build`
-- [ ] **Settings cleanup** — audit settings surface and remove dead/low-value controls
-
-## Later: Polish
-
-- [ ] **Doc cleanup** — consolidate stale docs, kill anything that doesn't match reality
-- [ ] **Code cleanup** — deep pass, remove slop, consistent patterns
-- [ ] **Task title editing** — editable at top of conversation
-- [ ] **Progress indicators** — checkmarks/status hints in right panel for task progress
-- [ ] **Profile chip** — bottom-left identity/plan/status chip
-- [ ] **Empty state polish** — shuffleable task categories + "See more ideas" + richer task tiles like Cowork
+- [ ] **Runtime download** — first-run pack download for non-dev users.
+- [ ] **Bundle pi** — include pi in runtime pack instead of copying from global npm.
+- [ ] **Onboarding** — first-run experience that doesn't require `mise run runtime-build`.
+- [ ] **Settings cleanup** — audit settings surface and remove dead/low-value controls.
+- [ ] **Doc cleanup** — consolidate stale docs, kill anything that doesn't match reality.
+- [ ] **Code cleanup** — deep pass, remove slop, consistent patterns.
+- [ ] **Task title editing** — editable at top of conversation.
+- [ ] **Progress indicators** — checkmarks/status hints in right panel for task progress.
+- [ ] **Profile chip** — bottom-left identity/plan/status chip.
+- [ ] **Empty state polish** — shuffleable task categories + "See more ideas" + richer task tiles like Cowork.
 - [ ] **Progress model v2 (non-P0)** — experiment with Cowork-style step/milestone summaries inferred from task/tool activity, with clear confidence/limitations.
-
-## Someday
-
-- [ ] Connectors (Calendar, Slack, Google Drive, Notion)
-- [ ] Clipboard + attachments (images/files with MIME-aware previews)
-- [ ] Multi-folder tasks
-- [ ] Cross-platform (Linux/Windows)
-- [ ] MITM network proxy
-- [ ] Canvas/rich artifact viewer
-- [ ] qcow2 rootfs (lower RAM)
-- [ ] Gate G2 — Gondolin vs deeper sandbox hardening (research only)
-
-## Testing
-
-- Harness primitives: `test-start`, `test-prompt`, `test-screenshot`, `test-set-folder`, `test-set-task`, `test-create-task`, `test-delete-tasks`, `test-dump-state`, `test-state-snapshot`, `test-runtime-diag`, `test-stop`, `test-open-preview`, `test-write-working-file`, `test-open-working-folder`, `test-auth-*`, `test-send-login`, `test-check-permissions`
-- Scope enforcement: `scripts/harness/path-i-lite-negative.sh`
-- Rule: primitives only, no monolithic E2E scripts
+- [ ] Connectors (Calendar, Slack, Google Drive, Notion).
+- [ ] Clipboard + attachments (images/files with MIME-aware previews).
+- [ ] Multi-folder tasks.
+- [ ] Cross-platform (Linux/Windows).
+- [ ] MITM network proxy.
+- [ ] Canvas/rich artifact viewer.
+- [ ] qcow2 rootfs (lower RAM).
+- [ ] Gate G2 — Gondolin vs deeper sandbox hardening (research only).
@@ -1,35 +1,36 @@
 # Docs Index
 
-## Runtime (source of truth)
+Status: active
+Category: canonical
+Owner: product/runtime
+Last reviewed: 2026-02-07
 
-- `runtime-taskd-plan.md` — taskd rollout status and phases
-- `runtime-taskd-rpc-spec.md` — taskd RPC contract
-- `runtime-pack.md` — VM runtime pack format and boot model
-- `pi-integration.md` — host↔VM↔pi integration overview
-- `testing-strategy.md` — test approach and harness primitives
-
-## Runtime (research / deferred)
+## Canonical (active source of truth)
 
-- `runtime-g2-architecture-spike.md` — post-MVP hardening research (Gondolin vs deeper sandbox)
-- `adr/0001-runtime-g2-decision.md` — decision record template
+- `runtime-taskd-plan.md` — taskd runtime architecture and rollout status
+- `runtime-taskd-rpc-spec.md` — host↔taskd RPC contract
+- `runtime-pack.md` — VM runtime pack format and boot model
+- `pi-integration.md` — host↔VM↔pi integration quick reference
+- `testing-strategy.md` — test strategy, harness primitives, and scope-negative runbook
+- `auth-flow.md` — authentication behavior
+- `permissions-model.md` — scoped local mode + permission policy
+- `task-artifact-contract.md` — working-folder + outputs/uploads/scratchpad contract
+- `product-direction.md` — durable product principles and strategy lanes
 
-## Product
+## Research (non-normative)
 
-- `auth-flow.md` — authentication behavior
-- `permissions-model.md` — folder access model
-- `task-artifact-contract.md` — working-folder immutability + outputs/uploads/scratchpad contract
-- `folder-artifact-implementation-plan.md` — implementation plan for one-time folder bind + scratchpad aggregation
-- `cleanup-execution-plan.md` — cleanup execution archive + remaining closeout checklist
-- `followup-steering-spec.md` — deferred spec for queued follow-ups, steering, and stop UX in the composer
-- `ui-roadmap.md` — UI direction + Cowork comparison (execution tracking is in `../TODO.md`)
+- `research/runtime-g2-architecture-spike.md` — post-MVP hardening research (Gondolin vs deeper sandbox)
+- `research/network-mitm-spike.md` — future strict-network spike notes
+- `research/followup-steering-spec.md` — deferred composer queue/steering design
+- `research/` — Cowork notes, sketches, and field intel
 
-## Supporting
+## Archive (historical)
 
-- `path-i-lite-negative-suite.md` — scope enforcement test (traversal/symlink/cross-task)
-- `network-mitm-spike.md` — future network interception notes
+- `archive/cleanup-execution-plan.md` — closed cleanup implementation plan (superseded by `../TODO.md`)
+- `archive/folder-artifact-implementation-plan.md` — closed folder/artifact implementation sequencing doc (superseded by `task-artifact-contract.md` + `../TODO.md`)
+- `archive/docs-realignment-plan.md` — completed deep-cut docs reclassification plan
+- `archive/ui-roadmap.md` — superseded directional roadmap (replaced by `product-direction.md` + `../TODO.md`)
 
-## Research
+## ADRs
 
-- `research/` — Cowork notes, sketches, field intel
-- `research/cowork-claude-runtime-intel-2026-02-06.md` — Cowork runtime observations
-- `research/sandbox-strategy.md` — cross-platform sandbox model
+- `adr/0001-runtime-g2-decision.md` — Gate G2 decision record
@@ -5,9 +5,9 @@
 - Owners: runtime/platform
 - Related:
   - `docs/runtime-taskd-plan.md`
-  - `docs/runtime-g2-architecture-spike.md`
+  - `docs/research/runtime-g2-architecture-spike.md`
   - `docs/permissions-model.md`
-  - `docs/path-i-lite-negative-suite.md`
+  - `docs/testing-strategy.md`
 
 ## Context
 
@@ -66,7 +66,7 @@ Short description:
 Evidence links:
 
 - State snapshots/screenshots/logs from runtime and Path I-lite runs in `tmp/dev/`
-- Repeatable negative suite: `docs/path-i-lite-negative-suite.md`
+- Repeatable negative suite: `docs/testing-strategy.md` (`Scope enforcement suite` section)
 
 ## Decision