Skip to content

Commit 463fa27

Browse files
committed
docs: deep-cut realignment + scope-negative cleanup πŸ“š
1 parent 1a1a768 commit 463fa27

26 files changed

+421
-286
lines changed

β€ŽAGENTS.mdβ€Ž

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ mise run check-ci # fast CI gate (format-check + lint + compile + fast
2121
mise run check-full # full gate (check + live regressions)
2222
mise run test-regressions # live app regression suite
2323
mise run test-regressions-if-needed # run live regressions only when impacted files changed
24+
mise run test-scope-negative # scope-negative harness suite
2425
mise run install-git-hooks # reinstall pre-commit/pre-push hooks
2526
mise run tauri-dev # run app
2627
mise run runtime-build # build VM runtime pack

β€ŽREADME.mdβ€Ž

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ mise run test
3838
mise run test-vite
3939
mise run test-rust
4040
mise run test-regressions
41+
mise run test-scope-negative
4142
```
4243

4344
```bash
@@ -73,7 +74,7 @@ mise run test-stop
7374
Scope enforcement suite:
7475

7576
```bash
76-
./scripts/harness/path-i-lite-negative.sh
77+
mise run test-scope-negative
7778
```
7879

7980
### Cleanup

β€ŽTODO.mdβ€Ž

Lines changed: 27 additions & 89 deletions
Original file line numberDiff line numberDiff line change
@@ -1,110 +1,48 @@
11
# TODO
22

3-
> Execution sequencing for cleanup work lives in `docs/cleanup-execution-plan.md` (execution archive + remaining closeout checklist).
3+
## Now
44

5-
## Now: Foundation cleanup
6-
7-
- [ ] **Testing sanity gate (P0)** β€” define and ship a real automated regression suite for the liftoff path (Vitest + Rust), keep daily checks fast (`mise run check`) while enforcing live regressions in `mise run check-full`, and stop relying on ad-hoc/manual shell-harness runs for core correctness.
8-
- [x] Add a machine-readable `state_snapshot` contract for deterministic assertions (no log-grep testing).
9-
- [x] Add first gated Vitest regression: reopen cwd correctness (`/mnt/workdir...`) on folder-bound task reopen.
10-
- [x] Add remaining gated Vitest regressions: folder-bind continuity (no UI reset), working-folder panel refresh on folder change, and runtime-mismatch badge rules.
11-
- [x] Add one sequential live journey canary (messages + models + workdir + artifacts + reopen), while keeping focused canaries for isolated invariants.
12-
- [x] Add CI enforcement for both gates (`check` on PRs, `check-full` required before merge).
13-
- [x] Remove low-signal/noise tests and stale assertions (smoke test, legacy session-file assumptions, noisy debug logs).
14-
- [x] Add `mise run test-regressions` task (currently runs live-app regression tests).
15-
- [x] Split gates: `mise run check` (fast) and `mise run check-full` (includes regressions).
16-
- [x] Add path-aware regression gating for push/CI so live tests only run when integration-impacting files changed.
17-
- [ ] Burn in the CI/hook split gate behavior (`check-full` skip/run on path filters + `PIWORK_FORCE_CHECK_FULL=1` local pre-push path), then mark this parent item done.
18-
- [x] Add fast protocol guardrail (`mise run audit-protocol`) via Vitest contract tests.
19-
- [ ] **CI/local gate alignment + cache policy cleanup (P0 ergonomics)** β€” reduce path-detection magic and keep CI/local gating behavior predictable.
20-
- Decide whether docs-only changes should run a no-op gate vs lightweight docs validation.
21-
- Keep integration trigger lists explicit and scoped to runtime-impacting paths (CI + local pre-push should stay in sync).
22-
- Reconcile Rust cache strategy (`rust-cache` sharing between jobs, optional sccache persistence experiment) and keep observable metrics in logs.
23-
- Revisit trigger: after 5–10 mixed pushes (docs-only, frontend-only, rust/integration) with expected run/skip behavior and predictable cache hit patterns.
245
- [ ] **Reactive model/bootstrap sequencing (P0 stability)** β€” move runtime model setup from timeout-driven polling to explicit task readiness states.
256
- Add a per-task child-command queue in `taskd` so bootstrap `set_model` and first `get_available_models` are serialized.
267
- Expose bootstrap readiness/error in `runtime_get_state` and gate UI model-fetch requests on that signal.
278
- Keep timeout values only as fallback safety rails, not primary control flow.
28-
- Revisit trigger: after timeout tuning lands and CI regressions are green for several consecutive runs, implement this to reduce remaining flake/time spent in wait loops.
29-
- [x] **Kill v1 runtime** β€” remove `PIWORK_RUNTIME_V2_TASKD` flag, v1 code paths in runtimeService (`handleTaskSwitchV1`, `handleFolderChangeV1`, `ensureTaskSessionReady`), v1 `nc -l` loop in init script, `RuntimeMode` type. taskd is the only runtime.
30-
- [x] **Enforce V2-only host protocol** β€” removed legacy host request handling in taskd, host parser is strict `{ id, type, payload }`, and RuntimeService only resolves pending RPCs from taskd V2 response envelopes.
31-
- [x] **Finalize runtime naming cleanup (P0)** β€” dropped `handleV2*`/`sendV2*` helper names, removed the `__legacy__` sentinel path from runtime/UI mismatch logic, and switched to neutral runtime envelope naming.
32-
- [x] **Extract init script** β€” move the heredoc out of `mise-tasks/runtime-build` into `runtime/init.sh`
33-
- [x] **Fix context pollution** β€” infrastructure bash commands (grep mount check, mkdir, session writes) go through pi's RPC and pollute the agent's conversation. Add `system_bash` to taskd that bypasses pi sessions, or do checks in taskd before spawning pi.
34-
- [x] **Simplify auth/settings** β€” strip Settings modal to: show current auth status + "Import from pi" button. Kill multi-profile UI. For MVP: baked auth or `~/.pi/agent/auth.json` import.
359
- [ ] **Proper auth MVP (P0)** β€” make auth first-class for non-existing pi setups: working OAuth `/login` flow and provider API key entry in Settings; keep "Import from pi" as convenience, not primary path.
36-
- [x] **Lock working folder after first bind** β€” `workingFolder` supports one-time bind (`null -> path`), then becomes immutable for that task; use a new task for a different folder.
37-
- [x] **Define task artifact persistence contract** β€” documented in `docs/task-artifact-contract.md` (`outputs` writable, `uploads` read-only, Scratchpad aggregates both).
38-
- [x] **Implement artifact contract in runtime/UI** β€” enforce one-time folder bind, surface Scratchpad from `outputs` + `uploads`, and apply uploads read-only policy.
3910
- [ ] **Add file import UX (P0)** β€” support importing local files into task `uploads`, then show/preview them in Scratchpad immediately while keeping uploads read-only after import.
4011
- [ ] **Untangle auth state from runtime artifacts** β€” keep auth storage purpose clear; avoid mixing credentials with unrelated pi/session artifacts.
41-
- [ ] **Fix sendLogin optimistic log** β€” logs `[info] Sent /login` even if not connected
42-
- [x] **Fix opener permission path** β€” added `opener:allow-open-path` capability so `Open in Finder` is authorized.
43-
- [x] **Fix right-panel error isolation** β€” Working-folder action errors are now scoped to the Working folder card.
44-
- [x] **Fix first `/mnt/workdir` write reliability (race mitigation)** β€” task-bound folder changes now mark `taskSwitching` before validation, and prompt send is blocked until runtime is ready.
45-
- [x] **Add harness regression for working-folder writes** β€” set folder β†’ write file immediately β†’ assert host path has file.
46-
- [x] **Add harness check for open-folder action** β€” validate Working-folder header icon opens Finder path successfully.
12+
- [ ] **Fix sendLogin optimistic log** β€” logs `[info] Sent /login` even if not connected.
4713
- [ ] **Inject minimal FS runtime hint into prompts** β€” include working-folder host path + `/mnt/workdir` alias + scratchpad path, and refresh when folder is bound later (not just at startup).
48-
- [x] **Fix dev cwd chip staleness on task reopen** β€” reopen now validates persisted working folder before runtime prep, then refreshes on `task_ready`, so cwd settles to `/mnt/workdir...` instead of sticking at `/mnt/taskstate/.../outputs`.
49-
- [ ] **Delete remaining slop (P0)** β€” review docs and code for stale references to v1/v2/legacy naming, old sync protocol language, and obsolete smoke-suite assumptions.
50-
- [x] Removed runtime/taskd `handleV2*`/`sendV2*` helper naming + `__legacy__` mismatch sentinel path.
51-
- [x] Removed unused UI leftovers: `src/lib/components/ProviderList.svelte`, `src/lib/utils/notice.svelte.ts`.
52-
- [ ] **Script hygiene pass (P0)** β€” reduce `scripts/` sprawl by making `mise` the single entrypoint for dev/test ops, moving one-off experiments to `scripts/lab/`, and deleting wrappers not used by `mise` or CI.
53-
- [x] Moved MITM spike scripts to `scripts/lab/` and updated docs pointers.
54-
- [ ] **Dev watch scope (P0)** β€” avoid restarting `tauri dev` for non-runtime docs/content edits (e.g. Markdown), keep hot reload scoped to relevant source/config files.
55-
- [ ] **Close out cleanup execution plan (P0)** β€” finish remaining PR-5 leftovers (naming consistency, slop purge, script/watch hygiene), then mark `docs/cleanup-execution-plan.md` closed.
56-
- [x] **Roadmap sync hygiene** β€” synced `docs/ui-roadmap.md` with current `TODO.md` execution state (2026-02-07).
5714

58-
## Next: Make it usable
15+
## Next
5916

60-
- [x] **Model picker realism (no fake fallback)** β€” removed hardcoded fallback model lists in runtime/UI; model picker now uses runtime-reported models only.
61-
- [x] **Model availability empty/error state** β€” picker now shows explicit loading/empty/error states and disables selection when unavailable.
6217
- [ ] **Model scope toggle in Settings** β€” add `Preferred only` (default shortlist we define) vs `All available` filtering for model picker results.
63-
- [x] **Persist model selection to task metadata** β€” picker updates now persist `{ provider, model }` on task metadata so switching/reopening tasks restores model intent.
64-
- [x] **Finish auth profile cull for MVP** β€” removed profile switching plumbing; runtime/test/auth paths are standardized on the default profile.
6518
- [ ] **Markdown rendering** β€” render agent responses (bold, lists, code blocks). Biggest UX gap.
66-
- [ ] **Tool call display** β€” collapsible "Created a file β€Ί", "Ran command β€Ί" in message stream
67-
- [ ] **Interruptible composer (steering + follow-ups)** β€” deferred spec captured in `docs/followup-steering-spec.md` (`Enter` while running = steering, `Option+Enter` = queue follow-up, `Option+Up` = recall queued draft, `Esc`/Stop button = interrupt). Revisit after Markdown + tool-call display stabilize.
68-
- [x] **Right panel IA pass** β€” replace β€œDownloads” with β€œWorking folder” card semantics (dynamic title = folder basename when set), clear empty states, and open-in-Finder affordance.
69-
- [x] **Move Working-folder open action to header** β€” icon-only action is now in card header (left of chevron), body button removed.
70-
- [x] **Scratchpad continuity** β€” keep Scratchpad visible for every task and aggregate artifacts from both `outputs` and `uploads`.
71-
- [x] **Artifact explorer parity** β€” make file listing/preview behavior consistent across working-folder and no-folder tasks, including uploads read-only behavior.
72-
- [x] **Auto-refresh artifact panels** β€” Scratchpad now refreshes on `tool_execution_end` / `turn_end` / `agent_end` events (manual refresh still available).
73-
- [x] **Working-folder file visibility** β€” Working-folder card now lists files and updates from runtime events.
19+
- [ ] **Tool call display** β€” collapsible "Created a file β€Ί", "Ran command β€Ί" in message stream.
20+
- [ ] **Interruptible composer (steering + follow-ups)** β€” deferred spec captured in `docs/research/followup-steering-spec.md` (`Enter` while running = steering, `Option+Enter` = queue follow-up, `Option+Up` = recall queued draft, `Esc`/Stop button = interrupt). Revisit after Markdown + tool-call display stabilize.
7421
- [ ] **Context panel usefulness (enrichment)** β€” panel exists; improve it to surface active connectors/tools and task-referenced files instead of mostly static copy.
7522

76-
## Later: Production
23+
## Later
7724

7825
- [ ] **Auth hardening follow-up** β€” provider-by-provider `/login` reliability through VM NAT, edge-case diagnostics, and clearer failure UX after proper auth MVP ships.
26+
- [ ] **macOS distribution pilot (post-auth)** β€” once proper auth MVP is stable, publish a downloadable macOS build so external users can try Piwork without local dev setup.
27+
- Revisit trigger: proper auth MVP done + runtime startup/install path is reliable for fresh machines.
28+
- Scope: macOS first; Linux/Windows remain later.
7929
- [ ] **Multi-task runtime behavior** β€” define expected behavior for switching between active tasks without losing running session state (foreground/background semantics, status visibility, resume behavior).
80-
- [ ] **Runtime download** β€” first-run pack download for non-dev users
81-
- [ ] **Bundle pi** β€” include pi in runtime pack instead of copying from global npm
82-
- [ ] **Onboarding** β€” first-run experience that doesn't require `mise run runtime-build`
83-
- [ ] **Settings cleanup** β€” audit settings surface and remove dead/low-value controls
84-
85-
## Later: Polish
86-
87-
- [ ] **Doc cleanup** β€” consolidate stale docs, kill anything that doesn't match reality
88-
- [ ] **Code cleanup** β€” deep pass, remove slop, consistent patterns
89-
- [ ] **Task title editing** β€” editable at top of conversation
90-
- [ ] **Progress indicators** β€” checkmarks/status hints in right panel for task progress
91-
- [ ] **Profile chip** β€” bottom-left identity/plan/status chip
92-
- [ ] **Empty state polish** β€” shuffleable task categories + "See more ideas" + richer task tiles like Cowork
30+
- [ ] **Runtime download** β€” first-run pack download for non-dev users.
31+
- [ ] **Bundle pi** β€” include pi in runtime pack instead of copying from global npm.
32+
- [ ] **Onboarding** β€” first-run experience that doesn't require `mise run runtime-build`.
33+
- [ ] **Settings cleanup** β€” audit settings surface and remove dead/low-value controls.
34+
- [ ] **Doc cleanup** β€” consolidate stale docs, kill anything that doesn't match reality.
35+
- [ ] **Code cleanup** β€” deep pass, remove slop, consistent patterns.
36+
- [ ] **Task title editing** β€” editable at top of conversation.
37+
- [ ] **Progress indicators** β€” checkmarks/status hints in right panel for task progress.
38+
- [ ] **Profile chip** β€” bottom-left identity/plan/status chip.
39+
- [ ] **Empty state polish** β€” shuffleable task categories + "See more ideas" + richer task tiles like Cowork.
9340
- [ ] **Progress model v2 (non-P0)** β€” experiment with Cowork-style step/milestone summaries inferred from task/tool activity, with clear confidence/limitations.
94-
95-
## Someday
96-
97-
- [ ] Connectors (Calendar, Slack, Google Drive, Notion)
98-
- [ ] Clipboard + attachments (images/files with MIME-aware previews)
99-
- [ ] Multi-folder tasks
100-
- [ ] Cross-platform (Linux/Windows)
101-
- [ ] MITM network proxy
102-
- [ ] Canvas/rich artifact viewer
103-
- [ ] qcow2 rootfs (lower RAM)
104-
- [ ] Gate G2 β€” Gondolin vs deeper sandbox hardening (research only)
105-
106-
## Testing
107-
108-
- Harness primitives: `test-start`, `test-prompt`, `test-screenshot`, `test-set-folder`, `test-set-task`, `test-create-task`, `test-delete-tasks`, `test-dump-state`, `test-state-snapshot`, `test-runtime-diag`, `test-stop`, `test-open-preview`, `test-write-working-file`, `test-open-working-folder`, `test-auth-*`, `test-send-login`, `test-check-permissions`
109-
- Scope enforcement: `scripts/harness/path-i-lite-negative.sh`
110-
- Rule: primitives only, no monolithic E2E scripts
41+
- [ ] Connectors (Calendar, Slack, Google Drive, Notion).
42+
- [ ] Clipboard + attachments (images/files with MIME-aware previews).
43+
- [ ] Multi-folder tasks.
44+
- [ ] Cross-platform (Linux/Windows).
45+
- [ ] MITM network proxy.
46+
- [ ] Canvas/rich artifact viewer.
47+
- [ ] qcow2 rootfs (lower RAM).
48+
- [ ] Gate G2 β€” Gondolin vs deeper sandbox hardening (research only).

β€Ždocs/README.mdβ€Ž

Lines changed: 26 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,35 +1,36 @@
11
# Docs Index
22

3-
## Runtime (source of truth)
3+
Status: active
4+
Category: canonical
5+
Owner: product/runtime
6+
Last reviewed: 2026-02-07
47

5-
- `runtime-taskd-plan.md` β€” taskd rollout status and phases
6-
- `runtime-taskd-rpc-spec.md` β€” taskd RPC contract
7-
- `runtime-pack.md` β€” VM runtime pack format and boot model
8-
- `pi-integration.md` β€” host↔VM↔pi integration overview
9-
- `testing-strategy.md` β€” test approach and harness primitives
10-
11-
## Runtime (research / deferred)
8+
## Canonical (active source of truth)
129

13-
- `runtime-g2-architecture-spike.md` β€” post-MVP hardening research (Gondolin vs deeper sandbox)
14-
- `adr/0001-runtime-g2-decision.md` β€” decision record template
10+
- `runtime-taskd-plan.md` β€” taskd runtime architecture and rollout status
11+
- `runtime-taskd-rpc-spec.md` β€” host↔taskd RPC contract
12+
- `runtime-pack.md` β€” VM runtime pack format and boot model
13+
- `pi-integration.md` β€” host↔VM↔pi integration quick reference
14+
- `testing-strategy.md` β€” test strategy, harness primitives, and scope-negative runbook
15+
- `auth-flow.md` β€” authentication behavior
16+
- `permissions-model.md` β€” scoped local mode + permission policy
17+
- `task-artifact-contract.md` β€” working-folder + outputs/uploads/scratchpad contract
18+
- `product-direction.md` β€” durable product principles and strategy lanes
1519

16-
## Product
20+
## Research (non-normative)
1721

18-
- `auth-flow.md` β€” authentication behavior
19-
- `permissions-model.md` β€” folder access model
20-
- `task-artifact-contract.md` β€” working-folder immutability + outputs/uploads/scratchpad contract
21-
- `folder-artifact-implementation-plan.md` β€” implementation plan for one-time folder bind + scratchpad aggregation
22-
- `cleanup-execution-plan.md` β€” cleanup execution archive + remaining closeout checklist
23-
- `followup-steering-spec.md` β€” deferred spec for queued follow-ups, steering, and stop UX in the composer
24-
- `ui-roadmap.md` β€” UI direction + Cowork comparison (execution tracking is in `../TODO.md`)
22+
- `research/runtime-g2-architecture-spike.md` β€” post-MVP hardening research (Gondolin vs deeper sandbox)
23+
- `research/network-mitm-spike.md` β€” future strict-network spike notes
24+
- `research/followup-steering-spec.md` β€” deferred composer queue/steering design
25+
- `research/` β€” Cowork notes, sketches, and field intel
2526

26-
## Supporting
27+
## Archive (historical)
2728

28-
- `path-i-lite-negative-suite.md` β€” scope enforcement test (traversal/symlink/cross-task)
29-
- `network-mitm-spike.md` β€” future network interception notes
29+
- `archive/cleanup-execution-plan.md` β€” closed cleanup implementation plan (superseded by `../TODO.md`)
30+
- `archive/folder-artifact-implementation-plan.md` β€” closed folder/artifact implementation sequencing doc (superseded by `task-artifact-contract.md` + `../TODO.md`)
31+
- `archive/docs-realignment-plan.md` β€” completed deep-cut docs reclassification plan
32+
- `archive/ui-roadmap.md` β€” superseded directional roadmap (replaced by `product-direction.md` + `../TODO.md`)
3033

31-
## Research
34+
## ADRs
3235

33-
- `research/` β€” Cowork notes, sketches, field intel
34-
- `research/cowork-claude-runtime-intel-2026-02-06.md` β€” Cowork runtime observations
35-
- `research/sandbox-strategy.md` β€” cross-platform sandbox model
36+
- `adr/0001-runtime-g2-decision.md` β€” Gate G2 decision record

β€Ždocs/adr/0001-runtime-g2-decision.mdβ€Ž

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,9 @@
55
- Owners: runtime/platform
66
- Related:
77
- `docs/runtime-taskd-plan.md`
8-
- `docs/runtime-g2-architecture-spike.md`
8+
- `docs/research/runtime-g2-architecture-spike.md`
99
- `docs/permissions-model.md`
10-
- `docs/path-i-lite-negative-suite.md`
10+
- `docs/testing-strategy.md`
1111

1212
## Context
1313

@@ -66,7 +66,7 @@ Short description:
6666
Evidence links:
6767

6868
- State snapshots/screenshots/logs from runtime and Path I-lite runs in `tmp/dev/`
69-
- Repeatable negative suite: `docs/path-i-lite-negative-suite.md`
69+
- Repeatable negative suite: `docs/testing-strategy.md` (`Scope enforcement suite` section)
7070

7171
## Decision
7272

0 commit comments

Comments
Β (0)