|
1 | 1 | # TODO |
2 | 2 |
|
3 | | -> Execution sequencing for cleanup work lives in `docs/cleanup-execution-plan.md` (execution archive + remaining closeout checklist). |
| 3 | +## Now |
4 | 4 |
|
5 | | -## Now: Foundation cleanup |
6 | | - |
7 | | -- [ ] **Testing sanity gate (P0)** β define and ship a real automated regression suite for the liftoff path (Vitest + Rust), keep daily checks fast (`mise run check`) while enforcing live regressions in `mise run check-full`, and stop relying on ad-hoc/manual shell-harness runs for core correctness. |
8 | | - - [x] Add a machine-readable `state_snapshot` contract for deterministic assertions (no log-grep testing). |
9 | | - - [x] Add first gated Vitest regression: reopen cwd correctness (`/mnt/workdir...`) on folder-bound task reopen. |
10 | | - - [x] Add remaining gated Vitest regressions: folder-bind continuity (no UI reset), working-folder panel refresh on folder change, and runtime-mismatch badge rules. |
11 | | - - [x] Add one sequential live journey canary (messages + models + workdir + artifacts + reopen), while keeping focused canaries for isolated invariants. |
12 | | - - [x] Add CI enforcement for both gates (`check` on PRs, `check-full` required before merge). |
13 | | - - [x] Remove low-signal/noise tests and stale assertions (smoke test, legacy session-file assumptions, noisy debug logs). |
14 | | - - [x] Add `mise run test-regressions` task (currently runs live-app regression tests). |
15 | | - - [x] Split gates: `mise run check` (fast) and `mise run check-full` (includes regressions). |
16 | | - - [x] Add path-aware regression gating for push/CI so live tests only run when integration-impacting files changed. |
17 | | - - [ ] Burn in the CI/hook split gate behavior (`check-full` skip/run on path filters + `PIWORK_FORCE_CHECK_FULL=1` local pre-push path), then mark this parent item done. |
18 | | - - [x] Add fast protocol guardrail (`mise run audit-protocol`) via Vitest contract tests. |
19 | | -- [ ] **CI/local gate alignment + cache policy cleanup (P0 ergonomics)** β reduce path-detection magic and keep CI/local gating behavior predictable. |
20 | | - - Decide whether docs-only changes should run a no-op gate vs lightweight docs validation. |
21 | | - - Keep integration trigger lists explicit and scoped to runtime-impacting paths (CI + local pre-push should stay in sync). |
22 | | - - Reconcile Rust cache strategy (`rust-cache` sharing between jobs, optional sccache persistence experiment) and keep observable metrics in logs. |
23 | | - - Revisit trigger: after 5β10 mixed pushes (docs-only, frontend-only, rust/integration) with expected run/skip behavior and predictable cache hit patterns. |
24 | 5 | - [ ] **Reactive model/bootstrap sequencing (P0 stability)** β move runtime model setup from timeout-driven polling to explicit task readiness states. |
25 | 6 | - Add a per-task child-command queue in `taskd` so bootstrap `set_model` and first `get_available_models` are serialized. |
26 | 7 | - Expose bootstrap readiness/error in `runtime_get_state` and gate UI model-fetch requests on that signal. |
27 | 8 | - Keep timeout values only as fallback safety rails, not primary control flow. |
28 | | - - Revisit trigger: after timeout tuning lands and CI regressions are green for several consecutive runs, implement this to reduce remaining flake/time spent in wait loops. |
29 | | -- [x] **Kill v1 runtime** β remove `PIWORK_RUNTIME_V2_TASKD` flag, v1 code paths in runtimeService (`handleTaskSwitchV1`, `handleFolderChangeV1`, `ensureTaskSessionReady`), v1 `nc -l` loop in init script, `RuntimeMode` type. taskd is the only runtime. |
30 | | -- [x] **Enforce V2-only host protocol** β removed legacy host request handling in taskd, host parser is strict `{ id, type, payload }`, and RuntimeService only resolves pending RPCs from taskd V2 response envelopes. |
31 | | -- [x] **Finalize runtime naming cleanup (P0)** β dropped `handleV2*`/`sendV2*` helper names, removed the `__legacy__` sentinel path from runtime/UI mismatch logic, and switched to neutral runtime envelope naming. |
32 | | -- [x] **Extract init script** β move the heredoc out of `mise-tasks/runtime-build` into `runtime/init.sh` |
33 | | -- [x] **Fix context pollution** β infrastructure bash commands (grep mount check, mkdir, session writes) go through pi's RPC and pollute the agent's conversation. Add `system_bash` to taskd that bypasses pi sessions, or do checks in taskd before spawning pi. |
34 | | -- [x] **Simplify auth/settings** β strip Settings modal to: show current auth status + "Import from pi" button. Kill multi-profile UI. For MVP: baked auth or `~/.pi/agent/auth.json` import. |
35 | 9 | - [ ] **Proper auth MVP (P0)** β make auth first-class for non-existing pi setups: working OAuth `/login` flow and provider API key entry in Settings; keep "Import from pi" as convenience, not primary path. |
36 | | -- [x] **Lock working folder after first bind** β `workingFolder` supports one-time bind (`null -> path`), then becomes immutable for that task; use a new task for a different folder. |
37 | | -- [x] **Define task artifact persistence contract** β documented in `docs/task-artifact-contract.md` (`outputs` writable, `uploads` read-only, Scratchpad aggregates both). |
38 | | -- [x] **Implement artifact contract in runtime/UI** β enforce one-time folder bind, surface Scratchpad from `outputs` + `uploads`, and apply uploads read-only policy. |
39 | 10 | - [ ] **Add file import UX (P0)** β support importing local files into task `uploads`, then show/preview them in Scratchpad immediately while keeping uploads read-only after import. |
40 | 11 | - [ ] **Untangle auth state from runtime artifacts** β keep auth storage purpose clear; avoid mixing credentials with unrelated pi/session artifacts. |
41 | | -- [ ] **Fix sendLogin optimistic log** β logs `[info] Sent /login` even if not connected |
42 | | -- [x] **Fix opener permission path** β added `opener:allow-open-path` capability so `Open in Finder` is authorized. |
43 | | -- [x] **Fix right-panel error isolation** β Working-folder action errors are now scoped to the Working folder card. |
44 | | -- [x] **Fix first `/mnt/workdir` write reliability (race mitigation)** β task-bound folder changes now mark `taskSwitching` before validation, and prompt send is blocked until runtime is ready. |
45 | | -- [x] **Add harness regression for working-folder writes** β set folder β write file immediately β assert host path has file. |
46 | | -- [x] **Add harness check for open-folder action** β validate Working-folder header icon opens Finder path successfully. |
| 12 | +- [ ] **Fix sendLogin optimistic log** β logs `[info] Sent /login` even if not connected. |
47 | 13 | - [ ] **Inject minimal FS runtime hint into prompts** β include working-folder host path + `/mnt/workdir` alias + scratchpad path, and refresh when folder is bound later (not just at startup). |
48 | | -- [x] **Fix dev cwd chip staleness on task reopen** β reopen now validates persisted working folder before runtime prep, then refreshes on `task_ready`, so cwd settles to `/mnt/workdir...` instead of sticking at `/mnt/taskstate/.../outputs`. |
49 | | -- [ ] **Delete remaining slop (P0)** β review docs and code for stale references to v1/v2/legacy naming, old sync protocol language, and obsolete smoke-suite assumptions. |
50 | | - - [x] Removed runtime/taskd `handleV2*`/`sendV2*` helper naming + `__legacy__` mismatch sentinel path. |
51 | | - - [x] Removed unused UI leftovers: `src/lib/components/ProviderList.svelte`, `src/lib/utils/notice.svelte.ts`. |
52 | | -- [ ] **Script hygiene pass (P0)** β reduce `scripts/` sprawl by making `mise` the single entrypoint for dev/test ops, moving one-off experiments to `scripts/lab/`, and deleting wrappers not used by `mise` or CI. |
53 | | - - [x] Moved MITM spike scripts to `scripts/lab/` and updated docs pointers. |
54 | | -- [ ] **Dev watch scope (P0)** β avoid restarting `tauri dev` for non-runtime docs/content edits (e.g. Markdown), keep hot reload scoped to relevant source/config files. |
55 | | -- [ ] **Close out cleanup execution plan (P0)** β finish remaining PR-5 leftovers (naming consistency, slop purge, script/watch hygiene), then mark `docs/cleanup-execution-plan.md` closed. |
56 | | -- [x] **Roadmap sync hygiene** β synced `docs/ui-roadmap.md` with current `TODO.md` execution state (2026-02-07). |
57 | 14 |
|
58 | | -## Next: Make it usable |
| 15 | +## Next |
59 | 16 |
|
60 | | -- [x] **Model picker realism (no fake fallback)** β removed hardcoded fallback model lists in runtime/UI; model picker now uses runtime-reported models only. |
61 | | -- [x] **Model availability empty/error state** β picker now shows explicit loading/empty/error states and disables selection when unavailable. |
62 | 17 | - [ ] **Model scope toggle in Settings** β add `Preferred only` (default shortlist we define) vs `All available` filtering for model picker results. |
63 | | -- [x] **Persist model selection to task metadata** β picker updates now persist `{ provider, model }` on task metadata so switching/reopening tasks restores model intent. |
64 | | -- [x] **Finish auth profile cull for MVP** β removed profile switching plumbing; runtime/test/auth paths are standardized on the default profile. |
65 | 18 | - [ ] **Markdown rendering** β render agent responses (bold, lists, code blocks). Biggest UX gap. |
66 | | -- [ ] **Tool call display** β collapsible "Created a file βΊ", "Ran command βΊ" in message stream |
67 | | -- [ ] **Interruptible composer (steering + follow-ups)** β deferred spec captured in `docs/followup-steering-spec.md` (`Enter` while running = steering, `Option+Enter` = queue follow-up, `Option+Up` = recall queued draft, `Esc`/Stop button = interrupt). Revisit after Markdown + tool-call display stabilize. |
68 | | -- [x] **Right panel IA pass** β replace βDownloadsβ with βWorking folderβ card semantics (dynamic title = folder basename when set), clear empty states, and open-in-Finder affordance. |
69 | | -- [x] **Move Working-folder open action to header** β icon-only action is now in card header (left of chevron), body button removed. |
70 | | -- [x] **Scratchpad continuity** β keep Scratchpad visible for every task and aggregate artifacts from both `outputs` and `uploads`. |
71 | | -- [x] **Artifact explorer parity** β make file listing/preview behavior consistent across working-folder and no-folder tasks, including uploads read-only behavior. |
72 | | -- [x] **Auto-refresh artifact panels** β Scratchpad now refreshes on `tool_execution_end` / `turn_end` / `agent_end` events (manual refresh still available). |
73 | | -- [x] **Working-folder file visibility** β Working-folder card now lists files and updates from runtime events. |
| 19 | +- [ ] **Tool call display** β collapsible "Created a file βΊ", "Ran command βΊ" in message stream. |
| 20 | +- [ ] **Interruptible composer (steering + follow-ups)** β deferred spec captured in `docs/research/followup-steering-spec.md` (`Enter` while running = steering, `Option+Enter` = queue follow-up, `Option+Up` = recall queued draft, `Esc`/Stop button = interrupt). Revisit after Markdown + tool-call display stabilize. |
74 | 21 | - [ ] **Context panel usefulness (enrichment)** β panel exists; improve it to surface active connectors/tools and task-referenced files instead of mostly static copy. |
75 | 22 |
|
76 | | -## Later: Production |
| 23 | +## Later |
77 | 24 |
|
78 | 25 | - [ ] **Auth hardening follow-up** β provider-by-provider `/login` reliability through VM NAT, edge-case diagnostics, and clearer failure UX after proper auth MVP ships. |
| 26 | +- [ ] **macOS distribution pilot (post-auth)** β once proper auth MVP is stable, publish a downloadable macOS build so external users can try Piwork without local dev setup. |
| 27 | + - Revisit trigger: proper auth MVP done + runtime startup/install path is reliable for fresh machines. |
| 28 | + - Scope: macOS first; Linux/Windows remain later. |
79 | 29 | - [ ] **Multi-task runtime behavior** β define expected behavior for switching between active tasks without losing running session state (foreground/background semantics, status visibility, resume behavior). |
80 | | -- [ ] **Runtime download** β first-run pack download for non-dev users |
81 | | -- [ ] **Bundle pi** β include pi in runtime pack instead of copying from global npm |
82 | | -- [ ] **Onboarding** β first-run experience that doesn't require `mise run runtime-build` |
83 | | -- [ ] **Settings cleanup** β audit settings surface and remove dead/low-value controls |
84 | | - |
85 | | -## Later: Polish |
86 | | - |
87 | | -- [ ] **Doc cleanup** β consolidate stale docs, kill anything that doesn't match reality |
88 | | -- [ ] **Code cleanup** β deep pass, remove slop, consistent patterns |
89 | | -- [ ] **Task title editing** β editable at top of conversation |
90 | | -- [ ] **Progress indicators** β checkmarks/status hints in right panel for task progress |
91 | | -- [ ] **Profile chip** β bottom-left identity/plan/status chip |
92 | | -- [ ] **Empty state polish** β shuffleable task categories + "See more ideas" + richer task tiles like Cowork |
| 30 | +- [ ] **Runtime download** β first-run pack download for non-dev users. |
| 31 | +- [ ] **Bundle pi** β include pi in runtime pack instead of copying from global npm. |
| 32 | +- [ ] **Onboarding** β first-run experience that doesn't require `mise run runtime-build`. |
| 33 | +- [ ] **Settings cleanup** β audit settings surface and remove dead/low-value controls. |
| 34 | +- [ ] **Doc cleanup** β consolidate stale docs, kill anything that doesn't match reality. |
| 35 | +- [ ] **Code cleanup** β deep pass, remove slop, consistent patterns. |
| 36 | +- [ ] **Task title editing** β editable at top of conversation. |
| 37 | +- [ ] **Progress indicators** β checkmarks/status hints in right panel for task progress. |
| 38 | +- [ ] **Profile chip** β bottom-left identity/plan/status chip. |
| 39 | +- [ ] **Empty state polish** β shuffleable task categories + "See more ideas" + richer task tiles like Cowork. |
93 | 40 | - [ ] **Progress model v2 (non-P0)** β experiment with Cowork-style step/milestone summaries inferred from task/tool activity, with clear confidence/limitations. |
94 | | - |
95 | | -## Someday |
96 | | - |
97 | | -- [ ] Connectors (Calendar, Slack, Google Drive, Notion) |
98 | | -- [ ] Clipboard + attachments (images/files with MIME-aware previews) |
99 | | -- [ ] Multi-folder tasks |
100 | | -- [ ] Cross-platform (Linux/Windows) |
101 | | -- [ ] MITM network proxy |
102 | | -- [ ] Canvas/rich artifact viewer |
103 | | -- [ ] qcow2 rootfs (lower RAM) |
104 | | -- [ ] Gate G2 β Gondolin vs deeper sandbox hardening (research only) |
105 | | - |
106 | | -## Testing |
107 | | - |
108 | | -- Harness primitives: `test-start`, `test-prompt`, `test-screenshot`, `test-set-folder`, `test-set-task`, `test-create-task`, `test-delete-tasks`, `test-dump-state`, `test-state-snapshot`, `test-runtime-diag`, `test-stop`, `test-open-preview`, `test-write-working-file`, `test-open-working-folder`, `test-auth-*`, `test-send-login`, `test-check-permissions` |
109 | | -- Scope enforcement: `scripts/harness/path-i-lite-negative.sh` |
110 | | -- Rule: primitives only, no monolithic E2E scripts |
| 41 | +- [ ] Connectors (Calendar, Slack, Google Drive, Notion). |
| 42 | +- [ ] Clipboard + attachments (images/files with MIME-aware previews). |
| 43 | +- [ ] Multi-folder tasks. |
| 44 | +- [ ] Cross-platform (Linux/Windows). |
| 45 | +- [ ] MITM network proxy. |
| 46 | +- [ ] Canvas/rich artifact viewer. |
| 47 | +- [ ] qcow2 rootfs (lower RAM). |
| 48 | +- [ ] Gate G2 β Gondolin vs deeper sandbox hardening (research only). |
0 commit comments