This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Shrink Ray is the best test-case reducer in the world. It's a personal hobby/research project, but it's important that it be high-quality, robust, and user-friendly software so it can be widely used. C-Reduce is the only comparable tool, and Shrink Ray has surpassed it.
Because this is a hobby project worked on haphazardly, there's accumulated technical debt and the codebase quality is uneven. Don't assume existing code patterns are correct just because they exist - much of it reflects partial refactors or rushed work.
The goal is to achieve software quality beyond what a single person working part-time can accomplish. This means taking on maintenance work (fixing type errors, improving test coverage, refactoring) and doing it well.
- Fix problems properly - Don't just suppress errors or use workarounds. If there's a type error, understand why and fix the underlying issue. If a test is hard to write, think about what would make the code more testable.
- Don't give up on hard things - When something seems difficult (like covering certain code paths), get creative. Refactor for testability, use Hypothesis for property-based testing, run subprocesses and gather coverage, write generators to produce edge-case inputs. Persistence matters more than speed.
- Self-review before presenting work - Before committing or presenting code, review it critically: "Is this sloppy? Did I take shortcuts? If I were reviewing someone else's PR with this code, what would I flag?" This catches many issues before they waste the maintainer's time.
Never use these as shortcuts - they're code smells that indicate a problem to fix:
# type: ignore- Fix the type error properly. Use proper type annotations, add type guards, or refactor to make types work. If you're addingtype: ignore, ask yourself why the code is confusing the type checker and fix that.# pragma: no cover- Write a test that covers the code. If coverage is hard to achieve, refactor for testability (extract methods, use dependency injection, etc.).# pragma: no branch- Same as above - find a way to exercise the branch.# noqa- Fix the lint error. If the linter is wrong, it's usually a sign the code could be clearer.
When tempted to add a suppression, instead:
- Understand why the tool is complaining
- Fix the underlying issue (refactor, add types, write tests)
- Only if the tool is genuinely wrong AND there's no cleaner solution, consider suppression - but this should be rare
Allowed exceptions (where suppression is acceptable):
# noqa: B027for intentionally empty methods on abstract classes that serve as optional override hooks (no better alternative exists)
If you encounter a case where suppression seems genuinely necessary and principled, ask the maintainer about adding it to this list.
- Make small, logically self-contained commits
- Each commit should ideally be lint-clean and pass tests
- Commits are a good checkpoint for self-review
- Just commit when ready - don't ask for permission
- Use the
/checkpointskill to ensure consistent quality at each commit - Always use
git addwith specific file paths - Never usegit add -A,git add ., orgit add <directory>. Always list the specific files you intend to stage. This prevents accidentally committing unrelated files (test scripts, debug files, etc.).
Shrink Ray is a standalone application, not a library. Do not add backward compatibility shims, re-exports, or compatibility layers when refactoring. When moving code between modules, update all imports directly rather than re-exporting from the old location.
This file is the source of truth for project conventions. However:
- Update it based on feedback from the maintainer
- Update it based on your own judgment about what would improve the project
- Ask about specific style decisions you notice and record them here
# Install dependencies
just install
# Run tests (skips slow tests by default)
just test
# Run all tests including slow ones
just test -m ""
# Run a single test file
just test tests/test_sat.py
# Run a specific test
just test tests/test_sat.py::test_name -v
# Lint and type-check
just lint
# Run shrinkray CLI
uv run shrinkray <interestingness_test> <file_to_reduce>Note: Some tests require minisat to be installed (apt-get install minisat on Ubuntu).
See the notes/ directory for detailed architecture documentation:
notes/core-abstractions.md- ReductionProblem, Format, passes, pumpsnotes/patching-system.md- How parallel patch application worksnotes/reduction-passes.md- Catalog of all reduction passes with examplesnotes/parallelism-model.md- WorkContext and speculative execution
Shrink Ray is a multiformat test-case reducer built on Trio for async/parallelism. The key architectural concepts:
ReductionProblem[T] (problem.py): The central interface representing a reduction task.
current_test_case: T- The current state being reducedis_interesting(test_case: T) -> bool- Tests if a candidate triggers the bugsort_key(test_case: T)- Ordering for reproducibility (natural ordering for text, shortlex for binary)- Problems can be "viewed" through Formats to reduce structured data
Format[S, T] (problem.py): Bridges bytes ↔ structured data.
parse(input: S) -> Tanddumps(output: T) -> S- Enables format-agnostic passes to work on bytes, JSON, Python AST, etc.
compose(format, pass)inpasses/definitions.pywraps a pass to work through a format layer
ReductionPass (passes/definitions.py): A callable (ReductionProblem[T]) -> Awaitable[None] that makes reduction attempts until no progress.
ReductionPump (passes/definitions.py): Like a pass but can temporarily INCREASE size (e.g., inlining functions). Returns a new test case that the reducer tries to reduce below the original.
Reducer (reducer.py): Orchestrates passes. ShrinkRay is the main reducer for bytes, organizing passes into tiers:
- initial_cuts: Fast high-value passes (comments, hollow, large blocks) with timeout-based cancellation
- great_passes: Core loop (line deletion, token deletion, lift_braces) - loops until no progress
- ok_passes: Run when great_passes stop making progress
- last_ditch_passes: Expensive or low-yield passes (byte lowering, substitutions)
WorkContext (work.py): Manages Trio-based parallelism with methods like map(), filter(), find_first_value(). Uses lazy prefetching with backpressure.
PatchApplier (passes/patching.py): Handles parallel patch application with binary-search merging to find maximal compatible patch sets. Key insight: if patches A and B work independently, trying both together often works.
bytes.py- Generic byte-level cuts (lines, blocks, tokens, whitespace)sequences.py- Operations on sequences (block_deletion, delete_duplicates)genericlanguages.py- Language-agnostic text passes (comments, brackets, identifiers)python.py- libcst-based AST reductions (lift blocks, strip annotations)json.py- JSON-specific passes (delete keys recursively)sat.py- DIMACS CNF format with unit propagationclangdelta.py- C/C++ support via creduce's clang_delta tool (pumps)
CLI → ShrinkRayState → ReductionProblem → Reducer → Passes → Patches → is_interesting → Update state
Passes work by generating patches (typically Cuts for deletions), applying them in parallel, and updating the problem state when reductions succeed.
The interactive TUI uses textual (not urwid) and runs in a subprocess architecture:
Main Process (asyncio/textual) Subprocess (trio)
┌─────────────────────────────┐ ┌─────────────────────────────┐
│ ShrinkRayApp (textual) │ │ ReducerWorker │
│ - StatsDisplay widget │◄───│ - Runs actual reduction │
│ - ContentPreview widget │ │ - Emits ProgressUpdate │
│ - SubprocessClient │───►│ - Handles start/cancel │
└─────────────────────────────┘ └─────────────────────────────┘
stdin/stdout JSON protocol
Why subprocess? Textual requires asyncio, but the reducer uses trio. They're incompatible in the same process.
Protocol (subprocess/protocol.py):
Request: Commands sent to worker (start, cancel, status)Response: Command acknowledgments with resultsProgressUpdate: Periodic stats (size, calls, reductions, parallelism, content preview)
Key files:
tui.py- Textual app with StatsDisplay and ContentPreview widgetssubprocess/worker.py- Entry point for reducer subprocess (shrinkray-worker)subprocess/client.py- SubprocessClient manages communicationsubprocess/protocol.py- Message dataclasses and JSON serialization
- Natural ordering: Ensures reproducibility - same minimal result regardless of reduction path. Uses a multi-tier heuristic for text (length, average squared line length, line count, then character ordering) and shortlex for binary data.
- Cache clearing on reduction: When a smaller test case is found, old cached results are no longer useful (derived from old test case)
- View caching:
problem.view(format)caches parsed views to avoid redundant parsing - Speculative parallelism: Multiple candidates tested concurrently; first success wins, others are "wasted" but harmless
This codebase is complex. The TUI runs in asyncio, the reducer runs in trio, they communicate via subprocess with a JSON protocol, and there are multiple layers of abstraction. This complexity makes bugs easy to introduce and hard to find. The tooling (tests, coverage, lints) exists to catch these bugs early. Use it properly.
A feature is NOT complete until it has 100% test coverage. This is not a suggestion—it is the definition of "done." Attempts to work around this (suppressions, claiming something is "too hard to test") are actively harmful because they leave bugs in the codebase that will cause problems later.
The following keywords are used per RFC 2119:
- MUST / MUST NOT - Absolute requirements
- SHOULD / SHOULD NOT - Strong recommendations with exceptions only when fully understood
- MAY - Truly optional
This codebase has repeatedly demonstrated that:
- Code written without tests first is usually broken
- "It's too hard to test" usually means the code needs refactoring
- Coverage gaps always correspond to bugs discovered later
- The tests catch bugs that seem obvious in hindsight but weren't caught during implementation
TDD is not bureaucracy—it's the most efficient path to working code in a codebase this complex.
Before writing any code, you MUST identify which layers need testing:
Layer 1: Pure Logic (no I/O)
- Data transformations, protocol serialization, algorithms
- Test with: Direct unit tests, Hypothesis property tests
- Example:
PassStatsDataserialization, history file parsing
Layer 2: Worker/Reducer (trio async)
- The
ReducerWorkerclass, reduction passes, problem state - Test with:
pytest-trio,@pytest.mark.trio, mock I/O streams - Example:
test_subprocess_worker.pytests worker commands in isolation
Layer 3: TUI Components (asyncio)
- Individual widgets like
StatsDisplay,ContentPreview, modals - Test with:
asyncio.run()wrapper,FakeReductionClient - Example: Testing that a modal displays correct data without spawning a real worker
Layer 4: Integration (subprocess communication)
- Full TUI ↔ Worker communication
- Test with: Real subprocess spawning (mark as
@pytest.mark.slow) - Example:
test_tui_history_modal_during_reduction
You MUST test each layer independently before testing them together. If a bug exists in the worker layer, you SHOULD be able to reproduce it with a worker-only test, not by running the full TUI.
For each feature, you MUST write tests before implementation:
-
Start with the contract - What should this feature do? Write a test that asserts the expected behavior.
-
Write failing tests - The test MUST fail before you write the implementation. If it passes, either the test is wrong or the feature already exists.
-
Test error cases first - Error handling is where most bugs hide. Write tests for:
- Missing required parameters
- Invalid input
- State not initialized
- Network/I/O failures
-
Test the happy path - Only after error cases are covered.
Example for adding a new worker command:
# Step 1: Test missing parameter
@pytest.mark.trio
async def test_handle_new_command_missing_param():
worker = ReducerWorker()
response = await worker._handle_new_command("id", {})
assert response.error == "param is required"
# Step 2: Test invalid state
@pytest.mark.trio
async def test_handle_new_command_no_state():
worker = ReducerWorker()
response = await worker._handle_new_command("id", {"param": "value"})
assert response.error == "State not available"
# Step 3: Test success case
@pytest.mark.trio
async def test_handle_new_command_success():
worker = ReducerWorker()
worker.state = MagicMock()
# ... setup ...
response = await worker._handle_new_command("id", {"param": "value"})
assert response.result == {"status": "success"}Write the minimum code to make tests pass:
- You MUST NOT write code that isn't exercised by a test
- You SHOULD run tests after each small change
- If you find yourself writing complex logic, stop and write more tests first
After implementation, you MUST run coverage:
uv run coverage run -m pytest tests/ -m "not slow" --ignore=tests/test_tui_snapshots.py -q
uv run coverage report --fail-under=100If coverage is not 100%, the feature is not complete.
When coverage shows missing lines:
- You MUST NOT add
# pragma: no cover - You MUST write a test that exercises the missing code
- If the code is genuinely unreachable, you MUST delete it
- If testing is difficult, you SHOULD refactor the code to be more testable
When code is hard to test, the problem is usually the code's structure, not the testing tools. Here are specific refactoring patterns with examples from this codebase:
Problem: Code creates its own dependencies internally, making them impossible to mock.
# BAD: Hard to test - creates its own subprocess
class Worker:
def start(self):
self.process = subprocess.Popen(["shrinkray-worker"])
# ...# GOOD: Dependency injection - accepts a client interface
class ShrinkRayApp:
def __init__(self, ..., client: SubprocessClient | None = None):
self._client = client or SubprocessClient()This pattern is used throughout the TUI - FakeReductionClient can be injected for testing.
Problem: A method mixes computation with I/O, making the computation untestable.
# BAD: Logic mixed with I/O
def process_history_file(self, path: Path) -> None:
data = path.read_bytes()
parsed = self._parse_format(data) # Complex logic
self._update_state(parsed)
self._notify_listeners()# GOOD: Pure parsing function extracted
def parse_history_entry(data: bytes) -> HistoryEntry:
"""Pure function - easy to test with arbitrary inputs."""
# Complex parsing logic here
return HistoryEntry(...)
def process_history_file(self, path: Path) -> None:
data = path.read_bytes()
entry = parse_history_entry(data) # Tested separately
self._update_state(entry)
self._notify_listeners()Now parse_history_entry can be tested with Hypothesis to generate edge cases.
Problem: Code uses time.time() directly, making time-dependent behavior untestable.
# BAD: Untestable timing logic
def should_refresh(self) -> bool:
return time.time() - self._last_refresh > 0.5# GOOD: Inject time source or use relative time
def __init__(self, ..., time_source: Callable[[], float] = time.time):
self._time_source = time_source
def should_refresh(self) -> bool:
return self._time_source() - self._last_refresh > 0.5
# In tests:
fake_time = 0.0
widget = Widget(time_source=lambda: fake_time)
fake_time = 1.0 # Advance time
assert widget.should_refresh()Problem: A large method has multiple code paths, and testing one path requires setting up unrelated state.
# BAD: One method, many responsibilities
def handle_command(self, cmd: str, params: dict) -> Response:
if cmd == "start":
# 50 lines of start logic
elif cmd == "cancel":
# 30 lines of cancel logic
elif cmd == "restart":
# 40 lines of restart logic# GOOD: Dispatch to focused handlers
def handle_command(self, cmd: str, params: dict) -> Response:
handlers = {
"start": self._handle_start,
"cancel": self._handle_cancel,
"restart": self._handle_restart,
}
handler = handlers.get(cmd)
if handler is None:
return Response(error=f"Unknown command: {cmd}")
return handler(params)
async def _handle_restart(self, params: dict) -> Response:
"""Focused method - test this directly."""
# ...Each handler can be tested in isolation with minimal setup.
Problem: Code depends on a concrete class that's hard to instantiate in tests.
# BAD: Depends on concrete class
def update_display(self, state: ShrinkRayStateSingleFile) -> None:
size = state.problem.current_size
# ...# GOOD: Depend on protocol or use duck typing
from typing import Protocol
class HasCurrentSize(Protocol):
@property
def current_size(self) -> int: ...
def update_display(self, state: HasCurrentSize) -> None:
size = state.current_size
# ...
# In tests: just pass any object with current_size
mock_state = MagicMock()
mock_state.current_size = 100
update_display(mock_state)Problem: Async and sync code are interleaved, making it hard to test the sync parts.
# BAD: Async operation buried in logic
async def compute_result(self, data: bytes) -> Result:
processed = self._transform(data) # Sync
validated = await self._async_validate(processed) # Async
return self._finalize(validated) # Sync# GOOD: Separate sync logic from async coordination
def transform(self, data: bytes) -> ProcessedData:
"""Pure sync function - easy to test."""
return ProcessedData(...)
def finalize(self, validated: ValidatedData) -> Result:
"""Pure sync function - easy to test."""
return Result(...)
async def compute_result(self, data: bytes) -> Result:
"""Thin async coordinator - test with mocked dependencies."""
processed = self.transform(data)
validated = await self._async_validate(processed)
return self.finalize(validated)Problem: Behavior depends on hardcoded values that can't be changed in tests.
# BAD: Hardcoded interval
def on_mount(self) -> None:
self.set_interval(0.5, self._refresh) # Must wait 500ms in tests# GOOD: Parameterize the interval
def __init__(self, ..., refresh_interval: float = 0.5):
self._refresh_interval = refresh_interval
def on_mount(self) -> None:
self.set_interval(self._refresh_interval, self._refresh)
# In tests: use short interval or mock set_intervalProblem: Method mutates object state, requiring inspection of internals to verify behavior.
# BAD: Mutates internal state
def update_stats(self, new_data: Stats) -> None:
self._stats = new_data
self._last_update = time.time()
if self._stats.size < self._threshold:
self._trigger_alert()# GOOD: Return a result that can be asserted
@dataclass
class StatsUpdateResult:
new_stats: Stats
alert_triggered: bool
def compute_stats_update(self, new_data: Stats) -> StatsUpdateResult:
"""Pure function - returns what would happen."""
alert = new_data.size < self._threshold
return StatsUpdateResult(new_stats=new_data, alert_triggered=alert)
def update_stats(self, new_data: Stats) -> None:
result = self.compute_stats_update(new_data)
self._stats = result.new_stats
self._last_update = time.time()
if result.alert_triggered:
self._trigger_alert()Problem: Object creation requires many parameters, making test setup verbose.
# BAD: Tests need to specify everything
def test_something():
state = ShrinkRayStateSingleFile(
input_type=InputType.all,
in_place=False,
test=["./test.sh"],
timeout=10.0,
base="test.c",
parallelism=1,
filename="test.c",
formatter="none",
# ... 10 more parameters ...
)# GOOD: Factory with sensible defaults
def make_test_state(
*,
initial: bytes = b"test",
filename: str = "test.c",
**overrides
) -> ShrinkRayStateSingleFile:
"""Factory for tests - only specify what matters."""
defaults = {
"input_type": InputType.all,
"in_place": False,
"test": ["true"],
"timeout": 10.0,
# ... sensible defaults ...
}
return ShrinkRayStateSingleFile(
initial=initial,
filename=filename,
**(defaults | overrides)
)
# In tests:
state = make_test_state(initial=b"specific content")Problem: Functions depend on or modify global state, causing test interference.
# BAD: Global state
_current_worker = None
def get_worker() -> Worker:
global _current_worker
if _current_worker is None:
_current_worker = Worker()
return _current_worker# GOOD: Pass dependencies explicitly or use a context
class WorkerContext:
def __init__(self):
self.worker = Worker()
# Tests create their own context
def test_something():
ctx = WorkerContext()
result = do_thing(ctx.worker)You SHOULD refactor when you notice:
- A test requires more than 10 lines of setup
- You need to access private attributes (
_foo) to verify behavior - Multiple tests duplicate the same complex setup
- You're tempted to add
# pragma: no coverto "unreachable" code - A single test covers multiple unrelated behaviors
- You need to mock more than 3 things to test a single method
After coverage passes, you MUST run lints:
just lintLint errors indicate code quality issues. You MUST NOT suppress them with # noqa except for the explicitly allowed cases in the "Avoid Suppressions" section.
Before committing, ask yourself:
- Did I write tests first, or did I retrofit them?
- Is every code path tested?
- Would I be embarrassed if someone reviewed this code?
- Did I take any shortcuts?
Most worker features can be tested without the TUI:
@pytest.mark.trio
async def test_worker_feature():
# Create worker with fake I/O
input_stream = BidirectionalInputStream()
output = MemoryOutputStream()
worker = ReducerWorker(input_stream=input_stream, output_stream=output)
# Send commands directly
input_stream.send_command(Request(id="1", command="start", params={...}))
# Run worker in nursery
async with trio.open_nursery() as nursery:
nursery.start_soon(worker.run)
# ... interact with worker ...
nursery.cancel_scope.cancel()This is MUCH faster than spawning a real subprocess and avoids TUI complexity.
Use FakeReductionClient to test TUI behavior:
def test_tui_feature():
async def run():
updates = [ProgressUpdate(status="Running", size=100, ...)]
client = FakeReductionClient(updates=updates)
app = ShrinkRayApp(file_path="test.txt", test=["true"], client=client)
async with app.run_test() as pilot:
await pilot.press("x") # Test keyboard interaction
# ... assert widget state ...
asyncio.run(run())This tests TUI logic without subprocess overhead.
Integration tests (real TUI + real worker) SHOULD be used sparingly:
- Verifying the protocol works end-to-end
- Testing timing-sensitive behavior
- Catching bugs that only appear with real async scheduling
Integration tests MUST be marked @pytest.mark.slow.
When you think something is too hard to test, it usually means one of:
- The code is poorly structured - Refactor it. Extract testable pieces.
- You don't understand the testing tools - Read the existing tests for examples.
- The test would be slow - That's fine, mark it
@pytest.mark.slow. - You need to mock something - Use
unittest.mock.patchor dependency injection.
"Too hard to test" is almost never a valid reason to skip tests. If you genuinely believe something cannot be tested, explain the specific technical barrier and propose a refactoring that would make it testable.
- Use pytest-trio for most async tests - This project uses pytest-trio, so async test functions work directly. Simply define
async def test_something():and pytest-trio will run it. - Exception: SubprocessClient tests use asyncio - The
SubprocessClientclass uses asyncio (for textual TUI compatibility), so tests for it must useasyncio.run()wrappers:This isolates asyncio tests from the trio event loop.def test_subprocess_client_something(): async def run(): client = SubprocessClient() # ... test logic ... asyncio.run(run())
- Parametrize similar tests - If multiple tests differ only in a single value (e.g., testing with values 0, 2, 4, 64, 100), use
@pytest.mark.parametrizeinstead of duplicating the test function. - Use meaningful IDs - Use
pytest.param(value, id="name")with comments explaining why each value was chosen:@pytest.mark.parametrize("target", [ pytest.param(0, id="zero"), # Edge case: boundary condition pytest.param(4, id="boundary"), # Where linear scan ends pytest.param(64, id="power_of_two"), # Tests binary search ])
- Parametrize by parallelism - Tests that involve WorkContext should typically be parametrized by parallelism
[1, 2]to catch bugs that only manifest in parallel execution.
- Use module-level functions, not classes - Write tests as
def test_something():at module level, not insideclass TestSomething:. Group related tests using section comments instead of classes. - Group related tests with section comments (e.g.,
# === View tests ===) - Keep tests fast (< 500ms each, ideally much less)
- Test edge cases explicitly with meaningful test names
- All imports at top of file - Never put imports inside test functions. Put all imports (including
from unittest.mock import patch) at the top of the test file. To check for violations:git grep ' import' tests/
- Mark genuinely slow tests with
@pytest.mark.slow- Tests that inherently require waiting (timeouts, full reductions, integration tests) should be marked as slow - By default,
just testskips slow tests - This keeps the feedback loop fast during development - Run
just test -m ""to run all tests including slow ones - Use this before committing or in CI - Maintain 100% coverage from non-slow tests - If a slow test provides unique coverage, either:
- Add a fast test that covers the same code path differently
- Accept the test as non-slow if the coverage is essential
- Refactor the code to be more testable without slow operations
- Use
--durations=10to monitor test performance - The justfile includes this by default to show slowest tests
- Snapshot tests (
tests/test_tui_snapshots.py) use a vendored copy ofpytest-textual-snapshotintests/pytest_textual_snapshot.py- The vendored copy fixes syrupy 5.0 compatibility (upstream uses
_file_extensionbut syrupy 5.0 changed tofile_extension) - Snapshots are saved as
.svgfiles for easy viewing
- The vendored copy fixes syrupy 5.0 compatibility (upstream uses
- Run
just test tests/test_tui_snapshots.py --snapshot-updateto update snapshots after intentional UI changes - View snapshot gallery:
uv run pytest tests/test_tui_snapshots.py --snapshot-reportthen opensnapshot_report.html - Unit tests (
tests/test_tui.py) useFakeSubprocessClientto test TUI logic without spawning subprocesses - Textual's
refresh(layout=True)is needed when updating reactive properties that affect widget height
The claude-reliability binary for this project is located at:
.claude-reliability/bin/claude-reliability
Always use this path when running commands. Do NOT use bare claude-reliability,
do NOT use paths containing ~/.claude-reliability/, and do NOT use $PLUGIN_ROOT_DIR
or any other variable to construct the path.
Example usage:
.claude-reliability/bin/claude-reliability work list
.claude-reliability/bin/claude-reliability work next
.claude-reliability/bin/claude-reliability work on <id>
.claude-reliability/bin/claude-reliability work update <id> --status complete