CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

MedMiner leverages large language models (LLMs) and LangGraph to extract and analyze data from medical documents. The project uses a workflow-based architecture to process doctor's letters and extract structured medical information (e.g., medications, diagnoses).

Development Environment

This project uses a dev container with all necessary tools pre-configured. Use VS Code's "Reopen in Container" feature to get started.

Package Manager: uv (modern Python package manager)

Common Commands

Testing

# Run all tests with coverage
uv run pytest

# Run a single test file
uv run pytest tests/test_file.py

# Run a specific test function
uv run pytest tests/test_file.py::test_function_name

Code Quality

# Format and lint code
uv run ruff check --fix

# Type checking
uv run ty check

# Run pre-commit hooks manually
pre-commit run --all-files

Documentation

# Build documentation
mkdocs build

# Serve documentation locally
mkdocs serve

Architecture

Workflow System

MedMiner uses a LangGraph-based workflow architecture where each extraction task is implemented as a compiled state graph with sequential node processing:

Base Workflow (src/medminer/workflows/base/workflow.py)
- BaseWorkflow: Abstract base for all workflows
- BaseExtractionWorkflow: Template for extraction tasks, builds a graph with: Extraction → Processing Node(s) → Storage
State Management (src/medminer/workflows/base/schema.py)
- DoctorsLetterState: Base state with patient_id and letter text
- ExtractuionState: Generic extraction state tracking raw/processed data and output path
- States are TypedDict subclasses passed through the graph
Node Types (src/medminer/workflows/base/node.py)
- InformationExtractor: Uses LLM with structured output to extract data from text
- BaseNode subclasses: Custom processing nodes (e.g., RxNavLookup for medication enrichment)
- DataStorage: Final node that writes processed data to CSV

Creating New Workflows

To add a new extraction workflow (e.g., for diagnoses, procedures):

Create a new directory under src/medminer/workflows/ (e.g., diagnoses/)
Define schemas in schema.py:
- Raw extracted data TypedDict
- Processed data TypedDict (if different from raw)
- State class extending ExtractuionState
- Response format extending ResponseFormat
Implement processing nodes in node.py by subclassing BaseNode
Create workflow in workflow.py:
- Subclass BaseExtractionWorkflow
- Set task_name, state_type, prompt, response_format
- Add custom process_nodes if needed (default is NoProcessing)

Example: See src/medminer/workflows/medications/ for a complete implementation that extracts medications and enriches them with RxNorm/ATC codes.

Settings System

src/medminer/settings.py provides a singleton Settings class (cached) for runtime configuration:

register(key, value): Set configuration
get(key, default): Retrieve configuration
Used for base_dir, split_patient flag, etc.

Model Configuration

src/medminer/utils/models.py handles LLM provider configuration via environment variables:

Reads {PROVIDER}_* env vars (e.g., OPENAI_API_KEY, OPENAI_MODEL)
Currently supports OpenAI provider
Returns model parameters as dict

Code Style

Line length: 120 characters
Python version: 3.13+
Formatting: Ruff (similar to Black)
Type hints: Required for all function definitions (mypy strict mode)
String quotes: Double quotes

Testing Notes

Test files go in tests/ directory
Use pytest with coverage enabled
Pytest cache and pytest itself configured in pyproject.toml
Coverage reports generated as XML to coverage.xml

Dependencies

Key dependencies:

LangChain/LangGraph: LLM orchestration and graph-based workflows
Pandas: Data processing and CSV output
Gradio: UI components (in src/medminer/ui/)
httpx: HTTP client for external APIs (e.g., RxNav)

See pyproject.toml for complete dependency list.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLAUDE.md

Project Overview

Development Environment

Common Commands

Testing

Code Quality

Documentation

Architecture

Workflow System

Creating New Workflows

Settings System

Model Configuration

Code Style

Testing Notes

Dependencies

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Project Overview

Development Environment

Common Commands

Testing

Code Quality

Documentation

Architecture

Workflow System

Creating New Workflows

Settings System

Model Configuration

Code Style

Testing Notes

Dependencies