A modular CLI tool for privacy-preserving text processing through composable redaction and validation pipelines.
blind-eye provides a Unix-style pipeline. Each component reads from stdin and writes to stdout, enabling flexible data processing workflows for PII detection, redaction, and validation.
make # runs setup, dependencies and tools
# make setup
# Installs pipx and poetry for python3 dependency management
# make dependencies
# Installs dependencies (poetry deps + models for spacy validation)
# make tools
# Installs pre-commit hooks and code quality toolsThe CLI consists of three composable stages:
./blind hf-input [options] | ./blind redact [options] | ./blind validate [options]Fetches sample datasets from Hugging Face.
./blind hf-input --sample-size 128 --dataset ai4privacy/pii-masking-400k./blind hf-input --help
--name TEXT Dataset name [default: ai4privacy/pii-masking-400k]
--filter-key TEXT Filter column name [default: language]
--filter-value TEXT Filter column value [default: en]
--source-column TEXT Source text column [default: source_text]
--sample-size TEXT Number of samples [default: 128]
--clear-cache BOOL Clears pre-processed cacheSome datasets are quite large, and pre-processing takes some time. This command is cached by default given the same arguments
Applies redaction models (fetched from hugging face) to input text.
./blind redact --model my-NER-model./blind redact --help
--model-id TEXT [default: iiiorg/piiranha-v1-detect-personal-information]
--batch-size INTEGER [default: 64]
--confidence INTEGER [default: 0.5]Validates redacted output
./blind validate./blind validate --help
--level [critical] [default: critical] │
--output-format [json|error_rate] [default: error_rate] │
--confidence FLOAT [default: 0.5] │
--batch-size INTEGER [default: 64] │
--language TEXT [default: en] --level only does critical, it seems not existing project as I could find distinguishes on severity levels for PII/PHI leakage. Training my own model here would be too far out of scope for this project.
# Full pipeline
./blind hf-input --sample-size 100 | ./blind redact | ./blind validate# Cache intermediate results
./blind hf-input --sample-size 128 > raw.txt
cat raw.txt | ./blind redact > redacted.txt
cat redacted.txt | ./blind validate# Use custom data
cat my_data.txt | ./blind redact | ./blind validate# Or skip validation to see the redacted output
echo "My name is John Doe and I live in Amsterdam" | ./blind redact