blind-eye

A modular CLI tool for privacy-preserving text processing through composable redaction and validation pipelines.

Description

blind-eye provides a Unix-style pipeline. Each component reads from stdin and writes to stdout, enabling flexible data processing workflows for PII detection, redaction, and validation.

Setup

make # runs setup, dependencies and tools

# make setup
# Installs pipx and poetry for python3 dependency management

# make dependencies
# Installs dependencies (poetry deps + models for spacy validation)

# make tools
# Installs pre-commit hooks and code quality tools

Usage

The CLI consists of three composable stages:

./blind hf-input [options] | ./blind redact [options] | ./blind validate [options]

Commands

hf-input

Fetches sample datasets from Hugging Face.

./blind hf-input --sample-size 128 --dataset ai4privacy/pii-masking-400k

./blind hf-input --help

--name           TEXT  Dataset name [default: ai4privacy/pii-masking-400k]
--filter-key     TEXT  Filter column name [default: language]
--filter-value   TEXT  Filter column value [default: en]
--source-column  TEXT  Source text column [default: source_text]
--sample-size    TEXT  Number of samples [default: 128]
--clear-cache    BOOL  Clears pre-processed cache

Some datasets are quite large, and pre-processing takes some time. This command is cached by default given the same arguments

redact

Applies redaction models (fetched from hugging face) to input text.

./blind redact --model my-NER-model

./blind redact --help

--model-id          TEXT     [default: iiiorg/piiranha-v1-detect-personal-information]
--batch-size        INTEGER  [default: 64]
--confidence        INTEGER  [default: 0.5]

validate

Validates redacted output

./blind validate

./blind validate --help

--level                [critical]         [default: critical]                     │
--output-format        [json|error_rate]  [default: error_rate]                   │
--confidence           FLOAT              [default: 0.5]                          │
--batch-size           INTEGER            [default: 64]                           │
--language             TEXT               [default: en]

--level only does critical, it seems not existing project as I could find distinguishes on severity levels for PII/PHI leakage. Training my own model here would be too far out of scope for this project.

Examples

# Full pipeline
./blind hf-input --sample-size 100 | ./blind redact | ./blind validate

# Cache intermediate results
./blind hf-input --sample-size 128 > raw.txt
cat raw.txt | ./blind redact > redacted.txt
cat redacted.txt | ./blind validate

# Use custom data
cat my_data.txt | ./blind redact | ./blind validate

# Or skip validation to see the redacted output
echo "My name is John Doe and I live in Amsterdam" | ./blind redact

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
src/blind_eye		src/blind_eye
tests/lib		tests/lib
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Makefile		Makefile
README.md		README.md
blind		blind
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

blind-eye

Description

Setup

Usage

Commands

hf-input

redact

validate

Examples

About

Uh oh!

Releases

Packages

Languages

matthijn/blind-eye

Folders and files

Latest commit

History

Repository files navigation

blind-eye

Description

Setup

Usage

Commands

hf-input

redact

validate

Examples

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages