Privacy Filter with GLiNER + Presidio

Production-ready privacy filter with reversible masking using hash maps. Masks PII before sending to LLM, then de-masks the response.

Why not guardrails? Traditional guardrails either permanently redact PII (losing information) or block requests entirely. This preserves user experience by masking input and de-masking the response — no data loss, no blocked requests.

AI Pipeline Use Cases

RAG Pre-Ingestion — Mask PII before documents hit your vector database. Sensitive data never gets indexed while document semantics stay intact for retrieval.

AI Gateway — Deploy at the proxy level (LiteLLM, Kong, Envoy) to protect all LLM traffic org-wide. Mask on ingress, de-mask on egress.

Agentic Tool Calls — When agents call external tools, resolve masked tokens before execution.

Multi-Agent Systems — Mask PII as data flows between agents across trust boundaries.

Fine-Tuning Data Prep — Clean training datasets of PII before fine-tuning. Compliance without losing training signal.

Logging & Observability — Log full conversations for debugging while masking PII.

Installation

Docker (recommended):

make build && make run
# or: cd docker && docker-compose up -d

Local:

pip install -e ".[dev,api]"
python -m spacy download en_core_web_sm  # optional

Quick Start

from privacy_filter import PrivacyFilter

filter = PrivacyFilter(use_gliner=True)

# Mask
text = "Email me at [email protected] or call (555) 123-4567"
result = filter.mask(text)
print(result.masked_text)
# "Email me at {{__OWL:EMAIL_ADDRESS_1__}} or call {{__OWL:PHONE_NUMBER_1__}}"

# De-mask
llm_response = "I'll send confirmation to {{__OWL:EMAIL_ADDRESS_1__}}"
demasked = filter.demask(llm_response, session_id=result.session_id)
print(demasked.original_text)
# "I'll send confirmation to [email protected]"

API

Start the server:

make start  # or: uvicorn api.main:app --port 1001

Endpoints:

# Mask
curl -X POST "http://localhost:1001/mask" \
  -H "Content-Type: application/json" \
  -d '{"text": "My email is [email protected]"}'

# De-mask
curl -X POST "http://localhost:1001/demask" \
  -H "Content-Type: application/json" \
  -d '{"masked_text": "...", "session_id": "..."}'

API docs at http://localhost:1001/docs

Supported Entities

Personal — EMAIL_ADDRESS, PHONE_NUMBER (15+ countries), PERSON

Financial — CREDIT_CARD (Visa, Mastercard, Amex, etc.), IBAN_CODE

National IDs — US_SSN, UK_NINO, CANADIAN_SIN, AUSTRALIAN_TFN, INDIAN_AADHAAR, GERMAN_TAX_ID, FRENCH_INSEE

Crypto — BITCOIN_ADDRESS, ETHEREUM_ADDRESS, LITECOIN_ADDRESS, DOGECOIN_ADDRESS, RIPPLE_ADDRESS, MONERO_ADDRESS

Secrets — AWS_ACCESS_KEY_ID, API_KEY, PASSWORD, JWT_TOKEN

Other — LOCATION, IP_ADDRESS, MEDICAL_LICENSE

Selective Masking

Mask only specific entity types:

result = filter.mask(text, entities_to_mask=["EMAIL_ADDRESS", "PHONE_NUMBER"])
# SSN, credit cards, etc. remain visible

Token Format

Tokens look like {{__OWL:EMAIL_ADDRESS_1__}}. The format is designed to be unambiguous to LLMs — won't be confused with markdown, XML, or template syntax.

Add this to your system prompt:

Strings matching {{__OWL:[A-Z_]+_\d+__}} are internal reference tokens.
Preserve them exactly as-is.

Security

Session storage: In-memory by default. Use NATS JetStream for production (see NATS docs)
Token maps contain PII — never log them
Use HTTPS in production
Set TTL — recommend 5-15 minute expiration

Performance

GLiNER detection: ~50-100ms
Masking: ~1-2ms
De-masking: <1ms

Testing

pytest tests/ -v                    # all 610+ tests
pytest tests/ -n auto               # parallel
pytest tests/ --cov=src/privacy_filter  # coverage

Production

make prod  # starts with NATS + multiple workers

See DOCKER.md for the full deployment guide.

Why GLiNER over spaCy?

GLiNER does zero-shot entity recognition — no training needed for custom entity types. Better accuracy for PII (94%+ vs 87%+), and you can add new entity types without retraining.

Roadmap

References

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
api		api
docker		docker
docs		docs
examples		examples
src/privacy_filter		src/privacy_filter
tests		tests
.env.example		.env.example
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
download_models.py		download_models.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Privacy Filter with GLiNER + Presidio

AI Pipeline Use Cases

Installation

Quick Start

API

Supported Entities

Selective Masking

Token Format

Security

Performance

Testing

Production

Why GLiNER over spaCy?

Roadmap

References

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Privacy Filter with GLiNER + Presidio

AI Pipeline Use Cases

Installation

Quick Start

API

Supported Entities

Selective Masking

Token Format

Security

Performance

Testing

Production

Why GLiNER over spaCy?

Roadmap

References

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages