PII Detection and Redaction Notebook

This Jupyter Notebook demonstrates the process of detecting and redacting Personally Identifiable Information (PII) from text extracted from images. It uses EasyOCR for Optical Character Recognition (OCR) and SpaCy for Named Entity Recognition (NER) to identify and redact sensitive information such as names, emails, phone numbers, and zip codes.

Features

OCR with EasyOCR: Extract text from images.
PII Detection with SpaCy: Identify sensitive entities like names, emails, phone numbers, and zip codes.
PII Redaction: Redact sensitive information using SpaCy and regex patterns.
Visualization: Display the original image alongside the redacted text.

How to Use

Install the required libraries:

pip install easyocr spacy torch matplotlib pillow

Download the SpaCy language model:
```
python -m spacy download en_core_web_sm
```
Open the notebook pytorch_training.ipynb in Jupyter Notebook or JupyterLab and begin training the model.
Once you like what the model is returning, you can use the pii_detection_redaction.ipynb notebook to add a file upload feature and test the PII detection and redaction functionality.
Follow the steps in the notebook to:
- Extract text from an image.
- Detect and redact PII.
- Visualize the results.
Replace the image_path variable with the path to your image file to test with your own data.

Example Output

Original Image: Displays the uploaded image.
Redacted Text: Shows the text with sensitive information redacted.

Future Plans

This notebook serves as a prototype for a Flask web application that will allow users to upload images via a web interface and receive redacted results. Stay tuned for the Flask app implementation!

Dependencies

EasyOCR
SpaCy
Torch
Matplotlib
Pillow

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
notebooks		notebooks
spii_redactor_flask_app		spii_redactor_flask_app
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PII Detection and Redaction Notebook

Features

How to Use

Example Output

Future Plans

Dependencies

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

ComputerAnything/spii_redactor

Folders and files

Latest commit

History

Repository files navigation

PII Detection and Redaction Notebook

Features

How to Use

Example Output

Future Plans

Dependencies

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages