📄 Paper Anonymizer

A web-based tool to anonymize research papers by removing author names, affiliations, and other identifying information before peer review.

🚀 Features

📂 Select input and output folders
📄 Supports PDF, DOC, DOCX files
🔄 Automatic DOC/DOCX → PDF conversion using Microsoft Word COM
🖱️ Interactive PDF viewer with drag-to-select redaction
✂️ Remove selected regions precisely (word-level redaction)
👁️ Preview removals before saving
↩️ Undo applied redactions
💾 Save anonymized files to output folder
🔗 Merge acknowledgement document (optional)
🧹 Removes PDF metadata for full anonymization

🏗️ Project Structure

Anonymizer-Paper/
│
├── backend/
│   ├── app.py              # FastAPI backend
│   ├── converter.py        # DOC/DOCX → PDF conversion (Word COM)
│   ├── pdf_editor.py       # Redaction + metadata removal
│   ├── utils.py            # File utilities
│   ├── requirements.txt    # Dependencies
│   └── temp/               # Temporary converted PDFs
│
├── input/                  # Input papers
├── output/                 # Anonymized papers
│
├── app.js                  # Frontend logic
├── index.html              # UI
├── style.css               # Styling

.
├── app.py              # FastAPI backend
├── converter.py        # DOC/DOCX → PDF conversion (Word COM)
├── pdf_editor.py       # Redaction + metadata removal
├── utils.py            # File utilities
├── requirements.txt    # Dependencies
│
├── index.html          # Frontend UI
├── app.js              # Frontend logic
├── style.css           # UI styling
│
├── input/              # Input papers
├── output/             # Anonymized papers
├── temp/               # Temporary converted PDFs

⚙️ Setup Instructions

1. Clone / Download

git clone <repo-url>
cd paper-anonymizer

2. Create Virtual Environment

python -m venv .venv
.venv\Scripts\activate   # Windows

3. Install Dependencies

pip install -r requirements.txt

Dependencies include:

FastAPI
PyMuPDF
pywin32 (for Word conversion)

4. Enable Word COM (IMPORTANT)

python -m win32com.client.makepy

Then select:

Microsoft Word XX.X Object Library

⚠️ Requires:

Windows OS
Microsoft Word installed

5. Run Backend Server

uvicorn app:app --reload

Server runs at:

http://localhost:8000

6. Open Frontend

Open:

index.html

Or run via Live Server:

http://localhost:63342/.../index.html

🧠 How It Works

1. File Loading

Backend lists files using list_files()
DOC/DOCX files are converted using Word COM

2. PDF Rendering

Uses PDF.js to render pages in browser
Text layer enables accurate selection

3. Selection System

User drags to select regions
Coordinates are converted to PDF space

4. Redaction Engine

From pdf_editor.py:

Extracts words using:
```
page.get_text("words")
```
Removes only words intersecting selection
Uses overlap threshold (>20%) for accuracy

5. Metadata Removal

doc.set_metadata({})
doc.del_xml_metadata()

🧪 Workflow

Select input & output folders
Choose a paper from sidebar
Drag to select author/affiliation area
Click REMOVE (preview)
Click SAVE to finalize
(Optional) Upload acknowledgement and click MERGE & SAVE

⚠️ Known Limitations

Word → PDF conversion may alter text positioning
Complex layouts (multi-column, tables) may cause slight inaccuracies
Requires Microsoft Word (not cross-platform)

🔮 Future Improvements

🤖 Auto-detect author sections
📊 Confidence score for anonymization
🧠 NLP-based entity removal (names, emails, institutions)

🛠️ Tech Stack

Frontend:

HTML, CSS, JavaScript
PDF.js

Backend:

FastAPI
PyMuPDF (fitz)
pywin32 (Word COM)

📌 Notes

Output files are saved as:
```
<original_name>_anonymized.pdf
```
Temporary files stored in /temp
Supports recursive folder scanning

📸 Screenshots

⭐ Summary

This tool provides a semi-automated anonymization pipeline combining:

manual precision (user selection)
automated processing (word-level redaction + metadata removal)

Designed for research paper review workflows where bias-free evaluation is required.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📄 Paper Anonymizer

🚀 Features

🏗️ Project Structure

⚙️ Setup Instructions

1. Clone / Download

2. Create Virtual Environment

3. Install Dependencies

4. Enable Word COM (IMPORTANT)

5. Run Backend Server

6. Open Frontend

🧠 How It Works

1. File Loading

2. PDF Rendering

3. Selection System

4. Redaction Engine

5. Metadata Removal

🧪 Workflow

⚠️ Known Limitations

🔮 Future Improvements

🛠️ Tech Stack

📌 Notes

📸 Screenshots

⭐ Summary

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
backend		backend
README.md		README.md
app.js		app.js
index.html		index.html
style.css		style.css

Folders and files

Latest commit

History

Repository files navigation

📄 Paper Anonymizer

🚀 Features

🏗️ Project Structure

⚙️ Setup Instructions

1. Clone / Download

2. Create Virtual Environment

3. Install Dependencies

4. Enable Word COM (IMPORTANT)

5. Run Backend Server

6. Open Frontend

🧠 How It Works

1. File Loading

2. PDF Rendering

3. Selection System

4. Redaction Engine

5. Metadata Removal

🧪 Workflow

⚠️ Known Limitations

🔮 Future Improvements

🛠️ Tech Stack

📌 Notes

📸 Screenshots

⭐ Summary

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages