A web-based tool to anonymize research papers by removing author names, affiliations, and other identifying information before peer review.
- ๐ Select input and output folders
- ๐ Supports PDF, DOC, DOCX files
- ๐ Automatic DOC/DOCX โ PDF conversion using Microsoft Word COM
- ๐ฑ๏ธ Interactive PDF viewer with drag-to-select redaction
- โ๏ธ Remove selected regions precisely (word-level redaction)
- ๐๏ธ Preview removals before saving
- โฉ๏ธ Undo applied redactions
- ๐พ Save anonymized files to output folder
- ๐ Merge acknowledgement document (optional)
- ๐งน Removes PDF metadata for full anonymization
Anonymizer-Paper/
โ
โโโ backend/
โ โโโ app.py # FastAPI backend
โ โโโ converter.py # DOC/DOCX โ PDF conversion (Word COM)
โ โโโ pdf_editor.py # Redaction + metadata removal
โ โโโ utils.py # File utilities
โ โโโ requirements.txt # Dependencies
โ โโโ temp/ # Temporary converted PDFs
โ
โโโ input/ # Input papers
โโโ output/ # Anonymized papers
โ
โโโ app.js # Frontend logic
โโโ index.html # UI
โโโ style.css # Styling
.
โโโ app.py # FastAPI backend
โโโ converter.py # DOC/DOCX โ PDF conversion (Word COM)
โโโ pdf_editor.py # Redaction + metadata removal
โโโ utils.py # File utilities
โโโ requirements.txt # Dependencies
โ
โโโ index.html # Frontend UI
โโโ app.js # Frontend logic
โโโ style.css # UI styling
โ
โโโ input/ # Input papers
โโโ output/ # Anonymized papers
โโโ temp/ # Temporary converted PDFs
git clone <repo-url>
cd paper-anonymizerpython -m venv .venv
.venv\Scripts\activate # Windowspip install -r requirements.txtDependencies include:
- FastAPI
- PyMuPDF
- pywin32 (for Word conversion)
python -m win32com.client.makepyThen select:
Microsoft Word XX.X Object Library
- Windows OS
- Microsoft Word installed
uvicorn app:app --reloadServer runs at:
http://localhost:8000
Open:
index.html
Or run via Live Server:
http://localhost:63342/.../index.html
- Backend lists files using
list_files() - DOC/DOCX files are converted using Word COM
- Uses PDF.js to render pages in browser
- Text layer enables accurate selection
- User drags to select regions
- Coordinates are converted to PDF space
From pdf_editor.py:
-
Extracts words using:
page.get_text("words")
-
Removes only words intersecting selection
-
Uses overlap threshold (>20%) for accuracy
doc.set_metadata({})
doc.del_xml_metadata()- Select input & output folders
- Choose a paper from sidebar
- Drag to select author/affiliation area
- Click REMOVE (preview)
- Click SAVE to finalize
- (Optional) Upload acknowledgement and click MERGE & SAVE
- Word โ PDF conversion may alter text positioning
- Complex layouts (multi-column, tables) may cause slight inaccuracies
- Requires Microsoft Word (not cross-platform)
- ๐ค Auto-detect author sections
- ๐ Confidence score for anonymization
- ๐ง NLP-based entity removal (names, emails, institutions)
Frontend:
- HTML, CSS, JavaScript
- PDF.js
Backend:
- FastAPI
- PyMuPDF (fitz)
- pywin32 (Word COM)
-
Output files are saved as:
<original_name>_anonymized.pdf -
Temporary files stored in
/temp -
Supports recursive folder scanning
This tool provides a semi-automated anonymization pipeline combining:
- manual precision (user selection)
- automated processing (word-level redaction + metadata removal)
Designed for research paper review workflows where bias-free evaluation is required.