Skip to content

Conversation

@yourwanghao
Copy link

## ๐ŸŽฏ Overview

This PR adds a modern, user-friendly web interface for DeepSeek-OCR with comprehensive bilingual support and real-time progress tracking.

## โœจ Features

### ๐ŸŽจ Modern Web UI
- Beautiful gradient design with smooth animations
- Drag-and-drop file upload
- Responsive design for desktop and mobile
- Zero external frontend dependencies

### ๐Ÿ“Š Real-time Progress Tracking
- WebSocket-based live progress updates
- Streaming log display during processing
- Async task processing for concurrent requests

### ๐Ÿ“ฅ Multiple Download Options
- Markdown file (cleaned text)
- Full annotation file (with detection markers)
- Visualization PDF (with bounding boxes)
- Extracted images (ZIP archive)
- Complete package (all files in ZIP)

### ๐ŸŒ Internationalization
- Auto-detect browser language (Chinese/English)
- One-click language toggle
- Persist user language preference in localStorage
- Full translation of all UI elements

## ๐Ÿ”ง Technical Implementation

- **Backend**: FastAPI with async/await
- **Real-time Communication**: WebSocket for progress updates
- **Task Processing**: Subprocess monitoring with intelligent parsing
- **File Downloads**: In-memory ZIP creation for efficiency
- **I18n**: Data attributes + localStorage for seamless language switching
- **Zero Breaking Changes**: Completely optional, doesn't affect existing CLI

## ๐Ÿ“ฆ Changes

```
5 files changed, 947 insertions(+)
- .gitignore: Standard Python ignore patterns
- requirements.txt: +4 lines (fastapi, uvicorn, python-multipart, tqdm)
- server/app.py: 752 lines (main web application)
- server/README.md: 60 lines (Chinese documentation)
- server/FEATURES.md: 104 lines (feature details)
```

## ๐Ÿš€ Usage

```bash
# Install dependencies
pip install -r requirements.txt

# Start server
uvicorn server.app:app --host 0.0.0.0 --port 8000

# Access from browser
http://localhost:8000
```

## โœ… Testing

- โœ… Tested on Ubuntu 22.04 + Python 3.10
- โœ… Tested with 20+ page PDFs
- โœ… Verified all download options
- โœ… Tested language switching
- โœ… Tested on Chrome

## ๐ŸŽฏ Benefits

1. **Accessibility**: Use from any device with a browser
2. **Team Collaboration**: Share OCR service across network
3. **User-friendly**: No CLI knowledge required
4. **International**: Supports Chinese and English users
5. **Real-time Feedback**: See progress, not waiting in the dark

## ๐Ÿ”’ Compatibility

- โœ… No breaking changes
- โœ… Existing CLI remains unchanged
- โœ… Web service is completely optional
- โœ… Minimal new dependencies

## ๐Ÿ“š Documentation

- Added comprehensive README (Chinese + English)
- Added feature documentation
- Added usage examples
- Inline code comments

## ๐Ÿ”ฎ Future Enhancements

Potential follow-ups (not in this PR):
- Batch file upload queue
- User authentication for public deployment
- Progress persistence across sessions
- OpenAPI/Swagger documentation

---

This PR significantly enhances DeepSeek-OCR's usability while maintaining backward compatibility. Looking forward to your feedback! ๐Ÿ™

Hawk Wang added 2 commits October 22, 2025 11:11
Add a modern, user-friendly web interface for DeepSeek-OCR with the following features:

- Beautiful gradient UI with drag-and-drop file upload
- Real-time progress tracking via WebSocket
- Live log streaming during OCR processing
- Multiple download options:
  * Markdown file (cleaned text)
  * Full annotation file (with detection markers)
  * Visualization PDF (with bounding boxes)
  * Extracted images (ZIP archive)
  * All files (complete ZIP package)
- Responsive design for desktop and mobile
- Async processing to handle concurrent requests
- In-memory ZIP creation for efficient downloads

Technical implementation:
- FastAPI backend with async/await
- WebSocket for real-time updates
- Subprocess monitoring with progress parsing
- Modern CSS with animations and transitions
- Zero external frontend dependencies

This makes DeepSeek-OCR accessible via web browser from any device
on the local network, perfect for team collaboration and remote access.
Add comprehensive internationalization support:

- Auto-detect browser language (navigator.language)
- Toggle between English and Chinese
- Persist language preference in localStorage
- Translate all UI elements dynamically
- Support for:
  * Page title and subtitle
  * Upload instructions
  * Form labels and placeholders
  * Button text
  * Progress messages
  * Error messages
  * Download links

Technical implementation:
- Data attributes (data-en, data-zh) for all text elements
- Translation dictionary for dynamic messages
- Language toggle button in top-right corner
- Smooth transitions when switching languages
- Fully accessible for international users

This makes DeepSeek-OCR accessible to both Chinese and
international users, improving adoption and usability.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant