A Next-Generation Plagiarism Detection System Powered by Deep Learning and Vector Search Technology
Architecture β’ Features β’ Tech Stack β’ Installation β’ Getting Started β’ Docs
The system employs a three-tier architecture:
- π Document Processing Layer: Extracts text from PDFs, segments into sentences, and generates embeddings
- πΎ Storage Layer: Stores document metadata in PostgreSQL and vector embeddings in Milvus
- π Search Layer: Performs high-performance similarity searches and generates detailed reports
- Semantic Analysis Engine: Powered by state-of-the-art transformer models
- Multi-lingual Support: Optimized for Vietnamese and English content
- Context-Aware Detection: Understanding beyond simple text matching
- Vector Search Technology: Using Milvus for lightning-fast similarity search
- Parallel Processing: Efficient handling of large document collections
- Scalable Infrastructure: Designed for institutional deployment
- Visual Results: Interactive visualization of matched content
- Detailed Reports: Page-by-page similarity analysis
- Evidence Mapping: Precise location of potential matches
- π Python
3.8+
- Core programming language - π PostgreSQL
12+
- Relational database for metadata - π Milvus
2.x
- Vector database for similarity search - π³ Docker & Docker Compose - Container management
- πΎ RAM
8GB+
- Recommended for optimal performance - π» CPU
4+ cores
- For parallel processing - π΄ Storage
10GB+
- For document storage and embeddings
-
π PostgreSQL Setup
# Start PostgreSQL service docker run -d \ --name postgres \ -e POSTGRES_USER=username \ -e POSTGRES_PASSWORD=password \ -e POSTGRES_DB=database_name \ -p 5434:5432 \ postgres:12
-
π Milvus Setup
# Download Milvus docker-compose file wget https://github.com/milvus-io/milvus/releases/download/v2.3.3/milvus-standalone-docker-compose.yml -O docker-compose.yml # Start Milvus docker-compose up -d
-
π¦ Clone Repository
git clone https://github.com/drkhanusa/DNU_PlagiarismChecker.git cd DNU_PlagiarismChecker
-
π Create Virtual Environment
python -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate
-
π Install Dependencies
pip install -e .
-
β‘ Environment Configuration
# Copy example environment file cp .env.example .env # Edit .env with your settings # Example configuration: DATABASE_URL=postgresql://username:password@localhost:5434/database_name MILVUS_HOST=localhost MILVUS_PORT=19530
-
π Initialize Database
# Create database tables python setup_database.py # Initialize Milvus collection python create_milvus_db.py
from plagiarism_checker import check_plagiarism_details
# Check a document
results = check_plagiarism_details(
file_path="path/to/document.pdf",
min_similarity=0.9
)
# View results
print(f"Overall Similarity: {results['data']['total_percent']}%")
for doc in results['data']['similarity_documents']:
print(f"Match: {doc['name']} - {doc['similarity_value']}%")
from create_corpus import CorpusCreator
creator = CorpusCreator()
creator.process_document("path/to/document.pdf")
For detailed documentation, please visit our Wiki or refer to the following sections:
- π Installation Guide
- π₯ User Manual
- π§ API Reference
- π€ Contributing Guidelines
Β© 2024 AIoTLab, Faculty of Information Technology, DaiNam University. All rights reserved.
Website β’ GitHub β’ Contact Us