Skip to content

A Vietnamese-optimized plagiarism detection system using Milvus vector database and sentence embeddings. Built for DaiNam University's Faculty of IT to detect similarities in graduation projects. Features high-precision semantic search, batch processing, and detailed reporting. πŸ” πŸ‡»πŸ‡³

Notifications You must be signed in to change notification settings

drkhanusa/DNU_PlagiarismChecker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

15 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ“ DaiNam University Plagiarism Detection System

DaiNam University Logo AIoTLab Logo

Made by AIoTLab Faculty of IT DaiNam University

πŸ”¬ Advanced Academic Integrity Through AI Innovation

A Next-Generation Plagiarism Detection System Powered by Deep Learning and Vector Search Technology

Architecture β€’ Features β€’ Tech Stack β€’ Installation β€’ Getting Started β€’ Docs

πŸ—οΈ Architecture

System Architecture

The system employs a three-tier architecture:

  1. πŸ“„ Document Processing Layer: Extracts text from PDFs, segments into sentences, and generates embeddings
  2. πŸ’Ύ Storage Layer: Stores document metadata in PostgreSQL and vector embeddings in Milvus
  3. πŸ”Ž Search Layer: Performs high-performance similarity searches and generates detailed reports

✨ Key Features

🧠 Advanced AI Technology

  • Semantic Analysis Engine: Powered by state-of-the-art transformer models
  • Multi-lingual Support: Optimized for Vietnamese and English content
  • Context-Aware Detection: Understanding beyond simple text matching

⚑ High-Performance Architecture

  • Vector Search Technology: Using Milvus for lightning-fast similarity search
  • Parallel Processing: Efficient handling of large document collections
  • Scalable Infrastructure: Designed for institutional deployment

πŸ“Š Comprehensive Analysis

  • Visual Results: Interactive visualization of matched content
  • Detailed Reports: Page-by-page similarity analysis
  • Evidence Mapping: Precise location of potential matches

πŸ”§ Tech Stack

Core Technologies

Docker PyTorch FastAPI HuggingFace

Database Systems

PostgreSQL Milvus

πŸ“₯ Installation

πŸ› οΈ Prerequisites

  • 🐍 Python 3.8+ - Core programming language
  • 🐘 PostgreSQL 12+ - Relational database for metadata
  • πŸ” Milvus 2.x - Vector database for similarity search
  • 🐳 Docker & Docker Compose - Container management
  • πŸ’Ύ RAM 8GB+ - Recommended for optimal performance
  • πŸ’» CPU 4+ cores - For parallel processing
  • πŸ–΄ Storage 10GB+ - For document storage and embeddings

πŸ—ƒοΈ Database Setup

  1. 🐘 PostgreSQL Setup

    # Start PostgreSQL service
    docker run -d \
      --name postgres \
      -e POSTGRES_USER=username \
      -e POSTGRES_PASSWORD=password \
      -e POSTGRES_DB=database_name \
      -p 5434:5432 \
      postgres:12
  2. πŸ” Milvus Setup

    # Download Milvus docker-compose file
    wget https://github.com/milvus-io/milvus/releases/download/v2.3.3/milvus-standalone-docker-compose.yml -O docker-compose.yml
    
    # Start Milvus
    docker-compose up -d

βš™οΈ Project Setup

  1. πŸ“¦ Clone Repository

    git clone https://github.com/drkhanusa/DNU_PlagiarismChecker.git
    cd DNU_PlagiarismChecker
  2. 🌟 Create Virtual Environment

    python -m venv venv
    source venv/bin/activate  # Windows: venv\Scripts\activate
  3. πŸ“š Install Dependencies

    pip install -e .
  4. ⚑ Environment Configuration

    # Copy example environment file
    cp .env.example .env
    
    # Edit .env with your settings
    # Example configuration:
    DATABASE_URL=postgresql://username:password@localhost:5434/database_name
    MILVUS_HOST=localhost
    MILVUS_PORT=19530
  5. πŸ”„ Initialize Database

    # Create database tables
    python setup_database.py
    
    # Initialize Milvus collection
    python create_milvus_db.py

πŸš€ Getting Started

⚑ Quick Start

from plagiarism_checker import check_plagiarism_details

# Check a document
results = check_plagiarism_details(
    file_path="path/to/document.pdf",
    min_similarity=0.9
)

# View results
print(f"Overall Similarity: {results['data']['total_percent']}%")
for doc in results['data']['similarity_documents']:
    print(f"Match: {doc['name']} - {doc['similarity_value']}%")

πŸ“₯ Adding Documents to Database

from create_corpus import CorpusCreator

creator = CorpusCreator()
creator.process_document("path/to/document.pdf")

πŸ“š Documentation

For detailed documentation, please visit our Wiki or refer to the following sections:

πŸ“ License

Β© 2024 AIoTLab, Faculty of Information Technology, DaiNam University. All rights reserved.


Made with πŸ’» by AIoTLab at DaiNam University

Website β€’ GitHub β€’ Contact Us

About

A Vietnamese-optimized plagiarism detection system using Milvus vector database and sentence embeddings. Built for DaiNam University's Faculty of IT to detect similarities in graduation projects. Features high-precision semantic search, batch processing, and detailed reporting. πŸ” πŸ‡»πŸ‡³

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages