Skip to content

AksaRose/Semantic-Search-Engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Semantic Search Engine

Semantic Search Engine is an intelligent PDF-based search tool that allows you to ask natural language questions and get semantically relevant answers from documents.
It uses Sentence Transformers for embeddings and FAISS for vector similarity search.

Didnt used LanagChain, Langchain had lot of abstractions that made this alot simpler and efficient. But I didnt wanted it to be simple. So, went the traditional way.


Screenshot 2025-09-17 at 11 05 01 PM

✨ Features

  • PDF Reader – Upload and parse text from PDF files.
  • Text Chunking – Splits large documents into smaller sentence chunks for better embedding and retrieval.
  • Embeddings – Uses all-MiniLM-L6-v2 from Sentence Transformers to generate semantic embeddings.
  • Vector Search – Leverages FAISS to store and query embeddings efficiently.
  • Question Answering – Ask a question like "What is the relevance of Blockchain?" and retrieve the most relevant chunks.

⚡ Example

python main.py

Sample output:

Total chunks: 42
2
[[0.23 0.45]] [[12 33]]
['Blockchain enables secure and transparent transactions ...'] distance: 0.23
['Distributed ledgers provide ...'] distance: 0.45

🛠️ How It Works

  1. Extract text from PDFs using PyPDF2.
  2. Tokenize sentences using NLTK.
  3. Split into chunks (approx. 100 words each).
  4. Encode sentences into embeddings using Sentence Transformers.
  5. Build a FAISS index to store embeddings.
  6. Search queries against the index to find relevant chunks.

##📦 Prerequisites

  • Python 3.9+ (tested on 3.11 / 3.13)
  • Virtual Environment (venv) recommended Install dependencies:
pip install PyPDF2 nltk sentence-transformers faiss-cpu numpy 

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages