Welcome! This is a hands-on interview task to assess your ability to build and debug a basic Retrieval-Augmented Generation (RAG) system using Python, FAISS, and language model embeddings.
You are given a minimal RAG system that allows users to query a small set of technical documents. The system uses a FastAPI backend with a FAISS index and sentence-transformer-based embeddings.
🔧 However, the current system is returning irrelevant or inaccurate results. Your task is to identify and fix the retrieval logic to improve the quality of answers.
rag_app_debugging/
│
├── app.py # FastAPI app with /ask endpoint
├── rag_utils.py # FAISS + embedding logic (this is your main focus)
├── client.py # Sends test query to the app
├── data/
│ └── docs.txt # Text corpus (technical content)
├── requirements.txt # Python dependencies
└── README.md # This file
-
Clone the project and create a virtual environment
python3 -m venv .venv source .venv/bin/activate # or `.venv\Scripts\activate` on Windows
-
Install dependencies
pip install -r requirements.txt
-
Start the FastAPI server
uvicorn app:app --reload
-
Run the client to send a query
python client.py
The current system does a basic embedding + FAISS L2 search. You need to improve the context retrieval quality using any of the following techniques (feel free to implement one or more):
- 🔹 Improve document chunking (e.g., use sentence/paragraph-based or sliding window)
- 🔹 Replace
IndexFlatL2withIndexFlatIP(cosine similarity) - 🔹 Normalize embeddings before indexing/search
- 🔹 Return top-k matches (k > 1) and optionally re-rank them
- 🔹 Use a better embedding model (e.g.,
all-MiniLM-L12-v2,intfloat/e5-base-v2) - 🔹 Add cross-encoder re-ranking if time permits
- The
/askendpoint returns relevant and meaningful results for typical technical queries. - You explain the changes you made and why they improve the results.
- Your code is clean, modular, and well-commented.
Try asking the system:
"What is the name of the war operation by Israel?""What is the reason for the war?""What is President Trump's stance on the war?""What could end the war?"
We’re not expecting a production-grade solution — just a thoughtful and focused approach to improving retrieval quality with clear reasoning. Feel free to leave comments in the code or discuss trade-offs.
If you have any questions during the session, just ask.