Skip to content

Commit

Permalink
Locally RAG from Scratch (#171)
Browse files Browse the repository at this point in the history
* assests and app name

* update README

* demo gifs

* talk with github codespaces

* talk with github codespaces

* gitignore

* linted

* added version

* link fix

* added local llm tag

* crag

* link fix

* lint

* llm tags

* non-clickable badge

* non-clickable badge

* fix

* tutorial llm tags

* added instructions and fix

* colab fix

* fix

* formatted

* hybrid search and rag colab

* colab format

* python test

* node test

* python test

* blog link update

* rag mlx

* myntra search engine app

* link fix

* CrewAI Example

* lint

* node test

* node test

* node test

* added readme

* support for Gemini Pro

* fix

* chunking techniques

* lint

* Locally RAG from Scratch

* lint

* llama3 added

* link finx
  • Loading branch information
PrashantDixit0 authored Apr 21, 2024
1 parent 6db3887 commit e4a8ec2
Show file tree
Hide file tree
Showing 7 changed files with 208 additions and 2 deletions.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,8 @@ Looking to get started with LLMs, vectorDBs, and the world of Generative AI? The
| --------- | -------------------------- | ----------- |
| | | |
| [Build RAG from Scratch](./tutorials/RAG-from-Scratch) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/tutorials/RAG-from-Scratch/RAG_from_Scratch.ipynb) [![LLM](https://img.shields.io/badge/openai-api-white)](#) | |
| [Langchain LlamaIndex Chunking](./tutorials/RAG-from-Scratch) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/tutorials/RAG-from-Scratch/RAG_from_Scratch.ipynb)| [![Ghost](https://img.shields.io/badge/ghost-000?style=for-the-badge&logo=ghost&logoColor=%23F7DF1E)](https://blog.lancedb.com/product-quantization-compress-high-dimensional-vectors-dfcba98fab47) |
| [Local RAG from Scratch with Llama3](./tutorials/Local-RAG-from-Scratch) | [![Python](https://img.shields.io/badge/python-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54)](./tutorials/Local-RAG-from-Scratch/rag.py) [![local LLM](https://img.shields.io/badge/local-llm-green)](#) | |
| [Langchain LlamaIndex Chunking](./tutorials/Langchain-LlamaIndex-Chunking) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/tutorials/Langchain-LlamaIndex-Chunking/Langchain_Llamaindex_chunking.ipynb)| [![Ghost](https://img.shields.io/badge/ghost-000?style=for-the-badge&logo=ghost&logoColor=%23F7DF1E)](https://blog.lancedb.com/chunking-techniques-with-langchain-and-llamaindex/) |
| [Product Quantization: Compress High Dimensional Vectors](https://blog.lancedb.com/product-quantization-compress-high-dimensional-vectors-dfcba98fab47) | | [![Ghost](https://img.shields.io/badge/ghost-000?style=for-the-badge&logo=ghost&logoColor=%23F7DF1E)](https://blog.lancedb.com/product-quantization-compress-high-dimensional-vectors-dfcba98fab47) |
| [Corrective RAG with Langgraph](./tutorials/Corrective-RAG-with_Langgraph/) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/tutorials/Corrective-RAG-with_Langgraph/CRAG_with_Langgraph.ipynb) [![LLM](https://img.shields.io/badge/openai-api-white)](#) | [![Ghost](https://img.shields.io/badge/ghost-000?style=for-the-badge&logo=ghost&logoColor=%23F7DF1E)](https://blog.lancedb.com/implementing-corrective-rag-in-the-easiest-way-2/)|
| [LLMs, RAG, & the missing storage layer for AI](https://blog.lancedb.com/llms-rag-the-missing-storage-layer-for-ai-28ded35fa984) | | [![Ghost](https://img.shields.io/badge/ghost-000?style=for-the-badge&logo=ghost&logoColor=%23F7DF1E)](https://blog.lancedb.com/llms-rag-the-missing-storage-layer-for-ai-28ded35fa984/) |
Expand Down
Binary file added assets/RAG-locally.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion tutorials/Langchain-LlamaIndex-Chunking/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@

We have comprehensively covered all the chunking techniques available in Langchain and LlamaIndex.

[Read More in Blog](https://blog.lancedb.com/chunking-techiniques-with-langchain-and-llamaindex/)
[Read More in Blog](https://blog.lancedb.com/chunking-techniques-with-langchain-and-llamaindex/)
14 changes: 14 additions & 0 deletions tutorials/Local-RAG-from-Scratch/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
## Locally RAG from Scratch with Llama3

This example demonstrates RAG built from scratch without using any supporting framework like Langchain and LlamaIndex.

![alt text](<../../assets/RAG-locally.png>)

This easy to build RAG locally can be done in following steps:

1. Reading Document and Recursive Text Splitting
2. Setup LanceDB table with schema and LanceDB Embedding API
3. Insert Chunks in LanceDB table
4. Query your question(This step will do semantic search and use Llama3 llm for resulting output)

**NOTE:** You can change document and query in document both in `rag.py`, Try to run with your custom document with your custom query questions.
76 changes: 76 additions & 0 deletions tutorials/Local-RAG-from-Scratch/lease.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
EX-10 2 elmonteleaseforfiling.htm MATERIAL CONTRACT
COMMERCIAL LEASE AGREEMENT



THIS LEASE AGREEMENT is made and entered into on December 1, 2013, by and between Temple CB, LLC, whose address is 4350 Temple City Boulevard, El Monte, California 91731 (hereinafter referred to as "Landlord"), and Okra Energy, Inc., whose address is 4350 Temple City Boulevard, El Monte, California 91731 (hereinafter referred to as "Tenant").



ARTICLE I - GRANT OF LEASE



Landlord, in consideration of the rents to be paid and the covenants and agreements to be performed and observed by the Tenant, does hereby lease to the Tenant and the Tenant does hereby lease and take from the Landlord the property described in Exhibit "A" attached hereto and by reference made a part hereof (the "Leased Premises"), together with, as part of the parcel, all improvements located thereon.



ARTICLE II - LEASE TERM



Section l. Term of Lease. The term of this Lease shall begin on the Commencement Date, as defined in Section 2 of this Article II, and shall terminate on May 31, 2020 ("the Termination Date"); provided, however, that at the option of Tenant, Tenant may renew this Lease for five additional successive one- year terms at a Monthly Rent of $100,000 per month, provided that notice of such renewal is given in writing no less than 120 days prior to the Termination Date or the expiration of any one-year renewal term. Tenant may at any time cancel this Lease and terminate all of its obligations hereunder by the payment of $300,000, plus all other amounts then due under this Lease.



Section 2. Commencement Date. The "Commencement Date" shall mean December 1, 2013.



ARTICLE III - EXTENSIONS



The parties hereto may elect to extend this Agreement upon such terms and conditions as may be agreed upon in writing and signed by the parties at the time of any such extension.



ARTICLE IV - DETERMINATION OF RENT



Section 1. Monthly Rent: The Tenant agrees to pay the Landlord and the Landlord agrees to accept, during the term hereof, at such place as the Landlord shall from time to time direct by notice to the Tenant, monthly rent of $40,000.


Section 2. Late Fee. A late fee in the amount of 5% of the Monthly Rent shall be assessed if payment is not postmarked or received by Landlord on or before the tenth day of each month.



ARTICLE V - SECURITY DEPOSIT



The Tenant has deposited with the Landlord the sum of Twenty Thousand Dollars ($20,000.00) as security for the full and faithful performance by the Tenant of all the terms of this lease required to be performed by the Tenant. Such sum shall be returned to the Tenant after the expiration of this lease, provided the Tenant has fully and faithfully carried out all of its terms. In the event of a bona fide sale of the property of which the leased premises are a part, the Landlord shall have the right to transfer the security to the purchaser to be held under the terms of this lease, and the Landlord shall be released from all liability for the return of such security to the Tenant.



ARTICLE VI - TAXES



Section l. Personal Property Taxes. The Tenant shall be liable for all taxes levied against any leasehold interest of the Tenant or personal property and trade fixtures owned or placed by the Tenant in the Leased Premises.



Section 2. Real Estate Taxes. During the continuance of this lease Landlord shall deliver to Tenant a copy of any real estate taxes and assessments against the Leased Property. From and after the Commencement Date, the Tenant shall pay to Landlord not later than twenty-one (21) days after the day on which the same may become initially due, all real estate taxes and assessments applicable to the Leased Premises, together with any interest and penalties lawfully imposed thereon as a result of Tenant's late payment thereof, which shall be levied upon the Leased Premises during the term of this Lease.



Section 3. Contest of Taxes. The Tenant, at its own cost and expense, may, if it shall in good faith so desire, contest by appropriate proceedings the amount of any personal or real property tax. The Tenant may, if it shall so desire, endeavor at any time or times, by appropriate proceedings, to obtain a reduction in the assessed valuation of the Leased Premises for tax purposes. In any such event, if the Landlord agrees, at the request of the Tenant, to join with the Tenant at Tenant's expense in said proceedings and the Landlord agrees to sign and deliver such papers and instruments as may be necessary to prosecute such proceedings, the Tenant shall have the right to contest the amount of any such tax and the Tenant shall have the right to withhold payment of any such tax, if the statute under which the Tenant is contesting such tax so permits.



Section 4. Payment of Ordinary Assessments. The Tenant shall pay all assessments, ordinary and extraordinary, attributable to or against the Leased Premises not later than twenty-one (21) days after the day on which the same became initially due. The Tenant may take the benefit of any law allowing assessments to be paid in installments and in such event the Tenant shall only be liable for such installments of assessments due during the term hereof.



111 changes: 111 additions & 0 deletions tutorials/Local-RAG-from-Scratch/rag.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
import nltk
import pandas as pd

nltk.download("punkt")
import re
import ollama

# lancedb imports for embedding api
import lancedb
from lancedb.embeddings import get_registry
from lancedb.pydantic import LanceModel, Vector


# Recursive Text Splitter
def recursive_text_splitter(text, max_chunk_length=1000, overlap=100):
"""
Helper function for chunking text recursively
"""
# Initialize result
result = []

current_chunk_count = 0
separator = ["\n", " "]
_splits = re.split(f"({separator})", text)
splits = [_splits[i] + _splits[i + 1] for i in range(1, len(_splits), 2)]

for i in range(len(splits)):
if current_chunk_count != 0:
chunk = "".join(
splits[
current_chunk_count
- overlap : current_chunk_count
+ max_chunk_length
]
)
else:
chunk = "".join(splits[0:max_chunk_length])

if len(chunk) > 0:
result.append("".join(chunk))
current_chunk_count += max_chunk_length

return result


# define schema for table with embedding api

model = get_registry().get("colbert").create(name="colbert-ir/colbertv2.0")


class TextModel(LanceModel):
text: str = model.SourceField()
vector: Vector(model.ndims()) = model.VectorField()


# add in vector db
def lanceDBConnection(df):
"""
LanceDB insertion
"""
db = lancedb.connect("/tmp/lancedb")
table = db.create_table(
"scratch",
schema=TextModel,
mode="overwrite",
)
table.add(df)
return table


# Read Document
with open("lease.txt", "r") as file:
text_data = file.read()

# Split the text using the recursive character text splitter
chunks = recursive_text_splitter(text_data, max_chunk_length=100, overlap=10)
df = pd.DataFrame({"text": chunks})
table = lanceDBConnection(df)

# Query Question
k = 5
question = "When this lease document was created?"

# Semantic Search
result = table.search(question).limit(5).to_list()
context = [r["text"] for r in result]

# Context Prompt
base_prompt = """You are an AI assistant. Your task is to understand the user question, and provide an answer using the provided contexts. Every answer you generate should have citations in this pattern "Answer [position].", for example: "Earth is round [1][2].," if it's relevant.
Your answers are correct, high-quality, and written by an domain expert. If the provided context does not contain the answer, simply state, "The provided context does not have the answer."
User question: {}
Contexts:
{}
"""

# llm
prompt = f"{base_prompt.format(question, context)}"

response = ollama.chat(
model="llama3",
messages=[
{
"role": "system",
"content": prompt,
},
],
)

print(response["message"]["content"])
4 changes: 4 additions & 0 deletions tutorials/Local-RAG-from-Scratch/requirments.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
ollama
nltk
pandas
lancedb

0 comments on commit e4a8ec2

Please sign in to comment.