diff --git a/README.md b/README.md index 9ec5d28d..d102f852 100644 --- a/README.md +++ b/README.md @@ -94,7 +94,8 @@ Looking to get started with LLMs, vectorDBs, and the world of Generative AI? The | --------- | -------------------------- | ----------- | | | | | | [Build RAG from Scratch](./tutorials/RAG-from-Scratch) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/tutorials/RAG-from-Scratch/RAG_from_Scratch.ipynb) [![LLM](https://img.shields.io/badge/openai-api-white)](#) | | -| [Langchain LlamaIndex Chunking](./tutorials/RAG-from-Scratch) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/tutorials/RAG-from-Scratch/RAG_from_Scratch.ipynb)| [![Ghost](https://img.shields.io/badge/ghost-000?style=for-the-badge&logo=ghost&logoColor=%23F7DF1E)](https://blog.lancedb.com/product-quantization-compress-high-dimensional-vectors-dfcba98fab47) | +| [Local RAG from Scratch with Llama3](./tutorials/Local-RAG-from-Scratch) | [![Python](https://img.shields.io/badge/python-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54)](./tutorials/Local-RAG-from-Scratch/rag.py) [![local LLM](https://img.shields.io/badge/local-llm-green)](#) | | +| [Langchain LlamaIndex Chunking](./tutorials/Langchain-LlamaIndex-Chunking) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/tutorials/Langchain-LlamaIndex-Chunking/Langchain_Llamaindex_chunking.ipynb)| [![Ghost](https://img.shields.io/badge/ghost-000?style=for-the-badge&logo=ghost&logoColor=%23F7DF1E)](https://blog.lancedb.com/chunking-techniques-with-langchain-and-llamaindex/) | | [Product Quantization: Compress High Dimensional Vectors](https://blog.lancedb.com/product-quantization-compress-high-dimensional-vectors-dfcba98fab47) | | [![Ghost](https://img.shields.io/badge/ghost-000?style=for-the-badge&logo=ghost&logoColor=%23F7DF1E)](https://blog.lancedb.com/product-quantization-compress-high-dimensional-vectors-dfcba98fab47) | | [Corrective RAG with Langgraph](./tutorials/Corrective-RAG-with_Langgraph/) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/tutorials/Corrective-RAG-with_Langgraph/CRAG_with_Langgraph.ipynb) [![LLM](https://img.shields.io/badge/openai-api-white)](#) | [![Ghost](https://img.shields.io/badge/ghost-000?style=for-the-badge&logo=ghost&logoColor=%23F7DF1E)](https://blog.lancedb.com/implementing-corrective-rag-in-the-easiest-way-2/)| | [LLMs, RAG, & the missing storage layer for AI](https://blog.lancedb.com/llms-rag-the-missing-storage-layer-for-ai-28ded35fa984) | | [![Ghost](https://img.shields.io/badge/ghost-000?style=for-the-badge&logo=ghost&logoColor=%23F7DF1E)](https://blog.lancedb.com/llms-rag-the-missing-storage-layer-for-ai-28ded35fa984/) | diff --git a/assets/RAG-locally.png b/assets/RAG-locally.png new file mode 100644 index 00000000..468a42e7 Binary files /dev/null and b/assets/RAG-locally.png differ diff --git a/tutorials/Langchain-LlamaIndex-Chunking/README.md b/tutorials/Langchain-LlamaIndex-Chunking/README.md index 3de1b089..42549094 100644 --- a/tutorials/Langchain-LlamaIndex-Chunking/README.md +++ b/tutorials/Langchain-LlamaIndex-Chunking/README.md @@ -4,4 +4,4 @@ We have comprehensively covered all the chunking techniques available in Langchain and LlamaIndex. -[Read More in Blog](https://blog.lancedb.com/chunking-techiniques-with-langchain-and-llamaindex/) \ No newline at end of file +[Read More in Blog](https://blog.lancedb.com/chunking-techniques-with-langchain-and-llamaindex/) \ No newline at end of file diff --git a/tutorials/Local-RAG-from-Scratch/README.md b/tutorials/Local-RAG-from-Scratch/README.md new file mode 100644 index 00000000..6ca9d41a --- /dev/null +++ b/tutorials/Local-RAG-from-Scratch/README.md @@ -0,0 +1,14 @@ +## Locally RAG from Scratch with Llama3 + +This example demonstrates RAG built from scratch without using any supporting framework like Langchain and LlamaIndex. + +![alt text](<../../assets/RAG-locally.png>) + +This easy to build RAG locally can be done in following steps: + +1. Reading Document and Recursive Text Splitting +2. Setup LanceDB table with schema and LanceDB Embedding API +3. Insert Chunks in LanceDB table +4. Query your question(This step will do semantic search and use Llama3 llm for resulting output) + +**NOTE:** You can change document and query in document both in `rag.py`, Try to run with your custom document with your custom query questions. \ No newline at end of file diff --git a/tutorials/Local-RAG-from-Scratch/lease.txt b/tutorials/Local-RAG-from-Scratch/lease.txt new file mode 100644 index 00000000..3c61558e --- /dev/null +++ b/tutorials/Local-RAG-from-Scratch/lease.txt @@ -0,0 +1,76 @@ +EX-10 2 elmonteleaseforfiling.htm MATERIAL CONTRACT +COMMERCIAL LEASE AGREEMENT + + + +THIS LEASE AGREEMENT is made and entered into on December 1, 2013, by and between Temple CB, LLC, whose address is 4350 Temple City Boulevard, El Monte, California 91731 (hereinafter referred to as "Landlord"), and Okra Energy, Inc., whose address is 4350 Temple City Boulevard, El Monte, California 91731 (hereinafter referred to as "Tenant"). + + + +ARTICLE I - GRANT OF LEASE + + + +Landlord, in consideration of the rents to be paid and the covenants and agreements to be performed and observed by the Tenant, does hereby lease to the Tenant and the Tenant does hereby lease and take from the Landlord the property described in Exhibit "A" attached hereto and by reference made a part hereof (the "Leased Premises"), together with, as part of the parcel, all improvements located thereon. + + + +ARTICLE II - LEASE TERM + + + +Section l. Term of Lease. The term of this Lease shall begin on the Commencement Date, as defined in Section 2 of this Article II, and shall terminate on May 31, 2020 ("the Termination Date"); provided, however, that at the option of Tenant, Tenant may renew this Lease for five additional successive one- year terms at a Monthly Rent of $100,000 per month, provided that notice of such renewal is given in writing no less than 120 days prior to the Termination Date or the expiration of any one-year renewal term. Tenant may at any time cancel this Lease and terminate all of its obligations hereunder by the payment of $300,000, plus all other amounts then due under this Lease. + + + +Section 2. Commencement Date. The "Commencement Date" shall mean December 1, 2013. + + + +ARTICLE III - EXTENSIONS + + + +The parties hereto may elect to extend this Agreement upon such terms and conditions as may be agreed upon in writing and signed by the parties at the time of any such extension. + + + +ARTICLE IV - DETERMINATION OF RENT + + + +Section 1. Monthly Rent: The Tenant agrees to pay the Landlord and the Landlord agrees to accept, during the term hereof, at such place as the Landlord shall from time to time direct by notice to the Tenant, monthly rent of $40,000. + + +Section 2. Late Fee. A late fee in the amount of 5% of the Monthly Rent shall be assessed if payment is not postmarked or received by Landlord on or before the tenth day of each month. + + + +ARTICLE V - SECURITY DEPOSIT + + + +The Tenant has deposited with the Landlord the sum of Twenty Thousand Dollars ($20,000.00) as security for the full and faithful performance by the Tenant of all the terms of this lease required to be performed by the Tenant. Such sum shall be returned to the Tenant after the expiration of this lease, provided the Tenant has fully and faithfully carried out all of its terms. In the event of a bona fide sale of the property of which the leased premises are a part, the Landlord shall have the right to transfer the security to the purchaser to be held under the terms of this lease, and the Landlord shall be released from all liability for the return of such security to the Tenant. + + + +ARTICLE VI - TAXES + + + +Section l. Personal Property Taxes. The Tenant shall be liable for all taxes levied against any leasehold interest of the Tenant or personal property and trade fixtures owned or placed by the Tenant in the Leased Premises. + + + +Section 2. Real Estate Taxes. During the continuance of this lease Landlord shall deliver to Tenant a copy of any real estate taxes and assessments against the Leased Property. From and after the Commencement Date, the Tenant shall pay to Landlord not later than twenty-one (21) days after the day on which the same may become initially due, all real estate taxes and assessments applicable to the Leased Premises, together with any interest and penalties lawfully imposed thereon as a result of Tenant's late payment thereof, which shall be levied upon the Leased Premises during the term of this Lease. + + + +Section 3. Contest of Taxes. The Tenant, at its own cost and expense, may, if it shall in good faith so desire, contest by appropriate proceedings the amount of any personal or real property tax. The Tenant may, if it shall so desire, endeavor at any time or times, by appropriate proceedings, to obtain a reduction in the assessed valuation of the Leased Premises for tax purposes. In any such event, if the Landlord agrees, at the request of the Tenant, to join with the Tenant at Tenant's expense in said proceedings and the Landlord agrees to sign and deliver such papers and instruments as may be necessary to prosecute such proceedings, the Tenant shall have the right to contest the amount of any such tax and the Tenant shall have the right to withhold payment of any such tax, if the statute under which the Tenant is contesting such tax so permits. + + + +Section 4. Payment of Ordinary Assessments. The Tenant shall pay all assessments, ordinary and extraordinary, attributable to or against the Leased Premises not later than twenty-one (21) days after the day on which the same became initially due. The Tenant may take the benefit of any law allowing assessments to be paid in installments and in such event the Tenant shall only be liable for such installments of assessments due during the term hereof. + + + diff --git a/tutorials/Local-RAG-from-Scratch/rag.py b/tutorials/Local-RAG-from-Scratch/rag.py new file mode 100644 index 00000000..c55df8e9 --- /dev/null +++ b/tutorials/Local-RAG-from-Scratch/rag.py @@ -0,0 +1,111 @@ +import nltk +import pandas as pd + +nltk.download("punkt") +import re +import ollama + +# lancedb imports for embedding api +import lancedb +from lancedb.embeddings import get_registry +from lancedb.pydantic import LanceModel, Vector + + +# Recursive Text Splitter +def recursive_text_splitter(text, max_chunk_length=1000, overlap=100): + """ + Helper function for chunking text recursively + """ + # Initialize result + result = [] + + current_chunk_count = 0 + separator = ["\n", " "] + _splits = re.split(f"({separator})", text) + splits = [_splits[i] + _splits[i + 1] for i in range(1, len(_splits), 2)] + + for i in range(len(splits)): + if current_chunk_count != 0: + chunk = "".join( + splits[ + current_chunk_count + - overlap : current_chunk_count + + max_chunk_length + ] + ) + else: + chunk = "".join(splits[0:max_chunk_length]) + + if len(chunk) > 0: + result.append("".join(chunk)) + current_chunk_count += max_chunk_length + + return result + + +# define schema for table with embedding api + +model = get_registry().get("colbert").create(name="colbert-ir/colbertv2.0") + + +class TextModel(LanceModel): + text: str = model.SourceField() + vector: Vector(model.ndims()) = model.VectorField() + + +# add in vector db +def lanceDBConnection(df): + """ + LanceDB insertion + """ + db = lancedb.connect("/tmp/lancedb") + table = db.create_table( + "scratch", + schema=TextModel, + mode="overwrite", + ) + table.add(df) + return table + + +# Read Document +with open("lease.txt", "r") as file: + text_data = file.read() + +# Split the text using the recursive character text splitter +chunks = recursive_text_splitter(text_data, max_chunk_length=100, overlap=10) +df = pd.DataFrame({"text": chunks}) +table = lanceDBConnection(df) + +# Query Question +k = 5 +question = "When this lease document was created?" + +# Semantic Search +result = table.search(question).limit(5).to_list() +context = [r["text"] for r in result] + +# Context Prompt +base_prompt = """You are an AI assistant. Your task is to understand the user question, and provide an answer using the provided contexts. Every answer you generate should have citations in this pattern "Answer [position].", for example: "Earth is round [1][2].," if it's relevant. +Your answers are correct, high-quality, and written by an domain expert. If the provided context does not contain the answer, simply state, "The provided context does not have the answer." + +User question: {} + +Contexts: +{} +""" + +# llm +prompt = f"{base_prompt.format(question, context)}" + +response = ollama.chat( + model="llama3", + messages=[ + { + "role": "system", + "content": prompt, + }, + ], +) + +print(response["message"]["content"]) diff --git a/tutorials/Local-RAG-from-Scratch/requirments.txt b/tutorials/Local-RAG-from-Scratch/requirments.txt new file mode 100644 index 00000000..eaf8317a --- /dev/null +++ b/tutorials/Local-RAG-from-Scratch/requirments.txt @@ -0,0 +1,4 @@ +ollama +nltk +pandas +lancedb \ No newline at end of file