Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PgVector embedder parameter is not accepting anything other than it's default OpenAI Embeddings. Help needed! #1746

Open
Cipher-unhsiV opened this issue Jan 10, 2025 · 1 comment

Comments

@Cipher-unhsiV
Copy link

@manthanguptaa as per your instruction from the issue #1736 I tried several ways of using an opensource embedder but nothing actually worked. I have tried the following embeddings that are available in phidata docs:

  1. MistralAI
  2. Together
  3. Huggingface
  4. SentenceTransformers

I was going through multiple errors like sqlachemy dimensionality is not matching, httpx readtimeout, pydantic.core validation error, incompatible numpy version and a lot other errors just to mention some. It's just a simple agentic rag that should read a pdf through url via PDFUrlKnowledgeBase, store them in PgVector2 and answer a predefined user query by accessing the knowledge_base but getting really hectic and involving. Please do help me in this regard! I'll get you the snippet to better understand the scenario:

import typer
from phi.agent import Agent, RunResponse
from typing import Optional,List
from phi.assistant import Assistant
from phi.model.deepseek import DeepSeekChat
from phi.model.groq import Groq
from phi.storage.assistant.postgres import PgAssistantStorage
from phi.knowledge.pdf import PDFUrlKnowledgeBase
from phi.vectordb.pgvector import PgVector2
from phi.embedder.mistral import MistralEmbedder
from phi.embedder.huggingface import HuggingfaceCustomEmbedder
from phi.embedder.together import TogetherEmbedder
from phi.embedder.sentence_transformer import SentenceTransformerEmbedder

import os
from dotenv import load_dotenv
load_dotenv()
os.environ["GROQ_API_KEY"]=os.getenv("GROQ_API_KEY")

db_url = "postgresql+psycopg://ai:ai@localhost:5532/ai"

knowledge_base=PDFUrlKnowledgeBase(
    urls=['https://phi-public.s3.amazonaws.com/recipes/ThaiRecipes.pdf'],
    vector_db=PgVector2(
        collection="recipies",
        db_url=db_url, 
        embedder=SentenceTransformerEmbedder(dimensions=1536),  # issue here
        )
)

knowledge_base.load(recreate=True, upsert=True)
#knowledge_base.load()

storage=PgAssistantStorage(table_name="pdf-assistant",db_url=db_url)

agent = Agent(
    model=Groq(id="llama-3.3-70b-versatile"),
    #model = SentenceTransformer('all-mpnet-base-v2', truncate_dim=384),
    knowledge=knowledge_base,
    storage=storage,
)

response: RunResponse = agent.run("What is the recipe for chicken curry?")
res = response.content

@manthanguptaa KINDLY DON'T CLOSE THIS ISSUE UNTIL I ACKNOWLEDGE ABOUT THE STATUS OF IMPROVEMENT IN LOCAL

@dirkbrnd
Copy link
Contributor

Hi @Cipher-unhsiV
I suggest using PgAgentStorage instead of PgAssistantStorage (it is deprecated). Also use PgVector instead of PgVector2 (also deprecated).

I'll let @manthanguptaa test after that if thats ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants