Skip to content

(DOCSP-50370): Create new LangChain self-query retrieval notebook #21

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

davidhou17
Copy link
Collaborator

@davidhou17 davidhou17 commented May 28, 2025

Copy link
Collaborator

@dacharyc dacharyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything works, but I've got a couple of nits with re-declaring stuff we've already declared, and some of the filter results. Non-blocking comments below!

"from langchain_core.runnables import RunnablePassthrough\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"llm = ChatOpenAI(model=\"gpt-4o\")\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: in the context of this notebook, we're re-declaring an llm we already declared up above in ln 233. I'd probably omit this line, and omit the related import from langchain_openai import ChatOpenAI in ln 343 above.

I also don't love re-declaring the retriever with one additional param. It would be great if we could set enable_limit when we initially declare the retriever in ln 234, and then remove the re-initializing here.

It makes sense to have these things on a docs page if we want this to be a stand-alone code example, but here in the context of the notebook, it's not needed.

"id": "833d90d9",
"metadata": {},
"source": [
"### Queries with filters"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got some query results that seem unrelated to the filter. i.e. for "toys", I got this document:

Document(id='685eaec1edc703d86a4c7201', metadata={'_id': '685eaec1edc703d86a4c7201', 'year': 1979, 'rating': 9.9, 'genre': 'science fiction'}, page_content='Three men walk into the Zone, three men walk out of the Zone')

For thriller and action, I got this document:

Document(id='685eaec1edc703d86a4c7203', metadata={'_id': '685eaec1edc703d86a4c7203', 'year': 1995, 'genre': 'animated', 'rating': 9.3}, page_content='Toys come alive and have a blast doing so')

I'm sure this is related to the limited amount of sample data we're providing, but it doesn't show the feature great to have these seemingly unrelated results being returned. I wonder if we want to add more sample data to show only obviously related results being retrieved?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants