Atlas search lookups #325

WaVEV · 2025-06-24T14:29:51Z

This PR adds the initial implementation of the Atlas operator.

Task:

django_mongodb_backend/functions.py

WaVEV · 2025-07-07T03:20:07Z

django_mongodb_backend/compiler.py

@@ -207,9 +243,36 @@ def _build_aggregation_pipeline(self, ids, group):
                pipeline.append({"$unset": "_id"})
        return pipeline

+    def _compound_searches_queries(self, search_replacements):


I want to preserve this function for the future, probably want to make hybrid search and this part of the code could be useful. I know that it is weird, check the replacement len as 1 and then iterate over it. Also the exception could be raised before this point. Let me know if you want me to refactor this code.

timgraham · 2025-07-22T01:35:14Z

tests/queries_/test_search.py

+    def _tear_down(self, model):
+        collection = self._get_collection(model)
+        for search_indexes in collection.list_search_indexes():
+            collection.drop_search_index(search_indexes["name"])
+        collection.delete_many({})


Could you add a comment explaining why this is necessary?

Between test the data persist, is this the way to get rid of it? or I am missing something? in the same test class

I think I need because TransactionTestCase. it does not wrap each test in a transaction that gets rolled back. But not 100% sure.

TransactionTestCase, and TestCase when transactions aren't supported, use flush to clear the database between tests. flush uses delete_many(), so yes, it's necessary to clean up the indexes but not the collection. I think create_search_index could add the cleanup collection.drop_search_index(search_indexes["name"]) (or something similar), so that the list_search_indexes() isn't needed.

I will try to fix it. But If I remove this line, some test fails because the data from the previous test is still in the collection.

The data was not cleaned because I didn't defined any available_apps. If I define it, I need to create the data in the setUp. It increases the test runtime.

timgraham · 2025-07-22T01:41:13Z

tests/queries_/test_search.py

+        self.create_search_index(
+            Article,
+            "equals_headline_index",
+            {
+                "mappings": {
+                    "dynamic": False,
+                    "fields": {"headline": {"type": "token"}, "number": {"type": "number"}},
+                }
+            },
+        )


Could we do the index creation/teardown in setupClass? (I would guess indexes aren't modified by any tests?)

tests/queries_/test_search.py

timgraham · 2025-07-22T01:50:51Z

tests/queries_/test_search.py

+    def test_constant_score(self):
+        constant_score = SearchScoreOption({"constant": {"value": 10}})
+        qs = Article.objects.annotate(score=SearchExists(path="body", score=constant_score))
+        self.wait_for_assertion(lambda: self.assertCountEqual(qs.all(), [self.article]))


While I like that wait_for_assertion is a relatively generic API, it really seems like a lot of boilerplate with lambda, all(), ... We may want to think about possibly providing some public test class mixin with assertion helpers for users (which we could also use in this file).

I tried to do something like you mention and I didn't find a solution, but I will try again.

Well I tried some delayed assert, it is not perfect but usable.

More generally, what's the reason the query needs to be fetched this way? Executing the same query a few times in a row doesn't return the correct results until some time?

🤔 . At some point, Atlas will have synchronized the new data. Then, the query will retrieve it, so we need to wait until the new objects are available.

Is there any MongoDB documentation about this? I don't see any mention of have to retry in the example at https://www.mongodb.com/docs/atlas/atlas-search/tutorial/. It seems unbelievable from a usability perspective. How are querysets going to be used outside of tests? Do we need to document a special pattern? There is no distinction between "no results" and "query hasn't synced yet"?

🤔 I summon @Jibola to avoid saying something that is not true. What I tried to tell is when a new index is created or data added there is a little time between it get indexed. If I do a query immediately after a new index, it will retrieve nothing, but If I wait a second the value will be pulled correctly. So, this delay that indexes needs, I don't know if it is documented but I got the idea from langchain

Maybe only the index creation needs time, but I don't know. 😬.
For Docarray the same was done:
https://github.com/docarray/docarray/blob/main/tests/index/mongo_atlas/__init__.py#L32

Whew, that makes a lot more sense than the previous theory! Depending on how long the waiting could take, we may want to consider having SchemaEditor.add_index() do the waiting, since Django migrations assume all operations run synchronously, since a data migration that follows a schema migration assumes that the previous operations have completed. (If not, it would be a caveat to document.) If we do have schema editor wait, you could use it to create the indexes in tests. If not, I guess waiting after test index creation is the way go.

well, I found the docs. it says: This means that data inserted into a MongoDB collection and indexed by Atlas Search will not be available immediately for $search queries.

Whew, that makes a lot more sense than the previous theory! Depending on how long the waiting could take, we may want to consider having SchemaEditor.add_index() do the waiting, since Django migrations assume all operations run synchronously, since a data migration that follows a schema migration assumes that the previous operations have completed. (If not, it would be a caveat to document.) If we do have schema editor wait, you could use it to create the indexes in tests. If not, I guess waiting after test index creation is the way go.

Totally get the confusion here! It bamboozled me too the first time I ran into the problem.

I would say that having the SchemaEditor wait is not a bad idea! In practice, I don't see many scenarios (please inform me if otherwise!) where someone makes a migration and within 5 seconds begins iterating -- outside of tests -- but I would want it "flaggable" if at all possible.

timgraham reviewed Jun 25, 2025

View reviewed changes

django_mongodb_backend/functions.py Outdated Show resolved Hide resolved

timgraham reviewed Jun 25, 2025

View reviewed changes

django_mongodb_backend/functions.py Outdated Show resolved Hide resolved

WaVEV force-pushed the atlas-search-lookups branch from 449b6a3 to ca8a7cf Compare June 26, 2025 02:56

WaVEV commented Jul 7, 2025

View reviewed changes

WaVEV force-pushed the atlas-search-lookups branch 3 times, most recently from 9935b25 to a467a57 Compare July 12, 2025 23:32

WaVEV changed the title ~~[WIP] Atlas search lookups~~ Atlas search lookups Jul 14, 2025

WaVEV force-pushed the atlas-search-lookups branch 4 times, most recently from ea2118b to 206b554 Compare July 21, 2025 19:29

timgraham reviewed Jul 22, 2025

View reviewed changes

WaVEV force-pushed the atlas-search-lookups branch 4 times, most recently from 456028d to 65f22e6 Compare July 22, 2025 05:16

WaVEV marked this pull request as ready for review July 24, 2025 19:39

timgraham and others added 12 commits July 25, 2025 23:40

Create django_mongodb_backend.expressions package

ca691e9

Adapt query and compiler for operator support.

0756e6f

Add Search operators.

b2b98f6

Make operators combinable and add compound expressions.

754f429

Add vector search operator.

4ec87ea

Add search lookup.

2d3c8d3

Add test search

8e2403a

Add combinable test

56034a4

Test clean up.

a9c09df

Add dalayed assertion methods in unit test.

f0ab51e

Support operator as string

38663fc

Update docstring

fb1db85

WaVEV added 3 commits July 25, 2025 23:40

Add docs

aee4c1b

Edit docs.

d65da2a

Update docs.

e7f4d22

WaVEV force-pushed the atlas-search-lookups branch from eb6eb07 to e7f4d22 Compare July 26, 2025 02:40

WaVEV added 3 commits July 27, 2025 14:32

add available_apps to unit test

d32b7de

Simplify clean up call

7cd43d6

Add change log

0fdb066

Atlas search lookups #325

Are you sure you want to change the base?

Atlas search lookups #325

Uh oh!

Conversation

WaVEV commented Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WaVEV Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WaVEV Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

WaVEV commented Jun 24, 2025 •

edited

Loading

WaVEV Jul 22, 2025 •

edited

Loading

WaVEV Jul 23, 2025 •

edited

Loading