Add querying support to EmbeddedModelArrayfield #9

WaVEV · 2025-05-24T19:12:05Z

In order to simplify the diff of mongodb#303, this PR is open but it will be merged after Tim's patch.

timgraham

Some initial comments.

docs/source/releases/5.2.x.rst

tests/model_fields_/test_embedded_model.py

docs/source/ref/models/fields.rst

tests/model_fields_/models.py

timgraham · 2025-05-26T17:39:41Z

docs/source/ref/models/fields.rst

+
+Note that this does **not** require the whole array to match, only that at least one embedded document matches exactly.
+
+Keytransforms


"Key transform" was borrowed from JSONField. I would describe this more like "Embedded field lookup"

timgraham · 2025-05-26T17:41:47Z

docs/source/ref/models/fields.rst

+    ... )
+    >>> Post.objects.create(name="Second post", tags=[Tag(label="tutorial")])
+
+    >>> Post.objects.filter(tags__label__exact="tutorial")


Isn't "exact" and "key transform" the same?

Post.objects.filter(tags__label__exact="tutorial") Post.objects.filter(tags__label="tutorial") [normally an implicit __exact]

It's confusing that there are two sections for that.

Yes, I will rewrite this part.
What I wanted to explain is that the EMFA key transform enables us to perform queries. It does not execute any queries itself, but it is necessary. I think I could clarify that it provides the path from the column to the EMFA, and from the inner EMF to the final field, in two separate results. This way, we can join these two parts according to the requirements of each lookup.
There are some sentences that shouldn't be here, like The transform checks if any element in the array has a field matching the condition

docs/source/ref/models/fields.rst

timgraham · 2025-05-26T17:58:33Z

docs/source/ref/models/fields.rst

+    >>> Post.objects.filter(tags__label__overlap=["thoughts"])
+    <QuerySet [<Post: First post>, <Post: Second post>]>
+
+    >>> Post.objects.filter(tags__label__overlap=["tutorial", "thoughts"])


This looks kind of like "in" to me, when querying related fields normally. Check the error message:

====================================================================== ERROR: test_overlap_simplefield (model_fields_.test_embedded_model_array.QueryingTests.test_overlap_simplefield) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/tim/code/django-mongodb/tests/model_fields_/test_embedded_model_array.py", line 212, in test_overlap_simplefield MuseumExhibit.objects.filter(sections__section_number__in=10), [] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/tim/code/django/django/db/models/manager.py", line 87, in manager_method return getattr(self.get_queryset(), name)(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/tim/code/django/django/db/models/query.py", line 1491, in filter return self._filter_or_exclude(False, args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/tim/code/django/django/db/models/query.py", line 1509, in _filter_or_exclude clone._filter_or_exclude_inplace(negate, args, kwargs) File "/home/tim/code/django/django/db/models/query.py", line 1516, in _filter_or_exclude_inplace self._query.add_q(Q(*args, **kwargs)) File "/home/tim/code/django/django/db/models/sql/query.py", line 1643, in add_q clause, _ = self._add_q(q_object, can_reuse) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/tim/code/django/django/db/models/sql/query.py", line 1675, in _add_q child_clause, needed_inner = self.build_filter( ^^^^^^^^^^^^^^^^^^ File "/home/tim/code/django/django/db/models/sql/query.py", line 1585, in build_filter condition = self.build_lookup(lookups, col, value) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/tim/code/django/django/db/models/sql/query.py", line 1406, in build_lookup lhs = self.try_transform(lhs, lookup_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/tim/code/django/django/db/models/sql/query.py", line 1438, in try_transform transform_class = lhs.get_transform(name) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/tim/code/django-mongodb/django_mongodb_backend/fields/embedded_model_array.py", line 172, in get_transform raise FieldDoesNotExist( django.core.exceptions.FieldDoesNotExist: Unsupported lookup 'in' for EmbeddedModelArrayField of 'IntegerField', perhaps you meant in?

First I have to fix the message because is wrong. Maybe I messed up with the types, have to recheck that.
It is an overlap, because when querying inside an EMFA, we have two ways of indexing.

The easy one it by number, in this case transform from arrayfield is borrowed, i.e. somefield_0_other_field. In that case the output_type or output_field is a value.

The other way to index is for a field of the embedded documents. In that case the lhs's type must be array of any (depends on the destination, if the destination is an integer it has to match a array of integer, maybe we could generate an arrayField of integer)
In the example that you provide sections__section_number__in sections is an array, and the test is jumping to the inner objects, in that case it should type as array. It is different from section__3__section_number, in that case it will type as integer.

Under the hood the overlap is calculated as multiple in operator. That is why it sounds similar

Thinking again with this example and re-read the documentation, I have misunderstood it. This operation belongs to in.

WaVEV · 2025-05-26T22:08:33Z

I opened this as a review too early. I figure out that I can support multiples lookups without much effort. But I will appreciate your review 😄

timgraham · 2025-05-28T14:50:22Z

django_mongodb_backend/fields/array.py

-        return {
-            "$and": [{"$ne": [lhs_mql, None]}, {"$size": {"$setIntersection": [value, lhs_mql]}}]
-        }
+        return {"$and": [{"$isArray": lhs_mql}, {"$size": {"$setIntersection": [value, lhs_mql]}}]}


There isn't a test failure with this change reverted. Should it be part of the subquery patch instead?

🤔, I face a failure with the $ne code. It was when some missing type values appeared. Maybe they were in the EMF comparisons and perhaps it will appear in the subquery. I will try to find a test that check that otherwise will revert this change

I couldn't find any test. So I will revert this

For the other isArray the len test is testing it. So the other isArray will be kept.

docs/source/releases/5.2.x.rst

timgraham · 2025-05-28T15:19:58Z

tests/model_fields_/test_embedded_model_array.py

+    def test_all_filter(self):
+        self.assertCountEqual(
+            MuseumExhibit.objects.filter(sections__section_number__all=[1, 2]), [self.wonders]
+        )


I'm thinking about the overall design and whether or not we should try to make the querying API look like a ManyToManyField or whether it's more appropriate to make it look more like arrayfield.

In the relational world, this would be written as MuseumExhibit.objects.filter(Q(sections__section_number=1) & Q(sections__section_number=2)). It combines two $anyElementTrue clauses with $and and gives the same result, but I guess it doesn't perform as well.

🤔 yes, it behaves like an array or a relation, maybe OneToMany it has a kind of duality. Also most of the queries are existence quantifier. Except this one 😬; other unusual behaviour here is: all only support equal lookup meanwhile the existence supports everyone.

Right, I'm doubtful about the design of all and contained_by. In ArrayField, the lookup happens on the ArrayField, e.g. Post.objects.filter(tags__contained_by=["thoughts", "django"]).

Now EmbeddedModelArrayField stores an array of dictionaries and you're trying to hook these lookups onto the embedded field. I think the "duality" as you called it, isn't entirely intuitive.

I think we should save the work for these lookups in case it's demanded later, but I suggest to omit it for now.

Incidentally, can you confirm my suspicion about the performance of the Q() alternative that I proposed?

The Q works equally well. The generated plan isn't better or worse than the current all implementation.

On the other hand, all is equivalent to contains.

I'd think contains would do pattern matching (e.g. WHERE headline LIKE '%Lennon%';). That's another example of why I think trying to overload embeddedarrayfield__embededfield___lookup to do an "array field-like operation" rather than an existence test isn't a clear API design.

timgraham · 2025-05-28T15:23:15Z

django_mongodb_backend/fields/embedded_model_array.py

+@EmbeddedModelArrayField.register_lookup
+class EMFArrayIn(EMFArrayBuildinLookup, lookups.In):
+    pass


The implementation is a bit confusing. The lookup is registered on the field, but what it actually applies to is the embedded model fields.

I don't understand the question. Let me explain why I need to add this to make it work.
The lookups are inherited, so they're tried to be solved as the father does. But the builtin lookup defined in lookups.py don’t know how to handle this kind of transform. So I need to add this to use the EmbeddedModelArrayFieldBuiltinLookup

My comment isn't really actionable. I just wanted to flag it and perhaps I'll add a comment about it.

django_mongodb_backend/fields/embedded_model_array.py

WaVEV · 2025-05-31T23:22:38Z

django_mongodb_backend/fields/embedded_model_array.py

+        lhs_mql = process_lhs(self, compiler, connection)
+        inner_lhs_mql = lhs_mql["$ifNull"][0]["$map"]["in"]
+        values = process_rhs(self, compiler, connection)
+        lhs_mql["$ifNull"][0]["$map"]["in"] = connection.mongo_operators[self.lookup_name](


Now, I understand why Django returns template and parameters in as_sql results. 😬. I will think in another way to handle this.

(Not what you encountered, but primarily it's to prevent SQL injection.)

Match the order in embedded_model_array.py

WaVEV · 2025-06-03T00:10:43Z

the discussion continues in mongodb#303

WaVEV force-pushed the arrayfield+emf-querying branch 2 times, most recently from 8404d98 to 703502d Compare May 25, 2025 00:59

This comment was marked as resolved.

Sign in to view

WaVEV force-pushed the arrayfield+emf-querying branch 2 times, most recently from 329ab7a to 7c45d6a Compare May 25, 2025 03:48

WaVEV marked this pull request as ready for review May 25, 2025 14:22

WaVEV requested a review from timgraham May 25, 2025 14:22

WaVEV force-pushed the arrayfield+emf-querying branch 2 times, most recently from 41f1843 to ca7a7b3 Compare May 26, 2025 03:33

timgraham reviewed May 26, 2025

View reviewed changes

timgraham changed the title ~~Arrayfield+emf querying~~ Add querying support to EmbeddedModelArrayfield May 27, 2025

timgraham mentioned this pull request May 28, 2025

INTPYTHON-483 Add querying support for EmbeddedModelArrayField mongodb/django-mongodb-backend#303

Merged

timgraham reviewed May 28, 2025

View reviewed changes

WaVEV commented May 31, 2025

View reviewed changes

WaVEV force-pushed the arrayfield+emf-querying branch 3 times, most recently from 744a9ef to f6625d9 Compare June 2, 2025 01:30

timgraham added 2 commits June 2, 2025 14:01

Reorder classes in forms/fields/embedded_model.py

dbeb2de

Match the order in embedded_model_array.py

Document that embedded models don't have a primary key value

525d960

timgraham force-pushed the arrayfield+emf branch from d93ee38 to f528213 Compare June 2, 2025 18:54

Polished embedded model example

8c501aa

timgraham force-pushed the arrayfield+emf branch from f528213 to eefcb2f Compare June 2, 2025 19:58

INTPYTHON-483 Add EmbeddedModelArrayField

fe24e7a

WaVEV force-pushed the arrayfield+emf-querying branch 2 times, most recently from b45dc6f to b4e627e Compare June 2, 2025 23:30

WaVEV added 2 commits June 2, 2025 21:03

EmbeddedModelArrayField Querying

9e36afe

Add docs

e40bea8

WaVEV force-pushed the arrayfield+emf-querying branch from b4e627e to e40bea8 Compare June 3, 2025 00:05

WaVEV closed this Jun 3, 2025


		Note that this does not require the whole array to match, only that at least one embedded document matches exactly.

		Keytransforms

Add querying support to EmbeddedModelArrayfield #9

Add querying support to EmbeddedModelArrayfield #9

Uh oh!

Conversation

WaVEV commented May 24, 2025 • edited by timgraham Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as resolved.

timgraham left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WaVEV May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WaVEV May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WaVEV commented May 26, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

WaVEV May 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WaVEV commented Jun 3, 2025

Uh oh!

Uh oh!

WaVEV commented May 24, 2025 •

edited by timgraham

Loading

WaVEV May 27, 2025 •

edited

Loading

WaVEV May 26, 2025 •

edited

Loading

WaVEV May 31, 2025 •

edited

Loading