Skip to content

Add querying support to EmbeddedModelArrayfield #9

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

WaVEV
Copy link
Collaborator

@WaVEV WaVEV commented May 24, 2025

In order to simplify the diff of mongodb#303, this PR is open but it will be merged after Tim's patch.

@WaVEV WaVEV force-pushed the arrayfield+emf-querying branch 2 times, most recently from 8404d98 to 703502d Compare May 25, 2025 00:59
@timgraham

This comment was marked as resolved.

@WaVEV WaVEV force-pushed the arrayfield+emf-querying branch 2 times, most recently from 329ab7a to 7c45d6a Compare May 25, 2025 03:48
@WaVEV WaVEV marked this pull request as ready for review May 25, 2025 14:22
@WaVEV WaVEV requested a review from timgraham May 25, 2025 14:22
@WaVEV WaVEV force-pushed the arrayfield+emf-querying branch 2 times, most recently from 41f1843 to ca7a7b3 Compare May 26, 2025 03:33
Copy link
Owner

@timgraham timgraham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments.


Note that this does **not** require the whole array to match, only that at least one embedded document matches exactly.

Keytransforms
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Key transform" was borrowed from JSONField. I would describe this more like "Embedded field lookup"

... )
>>> Post.objects.create(name="Second post", tags=[Tag(label="tutorial")])

>>> Post.objects.filter(tags__label__exact="tutorial")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't "exact" and "key transform" the same?

Post.objects.filter(tags__label__exact="tutorial")
Post.objects.filter(tags__label="tutorial")  [normally an implicit __exact]

It's confusing that there are two sections for that.

Copy link
Collaborator Author

@WaVEV WaVEV May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will rewrite this part.
What I wanted to explain is that the EMFA key transform enables us to perform queries. It does not execute any queries itself, but it is necessary. I think I could clarify that it provides the path from the column to the EMFA, and from the inner EMF to the final field, in two separate results. This way, we can join these two parts according to the requirements of each lookup.
There are some sentences that shouldn't be here, like The transform checks if any element in the array has a field matching the condition

>>> Post.objects.filter(tags__label__overlap=["thoughts"])
<QuerySet [<Post: First post>, <Post: Second post>]>

>>> Post.objects.filter(tags__label__overlap=["tutorial", "thoughts"])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks kind of like "in" to me, when querying related fields normally. Check the error message:

======================================================================
ERROR: test_overlap_simplefield (model_fields_.test_embedded_model_array.QueryingTests.test_overlap_simplefield)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tim/code/django-mongodb/tests/model_fields_/test_embedded_model_array.py", line 212, in test_overlap_simplefield
    MuseumExhibit.objects.filter(sections__section_number__in=10), []
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tim/code/django/django/db/models/manager.py", line 87, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tim/code/django/django/db/models/query.py", line 1491, in filter
    return self._filter_or_exclude(False, args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tim/code/django/django/db/models/query.py", line 1509, in _filter_or_exclude
    clone._filter_or_exclude_inplace(negate, args, kwargs)
  File "/home/tim/code/django/django/db/models/query.py", line 1516, in _filter_or_exclude_inplace
    self._query.add_q(Q(*args, **kwargs))
  File "/home/tim/code/django/django/db/models/sql/query.py", line 1643, in add_q
    clause, _ = self._add_q(q_object, can_reuse)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tim/code/django/django/db/models/sql/query.py", line 1675, in _add_q
    child_clause, needed_inner = self.build_filter(
                                 ^^^^^^^^^^^^^^^^^^
  File "/home/tim/code/django/django/db/models/sql/query.py", line 1585, in build_filter
    condition = self.build_lookup(lookups, col, value)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tim/code/django/django/db/models/sql/query.py", line 1406, in build_lookup
    lhs = self.try_transform(lhs, lookup_name)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tim/code/django/django/db/models/sql/query.py", line 1438, in try_transform
    transform_class = lhs.get_transform(name)
                      ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tim/code/django-mongodb/django_mongodb_backend/fields/embedded_model_array.py", line 172, in get_transform
    raise FieldDoesNotExist(
django.core.exceptions.FieldDoesNotExist: Unsupported lookup 'in' for EmbeddedModelArrayField of 'IntegerField', perhaps you meant in?

Copy link
Collaborator Author

@WaVEV WaVEV May 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First I have to fix the message because is wrong. Maybe I messed up with the types, have to recheck that.
It is an overlap, because when querying inside an EMFA, we have two ways of indexing.

  1. The easy one it by number, in this case transform from arrayfield is borrowed, i.e. somefield_0_other_field. In that case the output_type or output_field is a value.
  2. The other way to index is for a field of the embedded documents. In that case the lhs's type must be array of any (depends on the destination, if the destination is an integer it has to match a array of integer, maybe we could generate an arrayField of integer)
    In the example that you provide sections__section_number__in sections is an array, and the test is jumping to the inner objects, in that case it should type as array. It is different from section__3__section_number, in that case it will type as integer.

Under the hood the overlap is calculated as multiple in operator. That is why it sounds similar

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking again with this example and re-read the documentation, I have misunderstood it. This operation belongs to in.

@WaVEV
Copy link
Collaborator Author

WaVEV commented May 26, 2025

I opened this as a review too early. I figure out that I can support multiples lookups without much effort. But I will appreciate your review 😄

@timgraham timgraham changed the title Arrayfield+emf querying Add querying support to EmbeddedModelArrayfield May 27, 2025
return {
"$and": [{"$ne": [lhs_mql, None]}, {"$size": {"$setIntersection": [value, lhs_mql]}}]
}
return {"$and": [{"$isArray": lhs_mql}, {"$size": {"$setIntersection": [value, lhs_mql]}}]}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't a test failure with this change reverted. Should it be part of the subquery patch instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔, I face a failure with the $ne code. It was when some missing type values appeared. Maybe they were in the EMF comparisons and perhaps it will appear in the subquery. I will try to find a test that check that otherwise will revert this change

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find any test. So I will revert this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the other isArray the len test is testing it. So the other isArray will be kept.

Comment on lines +181 to +188
def test_all_filter(self):
self.assertCountEqual(
MuseumExhibit.objects.filter(sections__section_number__all=[1, 2]), [self.wonders]
)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about the overall design and whether or not we should try to make the querying API look like a ManyToManyField or whether it's more appropriate to make it look more like arrayfield.

In the relational world, this would be written as MuseumExhibit.objects.filter(Q(sections__section_number=1) & Q(sections__section_number=2)). It combines two $anyElementTrue clauses with $and and gives the same result, but I guess it doesn't perform as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 yes, it behaves like an array or a relation, maybe OneToMany it has a kind of duality. Also most of the queries are existence quantifier. Except this one 😬; other unusual behaviour here is: all only support equal lookup meanwhile the existence supports everyone.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I'm doubtful about the design of all and contained_by. In ArrayField, the lookup happens on the ArrayField, e.g. Post.objects.filter(tags__contained_by=["thoughts", "django"]).

Now EmbeddedModelArrayField stores an array of dictionaries and you're trying to hook these lookups onto the embedded field. I think the "duality" as you called it, isn't entirely intuitive.

I think we should save the work for these lookups in case it's demanded later, but I suggest to omit it for now.

Incidentally, can you confirm my suspicion about the performance of the Q() alternative that I proposed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Q works equally well. The generated plan isn't better or worse than the current all implementation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, all is equivalent to contains.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd think contains would do pattern matching (e.g. WHERE headline LIKE '%Lennon%';). That's another example of why I think trying to overload embeddedarrayfield__embededfield___lookup to do an "array field-like operation" rather than an existence test isn't a clear API design.

Comment on lines 122 to 132
@EmbeddedModelArrayField.register_lookup
class EMFArrayIn(EMFArrayBuildinLookup, lookups.In):
pass
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation is a bit confusing. The lookup is registered on the field, but what it actually applies to is the embedded model fields.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the question. Let me explain why I need to add this to make it work.
The lookups are inherited, so they're tried to be solved as the father does. But the builtin lookup defined in lookups.py don’t know how to handle this kind of transform. So I need to add this to use the EmbeddedModelArrayFieldBuiltinLookup

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment isn't really actionable. I just wanted to flag it and perhaps I'll add a comment about it.

lhs_mql = process_lhs(self, compiler, connection)
inner_lhs_mql = lhs_mql["$ifNull"][0]["$map"]["in"]
values = process_rhs(self, compiler, connection)
lhs_mql["$ifNull"][0]["$map"]["in"] = connection.mongo_operators[self.lookup_name](
Copy link
Collaborator Author

@WaVEV WaVEV May 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, I understand why Django returns template and parameters in as_sql results. 😬. I will think in another way to handle this.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Not what you encountered, but primarily it's to prevent SQL injection.)

@WaVEV WaVEV force-pushed the arrayfield+emf-querying branch 3 times, most recently from 744a9ef to f6625d9 Compare June 2, 2025 01:30
@WaVEV WaVEV force-pushed the arrayfield+emf-querying branch 2 times, most recently from b45dc6f to b4e627e Compare June 2, 2025 23:30
@WaVEV WaVEV force-pushed the arrayfield+emf-querying branch from b4e627e to e40bea8 Compare June 3, 2025 00:05
@WaVEV
Copy link
Collaborator Author

WaVEV commented Jun 3, 2025

the discussion continues in mongodb#303

@WaVEV WaVEV closed this Jun 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants