-
Notifications
You must be signed in to change notification settings - Fork 0
Add querying support to EmbeddedModelArrayfield #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8404d98
to
703502d
Compare
This comment was marked as resolved.
This comment was marked as resolved.
329ab7a
to
7c45d6a
Compare
41f1843
to
ca7a7b3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial comments.
docs/source/ref/models/fields.rst
Outdated
|
||
Note that this does **not** require the whole array to match, only that at least one embedded document matches exactly. | ||
|
||
Keytransforms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Key transform" was borrowed from JSONField. I would describe this more like "Embedded field lookup"
... ) | ||
>>> Post.objects.create(name="Second post", tags=[Tag(label="tutorial")]) | ||
|
||
>>> Post.objects.filter(tags__label__exact="tutorial") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't "exact" and "key transform" the same?
Post.objects.filter(tags__label__exact="tutorial")
Post.objects.filter(tags__label="tutorial") [normally an implicit __exact]
It's confusing that there are two sections for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will rewrite this part.
What I wanted to explain is that the EMFA key transform enables us to perform queries. It does not execute any queries itself, but it is necessary. I think I could clarify that it provides the path from the column to the EMFA, and from the inner EMF to the final field, in two separate results. This way, we can join these two parts according to the requirements of each lookup.
There are some sentences that shouldn't be here, like The transform checks if any element in the array has a field matching the condition
docs/source/ref/models/fields.rst
Outdated
>>> Post.objects.filter(tags__label__overlap=["thoughts"]) | ||
<QuerySet [<Post: First post>, <Post: Second post>]> | ||
|
||
>>> Post.objects.filter(tags__label__overlap=["tutorial", "thoughts"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks kind of like "in" to me, when querying related fields normally. Check the error message:
======================================================================
ERROR: test_overlap_simplefield (model_fields_.test_embedded_model_array.QueryingTests.test_overlap_simplefield)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/tim/code/django-mongodb/tests/model_fields_/test_embedded_model_array.py", line 212, in test_overlap_simplefield
MuseumExhibit.objects.filter(sections__section_number__in=10), []
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tim/code/django/django/db/models/manager.py", line 87, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tim/code/django/django/db/models/query.py", line 1491, in filter
return self._filter_or_exclude(False, args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tim/code/django/django/db/models/query.py", line 1509, in _filter_or_exclude
clone._filter_or_exclude_inplace(negate, args, kwargs)
File "/home/tim/code/django/django/db/models/query.py", line 1516, in _filter_or_exclude_inplace
self._query.add_q(Q(*args, **kwargs))
File "/home/tim/code/django/django/db/models/sql/query.py", line 1643, in add_q
clause, _ = self._add_q(q_object, can_reuse)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tim/code/django/django/db/models/sql/query.py", line 1675, in _add_q
child_clause, needed_inner = self.build_filter(
^^^^^^^^^^^^^^^^^^
File "/home/tim/code/django/django/db/models/sql/query.py", line 1585, in build_filter
condition = self.build_lookup(lookups, col, value)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tim/code/django/django/db/models/sql/query.py", line 1406, in build_lookup
lhs = self.try_transform(lhs, lookup_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tim/code/django/django/db/models/sql/query.py", line 1438, in try_transform
transform_class = lhs.get_transform(name)
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tim/code/django-mongodb/django_mongodb_backend/fields/embedded_model_array.py", line 172, in get_transform
raise FieldDoesNotExist(
django.core.exceptions.FieldDoesNotExist: Unsupported lookup 'in' for EmbeddedModelArrayField of 'IntegerField', perhaps you meant in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First I have to fix the message because is wrong. Maybe I messed up with the types, have to recheck that.
It is an overlap, because when querying inside an EMFA, we have two ways of indexing.
- The easy one it by number, in this case transform from arrayfield is borrowed, i.e. somefield_0_other_field. In that case the
output_type
oroutput_field
is a value. - The other way to index is for a field of the embedded documents. In that case the
lhs
's type must bearray of any
(depends on the destination, if the destination is an integer it has to match a array of integer, maybe we could generate an arrayField of integer)
In the example that you providesections__section_number__in
sections is an array, and the test is jumping to the inner objects, in that case it should type as array. It is different from section__3__section_number, in that case it will type as integer.
Under the hood the overlap is calculated as multiple in
operator. That is why it sounds similar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking again with this example and re-read the documentation, I have misunderstood it. This operation belongs to in
.
I opened this as a review too early. I figure out that I can support multiples lookups without much effort. But I will appreciate your review 😄 |
return { | ||
"$and": [{"$ne": [lhs_mql, None]}, {"$size": {"$setIntersection": [value, lhs_mql]}}] | ||
} | ||
return {"$and": [{"$isArray": lhs_mql}, {"$size": {"$setIntersection": [value, lhs_mql]}}]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There isn't a test failure with this change reverted. Should it be part of the subquery patch instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔, I face a failure with the $ne
code. It was when some missing
type values appeared. Maybe they were in the EMF comparisons and perhaps it will appear in the subquery. I will try to find a test that check that otherwise will revert this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find any test. So I will revert this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the other isArray
the len test is testing it. So the other isArray will be kept.
def test_all_filter(self): | ||
self.assertCountEqual( | ||
MuseumExhibit.objects.filter(sections__section_number__all=[1, 2]), [self.wonders] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking about the overall design and whether or not we should try to make the querying API look like a ManyToManyField
or whether it's more appropriate to make it look more like arrayfield.
In the relational world, this would be written as MuseumExhibit.objects.filter(Q(sections__section_number=1) & Q(sections__section_number=2))
. It combines two $anyElementTrue
clauses with $and
and gives the same result, but I guess it doesn't perform as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 yes, it behaves like an array or a relation, maybe OneToMany
it has a kind of duality. Also most of the queries are existence quantifier. Except this one 😬; other unusual behaviour here is: all
only support equal
lookup meanwhile the existence
supports everyone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I'm doubtful about the design of all
and contained_by
. In ArrayField
, the lookup happens on the ArrayField
, e.g. Post.objects.filter(tags__contained_by=["thoughts", "django"])
.
Now EmbeddedModelArrayField
stores an array of dictionaries and you're trying to hook these lookups onto the embedded field. I think the "duality" as you called it, isn't entirely intuitive.
I think we should save the work for these lookups in case it's demanded later, but I suggest to omit it for now.
Incidentally, can you confirm my suspicion about the performance of the Q()
alternative that I proposed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Q works equally well. The generated plan isn't better or worse than the current all implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, all is equivalent to contains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd think contains
would do pattern matching (e.g. WHERE headline LIKE '%Lennon%';
). That's another example of why I think trying to overload embeddedarrayfield__embededfield___lookup
to do an "array field-like operation" rather than an existence test isn't a clear API design.
@EmbeddedModelArrayField.register_lookup | ||
class EMFArrayIn(EMFArrayBuildinLookup, lookups.In): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation is a bit confusing. The lookup is registered on the field, but what it actually applies to is the embedded model fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the question. Let me explain why I need to add this to make it work.
The lookups are inherited, so they're tried to be solved as the father does. But the builtin lookup defined in lookups.py don’t know how to handle this kind of transform. So I need to add this to use the EmbeddedModelArrayFieldBuiltinLookup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment isn't really actionable. I just wanted to flag it and perhaps I'll add a comment about it.
lhs_mql = process_lhs(self, compiler, connection) | ||
inner_lhs_mql = lhs_mql["$ifNull"][0]["$map"]["in"] | ||
values = process_rhs(self, compiler, connection) | ||
lhs_mql["$ifNull"][0]["$map"]["in"] = connection.mongo_operators[self.lookup_name]( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, I understand why Django returns template and parameters in as_sql results. 😬. I will think in another way to handle this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Not what you encountered, but primarily it's to prevent SQL injection.)
744a9ef
to
f6625d9
Compare
Match the order in embedded_model_array.py
b45dc6f
to
b4e627e
Compare
b4e627e
to
e40bea8
Compare
the discussion continues in mongodb#303 |
In order to simplify the diff of mongodb#303, this PR is open but it will be merged after Tim's patch.