-
Notifications
You must be signed in to change notification settings - Fork 21
Add subquery support for EmbeddedModelArrayField #314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
{"$project": {"tmp_name": expr.as_mql(compiler, connection)}}, | ||
{ | ||
"$unwind": "$tmp_name", | ||
}, | ||
{ | ||
"$group": { | ||
"_id": None, | ||
"tmp_name": {"$addToSet": "$tmp_name"}, | ||
} | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand this code correctly:
- We rename a field (The array) to a
tmp_name
a. If there's a subquery, it will produce another array. - We unwind (the array)
- We take the value and add it back to a set to group all items into one
- We only pick the first item in the set? Or is this returning an array of arrays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the field we're querying on the RHS consists of multiple arrays.
Maybe an example will make this clearer. This example uses ArrayField, but it's analogous to EMAF:
Given the following documents.
[{'_id': ObjectId('684a4ddbc6db6230d7591003'),
'field': [1],
'field_nested': None,
'order': 1},
{'_id': ObjectId('684a4ddbc6db6230d7591004'),
'field': [2],
'field_nested': None,
'order': 2},
{'_id': ObjectId('684a4ddbc6db6230d7591005'),
'field': [2, 3],
'field_nested': None,
'order': 3},
{'_id': ObjectId('684a4ddbc6db6230d7591006'),
'field': [20, 30, 40],
'field_nested': None,
'order': 4},
{'_id': ObjectId('684a4ddbc6db6230d7591007'),
'field': None,
'field_nested': None,
'order': 5}]
Suppose we want to apply an $overlap query using a subquery. For example:
qs = NullableIntegerArrayModel.objects.filter(order__lt=3)
self.assertCountEqual(
NullableIntegerArrayModel.objects.filter(
field__overlap=qs.values_list("field"),
),
self.objs[:3],
)
To do this, we need to concatenate all the values from the RHS subquery. That means using an $unwind followed by a $group.
The $group stage collects values into an array using $addToSet. Since we use _id: null, this results in a single grouped array. However, because we're aggregating arrays from multiple documents, the result is a list of lists.
This pipeline was originally copied or adapted from ArrayField, because the structure of EmbeddedModelArrayField on the RHS behaves similarly to ArrayField.
return [ | ||
{ | ||
"$facet": { | ||
"group": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"group": [ | |
"group": [ |
Let's name it something else. Since group is an existing predicate, it could get confusing. How about array_agg_query_mixin
. Just be as explicit as that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 IMO naming with some with meaning in the context of the pipeline is better. I am thinking in something like gathered_values
, or concatenated_values
.
# This test would be useful to have, but it cannot be implemented | ||
# due to the current annotation handling. | ||
# Supporting slicing or indexing over an annotated | ||
# Embedded Array Field would require a refactor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a marked limitation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The query is really quite obscure, so I don't think it needs to be documented, if that's what you meant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 , yes. it is obscure. Maybe the behave isn't expected nor documented. It tries to show that an annotated embeddedmodelArrayField should be handled very similar to an Arrayfield.
BTW, I think the annotation should be reworked in the searchVector. So, we could return to this if we have time and we want to have this kind of functionality.
b4eb3c6
to
b77a335
Compare
b77a335
to
783aa8a
Compare
Subquery and annotation handling.