-
Notifications
You must be signed in to change notification settings - Fork 21
WIP: New unified syntax #313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: sync-streams
Are you sure you want to change the base?
Conversation
🦋 Changeset detectedLatest commit: 3dceb77 The changes in this PR will be included in the next version bump. This PR includes changesets to release 20 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did an initial review on the tests and implementation, and it looks great!
Some requests for more tests:
- Error condition - selecting multiple columns in a subquery:
WHERE ... IN (SELECT a, b)
- IN on parameter data:
SELECT * FROM comments WHERE issue_id IN (subscription.parameters() -> 'issue_ids')
- IN on parameter data AND table - is this supported? If not, just test the error condition:
SELECT * FROM comments WHERE issue_id IN (SELECT id FROM issues WHERE owner_id = request.user_id()) AND label IN (subscription.parameters() -> 'labels')
On the syntax, I think we just need to finalize the names of the request/stream parameters before merging. I'd also recommend removing the subscription_parameters
virtual table, and also removing the token_parameters
from streams, in favor of only using the parameter functions.
}); | ||
|
||
test('negated subquery from outer not operator', () => { | ||
const [_, errors] = syncStreamFromSql('s', 'select * from comments where not ()', options); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the expected result here? (looks like an incomplete test)
This is still WIP, I'll restructure the PRs a bit to make them easier to review.
This adds support for sync streams being defined as a single SQL statement (instead of the split between parameter and data queries we have to day). For details, see the internal "2025-05 Subqueries / New Sync Streams Syntax" document.
Overview
This new syntax is implemented with the same basic primitives also used for sync rules, meaning that assigning rows to buckets and and assigning buckets to users is a three-step process:
evaluateRow
).request parameters => data query results
and persist it (evaluateParameterRow
).BucketParameterQuerier
, using persisted lookups if subqueries exist).Despite that being kept the same, there are some pretty fundamental differences between the rules and streams:
OR
clause in aWHERE
. This is a difference to sync rules, where multiple independent rules would have to be written.Examples
Before looking at how this is implemented, it's helpful to consider a few examples (particularly interesting ones in bold):
SELECT * FROM issues WHERE length(description) < 10
: This is a query without parameters, you could define the same query with sync rules.SELECT * FROM issues WHERE owner_id = request.user_id()
. An implicit parameter is introduced whenever a value derived from data is compared with data derived from the request (like token, user, or stream parameters). Previously, this would correspond to a static parameter query (e.g.SELECT request.user_id() as uid
and thenselect * from issues where owner_id = bucket.user_id
as a data query).SELECT * FROM issues WHERE owner_id = request.user_id() AND length(description) < 10
: This combination of the two is also possible with sync rules. Note that conditions that only apply on the row to sync don't introduce parameters. They only affect howevaluateRow
works, by ignoring rows not matching the parameter.SELECT * FROM issues WHERE owner_id = request.user_id() OR length(description) < 10 OR token_parameters.is_admin
: Here, there are three independent conditions that could cause a row to get synced! The first one depends on an implicit stream parameter, the second one only depends on the row, the third one only depends on the token. This query is impossible to represent with a single sync rule, but with streams it's pretty straightforward.select * from comments where issue_id in (select id from issues where owner_id = request.user_id())
: This is equivalent to a parameter query selecting fromissues
before.select * from lists WHERE owner_id = (SELECT id FROM users WHERE id = requests.user_id() and users.is_admin)
. This form is also supported by sync rules. However, there's another way to write that now:select * from lists WHERE owner_id = requests.user_id() AND requests.user_id() IN (SELECT id FROM users WHERE is_admin)
. This creates a parameter lookup for admin users by their id. However, that parameter lookup doesn't create a bucket parameter! Instead, it's used as an additional request filter in the third step.OR
instead ofAND
: Some users (admins) have access to everything, others only to the rows they own. This is a pretty useful query that was impossible to write (in a single stream) before.SELECT * FROM users WHERE id IN (SELECT user_a FROM friends WHERE user_b = request.user_id()) OR id IN (SELECT user_b FROM friends WHERE user_a = request.user_id())
. These also create two independent buckets to sync, both with a parameter backed by a lookup.select * from comments where (issue_id in (select id from issues where owner_id = request.user_id()) OR token_parameters.is_admin) AND NOT comments.is_deleted
is a valid stream definition. The compiler applies the distributive law two create two stream variants (one with a parameter and a row condition, and one with only the row condition but only visible to users with admin rights).select * from comments where not (is_deleted AND issue_id not in (select id from issues where owner_id = request.user_id()))
. Why one would possibly want to write queries like this is beyond my understanding, a side-effect of how the compiler handles query 11 is that this is also allowed.Implementation
I've tried to document the implementation in detail, but it's still helpful to have a broad overview of how these queries are implemented.
After all intermediate steps, streams are represented by an array of
StreamVariant
s. Each variant is formed by anOR
clause, where the left and right subfilters may depend on different internal parameter sets (see e.g. example query 4). Since the parameters are different, we need to encode the variant in resulting bucket ids. E.g. query 4 would have buckets of the formstream|0["owner_id"]
for matching owners,stream|1[]
for rows withlength(description < 10)
andstream|2[]
for all rows (that bucket would only be visible to admin users).Each variant consists of:
filterRow()
: This maps a row to sync into all possible values that would match that row. E.g. forFROM comments WHERE issue_in IN (SELECT ...)
,filterRow(comments)
would returncomments.issue_id
. For the overlap operator (&&
), this would return all values in the array.StaticLookup
, extracting the other side of the comparison from request parameters (e.g. inWHERE owner_id = request.user()
); or anEqualsRowInSubqueryLookup
which extracts values from indexed lookups.WHERE length(content) < 10
that only depend on the row are included asadditionalRowFilters
. They only affect the first resolve step, by ignoring some rows inevaluateRow
.WHERE token_parameters.is_admin
that only depend on request parameters are included here. They don't rely on parameters, but can exclude some requests. They can either be static, or also rely on subquery results (e.g.WHERE request.user_id() IN (SELECT id FROM users WHERE is_admin)
).Within a variant, conditions form a conjunction: Rows are only included if all filters match.
Different variants form a disjunction: Rows are distributed to all buckets for each matching variant, and users see all variants where a request filter grants access.
To compile a
WHERE
clause into variants, we can't use most of the logic currently insql_filters.ts
because our way of dealing with subqueries and bucket parameters is incompatible with sync rules. Instead, we define classes for boolean algebra with theFilterOperator
class instreams/filter.ts
. This allows composing filters based on:AND
,OR
andNOT
boolean operators.WHERE length(content) < 10
) or only on request data (WHERE token_parameters.is_admin
). These are created and composed using the existingSqlTools
class.CompareRowValueWithStreamParameter
class, which creates an implicit parameter based on what is aParameterMatchClause
in sync rules.InOperator
, which checks whether an expression derived from row data is included in results of a subquery derived from request data. Similarly to the existing query compiler,IN
operators working on row/request data on both sides are compiled into aSimpleCondition
instead.OverlapOperator
, which is very similar toInOperator
but takes an array on the left-side and matches if the subquery intersects with that array.ExistsOperator
, which is also similar toInOperator
but doesn't compare values, it matches if the subquery returns any row. This is used to compileWHERE request.user_id() IN (SELECT * FROM users WHERE is_admin)
, by pushing the request parameter into the subquery:WHERE EXISTS (SELECT _ FROM users WHERE is_admin AND id = request.user_id())
.Compiling a
WHERE
clause then involves:OR
as the top-level operator andAND
within).NOT
operator appears before anything that isn't aSimpleCondition
(for those we can push the operator into the condition viacomposeFunction(OPERATOR_NOT, ...)
). So after this step, noNOT
operators appear in the query.OR
into a variant.a. By introducing parameters for
CompareRowValueWithStreamParameter
,InOperator
andOverlapOperator
.b. Introducing additional row filters for
SimpleCondition
s that are static or depend on row data.c. And requests filters for
ExistsOperator and
SimpleCondition`s that depend on request data.Remaining todos:
SqlBucketDescriptor
.SqlBucketDescriptor
outside ofsync-rules
.