-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Questions
-
what are tables in our context?
A single query is converted to a table in our case. Then joining means we want to combine results from two queries.
What is the example of two queries that can't be merged into one? -
What can be the key for query joins?
- time: what was happening at one service while another is doing x? if they're not causally related
Join semantics from other systems
- sql join: https://www.w3schools.com/sql/sql_join.asp
Used to combine rows from two or more tables, based on a related column between them.
SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate
FROM Orders
INNER JOIN Customers ON Orders.CustomerID=Customers.CustomerID;-
spark (streaming) join: https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#stream-stream-joins
"The challenge of generating join results between two data streams is that, at any point of time, the view of the dataset is incomplete for both sides of the join making it much harder to find matches between inputs" -
pivot tracing happend-before join: https://www2.cs.uic.edu/~brents/cs494-cdcs/papers/pivot-tracing.pdf
happened-before relation: https://lamport.azurewebsites.net/pubs/time-clocks.pdf
The relation "->" on the set of events of a system is the smallest relation satisfying the following three conditions
- If a and b are events in the same process, and a comes before b, then a->b.
- if a is the sending of a message by one process and b is the receipt of the same message by another process, then a->b.
- if a->b and b->c then a->c.
Example query from the pivot tracing paper
From incr In DataNodeMetrics.incrBytesRead
Join cl In First(ClientProtocols) On cl -> incr
GroupBy cl.procName
Select cl.procName, SUM(incr.delta)this is done by propagating baggage along the request.