-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PHOENIX-7499 : Update stream metadata when data table regions merge #2057
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are we closing resources?
for (RegionInfo ri : regionsToMerge) { | ||
List<String> ancestorIDs = getAncestorIdsForMerge(conn, tableName, streamName, ri); | ||
updateParentPartitionEndTime(conn, tableName, streamName, ancestorIDs, | ||
mergedRegion.getRegionId()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit optimization: In case of merge, the parent will appear multiple times, but needs to be updated only once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@haridsv During merges, we decided to insert one row for the daughter per parent. (parent partition id is part of PK)
Parents in a merge can also be results of merges in the past and hence they can have multiple rows in the table. We would need to update all those rows with the end time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By looking up the parent of the parent, we are actually updating the grand parents right? Is that really the intention? See my other comment visualizing the operations and let me know what I am missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just looking up the grandparent to form the complete PK for updating the parent end time.
phoenix-core-server/src/main/java/org/apache/phoenix/coprocessor/PhoenixMasterObserver.java
Show resolved
Hide resolved
// if parent was a result of a merge, there will be multiple grandparents. | ||
while (rs.next()) { | ||
ancestorIDs.add(rs.getString(2)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the situation in which a split will have multiple ancestors? In fact, isn't it an error if we actually find 2 regions with exactly the same start and end keys?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@haridsv By ancestors here I mean both parent and grandparent. The parent in a split can be a result of a merge. For example: A and B merged to form C. We would have inserted 2 rows for the daughter C with 2 parents, (C-->A) and (C-->B), with the same start/end key. If C splits into D and E, we will insert (D-->C) and (E-->C). When we lookup the parent of D and E, we will find those 2 rows with the same start/end key - both of which will need to be marked with the end time. (Parent's partition id is now part of the PK for CDC_STREAM table, so we will have to find all grandparents to mark the end time for a parent which has split).
Let me know if I am missing something here or over-complicating things. FYI @virajjasani
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get the part about grand parents, but then again I must be missing something, please see my other comment with visualization and let me know what I am missing.
@haridsv I have created connections using try-with-resource. Let me know if you see I missed it somewhere. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trying to understand this better with a more concrete illustration of events happend at different timestamps tn
, with regions are depicted as ID
(startRow
, endRow
):
Say at t0
, this is the state with seed partitions (I am using event timestamp as the start time for simplicity):
Partition | Parent Partition | Start Time | End Time | Start Key | End Key |
---|---|---|---|---|---|
A | null | t0 | null | r1 | r2 |
B | null | t0 | null | r2 | r3 |
C | null | t0 | null | r3 | r4 |
D | null | t0 | null | r4 | r5 |
Merge at t1
: A
(r1
,r2
), B
(r2
, r3
), C
(r3
, r4
), D
(r4
, r5
) -> AB
(r1
, r3
), CD
(r3
, r5
). This will result in 4 new rows to be inserted and their parent 4 rows to be updated for end time:
Partition | Parent Partition | Start Time | End Time | Start Key | End Key |
---|---|---|---|---|---|
A | null | t0 | t1 | r1 | r2 |
B | null | t0 | t1 | r2 | r3 |
C | null | t0 | t1 | r3 | r4 |
D | null | t0 | t1 | r4 | r5 |
AB | A | t1 | null | r1 | r3 |
AB | B | t1 | null | r1 | r3 |
CD | C | t1 | null | r3 | r5 |
CD | D | t1 | null | r3 | r5 |
Merge at t2
: AB
(r1
, r3
), CD
(r3
, r5
) -> ABCD
(r1
, r5
). This will result in 2 new rows to be inserted and the existing 4 parent records to be updated for end time.
Partition | Parent Partition | Start Time | End Time | Start Key | End Key |
---|---|---|---|---|---|
AB | A | t1 | t2 | r1 | r3 |
AB | B | t1 | t2 | r1 | r3 |
CD | C | t1 | t2 | r3 | r5 |
CD | D | t1 | t2 | r3 | r5 |
ABCD | AB | t2 | null | r1 | r5 |
ABCD | CD | t2 | null | r1 | r5 |
Split at t3
: ABCD
(r1
, r4
) -> E(r1
,r3
), F(r3
, r5
). This will result in 2 new rows to be inserted and the end time for its parent to be updated.
Partition | Parent Partition | Start Time | End Time | Start Key | End Key |
---|---|---|---|---|---|
ABCD | AB | t2 | t3 | r1 | r5 |
ABCD | CD | t2 | t3 | r1 | r5 |
E | ABCD | t3 | null | r1 | r3 |
F | ABCD | t3 | null | r3 | r5 |
NOTE: I have excluded the grand parent rows at each stage in the above tables because I didn't see any pending operations on them to depict.
Questions:
- At any point, we need to look up the parent record(s) for the current region only to update the end time, right? I see you are looking up the parent of the parent (grand parents) during merge instead of the immediate parents Is this intentional?
- You are looking up both the parents and grand parents during split and updating their timestamp. Why grand parent too?
- For parents during split, why to lookup by start/end key, why not the same way as in merge? We are anyway not using the start/end key to constrain the rows to upsert for end time.
@@ -60,9 +62,13 @@ public class PhoenixMasterObserver implements MasterObserver, MasterCoprocessor | |||
private static final String PARTITION_UPSERT_SQL | |||
= "UPSERT INTO " + SYSTEM_CDC_STREAM_NAME + " VALUES (?,?,?,?,?,?,?,?)"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to list the column names here to make it easier to read the code instead of having to jump to QueryConstants for the name and order of the columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put the list of columns in a comment just above this line.
for (RegionInfo ri : regionsToMerge) { | ||
List<String> ancestorIDs = getAncestorIdsForMerge(conn, tableName, streamName, ri); | ||
updateParentPartitionEndTime(conn, tableName, streamName, ancestorIDs, | ||
mergedRegion.getRegionId()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By looking up the parent of the parent, we are actually updating the grand parents right? Is that really the intention? See my other comment visualizing the operations and let me know what I am missing.
// if parent was a result of a merge, there will be multiple grandparents. | ||
while (rs.next()) { | ||
ancestorIDs.add(rs.getString(2)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get the part about grand parents, but then again I must be missing something, please see my other comment with visualization and let me know what I am missing.
@haridsv Thanks for the illustration, it looks good and captures what I was trying to say.
Yes. That is correct.
I am looking up the grandparent only because the grandparent id will be a part of the PK for the rows for the parent. In your illustration, when ABCD splits, I want to know all its parents i.e. AB and CD. Then, we need to update the end time for all rows of ABCD i.e. (ABCD, AB) and (ABCD, CD). Hence when querying for the parent, I also include the grandparent in the query so that later on I can provide the complete PK to update the end time.
The merge coproc hook provides all parents and the daughter. The split coproc hook only provides the daughters, not the parents. So we will have to lookup by the start/end key. |
@palashc Thanks for the clarifications! I guess the point I was missing is that you are trying to do a point lookup to precisely target the rows to update by using the full PK. However, the lookup query for the grand parent is not a point lookup, so I feel there is not much advantage of including the grand parent for upsert. The same argument applies to split too, so while you are trying to be precise for upsert, the lookup for parent is impricise anyway. |
@haridsv Since parent partition id is part of the PK and can be null for the bootstrapped partitions, I will have to give the whole PK when updating the end time of a parent (tableName, streamName, parentId, grandparentId), right? That means I have to lookup the grandparent - 1) for splits, I anyway have to run a query to get the parent partition id based on start/end key so I just get the grandparent partition id with the same query. 2) for merges, I have the parent id already but need to run the query to lookup the grandparent. |
Thanks @virajjasani for clarifying offline that in Phoenix you need the full PK to upsert, I somehow assumed Phoenix supported that update variation. @palashc You may ignore my above comments. |
PHOENIX-7499