PHOENIX-7499 : Update stream metadata when data table regions merge #2057

palashc · 2025-01-15T19:43:17Z

haridsv

Where are we closing resources?

haridsv · 2025-01-23T05:40:04Z

phoenix-core-server/src/main/java/org/apache/phoenix/coprocessor/PhoenixMasterObserver.java

+                for (RegionInfo ri : regionsToMerge) {
+                    List<String> ancestorIDs = getAncestorIdsForMerge(conn, tableName, streamName, ri);
+                    updateParentPartitionEndTime(conn, tableName, streamName, ancestorIDs,
+                            mergedRegion.getRegionId());


Nit optimization: In case of merge, the parent will appear multiple times, but needs to be updated only once.

@haridsv During merges, we decided to insert one row for the daughter per parent. (parent partition id is part of PK)

Parents in a merge can also be results of merges in the past and hence they can have multiple rows in the table. We would need to update all those rows with the end time.

By looking up the parent of the parent, we are actually updating the grand parents right? Is that really the intention? See my other comment visualizing the operations and let me know what I am missing.

Just looking up the grandparent to form the complete PK for updating the parent end time.

phoenix-core-server/src/main/java/org/apache/phoenix/coprocessor/PhoenixMasterObserver.java

haridsv · 2025-01-23T06:15:36Z

phoenix-core-server/src/main/java/org/apache/phoenix/coprocessor/PhoenixMasterObserver.java

+        // if parent was a result of a merge, there will be multiple grandparents.
+        while (rs.next()) {
+            ancestorIDs.add(rs.getString(2));
+        }


What is the situation in which a split will have multiple ancestors? In fact, isn't it an error if we actually find 2 regions with exactly the same start and end keys?

@haridsv By ancestors here I mean both parent and grandparent. The parent in a split can be a result of a merge. For example: A and B merged to form C. We would have inserted 2 rows for the daughter C with 2 parents, (C-->A) and (C-->B), with the same start/end key. If C splits into D and E, we will insert (D-->C) and (E-->C). When we lookup the parent of D and E, we will find those 2 rows with the same start/end key - both of which will need to be marked with the end time. (Parent's partition id is now part of the PK for CDC_STREAM table, so we will have to find all grandparents to mark the end time for a parent which has split).

Let me know if I am missing something here or over-complicating things. FYI @virajjasani

I didn't get the part about grand parents, but then again I must be missing something, please see my other comment with visualization and let me know what I am missing.

palashc · 2025-01-23T07:43:01Z

Where are we closing resources?

@haridsv I have created connections using try-with-resource. Let me know if you see I missed it somewhere.

haridsv

I am trying to understand this better with a more concrete illustration of events happend at different timestamps tn, with regions are depicted as ID(startRow, endRow):

Say at t0, this is the state with seed partitions (I am using event timestamp as the start time for simplicity):

Partition	Parent Partition	Start Time	End Time	Start Key	End Key
A	null	t0	null	r1	r2
B	null	t0	null	r2	r3
C	null	t0	null	r3	r4
D	null	t0	null	r4	r5

Merge at t1: A(r1,r2), B(r2, r3), C(r3, r4), D(r4, r5) -> AB(r1, r3), CD(r3, r5). This will result in 4 new rows to be inserted and their parent 4 rows to be updated for end time:

Partition	Parent Partition	Start Time	End Time	Start Key	End Key
A	null	t0	t1	r1	r2
B	null	t0	t1	r2	r3
C	null	t0	t1	r3	r4
D	null	t0	t1	r4	r5
AB	A	t1	null	r1	r3
AB	B	t1	null	r1	r3
CD	C	t1	null	r3	r5
CD	D	t1	null	r3	r5

Merge at t2: AB(r1, r3), CD(r3, r5) -> ABCD(r1, r5). This will result in 2 new rows to be inserted and the existing 4 parent records to be updated for end time.

Partition	Parent Partition	Start Time	End Time	Start Key	End Key
AB	A	t1	t2	r1	r3
AB	B	t1	t2	r1	r3
CD	C	t1	t2	r3	r5
CD	D	t1	t2	r3	r5
ABCD	AB	t2	null	r1	r5
ABCD	CD	t2	null	r1	r5

Split at t3: ABCD(r1, r4) -> E(r1,r3), F(r3, r5). This will result in 2 new rows to be inserted and the end time for its parent to be updated.

Partition	Parent Partition	Start Time	End Time	Start Key	End Key
ABCD	AB	t2	t3	r1	r5
ABCD	CD	t2	t3	r1	r5
E	ABCD	t3	null	r1	r3
F	ABCD	t3	null	r3	r5

NOTE: I have excluded the grand parent rows at each stage in the above tables because I didn't see any pending operations on them to depict.

Questions:

At any point, we need to look up the parent record(s) for the current region only to update the end time, right? I see you are looking up the parent of the parent (grand parents) during merge instead of the immediate parents Is this intentional?
You are looking up both the parents and grand parents during split and updating their timestamp. Why grand parent too?
For parents during split, why to lookup by start/end key, why not the same way as in merge? We are anyway not using the start/end key to constrain the rows to upsert for end time.

haridsv · 2025-01-24T10:42:08Z

phoenix-core-server/src/main/java/org/apache/phoenix/coprocessor/PhoenixMasterObserver.java

@@ -60,9 +62,13 @@ public class PhoenixMasterObserver implements MasterObserver, MasterCoprocessor
    private static final String PARTITION_UPSERT_SQL
            = "UPSERT INTO " + SYSTEM_CDC_STREAM_NAME + " VALUES (?,?,?,?,?,?,?,?)";


It would be nice to list the column names here to make it easier to read the code instead of having to jump to QueryConstants for the name and order of the columns.

I put the list of columns in a comment just above this line.

haridsv · 2025-01-24T10:44:41Z

phoenix-core-server/src/main/java/org/apache/phoenix/coprocessor/PhoenixMasterObserver.java

+                for (RegionInfo ri : regionsToMerge) {
+                    List<String> ancestorIDs = getAncestorIdsForMerge(conn, tableName, streamName, ri);
+                    updateParentPartitionEndTime(conn, tableName, streamName, ancestorIDs,
+                            mergedRegion.getRegionId());


By looking up the parent of the parent, we are actually updating the grand parents right? Is that really the intention? See my other comment visualizing the operations and let me know what I am missing.

haridsv · 2025-01-24T10:46:52Z

phoenix-core-server/src/main/java/org/apache/phoenix/coprocessor/PhoenixMasterObserver.java

+        // if parent was a result of a merge, there will be multiple grandparents.
+        while (rs.next()) {
+            ancestorIDs.add(rs.getString(2));
+        }


I didn't get the part about grand parents, but then again I must be missing something, please see my other comment with visualization and let me know what I am missing.

palashc · 2025-01-27T19:17:21Z

@haridsv Thanks for the illustration, it looks good and captures what I was trying to say.

At any point, we need to look up the parent record(s) for the current region only to update the end time, right?

Yes. That is correct.

I see you are looking up the parent of the parent (grand parents) during merge instead of the immediate parents Is this intentional?

You are looking up both the parents and grand parents during split and updating their timestamp. Why grand parent too?

I am looking up the grandparent only because the grandparent id will be a part of the PK for the rows for the parent. In your illustration, when ABCD splits, I want to know all its parents i.e. AB and CD. Then, we need to update the end time for all rows of ABCD i.e. (ABCD, AB) and (ABCD, CD). Hence when querying for the parent, I also include the grandparent in the query so that later on I can provide the complete PK to update the end time.

For parents during split, why to lookup by start/end key, why not the same way as in merge? We are anyway not using the start/end key to constrain the rows to upsert for end time.

The merge coproc hook provides all parents and the daughter. The split coproc hook only provides the daughters, not the parents. So we will have to lookup by the start/end key.

haridsv · 2025-01-29T10:13:11Z

@palashc Thanks for the clarifications! I guess the point I was missing is that you are trying to do a point lookup to precisely target the rows to update by using the full PK. However, the lookup query for the grand parent is not a point lookup, so I feel there is not much advantage of including the grand parent for upsert. The same argument applies to split too, so while you are trying to be precise for upsert, the lookup for parent is impricise anyway.

palashc · 2025-01-29T21:10:31Z

@haridsv Since parent partition id is part of the PK and can be null for the bootstrapped partitions, I will have to give the whole PK when updating the end time of a parent (tableName, streamName, parentId, grandparentId), right? That means I have to lookup the grandparent - 1) for splits, I anyway have to run a query to get the parent partition id based on start/end key so I just get the grandparent partition id with the same query. 2) for merges, I have the parent id already but need to run the query to lookup the grandparent.

haridsv · 2025-01-30T05:17:49Z

Thanks @virajjasani for clarifying offline that in Phoenix you need the full PK to upsert, I somehow assumed Phoenix supported that update variation. @palashc You may ignore my above comments.

initial commit

cf49786

palashc marked this pull request as draft January 15, 2025 19:43

Merge branch 'master' into PHOENIX-7499

5382aa1

virajjasani self-requested a review January 17, 2025 21:35

add tests

7d1e17d

palashc marked this pull request as ready for review January 18, 2025 00:44

haridsv reviewed Jan 23, 2025

View reviewed changes

haridsv reviewed Jan 24, 2025

View reviewed changes

Palash Chauhan added 2 commits January 27, 2025 15:43

Merge branch 'master' into PHOENIX-7499

16bc375

set auto commit to false

d089cba

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PHOENIX-7499 : Update stream metadata when data table regions merge #2057

PHOENIX-7499 : Update stream metadata when data table regions merge #2057

palashc commented Jan 15, 2025 •

edited

Loading

haridsv left a comment

haridsv Jan 23, 2025

palashc Jan 23, 2025

haridsv Jan 24, 2025

palashc Jan 27, 2025

haridsv Jan 23, 2025

palashc Jan 23, 2025

haridsv Jan 24, 2025

palashc commented Jan 23, 2025

haridsv left a comment

haridsv Jan 24, 2025

palashc Jan 27, 2025

haridsv Jan 24, 2025

haridsv Jan 24, 2025

palashc commented Jan 27, 2025

haridsv commented Jan 29, 2025 •

edited

Loading

palashc commented Jan 29, 2025

haridsv commented Jan 30, 2025

		@@ -60,9 +62,13 @@ public class PhoenixMasterObserver implements MasterObserver, MasterCoprocessor
		private static final String PARTITION_UPSERT_SQL
		= "UPSERT INTO " + SYSTEM_CDC_STREAM_NAME + " VALUES (?,?,?,?,?,?,?,?)";

PHOENIX-7499 : Update stream metadata when data table regions merge #2057

Are you sure you want to change the base?

PHOENIX-7499 : Update stream metadata when data table regions merge #2057

Conversation

palashc commented Jan 15, 2025 • edited Loading

haridsv left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

palashc commented Jan 23, 2025

haridsv left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

palashc commented Jan 27, 2025

haridsv commented Jan 29, 2025 • edited Loading

palashc commented Jan 29, 2025

haridsv commented Jan 30, 2025

palashc commented Jan 15, 2025 •

edited

Loading

haridsv commented Jan 29, 2025 •

edited

Loading