#305 getAncestors Database functionality. #311

ABLL526 · 2025-01-21T20:32:05Z

Added the getAncestors Functionality for the Server Portion
This is for issue GET /partitionings/{partId}/parents -> returns all ancestors, not just direct ones #305

Release notes:

Created endpoint /api/v2/partitionings/{partitioning_id}/ancestors to get the partitioning ancestors.

1. Made the necessary changes as mentioned by the team. 3. Made the necessary changes to the getAncestors Database functionality.

github-actions · 2025-01-21T20:34:51Z

JaCoCo model module code coverage report - scala 2.13.11

Overall Project	56.51%	🍏

There is no coverage information present for the Files changed

github-actions · 2025-01-21T20:34:53Z

JaCoCo agent module code coverage report - scala 2.13.11

Overall Project	78.44%	🍏

There is no coverage information present for the Files changed

github-actions · 2025-01-21T20:34:53Z

JaCoCo reader module code coverage report - scala 2.13.11

Overall Project	95.16%	🍏

There is no coverage information present for the Files changed

github-actions · 2025-01-21T20:34:54Z

JaCoCo server module code coverage report - scala 2.13.11

Overall Project	68.39%	🍏

There is no coverage information present for the Files changed

salamonpavel · 2025-01-28T10:37:09Z

database/src/main/postgres/runs/V0.3.0.2__get_ancestors.sql

+--      has_more            - Flag indicating if there are more partitionings available
+
+-- Status codes:
+--      11 - OK


Not super important I suppose but still, according to https://github.com/AbsaOSS/fa-db/blob/master/core/src/main/scala/za/co/absa/db/fadb/status/README.md we would maybe want to use status 10 instead of 11.

Changed as mentioned

I still see 11. 😉

Changed as needed

database/src/main/postgres/runs/V0.3.0.2__get_ancestors.sql

salamonpavel · 2025-01-28T10:46:45Z

database/src/main/postgres/runs/V0.3.0.2__get_ancestors.sql

+--
+-------------------------------------------------------------------------------
+DECLARE
+    partitionCreateAt TIMESTAMP;


partitioning

Changed as mentioned

Also the local variables start with _ by convention - avoids confusion with OUT parameters and column names.

Thank you the convention is useful to know. I have made the change.

salamonpavel · 2025-01-28T10:57:21Z

database/src/main/postgres/runs/V0.3.0.2__get_ancestors.sql

+        LIMIT i_limit
+        OFFSET i_offset;
+
+    IF FOUND THEN


You return status already from the query. And there is no reason to return 42. There are no records returned if ancestors don't exist. Have a look at runs.get_partitioning_checkpoints.

We then simply process the data as

if (results.nonEmpty && results.head.hasMore) ...

This makes sense although. From runs.get_paritioning_checkpoint_v2 it has a similar logic to this.
What I will do is comment it out for now and determine if it is necessary.

benedeki · 2025-01-29T15:27:28Z

database/src/main/postgres/runs/V0.3.0.2__get_ancestors.sql

+ * limitations under the License.
+ */
+
+CREATE OR REPLACE FUNCTION runs.get_ancestors(


Would name it to runs.get_partitioning_ancestors, otherwise the name is little ambiguous.

Make sense, changed

database/src/main/postgres/runs/V0.3.0.2__get_ancestors.sql

benedeki · 2025-01-29T16:00:25Z

database/src/main/postgres/runs/V0.3.0.2__get_ancestors.sql

+--
+-- Parameters:
+--      i_id_partitioning   - id that we asking the Ancestors for
+--      i_limit             - (optional) maximum number of partitionings to return, default is 5


Not important:
Don't we used 10 as the default limit in our functions?

I will change it to 10. I don't think it is important either.

benedeki · 2025-01-29T16:01:28Z

database/src/main/postgres/runs/V0.3.0.2__get_ancestors.sql

+--      has_more            - Flag indicating if there are more partitionings available
+
+-- Status codes:
+--      11 - OK


I still see 11. 😉

benedeki · 2025-01-29T16:27:12Z

database/src/main/postgres/runs/V0.3.0.2__get_ancestors.sql

+--
+-------------------------------------------------------------------------------
+DECLARE
+    partitionCreateAt TIMESTAMP;


Also the local variables start with _ by convention - avoids confusion with OUT parameters and column names.

benedeki · 2025-01-29T16:30:00Z

database/src/main/postgres/runs/V0.3.0.2__get_ancestors.sql

+-- Status codes:
+--      11 - OK
+--      41 - Partitioning not found
+--      42 - Ancestor Partitioning not found


I think there is no need for this status (and error one furthermore). If no ancestors found, it's OK, simple an empty list (particularly with paging).

Removed as needed

benedeki · 2025-01-30T07:28:04Z

database/src/main/postgres/runs/V0.3.0.2__get_ancestors.sql

+        WHERE
+            PF2.fk_partitioning = i_id_partitioning
+            AND
+            P.created_at < partitionCreateAt


Why this condition?
Actually I think the whole query is incorrect, unfortunately.
It should be

FROM flows.partitioning_to_flow PF INNER JOIN flows.flows F ON F.id_flow = PF.id_flow INNER JOIN runs.partitionings P ON P.id_partitioning = F.fk_primary_partitioning WHERE PF.fk_partitioning = i_id_partitioning AND P.id_partitioning IS DISTINCT FROM i_id_partitioning

I understand, but I think that it should be as the current since your query would get the children partitionings as well. Not just the Ancestors. Please let me know if I am mistaken with my analysis on this.

It seems that you understood the problem, technologies, and I checked your tests and it looks logically correct. Good job!

However, the biggest problem I see is this condition: P.created_at < partitionCreateAt - we cannot count on creation time - parent can be created before or after child creation, so you need to adjust the query

I have made this change as discussed, I did not realise there was an added field in the Flows table. That made this very easy. Thank you for the feedback, it has been changed and everything is working as intended.

lsulak · 2025-02-06T14:01:10Z

database/src/main/postgres/runs/V0.3.0.2__get_ancestors.sql

+    IN i_offset             BIGINT DEFAULT 0,
+    OUT status              INTEGER,
+    OUT status_text         TEXT,
+    OUT ancestorid         BIGINT,


Suggested change

OUT ancestorid BIGINT,

OUT ancestor_id BIGINT,

lsulak · 2025-02-06T14:22:53Z

database/src/test/scala/za/co/absa/atum/database/runs/GetAncestorsIntegrationTests.scala

+        assert(row.getLong("ancestorid").contains(partId1))
+        assert(returnedPartitioningParsed == expectedPartitioning1)
+        assert(row.getString("author").contains("Grandpa"))
+        assert(row.getString("author").contains("Grandpa"))


also I'd recommend to test whether the result is final, by adding this on the end of this local scope:

assert(!queryResult.next())

Added in to all tests, thank you

lsulak · 2025-02-06T14:25:59Z

database/src/test/scala/za/co/absa/atum/database/runs/GetAncestorsIntegrationTests.scala

+        assert(row.getString("status_text").contains("OK"))
+        assert(row.getLong("ancestorid").contains(partId4))
+        assert(returnedPartitioningParsed == expectedPartitioning4)
+        assert(row.getString("author").contains("Grandson"))


again, add the check on finality

Btw, I draw a simple diagram of partitioning hierarchy for this test and I see that this test checks ancestors correctly. For other reviewers, this is the hierarchy - ancestors are requested for Partitioning 5, and ancestors are: 1, 2, 3, 4:

1 2 | | 3 4 6 \ | | 5 7 | / 8

consider adding this 'diagram' into the code comment somewhere for easier understanding

also, you can use .map to iterate over a list of expected results - you are checking 4 rows and the code is almost the same, so can be massively simplified :)

Thank you will modify once I have the solution correct. I have also added the diagram in the comments.

Awesome, looking forward to it, I'll review once adjusted :)

Used the map implementation as mentioned.
Take a look, I had to add a few other items to make it work.
I could be missing something or if my method is not appropriate, please let me know.

lsulak · 2025-02-06T14:35:12Z

database/src/test/scala/za/co/absa/atum/database/runs/GetAncestorsIntegrationTests.scala

+        assert(row.getString("status_text").contains("OK"))
+        assert(row.getLong("ancestorid").contains(partId7))
+        assert(returnedPartitioningParsed == expectedPartitioning7)
+        assert(row.getString("author").contains("Daughter"))


same here - briefly checked, what you are checking seems to be correct, but can be massively simplified & I'd like you to add the last assert on data finality

lsulak · 2025-02-06T14:42:47Z

database/src/test/scala/za/co/absa/atum/database/runs/GetAncestorsIntegrationTests.scala

+    val Time5 = OffsetDateTime.parse("1992-08-07T10:00:00Z")
+    val Time6 = OffsetDateTime.parse("1992-08-08T10:00:00Z")
+    val Time7 = OffsetDateTime.parse("1992-08-09T10:00:00Z")
+    val Time8 = OffsetDateTime.parse("1992-08-09T11:00:00Z")


I would propose to change these times so that they are not incremental - let's say that they are more 'random', let's say that there isn't this nice chronological sequence of partitioning time creation. For example, in real life, parent can be created in later moment than a child.

Removing the entire premise of time creation from the query

1. Made the necessary changes as mentioned by the team. 2. Made the necessary changes to the getAncestors Database functionality. 3. Made changes as requested

1. Made the necessary changes as mentioned by the team. 2. Made the necessary changes to the getAncestors Database functionality. 3. Now working completely as intended.

1. Made the necessary changes as mentioned by the team. 2. Made the necessary changes to the getAncestors Database functionality. 3. Now working completely as intended. 4. Removed unnecessary files

benedeki · 2025-02-24T06:43:23Z

database/src/main/postgres/runs/V0.3.0.2__get_partitioning_ancestors.sql

+    SELECT count(*) > i_limit
+    FROM flows.partitioning_to_flow PTF
+    WHERE PTF.fk_flow IN (
+        SELECT fk_flow
+        FROM flows.partitioning_to_flow
+        WHERE fk_partitioning = i_id_partitioning
+    )
+    LIMIT i_limit + 1 OFFSET i_offset
+    INTO _has_more;
+
+    -- Return the ancestors
+    RETURN QUERY
+        SELECT
+            10 AS status,
+            'OK' AS status_text,
+            P.id_partitioning AS ancestor_id,
+            P.partitioning AS partitioning,
+            P.created_by AS author,
+            _has_more AS has_more
+        FROM
+            flows.partitioning_to_flow PF
+                INNER JOIN flows.flows F ON F.id_flow = PF.fk_flow
+                INNER JOIN runs.partitionings P ON P.id_partitioning = F.fk_primary_partitioning
+        WHERE
+            PF.fk_partitioning = i_id_partitioning AND
+            P.id_partitioning IS DISTINCT FROM i_id_partitioning
+        GROUP BY P.id_partitioning
+        ORDER BY P.id_partitioning
+        LIMIT i_limit
+        OFFSET i_offset;


Let's talk about these queries 😉

benedeki · 2025-02-24T06:44:22Z

database/src/main/postgres/runs/V0.3.0.2__get_partitioning_ancestors.sql

+    IF NOT FOUND THEN
+        status := 10;
+        status_text := 'OK';
+        RETURN NEXT;
+    END IF;


While totally valid if agreed upon in the DB-app contract, I think this would cause more trouble then good. Again, will happily explain.

I will check the documentation to return an unexpected code. Or an empty answer.

benedeki · 2025-02-24T08:44:57Z

.../src/test/scala/za/co/absa/atum/database/runs/GetPartitioningAncestorsIntegrationTests.scala

+  // Testing for return of the Ancestors for a given Partition ID
+  //
+  //  1(Grandma)  2(Grandpa)
+  //      |           |
+  //  3(Mother)   4(Father)    6(Daughter)
+  //     \        |                |
+  //       5(Son)           7(Granddaughter)
+  //          |            /
+  //           8(Grandson)


Love the diagram 😎

benedeki · 2025-02-24T08:45:35Z

.../src/test/scala/za/co/absa/atum/database/runs/GetPartitioningAncestorsIntegrationTests.scala

+  test("Returns Ancestors for a given Partition ID"){
+    val partitioningID1 = function(createPartitioningFn)
+      .setParam("i_partitioning", partitioning1)
+      .setParam("i_by_user", "Grandma")


Putting the "family tree position" here is smart too. 👍

benedeki · 2025-02-24T08:51:25Z

.../src/test/scala/za/co/absa/atum/database/runs/GetPartitioningAncestorsIntegrationTests.scala

+        var returnedPartitioningParsed = parse(returnedPartitioning.value)
+          .getOrElse(fail("Failed to parse returned partitioning"))
+        //Used breakable to be able to break the loop
+        breakable {


Not sure I understand the need for this.

Removed the breakable test

ABLL526 · 2025-02-24T11:20:25Z

.../src/test/scala/za/co/absa/atum/database/runs/GetPartitioningAncestorsIntegrationTests.scala

+        val row = queryResult.next()
+        assert(row.getInt("status").contains(41))
+        assert(row.getString("status_text").contains("Partitioning not found"))
+        assert(row.getJsonB("ancestor_id").isEmpty)


Testing null values is fine for test purposes, in reality it can have data. Therefore just testing the error codes is good enough.

ABLL526 · 2025-02-24T11:22:53Z

.../src/test/scala/za/co/absa/atum/database/runs/GetPartitioningAncestorsIntegrationTests.scala

+  //Second Failure Test: Ancestor Partitioning not found
+  test("Ancestor Partitioning not found") {
+
+    val partitioningID1 = function(createPartitioningFn)


Just call the function. Make the assumption it works. It helps with shortening code and maintenance.

Having the data preparation in one function.

Adjustments to the SQL and added in status code 14. Adjustments to the tests, made it more readable and shorter.

lsulak · 2025-02-27T10:08:34Z

.../src/test/scala/za/co/absa/atum/database/runs/GetPartitioningAncestorsIntegrationTests.scala

+            assert(row.getInt("status").contains(10))
+            assert(row.getString("status_text").contains("OK"))
+            assert(row.getLong("ancestor_id").contains(v._1))
+            assert(returnedPartitioningParsed == v._2)


you can probably expand v as pattern match instead of ._1 and ._2, just an idea

lsulak · 2025-02-27T10:30:46Z

.../src/test/scala/za/co/absa/atum/database/runs/GetPartitioningAncestorsIntegrationTests.scala

+            assert(row.getLong("ancestor_id").contains(v._1))
+            assert(returnedPartitioningParsed == v._2)
+            assert(row.getString("author").contains(k))
+            if (!queryResult.hasNext) break()


this is a very non-functional, procedural-like code, which is, no matter the style, very non-deterministic. I would prepare exact expected parents and loop over them, not like this

lsulak · 2025-02-27T10:32:23Z

.../src/test/scala/za/co/absa/atum/database/runs/GetPartitioningAncestorsIntegrationTests.scala

+            assert(queryResult.hasNext)
+            row = queryResult.next()
+            returnedPartitioning = row.getJsonB("partitioning").get
+            returnedPartitioningParsed = parse(returnedPartitioning.value)


also here you are not really checking the expected & returned ancestors. This test basically doesn't test whether all expected ancestors are being returned, it really just checks P1, then if the query didn't return anything it breaks from the check loop.

So this test doesn't prove that your DB function really works and therefore it really needs improvement

having partial-only expected results (your case) is definitely not a good practice

the last 2 tests are OK, but this one seriously needs improvement

Thank you. I have removed all the breakable loops. I have then added a set case not partial-only expected results. Please check if this is a better practice.

Made amendments to test code mentioned by PR. - Removed Breakable tests - Added a set case and removed the partial test case.

benedeki

By returning 14, you IMHO chose the more complicated path, but it's a valid one 😉

- Made some changes to the naming convention on the files.

Added the getAncestors Database functionality.

bbba4ae

1. Made the necessary changes as mentioned by the team. 3. Made the necessary changes to the getAncestors Database functionality.

ABLL526 requested review from benedeki, lsulak, Zejnilovic, dk1844 and salamonpavel as code owners January 21, 2025 20:32

ABLL526 added good first issue Good for newcomers DB Issues touching the Database part of the project Server Issues touching the server part of the project labels Jan 21, 2025

ABLL526 linked an issue Jan 21, 2025 that may be closed by this pull request

GET /partitionings/{partId}/parents -> returns all ancestors, not just direct ones #305

Open

ABLL526 self-assigned this Jan 21, 2025

ABLL526 changed the title ~~Added the getAncestors Database functionality.~~ #305 getAncestors Database functionality. Jan 21, 2025

ABLL526 added the enhancement New feature or request label Jan 21, 2025

salamonpavel reviewed Jan 28, 2025

View reviewed changes

database/src/main/postgres/runs/V0.3.0.2__get_ancestors.sql Outdated Show resolved Hide resolved

salamonpavel reviewed Jan 28, 2025

View reviewed changes

salamonpavel mentioned this pull request Jan 28, 2025

#305 Get-Ancestors-Server Functionality #312

Open

benedeki requested changes Jan 30, 2025

View reviewed changes

lsulak reviewed Feb 6, 2025

View reviewed changes

ABLL526 and others added 2 commits February 11, 2025 04:27

Added the getAncestors Database functionality.

e479ea3

1. Made the necessary changes as mentioned by the team. 2. Made the necessary changes to the getAncestors Database functionality. 3. Made changes as requested

Added the getAncestors Database functionality.

8694a4f

1. Made the necessary changes as mentioned by the team. 2. Made the necessary changes to the getAncestors Database functionality. 3. Now working completely as intended.

ABLL526 added 2 commits February 12, 2025 17:38

Added the getAncestors Database functionality.

d968718

1. Made the necessary changes as mentioned by the team. 2. Made the necessary changes to the getAncestors Database functionality. 3. Now working completely as intended.

Added the getAncestors Database functionality.

a5c556c

1. Made the necessary changes as mentioned by the team. 2. Made the necessary changes to the getAncestors Database functionality. 3. Now working completely as intended. 4. Removed unnecessary files

benedeki reviewed Feb 24, 2025

View reviewed changes

ABLL526 commented Feb 24, 2025

View reviewed changes

Changes made:

8961762

Adjustments to the SQL and added in status code 14. Adjustments to the tests, made it more readable and shorter.

lsulak reviewed Feb 27, 2025

View reviewed changes

Changes made:

f83460a

Made amendments to test code mentioned by PR. - Removed Breakable tests - Added a set case and removed the partial test case.

benedeki previously approved these changes Mar 3, 2025

View reviewed changes

ABLL526 and others added 3 commits March 3, 2025 15:53

Merge branch 'master' into 305-Get-Ancestors-Database

c99a34a

Merge branch 'master' into 305-Get-Ancestors-Database

9c3256f

Changes Made:

4dcaa86

- Made some changes to the naming convention on the files.

ABLL526 dismissed benedeki’s stale review via 4dcaa86 March 7, 2025 14:29

#305 getAncestors Database functionality. #311

Are you sure you want to change the base?

#305 getAncestors Database functionality. #311

Conversation

ABLL526 commented Jan 21, 2025 • edited Loading

github-actions bot commented Jan 21, 2025 • edited Loading

JaCoCo model module code coverage report - scala 2.13.11

github-actions bot commented Jan 21, 2025 • edited Loading

JaCoCo agent module code coverage report - scala 2.13.11

github-actions bot commented Jan 21, 2025 • edited Loading

JaCoCo reader module code coverage report - scala 2.13.11

github-actions bot commented Jan 21, 2025

JaCoCo server module code coverage report - scala 2.13.11

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

salamonpavel Jan 28, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lsulak Feb 6, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lsulak Feb 6, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lsulak Feb 6, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ABLL526 Feb 24, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lsulak Feb 27, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benedeki left a comment

Choose a reason for hiding this comment

ABLL526 commented Jan 21, 2025 •

edited

Loading

github-actions bot commented Jan 21, 2025 •

edited

Loading

github-actions bot commented Jan 21, 2025 •

edited

Loading

github-actions bot commented Jan 21, 2025 •

edited

Loading

salamonpavel Jan 28, 2025 •

edited

Loading

lsulak Feb 6, 2025 •

edited

Loading

lsulak Feb 6, 2025 •

edited

Loading

lsulak Feb 6, 2025 •

edited

Loading

ABLL526 Feb 24, 2025 •

edited

Loading

lsulak Feb 27, 2025 •

edited

Loading