Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#305 getAncestors Database functionality. #311

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

ABLL526
Copy link
Contributor

@ABLL526 ABLL526 commented Jan 21, 2025

Release notes:

  • Created endpoint /api/v2/partitionings/{partitioning_id}/ancestors to get the partitioning ancestors.

1. Made the necessary changes as mentioned by the team.
3. Made the necessary changes to the getAncestors Database functionality.
@ABLL526 ABLL526 added good first issue Good for newcomers DB Issues touching the Database part of the project Server Issues touching the server part of the project labels Jan 21, 2025
Copy link

github-actions bot commented Jan 21, 2025

JaCoCo model module code coverage report - scala 2.13.11

Overall Project 56.51% 🍏

There is no coverage information present for the Files changed

Copy link

github-actions bot commented Jan 21, 2025

JaCoCo agent module code coverage report - scala 2.13.11

Overall Project 78.44% 🍏

There is no coverage information present for the Files changed

Copy link

github-actions bot commented Jan 21, 2025

JaCoCo reader module code coverage report - scala 2.13.11

Overall Project 95.16% 🍏

There is no coverage information present for the Files changed

Copy link

JaCoCo server module code coverage report - scala 2.13.11

Overall Project 68.39% 🍏

There is no coverage information present for the Files changed

@ABLL526 ABLL526 self-assigned this Jan 21, 2025
@ABLL526 ABLL526 changed the title Added the getAncestors Database functionality. #305 getAncestors Database functionality. Jan 21, 2025
@ABLL526 ABLL526 added the enhancement New feature or request label Jan 21, 2025
-- has_more - Flag indicating if there are more partitionings available

-- Status codes:
-- 11 - OK
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not super important I suppose but still, according to https://github.com/AbsaOSS/fa-db/blob/master/core/src/main/scala/za/co/absa/db/fadb/status/README.md we would maybe want to use status 10 instead of 11.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed as mentioned

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still see 11. 😉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed as needed

--
-------------------------------------------------------------------------------
DECLARE
partitionCreateAt TIMESTAMP;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partitioning

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed as mentioned

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the local variables start with _ by convention - avoids confusion with OUT parameters and column names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you the convention is useful to know. I have made the change.

LIMIT i_limit
OFFSET i_offset;

IF FOUND THEN
Copy link
Collaborator

@salamonpavel salamonpavel Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You return status already from the query. And there is no reason to return 42. There are no records returned if ancestors don't exist. Have a look at runs.get_partitioning_checkpoints.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We then simply process the data as

if (results.nonEmpty && results.head.hasMore) ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense although. From runs.get_paritioning_checkpoint_v2 it has a similar logic to this.
What I will do is comment it out for now and determine if it is necessary.

* limitations under the License.
*/

CREATE OR REPLACE FUNCTION runs.get_ancestors(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would name it to runs.get_partitioning_ancestors, otherwise the name is little ambiguous.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, changed

--
-- Parameters:
-- i_id_partitioning - id that we asking the Ancestors for
-- i_limit - (optional) maximum number of partitionings to return, default is 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not important:
Don't we used 10 as the default limit in our functions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change it to 10. I don't think it is important either.

-- has_more - Flag indicating if there are more partitionings available

-- Status codes:
-- 11 - OK
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still see 11. 😉

--
-------------------------------------------------------------------------------
DECLARE
partitionCreateAt TIMESTAMP;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the local variables start with _ by convention - avoids confusion with OUT parameters and column names.

-- Status codes:
-- 11 - OK
-- 41 - Partitioning not found
-- 42 - Ancestor Partitioning not found
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is no need for this status (and error one furthermore). If no ancestors found, it's OK, simple an empty list (particularly with paging).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed as needed

WHERE
PF2.fk_partitioning = i_id_partitioning
AND
P.created_at < partitionCreateAt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this condition?
Actually I think the whole query is incorrect, unfortunately.
It should be

FROM
            flows.partitioning_to_flow PF
                INNER JOIN flows.flows F ON F.id_flow = PF.id_flow
                INNER JOIN runs.partitionings P ON P.id_partitioning = F.fk_primary_partitioning
        WHERE
            PF.fk_partitioning = i_id_partitioning AND
            P.id_partitioning IS DISTINCT FROM i_id_partitioning 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand, but I think that it should be as the current since your query would get the children partitionings as well. Not just the Ancestors. Please let me know if I am mistaken with my analysis on this.

Copy link
Collaborator

@lsulak lsulak Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that you understood the problem, technologies, and I checked your tests and it looks logically correct. Good job!

However, the biggest problem I see is this condition: P.created_at < partitionCreateAt - we cannot count on creation time - parent can be created before or after child creation, so you need to adjust the query

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made this change as discussed, I did not realise there was an added field in the Flows table. That made this very easy. Thank you for the feedback, it has been changed and everything is working as intended.

IN i_offset BIGINT DEFAULT 0,
OUT status INTEGER,
OUT status_text TEXT,
OUT ancestorid BIGINT,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
OUT ancestorid BIGINT,
OUT ancestor_id BIGINT,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

assert(row.getLong("ancestorid").contains(partId1))
assert(returnedPartitioningParsed == expectedPartitioning1)
assert(row.getString("author").contains("Grandpa"))
assert(row.getString("author").contains("Grandpa"))
Copy link
Collaborator

@lsulak lsulak Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also I'd recommend to test whether the result is final, by adding this on the end of this local scope:

assert(!queryResult.next())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in to all tests, thank you

assert(row.getString("status_text").contains("OK"))
assert(row.getLong("ancestorid").contains(partId4))
assert(returnedPartitioningParsed == expectedPartitioning4)
assert(row.getString("author").contains("Grandson"))
Copy link
Collaborator

@lsulak lsulak Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, add the check on finality

Btw, I draw a simple diagram of partitioning hierarchy for this test and I see that this test checks ancestors correctly. For other reviewers, this is the hierarchy - ancestors are requested for Partitioning 5, and ancestors are: 1, 2, 3, 4:

 1 2
 | | 
 3 4  6
 \ |  |
   5  7
   | /
   8

consider adding this 'diagram' into the code comment somewhere for easier understanding

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, you can use .map to iterate over a list of expected results - you are checking 4 rows and the code is almost the same, so can be massively simplified :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you will modify once I have the solution correct. I have also added the diagram in the comments.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, looking forward to it, I'll review once adjusted :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used the map implementation as mentioned.
Take a look, I had to add a few other items to make it work.
I could be missing something or if my method is not appropriate, please let me know.

assert(row.getString("status_text").contains("OK"))
assert(row.getLong("ancestorid").contains(partId7))
assert(returnedPartitioningParsed == expectedPartitioning7)
assert(row.getString("author").contains("Daughter"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here - briefly checked, what you are checking seems to be correct, but can be massively simplified & I'd like you to add the last assert on data finality

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

val Time5 = OffsetDateTime.parse("1992-08-07T10:00:00Z")
val Time6 = OffsetDateTime.parse("1992-08-08T10:00:00Z")
val Time7 = OffsetDateTime.parse("1992-08-09T10:00:00Z")
val Time8 = OffsetDateTime.parse("1992-08-09T11:00:00Z")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would propose to change these times so that they are not incremental - let's say that they are more 'random', let's say that there isn't this nice chronological sequence of partitioning time creation. For example, in real life, parent can be created in later moment than a child.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the entire premise of time creation from the query

ABLL526 and others added 2 commits February 11, 2025 04:27
1. Made the necessary changes as mentioned by the team.
2. Made the necessary changes to the getAncestors Database functionality.
3. Made changes as requested
1. Made the necessary changes as mentioned by the team.
2. Made the necessary changes to the getAncestors Database functionality.
3. Now working completely as intended.
1. Made the necessary changes as mentioned by the team.
2. Made the necessary changes to the getAncestors Database functionality.
3. Now working completely as intended.
1. Made the necessary changes as mentioned by the team.
2. Made the necessary changes to the getAncestors Database functionality.
3. Now working completely as intended.
4. Removed unnecessary files
Comment on lines 67 to 96
SELECT count(*) > i_limit
FROM flows.partitioning_to_flow PTF
WHERE PTF.fk_flow IN (
SELECT fk_flow
FROM flows.partitioning_to_flow
WHERE fk_partitioning = i_id_partitioning
)
LIMIT i_limit + 1 OFFSET i_offset
INTO _has_more;

-- Return the ancestors
RETURN QUERY
SELECT
10 AS status,
'OK' AS status_text,
P.id_partitioning AS ancestor_id,
P.partitioning AS partitioning,
P.created_by AS author,
_has_more AS has_more
FROM
flows.partitioning_to_flow PF
INNER JOIN flows.flows F ON F.id_flow = PF.fk_flow
INNER JOIN runs.partitionings P ON P.id_partitioning = F.fk_primary_partitioning
WHERE
PF.fk_partitioning = i_id_partitioning AND
P.id_partitioning IS DISTINCT FROM i_id_partitioning
GROUP BY P.id_partitioning
ORDER BY P.id_partitioning
LIMIT i_limit
OFFSET i_offset;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's talk about these queries 😉

Comment on lines 99 to 103
IF NOT FOUND THEN
status := 10;
status_text := 'OK';
RETURN NEXT;
END IF;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While totally valid if agreed upon in the DB-app contract, I think this would cause more trouble then good. Again, will happily explain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check the documentation to return an unexpected code. Or an empty answer.

Comment on lines +205 to +213
// Testing for return of the Ancestors for a given Partition ID
//
// 1(Grandma) 2(Grandpa)
// | |
// 3(Mother) 4(Father) 6(Daughter)
// \ | |
// 5(Son) 7(Granddaughter)
// | /
// 8(Grandson)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the diagram 😎

test("Returns Ancestors for a given Partition ID"){
val partitioningID1 = function(createPartitioningFn)
.setParam("i_partitioning", partitioning1)
.setParam("i_by_user", "Grandma")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting the "family tree position" here is smart too. 👍

var returnedPartitioningParsed = parse(returnedPartitioning.value)
.getOrElse(fail("Failed to parse returned partitioning"))
//Used breakable to be able to break the loop
breakable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the need for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the breakable test

val row = queryResult.next()
assert(row.getInt("status").contains(41))
assert(row.getString("status_text").contains("Partitioning not found"))
assert(row.getJsonB("ancestor_id").isEmpty)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing null values is fine for test purposes, in reality it can have data. Therefore just testing the error codes is good enough.

//Second Failure Test: Ancestor Partitioning not found
test("Ancestor Partitioning not found") {

val partitioningID1 = function(createPartitioningFn)
Copy link
Contributor Author

@ABLL526 ABLL526 Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just call the function. Make the assumption it works. It helps with shortening code and maintenance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having the data preparation in one function.

Adjustments to the SQL and added in status code 14.
Adjustments to the tests, made it more readable and shorter.
assert(row.getInt("status").contains(10))
assert(row.getString("status_text").contains("OK"))
assert(row.getLong("ancestor_id").contains(v._1))
assert(returnedPartitioningParsed == v._2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can probably expand v as pattern match instead of ._1 and ._2, just an idea

assert(row.getLong("ancestor_id").contains(v._1))
assert(returnedPartitioningParsed == v._2)
assert(row.getString("author").contains(k))
if (!queryResult.hasNext) break()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a very non-functional, procedural-like code, which is, no matter the style, very non-deterministic. I would prepare exact expected parents and loop over them, not like this

assert(queryResult.hasNext)
row = queryResult.next()
returnedPartitioning = row.getJsonB("partitioning").get
returnedPartitioningParsed = parse(returnedPartitioning.value)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also here you are not really checking the expected & returned ancestors. This test basically doesn't test whether all expected ancestors are being returned, it really just checks P1, then if the query didn't return anything it breaks from the check loop.

So this test doesn't prove that your DB function really works and therefore it really needs improvement

Copy link
Collaborator

@lsulak lsulak Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having partial-only expected results (your case) is definitely not a good practice

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the last 2 tests are OK, but this one seriously needs improvement

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I have removed all the breakable loops. I have then added a set case not partial-only expected results. Please check if this is a better practice.

Made amendments to test code mentioned by PR.
- Removed Breakable tests
- Added a set case and removed the partial test case.
benedeki
benedeki previously approved these changes Mar 3, 2025
Copy link
Contributor

@benedeki benedeki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By returning 14, you IMHO chose the more complicated path, but it's a valid one 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DB Issues touching the Database part of the project enhancement New feature or request good first issue Good for newcomers Server Issues touching the server part of the project
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GET /partitionings/{partId}/parents -> returns all ancestors, not just direct ones
4 participants