Skip to content

Conversation

@AndreiKingsley
Copy link
Collaborator

Closes #1387.

@AndreiKingsley AndreiKingsley marked this pull request as ready for review November 18, 2025 18:38
Copy link
Collaborator

@Jolanrensen Jolanrensen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I ever made any KoDEx proficiency awards, you'd have earned one by now :D.

Great docs! Just some small notes: Check @ExcludeFromSources to not grow the .jar size. Anything that's not reachable by the user and just used to @include somewhere else can be excluded.

Adding a String example inside the DSLs would be nice

* such columns are treated as distinct. Such a column from the right [DataFrame] will be automatically
* renamed in the resulting [DataFrame].
*/
private interface JoinBehavior
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ExcludeFromSources, check other places too :)

private interface JoinBehavior

/**
* Joins this [DataFrame] with [other][\other] [DataFrame] using selected key columns.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*the other, the selected key columns

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check other places too

* // Join rows where the `date` value in the right DataFrame
* // falls within the interval defined by the `startDate` and `endDate`
* // values in the left DataFrame.
* dfLeft.{@get [JoinWithMethod] joinWith}(dfRight) { right.date in startDate..endDate }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as the other example, can you add another with Strings?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it looks really awful in this DSL

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still, it's important for people to see it's possible

joinWith(right, JoinType.Full, joinExpression)

/**
* Performs [filter join][JoinType.Filter] of this [DataFrame] with the [right][\right] [DataFrame]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering... Is "Performs a filter join of this DataFrame with the right DataFrame" better? Depends whether we think of "join" as a special concept or as a generic action. You can say "I performed a cleanup", but at the same time "I performed gradlew clean".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, a good question!
I believe you’re right — “performs a join” is better since join is used here as a technical operation rather than just an action.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check correct usage of "a" vs "an" :)

filterJoin(other) { columns.toColumnSet() }

/**
* Performs a [exclude join][JoinType.Exclude] of this [DataFrame] with [other][\other] [DataFrame]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*an

): DataFrame<A> = join(other, type) { columns.toColumnSet() }

/**
* Performs a [inner join][JoinType.Inner] of this [DataFrame] with the [other][\other] [DataFrame]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*an

private interface JoinDocs

// `join` method used in the example
@Suppress("ktlint:standard:class-naming")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you @Suppress("ClassName") you suppress both intelliJ and KtLint at the same time :)

*
* In such cases, columns with duplicate names are automatically renamed
* using the pattern `"\$name\$n"`, where `name` is the original column name
* and `n` is a unique index (1, 2, 3, and so on); the first name column goes without a number.
Copy link
Collaborator

@Jolanrensen Jolanrensen Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the first time the name of the column is encountered, no number is appended.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean "not added"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"appending" is adding something at the end

* dfLeft.{@get [JoinMethod] join}(dfRight, "name", "city")
* ```
*/
private interface JoinStringApiExample
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exclude from sources

leftJoin(other) { columns.toColumnSet() }

/**
* Performs a [right join][JoinType.Right] of this [DataFrame] with [other][\other] [DataFrame]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*the other


/**
* Performs a [right join][JoinType.Right] of this [DataFrame] with [other][\other] [DataFrame]
* using selected key columns.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the selected key columns

* Each join type has a corresponding shortcut function:
* [innerJoinWith], [leftJoinWith], [rightJoinWith], [fullJoinWith], [filterJoinWith], and [excludeJoinWith].
*
* See also [join], which performs a join by exact values equality in the selected columns.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*exact value equality?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like "I sort my directory by file size" not "files size". It's an attribute of a single file, so it should be in singular form. (In Dutch/German it would be 1 word)

It's a bit strange with equality, I agree, as equality can only be expressed between two values. So maybe it could be better written like "which performs a join based on the exact equality between values in the selected columns"

): DataFrame<A> = joinWithImpl(right, type, addNewColumns = type.addNewColumns, joinExpression)

/**
* Performs a [inner join][JoinType.Inner] of this [DataFrame] with the [right][\right] [DataFrame]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*an

*
* // String API; join rows where `score` value in the left `DataFrame` is higher than 3.4
* // and the `passed` value in the right `DataFrame` is `true`.
* dfLeft.{@get [JoinWithMethod] joinWith}(dfRight) { "score"<Int>() > 3.4 && right["passed"] as Boolean }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

You can also avoid the cast like this by writing:
right[{ "passed"<Boolean>() }]
or
right.getValue<Boolean>("passed")
whatever you like best :)

please also add this example to the "normal" join

@Jolanrensen Jolanrensen self-requested a review November 21, 2025 11:17
@AndreiKingsley AndreiKingsley merged commit a98b926 into master Nov 21, 2025
5 checks passed
@AndreiKingsley AndreiKingsley deleted the join_kdocs branch November 21, 2025 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add KDocs for Join and JoinWith

3 participants