Reusage schemas fix #1252

Jolanrensen · 2025-06-13T13:16:11Z

Fixes #1222 by replacing the strictlyEqualNestedSchemas parameter by a more explicit ComparisonMode.

Comparing two schemas in LENIENT mode can return IsSuper, IsDerived, IsEqual, or None.
Compating in STRICT mode can only return IsEqual or None, because the schema's need to exactly match.
STRICT_FOR_NESTED_SCHEMAS works in LENIENT mode for the top-level, but STRICT for nested schemas. This is often used in Jupyter notebooks, to prevent nested types from extending each other and thus avoid a potential comparison explosion. (There could be a lot of nested types)

Also, added documentation everywhere

Requires tiny patch in the compiler plugin

…king. Changed behavior for nested schema comparison when strictlyEqualNestedSchemas == true

…n change state upon recursive calls

koperagen

Seems good, but it's quite hard to review with many stylistic / non-functional changes :(

koperagen · 2025-06-16T11:29:43Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/schema/DataFrameSchema.kt

-    public fun compare(other: DataFrameSchema, strictlyEqualNestedSchemas: Boolean = false): CompareResult
+    public fun compare(
+        other: DataFrameSchema,
+        comparisonMode: ComparisonMode = STRICT_FOR_NESTED_SCHEMAS,


i think LENIENT should be default for more visibility - easier to see where our "special" codegen mode STRICT_FOR_NESTED_SCHEMAS handling is used

koperagen · 2025-06-16T11:35:20Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/schema/ColumnSchema.kt

-    internal fun compareStrictlyEqualNestedSchemas(other: ColumnSchema): CompareResult = compare(other, true)
-
-    private fun compare(other: ColumnSchema, strictlyEqualNestedSchemas: Boolean): CompareResult {
+    public fun compare(other: ColumnSchema, comparisonMode: ComparisonMode = STRICT_FOR_NESTED_SCHEMAS): CompareResult {
        if (kind != other.kind) return CompareResult.None
        if (this === other) return CompareResult.Equals
        return when (this) {
            is Value -> compare(other as Value)


Seems odd that comparison mode is not used here. How you tell the difference between nullable and non-nullable column?

That's true, I had it at one point, but it wasn't in the implementation before... Let's see what breaks if I add it back

Ah, the same behavior was achieved by if (comparison != Equals && comparisonMode == STRICT) None else comparison in the other file, but I improved it now so ColumnSchema.Value now also has a comparisonMode argument and the "strictness increase" is better explained.

Jolanrensen · 2025-06-16T12:11:40Z

Seems good, but it's quite hard to review with many stylistic / non-functional changes :(

Sorry, I just had no clue what was going on before refactoring, let alone debug how it should behave. Hopefully the new approach expresses the intention behind the code better :)

Jolanrensen added 4 commits June 13, 2025 15:06

added test for Issue #1222

2b9d951

#1222: refactor, to understand how DataFrameSchemaImpl.compare is wor…

30540af

…king. Changed behavior for nested schema comparison when strictlyEqualNestedSchemas == true

replaced boolean strictlyEqualNestedSchemas by comparisonMode that ca…

18ba918

…n change state upon recursive calls

added data schema comparison test

cad6b07

Jolanrensen marked this pull request as ready for review June 16, 2025 11:11

Jolanrensen requested a review from koperagen June 16, 2025 11:12

changing testUtil to new ComparisonMode setting

a0c3f2e

Jolanrensen force-pushed the reusage-schemas-fix branch from b751010 to a0c3f2e Compare June 16, 2025 11:21

koperagen requested changes Jun 16, 2025

View reviewed changes

Jolanrensen added 3 commits June 16, 2025 14:24

setting ComparisonMode to LENIENT by default

03fb5e0

added comparisonMode to ColumnSchema.Value

f37bb2b

linting and api dump

ef4b8b2

Jolanrensen requested a review from koperagen June 16, 2025 14:04

Jolanrensen merged commit 1d6756e into master Jun 17, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reusage schemas fix #1252

Reusage schemas fix #1252

Uh oh!

Jolanrensen commented Jun 13, 2025 •

edited

Loading

Uh oh!

koperagen left a comment

Uh oh!

koperagen Jun 16, 2025

Uh oh!

Jolanrensen Jun 16, 2025

Uh oh!

koperagen Jun 16, 2025

Uh oh!

Jolanrensen Jun 16, 2025

Uh oh!

Jolanrensen Jun 16, 2025

Uh oh!

Jolanrensen commented Jun 16, 2025

Uh oh!

Uh oh!

Uh oh!

Reusage schemas fix #1252

Reusage schemas fix #1252

Uh oh!

Conversation

Jolanrensen commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

koperagen left a comment

Choose a reason for hiding this comment

Uh oh!

koperagen Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

Jolanrensen Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

koperagen Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

Jolanrensen Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

Jolanrensen Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

Jolanrensen commented Jun 16, 2025

Uh oh!

Uh oh!

Uh oh!

Jolanrensen commented Jun 13, 2025 •

edited

Loading