-
Notifications
You must be signed in to change notification settings - Fork 74
Reusage schemas fix #1252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reusage schemas fix #1252
Conversation
…king. Changed behavior for nested schema comparison when strictlyEqualNestedSchemas == true
…n change state upon recursive calls
b751010
to
a0c3f2e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems good, but it's quite hard to review with many stylistic / non-functional changes :(
public fun compare(other: DataFrameSchema, strictlyEqualNestedSchemas: Boolean = false): CompareResult | ||
public fun compare( | ||
other: DataFrameSchema, | ||
comparisonMode: ComparisonMode = STRICT_FOR_NESTED_SCHEMAS, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think LENIENT should be default for more visibility - easier to see where our "special" codegen mode STRICT_FOR_NESTED_SCHEMAS handling is used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree :)
internal fun compareStrictlyEqualNestedSchemas(other: ColumnSchema): CompareResult = compare(other, true) | ||
|
||
private fun compare(other: ColumnSchema, strictlyEqualNestedSchemas: Boolean): CompareResult { | ||
public fun compare(other: ColumnSchema, comparisonMode: ComparisonMode = STRICT_FOR_NESTED_SCHEMAS): CompareResult { | ||
if (kind != other.kind) return CompareResult.None | ||
if (this === other) return CompareResult.Equals | ||
return when (this) { | ||
is Value -> compare(other as Value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems odd that comparison mode is not used here. How you tell the difference between nullable and non-nullable column?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, I had it at one point, but it wasn't in the implementation before... Let's see what breaks if I add it back
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, the same behavior was achieved by if (comparison != Equals && comparisonMode == STRICT) None else comparison
in the other file, but I improved it now so ColumnSchema.Value
now also has a comparisonMode
argument and the "strictness increase" is better explained.
Sorry, I just had no clue what was going on before refactoring, let alone debug how it should behave. Hopefully the new approach expresses the intention behind the code better :) |
Fixes #1222 by replacing the
strictlyEqualNestedSchemas
parameter by a more explicitComparisonMode
.Comparing two schemas in
LENIENT
mode can returnIsSuper
,IsDerived
,IsEqual
, orNone
.Compating in
STRICT
mode can only returnIsEqual
orNone
, because the schema's need to exactly match.STRICT_FOR_NESTED_SCHEMAS
works inLENIENT
mode for the top-level, butSTRICT
for nested schemas. This is often used in Jupyter notebooks, to prevent nested types from extending each other and thus avoid a potential comparison explosion. (There could be a lot of nested types)Also, added documentation everywhere
Requires tiny patch in the compiler plugin