-
Notifications
You must be signed in to change notification settings - Fork 73
add nulls parse option in readExcel #1247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -255,6 +271,8 @@ public fun DataFrame.Companion.readExcel( | |||
* when set to false, it operates as [NameRepairStrategy.MAKE_UNIQUE], | |||
* ensuring unique column names will make the columns be named according to excel columns, like "A", "B", "C" etc. | |||
* for unstructured data. | |||
* @param parseEmptyAsNull when set to true, empty strings in cells are parsed as null (default true). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There may be more representations in strings that can refer to null
. maybe it might be better to follow readCsv and other parsing options for this. It will be overkill to add the full ParserOptions as argument (since datetime and locales are already handled by excel), but csv, for instance uses nullStrings: Set<String>= setOf("", "NA", "N/A", "null", "NULL", "None", "none", "NIL", "nil")
, so it might make sense to add a nullStrings
argument here as well, probably with fewer defaults
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created an issue:
#1248
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think is a good idea to introduce a new non-trivial logic just before release, need more time for implementation and testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay sure, makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
speaking off, this is a binary-breaking change. Since we released a beta already, we cannot break the API like that without a fallback mechanism. Could you add the old function signatures back with @Deprecated()
at level HIDDEN? That way, we won't break libraries calling the old function :)
@@ -59,6 +59,9 @@ public class Excel : SupportedDataFrameFormat { | |||
DefaultReadExcelMethod(pathRepresentation) | |||
} | |||
|
|||
private const val MESSAGE_1_1 = "Will be ERROR in 1.1." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HIDDEN >= ERROR already. You can remove this message or say that they are removed in 1.1
rowsCount, | ||
nameRepairStrategy, | ||
firstRowIsHeader, | ||
parseEmptyAsNull, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My 5 cents: how about enum parameter "emptyStringValue" with NULL, KEEP_AS_STRING options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or, if we're looking to extend this with other null strings #1248, it could be some sort of sealed interface (String) -> String? with companion object { val NULL = ...; val KEEP_AS_STRING ... }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like something like this too, but it would deviate even more for how we handle parsing String -> null
with parse()
, ParserOptions
, and readCsv()
. They now take a simple set of null-representing strings.
Maybe we could change it all to the same system at some point, but until then I'd keep this one as simple as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about enum then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an enum could work too indeed, as long as we can make a simple @Deprecated
ReplaceWith
eventually, I'm fine with it :)
Fixes #1166.