Skip to content

Fix schema/type inference issue #261

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 22 commits into from

Conversation

HyukjinKwon
Copy link
Member

#244
This is re-opened due to the build failure in travis.

@tanwanirahul
Copy link
Contributor

@HyukjinKwon @falaki Do you want me to raise new PR with a single commit? That will make sure we have a clean commit history?

@codecov-io
Copy link

Current coverage is 85.74%

Merging #261 into master will decrease coverage by -0.33% as of d0a91c5

@@            master    #261   diff @@
======================================
  Files           12      12       
  Stmts          517     519     +2
  Branches       149     149       
  Methods          0       0       
======================================
  Hit            445     445       
  Partial          0       0       
- Missed          72      74     +2

Review entire Coverage Diff as of d0a91c5

Powered by Codecov. Updated on successful CI builds.

@HyukjinKwon
Copy link
Member Author

@tanwanirahul dev/merge_pr.py will make this as a single commit log as PRs are not merged by simply pushing the merge button and these commit logs will only stay in comments in a single commit log at the end with a single author.

This is why I said here #244 (comment), the author might have to be you.

@@ -42,7 +42,11 @@ private[csv] object InferSchema {
mergeRowTypes)

val structFields = header.zip(rootTypes).map { case (thisHeader, rootType) =>
StructField(thisHeader, rootType, nullable = true)
val dType = rootType match {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, does this not produce Nulltype after merging Nulltypes?
It looks the test Merging Nulltypes should yeild Nulltype is not covering this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon Please elaborate. Referring to doc comments in InferSchema.scala

/**

  • Similar to the JSON schema inference.
  • [[org.apache.spark.sql.execution.datasources.json.InferSchema]]
  • 1. Infer type of each row
    
  • 2. Merge row types to find common type
    
  • 3. Replace any null types with string type
    
    */

All null types should be replaced with String types at the end. This is what is happening right now?
If I understand you correctly, you feel there is a need for some more tests to be added. Could you please mention what that test is suppose to test?

As far as testing merging two nulltypes returning null type, is already covered.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry. Please ignore my comment.

@HyukjinKwon
Copy link
Member Author

@tanwanirahul Well, but If you need to change some codes here, then I think you better create another PR maybe. (And please close the previous PR)

@tanwanirahul
Copy link
Contributor

I will create a JIRA on Spark and submit a patch once this is merged.

@falaki
Copy link
Member

falaki commented Feb 13, 2016

Thanks. This looks good and passes all tests. Merging it now.

@falaki falaki closed this in 8704b26 Feb 13, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants