Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[nbhatt] Fix JSON parsing to handle multi-line AddFile objects in delta-sharing response #592

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

nbhatt-atlassian
Copy link

@nbhatt-atlassian nbhatt-atlassian commented Oct 9, 2024

This issue was found when one of our customers couldn't do a simple

import delta_sharing

share_file_path = "<share_path>"
table_url = f"{share_file_path}#<table>"

# Attempt to load the table into a pandas DataFrame
df = delta_sharing.load_as_pandas(table_url)

The delta-sharing library is encountering JSONDecodeError exceptions when attempting to parse JSON data from a server response. The issue arises because the client code assumes that each line in the response corresponds to a complete JSON object. However, the JSON data contains embedded newline characters within string values, causing JSON objects to be split across multiple lines. As a result, parsing each line individually fails, leading to JSONDecodeError exceptions.


I wasn't able to figure out how to run Integration Tests locally but if someone could point me in the right direction here so I can write tests for this - please let me know :)

@nbhatt-atlassian nbhatt-atlassian marked this pull request as ready for review October 9, 2024 22:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant