Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[nbhatt] Fix JSON parsing to handle multi-line AddFile objects in delta-sharing response #592

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
28 changes: 25 additions & 3 deletions python/delta_sharing/rest_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -387,14 +387,36 @@ def list_files_in_table(
lines=[line for line in lines],
)
else:
protocol_json = json.loads(next(lines))
metadata_json = json.loads(next(lines))
def parse_json_stream(lines):
import json

buffer = ''
decoder = json.JSONDecoder()
for line in lines:
buffer += line.strip()

while buffer:
try:
# Attempt to decode a JSON object from the buffer
obj, idx = decoder.raw_decode(buffer)
json_str = buffer[:idx]
yield json_str
buffer = buffer[idx:].lstrip()
except json.JSONDecodeError:
# Incomplete JSON data; read more lines
break
parsed_lines = parse_json_stream(lines)
protocol_json = json.loads(next(parsed_lines))
metadata_json = json.loads(next(parsed_lines))

return ListFilesInTableResponse(
delta_table_version=int(headers.get("delta-table-version")),
protocol=Protocol.from_json(protocol_json["protocol"]),
metadata=Metadata.from_json(metadata_json["metaData"]),
add_files=[AddFile.from_json(json.loads(file)["file"]) for file in lines],
add_files=[
AddFile.from_json(json.loads(file)["file"])
for file in parsed_lines
],
lines=[]
)

Expand Down
Loading