Skip to content

Commit

Permalink
fit: Specify encoding when writing output file (#214)
Browse files Browse the repository at this point in the history
Specify encoding when writing output file to avoid errors when default target encoding doesn't have all characters. utf8 seems like the most universal and supported encoding. Otherwise, the cli fails with encoding errors when input file contains unicode text (basically most files nowadays) and the target system has default encoding set to some one-byte charset like cp1252

Signed-off-by: Johnny Salazar <[email protected]>
  • Loading branch information
cepera-ang authored Nov 4, 2024
1 parent 8fb445f commit af323c0
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions docling/cli/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,28 +90,28 @@ def export_documents(
# Export Deep Search document JSON format:
if export_json:
fname = output_dir / f"{doc_filename}.json"
with fname.open("w") as fp:
with fname.open("w", encoding="utf8") as fp:
_log.info(f"writing JSON output to {fname}")
fp.write(json.dumps(conv_res.document.export_to_dict()))

# Export Text format:
if export_txt:
fname = output_dir / f"{doc_filename}.txt"
with fname.open("w") as fp:
with fname.open("w", encoding="utf8") as fp:
_log.info(f"writing Text output to {fname}")
fp.write(conv_res.document.export_to_markdown(strict_text=True))

# Export Markdown format:
if export_md:
fname = output_dir / f"{doc_filename}.md"
with fname.open("w") as fp:
with fname.open("w", encoding="utf8") as fp:
_log.info(f"writing Markdown output to {fname}")
fp.write(conv_res.document.export_to_markdown())

# Export Document Tags format:
if export_doctags:
fname = output_dir / f"{doc_filename}.doctags"
with fname.open("w") as fp:
with fname.open("w", encoding="utf8") as fp:
_log.info(f"writing Doc Tags output to {fname}")
fp.write(conv_res.document.export_to_document_tokens())

Expand Down

0 comments on commit af323c0

Please sign in to comment.