Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TransportError for large files #511

Open
robvandijk opened this issue Jan 23, 2025 · 1 comment
Open

TransportError for large files #511

robvandijk opened this issue Jan 23, 2025 · 1 comment
Assignees
Labels
bug High priority issue for (blocking) problems

Comments

@robvandijk
Copy link
Contributor

A large PDF was processed by pdftotext, producing a ~170MB text file to be uploaded to ElasticSearch.

This resulted in a TransportError 413 "Request Entity Too Large". The default for uploads to ES is 100MB.

The same file was processed offline using pymupdf4llm (the new PDF parser that will be used for the re-indexing), producing just a ~100kB file.

@joepio I propose not to mess with the upload default maximum of 100MB and accept that this will fail occasionally in the current production version. The problem is not present in the re-indexing branch and so will be gone after swapping the machines after re-indexing.

@robvandijk robvandijk added the bug High priority issue for (blocking) problems label Jan 23, 2025
@joepio
Copy link
Contributor

joepio commented Jan 23, 2025

Sounds good! I think we can close this in that case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug High priority issue for (blocking) problems
Projects
None yet
Development

No branches or pull requests

2 participants