fix: checks for s3 pre-signed buckets #367

elena-kalinina · 2025-10-15T09:53:16Z

Pdf document loader had checks in place to determine if a url is a presigned url. However, this check was not working, first and foremost, because the regex was not capturing the s3 regex correctly. The presinged url failed the check and was processed as a normal url, which resulted in OSError: filename too long. However, just fixing the url would not allow to distinguish between public and presigned s3 buckets. I rewrote the method to correctly determine whether a url is specifically a presigned bucket (to be further processed accordingly).

In my previous commit, I fixed the regex that did not capture s3 bucket url structure and failed to distinguish presigned urls. However, I realized that just fixing the regex is not enough as now it does not distinguish between public and presigned s3 buckets. so I introduced an improved check that only filters presigned buckets.

fix: check for presigned url

elena-kalinina added 3 commits October 15, 2025 11:47

Merge pull request #1 from elena-kalinina/elena-kalinina-patch-1

56d139e

fix: check for presigned url

Merge branch 'main' into main

68c31f9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: checks for s3 pre-signed buckets #367

fix: checks for s3 pre-signed buckets #367

elena-kalinina commented Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix: checks for s3 pre-signed buckets #367

Are you sure you want to change the base?

fix: checks for s3 pre-signed buckets #367

Conversation

elena-kalinina commented Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant