Skip to content

resolve_source_as_path always copies to temp #185

Open
@J08nY

Description

@J08nY

As implemented in #79 the resource_source_as_path now always copies the source file into a (potentially temporary) workdir, even if the file is a local file and already has a path. This means that when one runs docling *.pdf with a lot of pdfs in the current working directory, docling spends a lot of time copying them over to a temporary directory. In one of my use cases the /var/tmp has per-user quotas and docling processing with a lot of PDFs has reached that quota and errored-out. I believe the default should be to not do a copy, or at least, there should be a way to request not doing a copy (via the CLI and also the library).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions