-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
extract_data() from PDF doesn't work #302
Comments
@andrie I've never seen that error in my fork. For example: library(ellmer)
pdf <- content_pdf_url("https://cran.r-project.org/web/packages/ellmer/ellmer.pdf")
chat <- chat_claude()
schema <- type_object(
package_name = type_string("The name of the R package"),
authors = type_array("The authors of the R package", items = type_string())
)
chat$extract_data(pdf, type = schema)
Is that specific to Bedrock? I know getting extract data to work for Gemini from PDFs was a little tricky, it requires an additional text prompt: library(ellmer)
pdf <- content_pdf_url("https://cran.r-project.org/web/packages/ellmer/ellmer.pdf")
chat <- chat_gemini()
schema <- type_object(
package_name = type_string("The name of the R package"),
authors = type_array("The authors of the R package", items = type_string())
)
chat$extract_data("Extract data from this PDF", pdf, type = schema) |
From some testing I think ellmer may also need to generate unique document names during JSON serialization for Bedrock. And to the second issue, I think Bedrock may require a text prompt as well. Maybe there is some way to signal a more informative error in that case. |
I can confirm that passing the prompt into bedrock made a difference. Thank you for the hint. And as @atheriel mentioned, the unique document name occurs when you upload the same PDF twice in a chat session. But further experimentation revealed that I can send an |
Should be fixed now in #265 |
PR #265 adds support for PDF in claude, and #301 fixes a missing
as_json()
method for using PDF in AWS Bedrock.This means I can successfully extract information from a PDF when using a custom prompt. For example, this pseudocode works:
However,
extract_data()
throws an error:results in:
cc @atheriel
The text was updated successfully, but these errors were encountered: