Perceptor is a multi modal large language model (LLM) focused on extracting information from document images or text.
Get your API token by signing in and go to the API tab.
import perceptor_client_lib.perceptor as perceptor
import asyncio
perceptor_client = perceptor.Client(api_key="API_TOKEN_HERE", request_url="https://perceptor-api.tamed.ai/1/model/")
context = """
Greetings traveler, my name is Perceptor. I was born in 2022 and I generate text from text or image context. Therefore, I am able to extract information from documents. Just ask.
"""
instructions = [
"When was Perceptor born?",
"What can Perceptor do?",
]
result = asyncio.run(perceptor_client.ask_text(context, instructions=instructions, request_parameters=perceptor.PerceptorRequest(flavor="original")))
for instruction_result in result:
if instruction_result.is_success:
print(f"Q: '{instruction_result.instruction}'\nA: '{instruction_result.response}'\n------")
else:
print(f"For question '{instruction_result.instruction}' the following error occurred: {instruction_result.error_text}")
Setup a virtual environment and run the following command to install the latest version:
pip install perceptor-client-lib@git+https://github.com/TamedAI/perceptor-client-lib-py
- (optional) Poppler: If you want to use pdf processing functionality, follow this instructions to install popppler on your machine. On Windows, if the poppler "bin" path is not added to PATH, you have to set the environment variable POPPLER_PATH to point to bin.
perceptor_client = perceptor.Client(api_key="your_key",request_url="request_url")
It is also possible to create Client without specifying any parameters. In that case, following environment variables will be used automatically:
- TAI_PERCEPTOR_BASE_URL for api url
- TAI_PERCEPTOR_API_KEY for api key
If no configuration parameters are specified and the above mentioned env variables are missing, a ValueError exception will be raised.
Parameters are specified via PerceptorRequest class. The structure of PerceptorRequest:
- flavor specifies request "flavor", for example "original". It's a mandatory value and has to be specified. You can find more information about flavors here.
- params is a dictionary of additional generation parameters, for example:
{
"temperature": 0.01,
"topK": 10,
"topP": 0.9,
"repetitionPenalty": 1,
"lengthPenalty": 1,
"penaltyAlpha": 1,
"maxLength": 512
}
- return_scores controls the access to confident scores:
request = PerceptorRequest(flavor="original", return_scores=True)
The perceptor client supports async access.
result = await perceptor_client.ask_text("text_to_process", instructions=["Question 1?"], request_parameters=request)
result = await perceptor_client.ask_text("text_to_process",
instructions=[
"Question 1?",
"Question 2",
],
request_parameters=request)
for instruction_result in result:
if instruction_result.is_success:
print(f"question '{instruction_result.instruction}', answer: '{instruction_result.response['text']}'")
else:
print(f"for question '{instruction_result.instruction}' following error occurred: {instruction_result.error_text}"
Following image formats are supported: "jpg", "png".
From image path:
result = await perceptor_client.ask_image("path_to_image_file",
instructions=[
"Question 1?",
"Question 2",
],
request_parameters=request)
for instruction_result in result:
if instruction_result.is_success:
print(f"question '{instruction_result.instruction}', answer: '{instruction_result.response['text']}'")
else:
print(f"for question '{instruction_result.instruction}' following error occurred: {instruction_result.error_text}"
...or from image file:
reader = open("image_path", 'rb')
with reader:
result = await perceptor_client.ask_image(reader,
instructions=[
"Question 1?",
"Question 2",
], file_type='jpg',
request_parameters=request)
for instruction_result in result:
if instruction_result.is_success:
print(f"question '{instruction_result.instruction}', answer: '{instruction_result.response['text']}'")
else:
print(f"for question '{instruction_result.instruction}' following error occurred: {instruction_result.error_text}"
...or from bytes:
reader = open(_image_path, 'rb')
with reader:
content_bytes = reader.read()
result = await perceptor_client.ask_image(content_bytes,
instructions=[
"Question 1?",
"Question 2",
], file_type='jpg',
request_parameters=request)
for instruction_result in result:
if instruction_result.is_success:
print(f"question '{instruction_result.instruction}', answer: '{instruction_result.response['text']}'")
else:
print(f"for question '{instruction_result.instruction}' following error occurred: {instruction_result.error_text}"
Table queries can be performed as following:
result = await perceptor_client.ask_table_from_image("path_to_image_file",
instruction="GENERATE TABLE Column1, Column2, Column3 GUIDED BY Column3",
request_parameters=request
)
From document file:
result = await perceptor_client.ask_document("path_to_document_file",
instructions=[
"Question 1?",
"Question 2",
],
request_parameters=request)
if result.is_success:
print(f"question '{result.instruction}', answer: '{result.response['text']}'")
else:
print(f"for question '{result.instruction}' following error occurred: {result.error_text}")
result = await perceptor_client.classify_text("text_to_process",
instruction="What kind of document is it?",
classes=["invoice", "application"],
request_parameters=request)
if result.is_success:
print(f"question '{result.instruction}', answer: '{result.response['scores']}'")
else:
print(f"for question '{result.instruction}' following error occurred: {result.error_text}")
Basic class containing the processing result is InstructionWithResult (see here).
It contains following properties:
instruction contains the original instruction text
is_success set to True if the query was successful
response is a dictionary containing at least "text" element (with actual response text) and may contain additional values (for example scores).
NOTE: the "classify..." methods return empty "text" and "scores" corresponding to the specified classes.
error_text error text (if error occurred)
Following methods return the list of InstructionWithResult instances:
ask_text
ask_image
Following method(s) return single InstructionWithResult instance:
ask_table_from_image
classify_text
classify_image
classify_document
classify_document_images
Following methods query multiple images (document images), hence return the list of DocumentImageResult instances, containing,
beside the InstructionWithResult list, also the original page info:
ask_document
ask_document_images
ask_table_from_document
ask_table_from_document_images
If you use the methods returning the list of DocumentImageResult and need to have the responses grouped by instruction rather than page, you can use the provided utility function to map the response:
mapped_result = group_by_instruction(result)
for instruction_result in mapped_result:
print(f"""instruction: {instruction_result.instruction}""")
for page_result in instruction_result.page_results:
if page_result.is_success:
answer = page_result.response
else:
answer = f"error: {page_result}"
print(f"""
page: {page_result.page_number},
answer: {answer}
""")
A flavor is the specialization of the Perceptor model for a specific instruction set. Usually, a flavor is created for a specific document type (e.g. invoices). With a custom flavor the model performance can be increased dramatically. At the same time, confidence scores are better aligned with correct answers.
request = perceptor.PerceptorRequest(flavor="MY_FLAVOR")
result = perceptor_client.ask_text("context text...", instructions=["Question 1", "Question 2"], request_parameters=request)
To create your own flavor you will need a small dataset (>50 documents) together with your instruction set and the correct answers. Upload your dataset here and reach out to us to access your custom flavor.