Skip to content

Commit 82baf42

Browse files
Merge pull request #16 from Zipstack/url-corrections
feat: Base url changes and README.md changes
2 parents 0075b70 + abaf736 commit 82baf42

File tree

3 files changed

+30
-133
lines changed

3 files changed

+30
-133
lines changed

README.md

+6-123
Original file line numberDiff line numberDiff line change
@@ -7,135 +7,18 @@
77

88
LLMs are powerful, but their output is as good as the input you provide. LLMWhisperer is a technology that presents data from complex documents (different designs and formats) to LLMs in a way that they can best understand. LLMWhisperer features include Layout Preserving Mode, Auto-switching between native text and OCR modes, proper representation of radio buttons and checkboxes in PDF forms as raw text, among other features. You can now extract raw text from complex PDF documents or images without having to worry about whether the document is a native text document, a scanned image or just a picture clicked on a smartphone. Extraction of raw text from invoices, purchase orders, bank statements, etc works easily for structured data extraction with LLMs powered by LLMWhisperer's Layout Preserving mode.
99

10-
Refer to the client documentation for more information: [LLMWhisperer Client Documentation](https://docs.unstract.com/llmwhisperer/index.html)
10+
Refer to the client documentation for more information: [LLMWhisperer Client Documentation](https://docs.unstract.com/llmwhisperer/llm_whisperer/python_client/llm_whisperer_python_client_intro/)
1111

12-
## Features
12+
## A note on versions
1313

14-
- Easy to use Pythonic interface.
15-
- Handles all the HTTP requests and responses for you.
16-
- Raises Python exceptions for API errors.
14+
There are two versions of the client library available in this package:
1715

18-
## Installation
16+
**LLMWhispererClient**: This is the legacy version of the client library and is recommended for supporting older apps only. This version will be deprecated in the future.
1917

20-
You can install the LLMWhisperer Python Client using pip:
18+
**LLMWhispererClientV2**: This is the latest version of the client library and is recommended for all new users. It is mandatory for all users who are using LLMWhisperer API version 2.0.0 and above (All customers who have signed up after 5th November 2024).
2119

22-
```bash
23-
pip install llmwhisperer-client
24-
```
20+
Documentation for both versions are available [here](https://docs.unstract.com/llmwhisperer/)
2521

26-
## Usage
27-
28-
First, import the `LLMWhispererClient` from the `client` module:
29-
30-
```python
31-
from unstract.llmwhisperer.client import LLMWhispererClient
32-
```
33-
34-
Then, create an instance of the `LLMWhispererClient`:
35-
36-
```python
37-
client = LLMWhispererClient(base_url="https://llmwhisperer-api.unstract.com/v1", api_key="your_api_key")
38-
```
39-
40-
Now, you can use the client to interact with the LLMWhisperer API:
41-
42-
```python
43-
# Get usage info
44-
usage_info = client.get_usage_info()
45-
46-
# Process a document
47-
# Extracted text is available in the 'extracted_text' field of the result
48-
whisper = client.whisper(file_path="path_to_your_file")
49-
50-
# Get the status of a whisper operation
51-
# whisper_hash is available in the 'whisper_hash' field of the result of the whisper operation
52-
status = client.whisper_status(whisper_hash)
53-
54-
# Retrieve the result of a whisper operation
55-
# whisper_hash is available in the 'whisper_hash' field of the result of the whisper operation
56-
whisper = client.whisper_retrieve(whisper_hash)
57-
```
58-
59-
### Error Handling
60-
61-
The client raises `LLMWhispererClientException` for API errors:
62-
63-
```python
64-
try:
65-
result = client.whisper_retrieve("invalid_hash")
66-
except LLMWhispererClientException as e:
67-
print(f"Error: {e.message}, Status Code: {e.status_code}")
68-
```
69-
70-
### Simple use case with defaults
71-
72-
```python
73-
client = LLMWhispererClient()
74-
try:
75-
result = client.whisper(file_path="sample_files/restaurant_invoice_photo.pdf")
76-
extracted_text = result["extracted_text"]
77-
print(extracted_text)
78-
except LLMWhispererClientException as e:
79-
print(e)
80-
```
81-
82-
### Simple use case with more options set
83-
We are forcing text processing and extracting text from the first two pages only.
84-
85-
```python
86-
client = LLMWhispererClient()
87-
try:
88-
result = client.whisper(
89-
file_path="sample_files/credit_card.pdf",
90-
processing_mode="text",
91-
force_text_processing=True,
92-
pages_to_extract="1,2",
93-
)
94-
extracted_text = result["extracted_text"]
95-
print(extracted_text)
96-
except LLMWhispererClientException as e:
97-
print(e)
98-
```
99-
100-
### Extraction with timeout set
101-
102-
The platform has a hard timeout of 200 seconds. If the document takes more than 200 seconds to convert (large documents), the platform will switch to async extraction and return a hash. The client can be used to check the status of the extraction and retrieve the result. Also note that the timeout is in seconds and can be set by the caller too.
103-
104-
105-
```python
106-
client = LLMWhispererClient()
107-
try:
108-
result = client.whisper(
109-
file_path="sample_files/credit_card.pdf",
110-
pages_to_extract="1,2",
111-
timeout=2,
112-
)
113-
if result["status_code"] == 202:
114-
print("Timeout occured. Whisper request accepted.")
115-
print(f"Whisper hash: {result['whisper-hash']}")
116-
while True:
117-
print("Polling for whisper status...")
118-
status = client.whisper_status(whisper_hash=result["whisper-hash"])
119-
if status["status"] == "processing":
120-
print("STATUS: processing...")
121-
elif status["status"] == "delivered":
122-
print("STATUS: Already delivered!")
123-
break
124-
elif status["status"] == "unknown":
125-
print("STATUS: unknown...")
126-
break
127-
elif status["status"] == "processed":
128-
print("STATUS: processed!")
129-
print("Let's retrieve the result of the extraction...")
130-
resultx = client.whisper_retrieve(
131-
whisper_hash=result["whisper-hash"]
132-
)
133-
print(resultx["extracted_text"])
134-
break
135-
time.sleep(2)
136-
except LLMWhispererClientException as e:
137-
print(e)
138-
```
13922

14023
## Questions and Feedback
14124

src/unstract/llmwhisperer/__init__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
__version__ = "0.23.0"
1+
__version__ = "2.0.0"
22

33
from .client import LLMWhispererClient # noqa: F401
44
from .client_v2 import LLMWhispererClientV2 # noqa: F401

src/unstract/llmwhisperer/client_v2.py

+23-9
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@
2727

2828
import requests
2929

30-
BASE_URL = "https://llmwhisperer-api.unstract.com/api/v2"
30+
BASE_URL_V2 = "https://llmwhisperer-api.us-central.unstract.com/api/v2"
3131

3232

3333
class LLMWhispererClientException(Exception):
@@ -62,7 +62,9 @@ class LLMWhispererClientV2:
6262
client's activities and errors.
6363
"""
6464

65-
formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
65+
formatter = logging.Formatter(
66+
"%(asctime)s - %(name)s - %(levelname)s - %(message)s"
67+
)
6668
logger = logging.getLogger(__name__)
6769
log_stream_handler = logging.StreamHandler()
6870
log_stream_handler.setFormatter(formatter)
@@ -108,7 +110,7 @@ def __init__(
108110
self.logger.debug("logging_level set to %s", logging_level)
109111

110112
if base_url == "":
111-
self.base_url = os.getenv("LLMWHISPERER_BASE_URL_V2", BASE_URL)
113+
self.base_url = os.getenv("LLMWHISPERER_BASE_URL_V2", BASE_URL_V2)
112114
else:
113115
self.base_url = base_url
114116
self.logger.debug("base_url set to %s", self.base_url)
@@ -281,7 +283,9 @@ def generate():
281283
)
282284
else:
283285
params["url_in_post"] = True
284-
req = requests.Request("POST", api_url, params=params, headers=self.headers, data=url)
286+
req = requests.Request(
287+
"POST", api_url, params=params, headers=self.headers, data=url
288+
)
285289
prepared = req.prepare()
286290
s = requests.Session()
287291
response = s.send(prepared, timeout=wait_timeout, stream=should_stream)
@@ -307,31 +311,41 @@ def generate():
307311
message["extraction"] = {}
308312
return message
309313
if status["status"] == "processing":
310-
self.logger.debug(f"Whisper-hash:{whisper_hash} | STATUS: processing...")
314+
self.logger.debug(
315+
f"Whisper-hash:{whisper_hash} | STATUS: processing..."
316+
)
311317
elif status["status"] == "delivered":
312-
self.logger.debug(f"Whisper-hash:{whisper_hash} | STATUS: Already delivered!")
318+
self.logger.debug(
319+
f"Whisper-hash:{whisper_hash} | STATUS: Already delivered!"
320+
)
313321
raise LLMWhispererClientException(
314322
{
315323
"status_code": -1,
316324
"message": "Whisper operation already delivered",
317325
}
318326
)
319327
elif status["status"] == "unknown":
320-
self.logger.debug(f"Whisper-hash:{whisper_hash} | STATUS: unknown...")
328+
self.logger.debug(
329+
f"Whisper-hash:{whisper_hash} | STATUS: unknown..."
330+
)
321331
raise LLMWhispererClientException(
322332
{
323333
"status_code": -1,
324334
"message": "Whisper operation status unknown",
325335
}
326336
)
327337
elif status["status"] == "failed":
328-
self.logger.debug(f"Whisper-hash:{whisper_hash} | STATUS: failed...")
338+
self.logger.debug(
339+
f"Whisper-hash:{whisper_hash} | STATUS: failed..."
340+
)
329341
message["status_code"] = -1
330342
message["message"] = "Whisper operation failed"
331343
message["extraction"] = {}
332344
return message
333345
elif status["status"] == "processed":
334-
self.logger.debug(f"Whisper-hash:{whisper_hash} | STATUS: processed!")
346+
self.logger.debug(
347+
f"Whisper-hash:{whisper_hash} | STATUS: processed!"
348+
)
335349
resultx = self.whisper_retrieve(whisper_hash=whisper_hash)
336350
if resultx["status_code"] == 200:
337351
message["status_code"] = 200

0 commit comments

Comments
 (0)