Skip to content

Commit 11da705

Browse files
authored
Merge pull request #6757 from EnterpriseDB/aidb-lost-commit
Aidb lost commits added back for 4.0 release
2 parents fe14e86 + 238061f commit 11da705

File tree

9 files changed

+192
-6
lines changed

9 files changed

+192
-6
lines changed

advocacy_docs/edb-postgres-ai/ai-accelerator/gettingstarted/index.mdx

+1-1
Original file line numberDiff line numberDiff line change
@@ -158,7 +158,7 @@ Now you have the actual data from the `products` table that matches the query. A
158158

159159
As it stands, vectors were calculated for the data, but if you add data to the table, it isn't automatically embedded. The knowledge base would go out of sync.
160160

161-
To keep the embeddings up to date, enable live See [auto-processing](../capabilities/auto-processing):
161+
To keep the embeddings up to date, enable live [auto-processing](../capabilities/auto-processing):
162162

163163
```sql
164164
select aidb.set_auto_knowledge_base('products_knowledge_base', 'Live');

advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/index.mdx

+1-1
Original file line numberDiff line numberDiff line change
@@ -19,4 +19,4 @@ This section provides details of the supported models in EDB Postgres AI - AI Ac
1919
* [CLIP](clip).
2020
* [NIM_CLIP](nim_clip).
2121
* [NIM_RERANKING](nim_reranking).
22-
22+
* [NIM_PADDLE_OCR](nim_paddle_ocr).
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
---
2+
title: "NIM OCR"
3+
navTitle: "NIM OCR"
4+
description: "NIM OCR (Optical Character Recognition) is a model that extracts text from images on NVIDIA NIM."
5+
---
6+
7+
Model name: `nim_paddle_ocr`
8+
9+
## About NVIDIA NIM Image OCR
10+
11+
The [NVIDIA NIM for Image OCR API](https://docs.nvidia.com/nim/ingestion/table-extraction/latest/overview.html) is an (Optical Character Recognition) microservice and interface that extracts text from images on NVIDIA NIM.
12+
13+
14+
## Supported aidb operations
15+
16+
* perform_ocr
17+
18+
## Supported models
19+
20+
### NVIDIA NIM for Image OCR
21+
22+
* [baidu/paddleocr](https://build.nvidia.com/baidu/paddleocr/modelcard) (default)
23+
24+
25+
## Creating the default model
26+
27+
```sql
28+
SELECT aidb.create_model(
29+
'my_paddle_ocr_model',
30+
'nim_paddle_ocr',
31+
credentials => '{"api_key": "__NVIDIA_NIM_API_KEY__" }'::JSONB
32+
);
33+
```
34+
35+
There is currently only one model supported by [NVIDIA NIM for Image OCR API](https://docs.nvidia.com/nim/ingestion/table-extraction/latest/support-matrix.html), the default `baidu/paddleocr`, so you don't need to specify the model in the configuration.
36+
37+
## Model configuration settings
38+
39+
The following configuration settings are available for PADDLE_OCR models:
40+
41+
* `url` — The default is `https://ai.api.nvidia.com/v1/cv/baidu/paddleocr`.
42+
43+
## Model credentials
44+
45+
The following credentials are required if executing inside NVIDIA NGC:
46+
47+
* `api_key` — The NVIDIA Cloud API key to use for authentication.

advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/perform_ocr.mdx

+101
Large diffs are not rendered by default.

advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/summarize_text.mdx

+6-3
Original file line numberDiff line numberDiff line change
@@ -67,12 +67,15 @@ The AI accelerator data preparation pipeline validates the options at creation t
6767

6868
```sql
6969
-- This model does not support the language adapter
70-
SELECT aidb.create_model('bert_model__1952', 'bert_local');
70+
SELECT aidb.create_model('bert_model', 'bert_local');
7171
```
7272

7373
```sql
7474
-- Single execution fails
75-
SELECT * FROM aidb.summarize_text('bert_model__1952', 'Hello world');
75+
SELECT * FROM aidb.summarize_text(
76+
input => 'Hello world',
77+
options => '{"model": "bert_model"}'
78+
);
7679
__OUTPUT__
7780
ERROR: The requested adapter is not supported by the model provider: bert_local
7881
```
@@ -86,7 +89,7 @@ SELECT aidb.create_table_preparer(
8689
source_data_column => 'content',
8790
destination_table => 'summarized_data__1952',
8891
destination_data_column => 'summaries',
89-
options => '{"model": "bert_model__1952"}'::JSONB -- Incompatible model
92+
options => '{"model": "bert_model"}'::JSONB -- Incompatible model
9093
);
9194
__OUTPUT__
9295
ERROR: Failed to create preparer: The requested adapter is not supported by the model provider: bert_local

advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/primitives.mdx

+25
Original file line numberDiff line numberDiff line change
@@ -91,3 +91,28 @@ SELECT * FROM aidb.summarize_text(
9191
```
9292

9393
- The `model` is the name of the created model to use for summarization. The model must support the `decode_text()` and `decode_text_batch()` [model primitives](../models/primitives).
94+
95+
## Perform OCR
96+
97+
Call `aidb.perform_ocr()` to parse text from image bytes:
98+
99+
```sql
100+
-- Create a model for use in OCR
101+
SELECT aidb.create_model(
102+
'my_paddle_ocr_model',
103+
'nim_paddle_ocr',
104+
credentials => '{"api_key": "__NVIDIA_NIM_API_KEY__" }'::JSONB
105+
);
106+
107+
SELECT * FROM aidb.perform_ocr(
108+
decode('', 'base64'),
109+
options => '{"model": "my_paddle_ocr_model"}'
110+
);
111+
```
112+
113+
- The `model` is the name of the created model to use for OCR. The model must support the `perform_ocr` operation.
114+
115+
!!! Tip
116+
Limitations of the model still apply. For example, the [NVIDIA NIM Image OCR API](https://docs.nvidia.com/nim/ingestion/table-extraction/latest/api-reference.html) model provider only supports `png` and `jpeg` image inputs.
117+
!!!
118+

advocacy_docs/edb-postgres-ai/ai-accelerator/reference/preparers.mdx

+1
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ The `aidb.DataPreparationOperation` type is an enum that represents the differen
4040
* SummarizeText
4141
* ParseHtml
4242
* ParsePdf
43+
* PerformOcr
4344

4445
## Functions
4546

advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.0.0_rel_notes.mdx

+1
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,7 @@ The HTTP status code, error response content, and request content are logged to
5151
This supports disabling certificate verification using <code>insecure_skip_verify</code> for development, or specifying a custom certificate authority using the <code>ca_path</code> field.
5252
These settings allow secure integration with services using self-signed or private CA certificates.</p>
5353
</details></td><td></td></tr>
54+
<tr><td><details><summary>Added PerformOcr operation to Preparer.</summary><hr/><p>Preparer pipelines can now utilize a <code>PerformOcr</code> operation that leverages the <a href="https://build.nvidia.com/baidu/paddleocr">PaddleOCR model</a> with the <a href="https://docs.nvidia.com/nim/ingestion/table-extraction/latest/overview.html">NVIDIA NIM Image OCR API</a>.</p></details></td><td></td></tr>
5455
</tbody></table>
5556

5657

advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.0.0.yml

+9-1
Original file line numberDiff line numberDiff line change
@@ -82,4 +82,12 @@ relnotes:
8282
These settings allow secure integration with services using self-signed or private CA certificates.
8383
jira: "AID-321"
8484
type: Enhancement
85-
impact: Medium
85+
impact: Medium
86+
87+
- relnote: Added PerformOcr operation to Preparer.
88+
details: Preparer pipelines can now utilize a `PerformOcr` operation that leverages the [PaddleOCR model](https://build.nvidia.com/baidu/paddleocr) with the [NVIDIA NIM Image OCR API](https://docs.nvidia.com/nim/ingestion/table-extraction/latest/overview.html).
89+
jira: "AID-109"
90+
addresses: ""
91+
type: Enhancement
92+
impact: Medium
93+

0 commit comments

Comments
 (0)