Skip to content

Commit d85facf

Browse files
authored
Merge pull request #6821 from EnterpriseDB/release-2025-05-19
Release 2025-05-19
2 parents 37d4a09 + 9539567 commit d85facf

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+921
-218
lines changed

advocacy_docs/edb-postgres-ai/ai-accelerator/capabilities/auto-processing.mdx

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -123,7 +123,11 @@ As well as for existing pipelines:
123123
- With [`aidb.set_auto_knowledge_base`](../reference/knowledge_bases#aidbset_auto_knowledge_base)
124124

125125
## Batch processing
126-
In Background and Disabled modes, (auto) processing happens in batches of configurable size. Within each batch,
126+
In Background and Disabled modes, (auto) processing happens in batches of configurable size. The pipeline will process all source records in batches.
127+
All records within each batch are processed in parallel wherever possible. This means pipeline steps like data retrieval, embeddings computation, and storing embeddings will run as parallel operations.
128+
E.g., when using a table as a data source, a batch of input records will be retrieved with a single query. With a volume source, concurrent requests will be used to retrieve a batch of records.
129+
130+
Our [knowledge base pipeline performance tuning guide](../knowledge_base/performance_tuning) explains how the batch size can be tuned for optimal throughput.
127131

128132
## Change detection
129133
AIDB auto-processing is designed around change detection mechanisms for table and volume data sources. This allows it to only
Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
---
2+
title: "Pipelines knowledge base performance tuning"
3+
navTitle: "Performance tuning"
4+
deepToC: true
5+
description: "How to tune the performance of knowledge base pipelines."
6+
---
7+
8+
9+
## Background
10+
The performance (i.e., throughput of embeddings per second) can be optimized by changing pipeline and model settings.
11+
This guide explains the relevant settings and shows how to tune them.
12+
13+
Knowledge base piplines process collections of individual records (rows in a table or objects in a volume). Rather than processing each record individually and sequentially, or processing all of them concurrently,
14+
AIDB offers batch processing. All the batches get processed sequentially, one after the other. Within each batch, records get processed concurrently wherever possible.
15+
16+
- [Pipeline `batch_size`](../capabilities/auto-processing) determines how many records each batch should have
17+
- Some model providers have configurable internal batch/parallel processing. We recommend leaving these setting at the default values and using the pipeline batch size to control execution.
18+
19+
!!! Note
20+
vector indexing also has an impact on pipeline performance. You can disable the vector by using `index_type => 'disabled'` to exclude it from your measurements.
21+
!!!
22+
23+
## Testing and tuning performance
24+
We will first set up test data and a knowledge base pipeline, then measure and tune the batch size.
25+
26+
### 1) Create a table and insert test data
27+
The actual data content length has some impact on model performance. You can use longer text to test that.
28+
```sql
29+
CREATE TABLE test_data_10k (id INT PRIMARY KEY, msg TEXT NOT NULL);
30+
31+
INSERT INTO test_data_10k (id, msg) SELECT generate_series(1, 10000) AS id, 'hello world';
32+
```
33+
34+
35+
### 2) Create a knowledge base pipeline
36+
The optimal batch size may be very different for different models. Measure and tune the batch size for each different model you want to use.
37+
```sql
38+
SELECT aidb.create_table_knowledge_base(
39+
name => 'perf_test_b',
40+
model_name => 'dummy', -- use the model you want to optimize for
41+
source_table => 'test_data_10k',
42+
source_data_column => 'msg',
43+
source_data_format => 'Text',
44+
index_type => 'disabled', -- optionally disable vector indexing to include/exclude it from the measurement
45+
auto_processing => 'Disabled', -- we want to manually run the pipeline to measure the runtime
46+
batch_size => 100 -- this is the paramter we will tune during this test
47+
);
48+
__OUTPUT__
49+
INFO: using vector table: public.perf_test_vector
50+
NOTICE: index "vdx_perf_test_vector" does not exist, skipping
51+
NOTICE: auto-processing is set to "Disabled". Manually run "SELECT aidb.bulk_embedding('perf_test');" to compute embeddings.
52+
create_table_knowledge_base
53+
-----------------------------
54+
perf_test
55+
(1 row)
56+
```
57+
58+
### 3) Run the pipeline, measure the performance
59+
We use `psql` in this test; the `\timing on` command is a feature in psql. If you use a different interface, check how it can display timing information.
60+
61+
```sql
62+
\timing on
63+
__OUTPUT__
64+
Timing is on.
65+
```
66+
67+
Now run the pipeline:
68+
```sql
69+
SELECT aidb.bulk_embedding('perf_test');
70+
__OUTPUT__
71+
INFO: perf_test: (re)setting state table to process all data...
72+
INFO: perf_test: Starting... Batch size 100, unprocessed rows: 10000, count(source records): 10000, count(embeddings): 0
73+
INFO: perf_test: Batch iteration finished, unprocessed rows: 9900, count(source records): 10000, count(embeddings): 100
74+
INFO: perf_test: Batch iteration finished, unprocessed rows: 9800, count(source records): 10000, count(embeddings): 200
75+
...
76+
INFO: perf_test: Batch iteration finished, unprocessed rows: 0, count(source records): 10000, count(embeddings): 10000
77+
INFO: perf_test: finished, unprocessed rows: 0, count(source records): 10000, count(embeddings): 10000
78+
bulk_embedding
79+
----------------
80+
81+
(1 row)
82+
83+
Time: 207161,174 ms (03:27,161)
84+
```
85+
86+
87+
88+
### 4) Tune the batch size
89+
You can use this call to adjust the batch size of the pipeline. We increase by 10x to 1000 records:
90+
```sql
91+
SELECT aidb.set_auto_knowledge_base('perf_test', 'Disabled', batch_size=>1000);
92+
```
93+
94+
Run the pipeline again.
95+
96+
!!! Note
97+
When using a Postgres table as the source, with auto-processing disabled, AIDB has no means to detect changes in the source data. So each bulk_embedding call has to re-process everything.
98+
99+
This is convenient for performance testing.
100+
101+
If you want to measure performance with a volumes source, you should delete and re-create the knowledge base between each test. AIDB is able to detect changes on volumes even with auto-procesing disabled.
102+
103+
!!!
104+
```sql
105+
SELECT aidb.bulk_embedding('perf_test');
106+
__OUTPUT__
107+
INFO: perf_test: (re)setting state table to process all data...
108+
INFO: perf_test: Starting... Batch size 1000, unprocessed rows: 10000, count(source records): 10000, count(embeddings): 10000
109+
...
110+
INFO: perf_test: finished, unprocessed rows: 0, count(source records): 10000, count(embeddings): 10000
111+
bulk_embedding
112+
----------------
113+
114+
(1 row)
115+
116+
Time: 154276,486 ms (02:34,276)
117+
```
118+
119+
120+
## Conclusion
121+
In this test, the pipeline took 02:34 min with batch size 1000 and 03:27 min with size 100. You can continue testing larger sizes until performance no longer improves, or even declines.

advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/embeddings.mdx

Lines changed: 29 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ Based on the name of the model, the model provider sets defaults accordingly:
4242
## Creating the default with OpenAI model
4343

4444
```sql
45-
SELECT aidb.create_model('my_openai_embeddings',
45+
SELECT aidb.create_model('my_openai_embeddings',
4646
'openai_embeddings',
4747
credentials=>'{"api_key": "sk-abc123xyz456def789ghi012jkl345mn"'::JSONB);
4848
```
@@ -58,7 +58,7 @@ SELECT aidb.create_model(
5858
'my_openai_model',
5959
'openai_embeddings',
6060
'{"model": "text-embedding-3-small"}'::JSONB,
61-
'{"api_key": "sk-abc123xyz456def789ghi012jkl345mn"}'::JSONB
61+
'{"api_key": "sk-abc123xyz456def789ghi012jkl345mn"}'::JSONB
6262
);
6363
```
6464

@@ -69,12 +69,35 @@ Because this example is passing the configuration options and the credentials, u
6969
The following configuration settings are available for OpenAI models:
7070

7171
* `model` — The OpenAI model to use.
72-
* `url` — The URL of the model to use. This value is optional and can be used to specify a custom model URL.
73-
* If `openai_completions` (or `completions`) is the `model`, `url` defaults to `https://api.openai.com/v1/chat/completions`.
72+
* `url` — The URL of the model to use. This value is optional and can be used to specify a custom model URL.
73+
* If `openai_completions` (or `completions`) is the `model`, `url` defaults to `https://api.openai.com/v1/chat/completions`.
7474
* If `nim_completions` is the `model`, `url` defaults to `https://integrate.api.nvidia.com/v1/chat/completions`.
7575
* `max_concurrent_requests` — The maximum number of concurrent requests to make to the OpenAI model. The default is `25`.
76-
77-
## Model credentials
76+
* `max_batch_size` — The maximum number of records to send to the model in a single request. The default is `50.000`.
77+
78+
### Batch and parallel processing
79+
The model providers for `embeddings`, `openai_embeddings`, and `nim_embeddings` support sending batch requests as well as concurrent requests.
80+
The two settings `max_concurrent_requests` and `max_batch_size` control this behavior. When a model provider receives a set of records (E.g., from a knowledge base pipeline)
81+
the following happens:
82+
* Assuming the knowledge base pipeline is configured with batch size 10.000.
83+
* And the model provider is configured with `max_batch_size=1000` and `max_concurrent_requests=5`.
84+
* Then, the provider will collect up to 1000 records and send them in a single request to the model.
85+
* And it will send 5 such large requests concurrently, until no more input records are left.
86+
* So in this example, the provider needs to send/receive 10 batches in total.
87+
* After sending the first 5, it waits for the responses to return.
88+
* Once a response is received, another request can be sent.
89+
* This means the provider won't wait for all 5 to return before sending off the next 5. Instead, it always keeps up to 5 requests in flight.
90+
91+
!!! Note
92+
The settings `max_concurrent_requests` and `max_batch_size` can have a significant impact on model performance. But they highly depend on
93+
the hardware and infrastructure.
94+
95+
We recommend leaving the defaults in place and [tuning the performance via the knowledge base pipeline batch size.](../../knowledge_base/performance_tuning)
96+
The default `max_batch_size` of 50.000 is intentionally high to allow the pipeline to control the actual size of the batches.
97+
!!!
98+
99+
100+
### Model credentials
78101
The following credentials may be required by the service providing these models. Note: `api_key` and `basic_auth` are exclusive. Only one of these two options can be used.
79102

80103
* `api_key` &mdash; The API key to use for Bearer Token authentication. The api_key will be sent in a header field as `Authorization: Bearer <api_key>`.
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
---
2+
title: "Pipelines PGFS with Google Cloud Storage"
3+
navTitle: "Google Cloud storage"
4+
description: "PGFS options and credentials with Google Cloud Storage."
5+
---
6+
7+
8+
## Overview: Google Cloud Storage
9+
PGFS uses the `gs:` prefix to indicate an Google Cloud Storage bucket.
10+
11+
The general syntax for using GCS is this:
12+
```sql
13+
select pgfs.create_storage_location(
14+
'storage_location_name',
15+
'gs://bucket_name',
16+
credentials => '{}'::JSONB
17+
);
18+
```
19+
20+
### The `credentials` argument in JSON format offers the following settings:
21+
| Option | Description |
22+
|------------------------------------|------------------------------------------|
23+
| `google_application_credentials` | Path to the application credentials file |
24+
| `google_service_account_key_file` | Path to the service account key file |
25+
26+
See the [Google Cloud documentation](https://cloud.google.com/iam/docs/keys-create-delete#creating) for more information on how to manage service account keys.
27+
28+
These options can also be set up via the equivalent environment variables to facilitate authentication in managed environments such as Google Kubernetes Engine.
29+
30+
## Example: private GCS bucket
31+
32+
```sql
33+
SELECT pgfs.create_storage_location('edb_ai_example_images', 'gs://my-company-ai-images',
34+
credentials => '{"google_service_account_key_file": "/var/run/gcs.json"}'
35+
);
36+
```
37+
38+
## Example: authentication in GKE
39+
40+
Ensure that the `GOOGLE_APPLICATION_CREDENTIALS` or the `GOOGLE_SERVICE_ACCOUNT_KEY_FILE` environment variable
41+
is set on your PostgreSQL pod. Then, PGFS will automatically pick them up:
42+
43+
```sql
44+
SELECT pgfs.create_storage_location('edb_ai_example_images', 'gs://my-company-ai-images');
45+
```

advocacy_docs/edb-postgres-ai/ai-accelerator/pgfs/functions/s3.mdx

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ select pgfs.create_storage_location(
2525
| `skip_signature` | Disable HMAC authentication (set this to "true" when you're not providing access_key_id/secret_access_key in the credentials). |
2626
| `region` | The region of the S3-compatible storage system. If the region is not specified, the client will attempt auto-discovery. |
2727
| `endpoint` | The endpoint of the S3-compatible storage system. |
28+
| `allow_http` | Whether the endpoint uses plain HTTP (rather than HTTPS/TLS). Set this to `true` if your endpoint starts with `http://`. |
2829

2930
### The `credentials` argument in JSON format offers the following settings:
3031
| Option | Description |
@@ -53,7 +54,7 @@ SELECT pgfs.create_storage_location('internal_ai_project', 's3://my-company-ai-i
5354
);
5455
```
5556

56-
## Example: non-AWS S3 / S3-compatible
57+
## Example: non-AWS S3 / S3-compatible with HTTPS
5758
This is an example of using an S3-compatible system like minIO. The `endpoint` must be provided in this case; it can only be omitted when using AWS S3.
5859

5960
```sql
@@ -63,4 +64,16 @@ SELECT pgfs.create_storage_location('ai_images_local_minio', 's3://my-ai-images'
6364
);
6465
```
6566

67+
## Example: non-AWS S3 / S3-compatible with HTTP
68+
This is an example of using an S3-compatible system like minIO. The `endpoint` must be provided in this case; it can only be omitted when using AWS S3.
69+
70+
In this case, the server does not use TLS encryption; so we configure a plain HTTP connection.
71+
72+
```sql
73+
SELECT pgfs.create_storage_location('ai_images_local_minio', 's3://my-ai-images',
74+
options => '{"endpoint": "http://minio-api.apps.local", "allow_http":"true"}',
75+
credentials => '{"access_key_id": "my_username", "secret_access_key":"my_password"}'
76+
);
77+
```
78+
6679

advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/concepts.mdx

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,12 @@ Bulk data preparation performs a preparer's associated operation for all of the
3434
Bulk data preparation does not delete existing destination data unless it conflicts with newly generated data. It is recommended to configure separate destination tables for each preparer.
3535
!!!
3636

37+
## Unnesting
38+
39+
Some Preparer [Primitives](./primitives) transform the shape of the data they are given. For example, `ChunkText` receives one text block and produces one or more text blocks. Rather than return nested collections of results, these Primitives automatically unnest (or "explode") their output, using a new `part_id` column to track the additional dimension.
40+
41+
You can see this in action in [Primitives](./primitives) and in the applicable [examples](./examples).
42+
3743
## Consistency with source data
3844

3945
To ensure correct and consistent data, the prepared destination data must be in sync with the source data. In the case of the table data source, you can enable preparer auto processing to inform the preparer pipeline about changes to the source data.

0 commit comments

Comments
 (0)