Skip to content

Commit 3f91fd4

Browse files
authored
Merge pull request #6339 from EnterpriseDB/release-2024-12-10a
Production release - 2024-12-10a
2 parents 5ea3a32 + 58b79df commit 3f91fd4

File tree

68 files changed

+5202
-712
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

68 files changed

+5202
-712
lines changed
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
---
2+
title: "Capabilities"
3+
navTitle: "Capabilities"
4+
description: "Capabilities of the EDB Postgres AI - AI Accelerator Pipelines."
5+
---
6+
7+
## Pipeline Lifecycle
8+
9+
This is a high-level overview of the lifecycle of a pipeline in the Pipelines system.
10+
11+
### A storage location is created (optional)
12+
13+
This step is optional and only needed for accessing data external storage.
14+
15+
Data for processing can be stored within the database in a table or in an external storage location.
16+
If you want to use an external storage location, you must create a storage location to access the data.
17+
This storage location can be an S3 bucket or a local file system.
18+
19+
The storage locations can be used to create a volume, suitable for a retriever to use to access the data it contains.
20+
21+
### A model is registered
22+
23+
A [model](models) is registered with the Pipelines system. This model can be a machine learning model, a deep learning model, or any other type of model that can be used for AI tasks.
24+
25+
### A retriever is registered
26+
27+
A retriever is registered with the Pipelines system. A retriever is a function that retrieves data from a table or volume and returns it in a format that can be used by the model.
28+
29+
By default, a retriever only needs:
30+
31+
* a name
32+
* the name of a registered model to use
33+
34+
If the retriever is for a table, it also needs:
35+
36+
* the name of the source table
37+
* the name of the column in the source table that contains the data
38+
* the data type of the column
39+
40+
If, on the other hand, the retriever is for a volume, it needs:
41+
42+
* the name of the volume
43+
* the name of the column in the volume that contains the data
44+
45+
When a retriever is registered, by default it will create a vector table to store the embeddings of the data that is retrieved.
46+
This table will have a column to store the embeddings and a column to store the key of the data.
47+
48+
The name of the vector table and the name of the vector column and the key column can be specified when the retriever is registered; this is useful if you are migrating to aidb and want to use an existing vector table.
49+
50+
### Embeddings are created
51+
52+
Embedding sees the data being retrieved from the source table or volume and encoded into a vector datatype. That vector data is then stored in the vector table.
53+
54+
If the source table already has data/rows at the time where the retriever is created, then a manual "bulk embedding" call must be made. This generates the embeddings for all the existing data in the source table.
55+
56+
Auto embedding can then be activated to keep the embeddings in sync going forward. Auto embedding uses Postgres triggers to detect insertions and updates to the source table and automatically generates embeddings for the new data.
57+
58+
### Data is queried
59+
60+
The embedded data can be queried using the retriever. The retriever can return the key to the data or the data itself, depending on the query. The data can be queried using a text query or an image query, depending on the type of data that is being retrieved.
61+
62+
### Next steps
63+
64+
While auto-embedding is enabled, the embeddings are always up-to-date and applications can use the retriever to query the data as needed.
65+
66+
### Cleanup
67+
68+
If the embeddings are no longer required, the retriever can be unregistered, the vector table can be dropped and the model can be unregistered too.
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
---
2+
title: Compatibility
3+
navTitle: Compatibility
4+
description: Compatibility information for the EDB Postgres AI - AI Accelerator Pipelines.
5+
---
6+
7+
## Supported platforms
8+
9+
### aidb
10+
11+
* Ubuntu 22.04LTS on X86/64
12+
* Debian 12 (Bookworm) on X86/64
13+
14+
### pgfs
15+
16+
* Ubuntu 22.04LTS on X86/64
17+
* Debian 12 (Bookworm) on X86/64
18+
19+
## Not currently supported
20+
21+
* Redhat/RHEL 9/8 on X86/64
22+
* ARM architectures
23+
* SLES
24+
* Debian before the current version 12
25+
* Ubuntu 24.04LTS
26+
* Non-Linux platforms
27+
28+
## Supported PostgreSQL versions
29+
30+
* EDB Postgres Advanced Server Version 14, 15, 16 and 17
31+
* EBD Postgres Extended Version 14, 15, 16 and 17
32+
* PostgreSQL 14, 15, 16 and 17
Lines changed: 194 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,194 @@
1+
---
2+
title: "Getting Started with Pipelines"
3+
navTitle: "Getting Started"
4+
description: "How to get started with AI Accelerator Pipelines."
5+
redirects:
6+
- /purl/aidb/gettingstarted
7+
---
8+
9+
## Where to Start
10+
11+
The best place to start is with the [Pipelines Overview](/edb-postgres-ai/ai-accelerator/overview) to get an understanding of what Pipelines is and how it works.
12+
13+
## Installation
14+
15+
Pipelines is included with the EDB Postgres AI - AI Accelerator suite of tools. To install Pipelines, follow the instructions in the [AI Accelerator Installation Guide](/edb-postgres-ai/ai-accelerator/installing).
16+
17+
## Using Pipelines
18+
19+
Once you have Pipelines installed, you can start using it to work with your data.
20+
21+
Log in to your Postgres server and ensure the Pipelines extension is installed:
22+
23+
```sql
24+
CREATE EXTENSION aidb CASCADE;
25+
```
26+
27+
We'll be working solely with Postgres table data for this example, so we won't need to install the pgfs extension.
28+
29+
Let's also create an example table to work with:
30+
31+
```sql
32+
CREATE TABLE products (
33+
id SERIAL PRIMARY KEY,
34+
product_name TEXT NOT NULL,
35+
description TEXT,
36+
last_updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
37+
);
38+
__OUTPUT__
39+
CREATE TABLE
40+
```
41+
42+
And let's insert some data:
43+
44+
```sql
45+
INSERT INTO products (product_name, description) VALUES
46+
('Hamburger', 'A delicious combination of bread and meat'),
47+
('Cheesburger', 'Improving on a classic, the cheese brings favorite flavors'),
48+
('Fish n Chips', 'The fish is a little greasy and the chips do not help'),
49+
('Fries', 'Never sure about these on their own, needs seasoning'),
50+
('Burrito', 'Always ready for this parcel of edible wonder'),
51+
('Pizza', 'It is very much a staple, but the rolled dough with toppings does not inspire'),
52+
('Sandwich', 'The blandest of offerings, the sandwich is predominantly boring bread'),
53+
('Veggie Burger', 'The ultra-processed vegetable product in this is neither healthy nor delicious'),
54+
('Kebab', 'Maybe one of the great edible treats, sliced lamb, salad and crisp pitta');
55+
__OUTPUT__
56+
INSERT 0 9
57+
```
58+
59+
So now we have a table with some data in it, food products and some very personal opinions about them.
60+
61+
## Registering a Retriever
62+
63+
The first step to using Pipelines with this data is to register a retriever. A retriever is a way to access the data in the table and use it in AI workflows.
64+
65+
```sql
66+
select aidb.register_retriever_for_table('products_retriever', 't5', 'products', 'description', 'Text');
67+
__OUTPUT__
68+
register_retriever_for_table
69+
------------------------------
70+
products_retriever
71+
(1 row)
72+
```
73+
74+
## Querying the retriever
75+
76+
Now that we have a retriever registered, we can query it to get similar results based on the data in the table.
77+
78+
```sql
79+
select * from aidb.retrieve_key('products_retriever','I like it',5);
80+
__OUTPUT__
81+
ERROR: Query returned no data. Hint: The "products_retriever_vector" table is likely empty. Make sure the embeddings have been computed.
82+
```
83+
84+
First, we haven't computed embeddings for our retriever yet.
85+
The `products_retriever_vector` table is where aidb keeps the computed embeddings for the retriever.
86+
Let's compute those embeddings now using `aidb.bulk_embedding`:
87+
88+
```sql
89+
select aidb.bulk_embedding('products_retriever');
90+
__OUTPUT__
91+
INFO: bulk_embedding_text found 9 rows in retriever products_retriever
92+
bulk_embedding
93+
----------------
94+
95+
(1 row)
96+
```
97+
98+
Now we can query the retriever again:
99+
100+
```sql
101+
select * from aidb.retrieve_key('products_retriever','I like it',4);
102+
__OUTPUT__
103+
key | distance
104+
-----+--------------------
105+
4 | 1.0369428080621286
106+
3 | 1.03737124138149
107+
2 | 1.0839594107837638
108+
5 | 1.0869412071766262
109+
(4 rows)
110+
```
111+
112+
Now we have some results. The `key` column is the primary key of the row in the `products` table, and the `distance` column is the distance between the query and the result. The lower the distance, the more similar the result is to the query.
113+
114+
What we really want is the actual matching text, not just the key. We can use `aidb.retrieve_text` for that:
115+
116+
```sql
117+
select * from aidb.retrieve_text('products_retriever','I like it',4);
118+
__OUTPUT__
119+
key | value | distance
120+
-----+------------------------------------------------------------+--------------------
121+
4 | Never sure about these on their own, needs seasoning | 1.0369428080621286
122+
3 | The fish is a little greasy and the chips do not help | 1.03737124138149
123+
2 | Improving on a classic, the cheese brings favorite flavors | 1.0839594107837638
124+
5 | Always ready for this parcel of edible wonder | 1.0869412071766262
125+
(4 rows)
126+
```
127+
128+
Now we have the actual data from the table that matches the query.
129+
130+
You may want the row data from the `products` table instead of the `products_retriever_vector` table. You can do that by joining the two tables:
131+
132+
```sql
133+
select * from aidb.retrieve_key('products_retriever','I like it',4) as a
134+
left join products as b
135+
on a.key=b.id;
136+
__OUTPUT__
137+
key | distance | id | product_name | description | last_updated_at
138+
-----+--------------------+----+--------------+------------------------------------------------------------+----------------------------------
139+
2 | 1.0839594107837638 | 2 | Cheesburger | Improving on a classic, the cheese brings favorite flavors | 04-DEC-24 16:48:52.599806 +00:00
140+
3 | 1.03737124138149 | 3 | Fish n Chips | The fish is a little greasy and the chips do not help | 04-DEC-24 16:48:52.599806 +00:00
141+
4 | 1.0369428080621286 | 4 | Fries | Never sure about these on their own, needs seasoning | 04-DEC-24 16:48:52.599806 +00:00
142+
5 | 1.0869412071766262 | 5 | Burrito | Always ready for this parcel of edible wonder | 04-DEC-24 16:48:52.599806 +00:00
143+
(4 rows)
144+
```
145+
146+
Now you have the actual data from the `products` table that matches the query and as you can see, the full power of Postgres is available to you to work with your AI workflows.
147+
148+
## One more thing, auto-embedding
149+
150+
As it stands vectors have been calculated for our data, but if we added data to the table it wouldn't be automatically embedded. The retriever would go out of sync.
151+
152+
To keep the embeddings up to date, we can enable auto-embedding:
153+
154+
```sql
155+
select aidb.enable_auto_embedding_for_table('products_retriever');
156+
__OUTPUT__
157+
enable_auto_embedding_for_table
158+
---------------------------------
159+
160+
(1 row)
161+
```
162+
163+
Now, if we add data to the table, the embeddings will be automatically calculated. We can quickly test this:
164+
165+
```sql
166+
INSERT INTO products (product_name, description) VALUES
167+
('Pasta', 'A carb-heavy delight that is always welcome, especially with a good sauce'),
168+
('Salad', 'Meh, it is what it is and it is not much. Occasionally saved by a good dressing');
169+
__OUTPUT__
170+
NOTICE: Running auto embedding for retriever products. key: "10" content: "A carb-heavy delight that is always welcome, especially with a good sauce"
171+
NOTICE: Running auto embedding for retriever products. key: "11" content: "Meh, it is what it is and it is not much. Occasionally saved by a good dressing"
172+
INSERT 0 2
173+
```
174+
175+
176+
```sql
177+
select * from aidb.retrieve_key('products_retriever','I like it',4) as a
178+
left join products as b
179+
on a.key=b.id;
180+
__OUTPUT__
181+
key | distance | id | product_name | description | last_updated_at
182+
-----+--------------------+----+--------------+---------------------------------------------------------------------------------+----------------------------------
183+
10 | 1.0351907976251493 | 10 | Pasta | A carb-heavy delight that is always welcome, especially with a good sauce | 04-DEC-24 17:09:44.97484 +00:00
184+
11 | 0.979874632270706 | 11 | Salad | Meh, it is what it is and it is not much. Occasionally saved by a good dressing | 04-DEC-24 17:09:44.97484 +00:00
185+
3 | 1.03737124138149 | 3 | Fish n Chips | The fish is a little greasy and the chips do not help | 04-DEC-24 16:48:52.599806 +00:00
186+
4 | 1.0369428080621286 | 4 | Fries | Never sure about these on their own, needs seasoning | 04-DEC-24 16:48:52.599806 +00:00
187+
(4 rows)
188+
```
189+
190+
## Further reading
191+
192+
In the [Models](../models) section, you can learn how to register more models with Pipelines, including external models from OpenAI API compatible services.
193+
194+
In the [Retrievers](../retrievers) section, you can learn more about how to use retrievers with external data sources, local files or S3 storage, and how to use the retriever functions to get the data you need.
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
---
2+
title: "EDB Postgres AI - AI Accelerator"
3+
navTitle: "AI Accelerator"
4+
directoryDefaults:
5+
product: "EDB Postgres AI"
6+
iconName: BrainCircuit
7+
indexCards: simple
8+
description: "All about the EDB Postgres AI - AI Accelerator suite of tools including Pipelines and PGvector."
9+
navigation:
10+
- overview
11+
- gettingstarted
12+
- "#Introducing Pipelines"
13+
- pipelines-overview
14+
- capabilities
15+
- limitations
16+
- compatibility
17+
- installing
18+
- "#Piplelines components"
19+
- models
20+
- retrievers
21+
- pgfs
22+
- "#Pipelines resources"
23+
- reference
24+
- rel_notes
25+
- licenses
26+
- "#Other components"
27+
- pgvector
28+
redirects:
29+
- /edb-postgres-ai/ai-ml/
30+
---
31+
32+
As part of the EDB Postgres AI platform, Pipelines abstracts away the complexity of working with AI data. It transforms Postgres into a powerful platform for AI data management, as it combines vector search from PGvector with automation for complex AI workflows.
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
---
2+
title: "Completing and verifying the extension installation"
3+
navTitle: "Completing the installation"
4+
description: "Completing and verifying the installation of the AI Database and File System extensions."
5+
---
6+
7+
### Installing the AI Database extension
8+
9+
The AI Database extension is an extension that provides a set of functions to run AI/ML models in the database. The extension is installed using the `CREATE EXTENSION` command.
10+
11+
```sql
12+
ebd=# CREATE EXTENSION aidb CASCADE;
13+
NOTICE: installing required extension "vector"
14+
CREATE EXTENSION
15+
edb=#
16+
```
17+
18+
### Installing the File System extension
19+
20+
The File System extension is an extension that provides a set of functions to interact with the file system from within the database. The extension is installed using the `CREATE EXTENSION` command.
21+
22+
```sql
23+
edb=# create extension pgfs;
24+
CREATE EXTENSION
25+
```
26+
27+
### Validating the installation
28+
29+
You can check the extensions have been installed by running the `\dx` command in `psql`.
30+
31+
```sql
32+
edb=# \dx
33+
__OUTPUT__
34+
List of installed extensions
35+
Name | Version | Schema | Description
36+
------------------+---------+------------+------------------------------------------------------------
37+
aidb | 1.0.7 | aidb | aidb: makes it easy to build AI applications with postgres
38+
pgfs | 1.0.4 | pgfs | pgfs: enables access to filesystem-like storage locations
39+
vector | 0.8.0 | public | vector data type and ivfflat and hnsw access methods
40+
```
41+
42+
Typically, there will be other extensions listed in this view. The `aidb`, `pgfs`, and `vector` extensions should be listed.
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
---
2+
title: "Installing AI Accelerator Pipelines"
3+
navTitle: "Installing"
4+
description: "How to install AI Accelerator Pipelines."
5+
navigation:
6+
- packages
7+
- complete
8+
---
9+
10+
Pipelines is delivered as a set of extensions. Depending on how you are deploying Pipelines, these extensions may be installed by your deployment platform (such as EDB Cloud Service) or if you deploy your own Postgres server, you will need to install them manually.
11+
12+
- [Manually installing pipelines packages](packages)
13+
14+
Once the packages are installed, you can [complete the installation](complete) by activating the extensions within Postgres.
15+

0 commit comments

Comments
 (0)