Skip to content

Commit bf6e01f

Browse files
authored
Merge pull request #156 from cipherstash/jsonb_contains_clean
feat(ste_vec): add GIN-indexable containment using jsonb arrays
2 parents 9277114 + c6b1927 commit bf6e01f

File tree

9 files changed

+1209
-34
lines changed

9 files changed

+1209
-34
lines changed

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -215,5 +215,9 @@ eql--*.sql
215215
# Generated SQLx migration (built from src/, never commit)
216216
tests/sqlx/migrations/001_install_eql.sql
217217

218+
# Large generated test data files
219+
tests/ste_vec_vast.sql
220+
tests/ste_vec_*M.sql*
221+
218222
# Rust build artifacts (using sccache)
219223
tests/sqlx/target/

docs/reference/database-indexes.md

Lines changed: 85 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ EQL supports PostgreSQL B-tree indexes on `eql_v2_encrypted` columns to improve
1010
- [Query Patterns That Don't Use Indexes](#query-patterns-that-dont-use-indexes)
1111
- [Index Limitations](#index-limitations)
1212
- [Best Practices](#best-practices)
13+
- [GIN Indexes for JSONB Containment](#gin-indexes-for-jsonb-containment)
1314

1415
---
1516

@@ -217,7 +218,7 @@ B-tree indexes **only work** with:
217218

218219
They **do not work** with:
219220
- `bf` (bloom_filter) - pattern matching
220-
- `sv` (ste_vec) - JSONB containment
221+
- Data with `sv` field (ste_vec) - JSONB containment uses GIN indexes instead (see [GIN Indexes](#gin-indexes-for-jsonb-containment))
221222
- Data without any index terms
222223

223224
### 2. Index Creation Timing
@@ -326,6 +327,89 @@ DROP INDEX IF EXISTS idx_users_encrypted_email;
326327

327328
---
328329

330+
## GIN Indexes for JSONB Containment
331+
332+
While B-tree indexes don't support `ste_vec` (JSONB containment), you can use PostgreSQL GIN indexes for efficient containment queries on encrypted JSONB columns.
333+
334+
### When to Use GIN Indexes
335+
336+
Use GIN indexes when:
337+
- You need to perform JSONB containment queries (`@>`, `<@`)
338+
- The table has a significant number of rows (500+ recommended)
339+
- Query performance on containment operations is important
340+
341+
### Creating a GIN Index
342+
343+
Create a GIN index using the `jsonb_array()` function, which extracts the encrypted JSONB as a native `jsonb[]` array:
344+
345+
```sql
346+
CREATE INDEX idx_encrypted_jsonb_gin
347+
ON table_name USING GIN (eql_v2.jsonb_array(encrypted_column));
348+
349+
ANALYZE table_name;
350+
```
351+
352+
**Important:** Always run `ANALYZE` after creating the index so PostgreSQL's query planner has accurate statistics.
353+
354+
### Query Patterns for GIN Indexes
355+
356+
There are two approaches to write containment queries that use GIN indexes:
357+
358+
#### Approach 1: Using jsonb_array() Function
359+
360+
Convert both sides to `jsonb[]` and use the native containment operator:
361+
362+
```sql
363+
SELECT * FROM table_name
364+
WHERE eql_v2.jsonb_array(encrypted_column) @>
365+
eql_v2.jsonb_array($1::eql_v2_encrypted);
366+
```
367+
368+
#### Approach 2: Using Helper Function
369+
370+
Use the convenience function which handles the conversion internally:
371+
372+
```sql
373+
SELECT * FROM table_name
374+
WHERE eql_v2.jsonb_contains(encrypted_column, $1::eql_v2_encrypted);
375+
```
376+
377+
Both approaches produce the same result and use the GIN index.
378+
379+
### Verifying Index Usage
380+
381+
Use `EXPLAIN` to verify the GIN index is being used:
382+
383+
```sql
384+
EXPLAIN SELECT * FROM table_name
385+
WHERE eql_v2.jsonb_array(encrypted_column) @>
386+
eql_v2.jsonb_array($1::eql_v2_encrypted);
387+
```
388+
389+
**Expected output:**
390+
```
391+
Bitmap Heap Scan on table_name
392+
Recheck Cond: (jsonb_array(encrypted_column) @> jsonb_array(...))
393+
-> Bitmap Index Scan on idx_encrypted_jsonb_gin
394+
Index Cond: (jsonb_array(encrypted_column) @> jsonb_array(...))
395+
```
396+
397+
If you see `Seq Scan`, ensure:
398+
1. The index exists
399+
2. `ANALYZE` has been run
400+
3. The table has enough rows (PostgreSQL may choose sequential scan for very small tables)
401+
402+
### GIN vs B-tree Index Comparison
403+
404+
| Feature | B-tree Index | GIN Index |
405+
|---------|-------------|-----------|
406+
| **Use case** | Equality, range queries | JSONB containment |
407+
| **Index terms** | `hm`, `b3`, `ob` | `sv` (via jsonb_array) |
408+
| **Operators** | `=`, `<`, `>`, `<=`, `>=` | `@>`, `<@` |
409+
| **Function** | Direct column reference | `eql_v2.jsonb_array()` |
410+
411+
---
412+
329413
## Troubleshooting
330414

331415
### Index Not Being Used

docs/reference/index-config.md

Lines changed: 60 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,14 @@
55
> If you are using Protect.js, see the [Protect.js schema](https://github.com/cipherstash/protectjs/blob/main/docs/reference/schema.md).
66
77
The following functions allow you to configure indexes for encrypted columns.
8-
All these functions modify the `eql_v2_configuration` table in your database, and are added during the EQL installation.
8+
All these functions modify the `public.eql_v2_configuration` table in your database, and are added during the EQL installation.
99

1010
> **IMPORTANT:** When you modify or add search configuration index, you must re-encrypt data that's already been stored in the database.
1111
> The CipherStash encryption solution will encrypt the data based on the current state of the configuration.
1212
1313
### Configuring search (`eql_v2.add_search_config`)
1414

15-
Add an index to an encrypted column.
15+
Add an index to an encrypted column. Returns the updated configuration as JSONB.
1616

1717
```sql
1818
SELECT eql_v2.add_search_config(
@@ -31,6 +31,7 @@ SELECT eql_v2.add_search_config(
3131
| `index_name` | The index kind | Required |
3232
| `cast_as` | The PostgreSQL type decrypted data will be cast to | Optional. Defaults to `text` |
3333
| `opts` | Index options | Optional for `match` indexes, required for `ste_vec` indexes (see below) |
34+
| `migrating` | Skip auto-migration if true | Optional. Defaults to `false`. Set to `true` for batch operations |
3435

3536
#### Option (`cast_as`)
3637

@@ -60,33 +61,33 @@ The default match index options are:
6061
"tokenizer": {
6162
"kind": "ngram",
6263
"token_length": 3
63-
}
64-
"token_filters": {
65-
"kind": "downcase"
66-
}
64+
},
65+
"token_filters": [
66+
{"kind": "downcase"}
67+
]
6768
}
6869
```
6970

70-
- `tokenFilters`: a list of filters to apply to normalize tokens before indexing.
71+
- `token_filters`: a list of filters to apply to normalize tokens before indexing.
7172
- `tokenizer`: determines how input text is split into tokens.
72-
- `m`: The size of the backing [bloom filter](https://en.wikipedia.org/wiki/Bloom_filter) in bits. Defaults to `2048`.
73+
- `bf`: The size of the backing [bloom filter](https://en.wikipedia.org/wiki/Bloom_filter) in bits. Defaults to `2048`.
7374
- `k`: The maximum number of bits set in the bloom filter per term. Defaults to `6`.
7475

7576
**Token filters**
7677

77-
There are currently only two token filters available: `downcase` and `upcase`. These are used to normalise the text before indexing and are also applied to query terms. An empty array can also be passed to `tokenFilters` if no normalisation of terms is required.
78+
The `downcase` token filter is available to normalise text before indexing and is also applied to query terms. An empty array can also be passed to `token_filters` if no normalisation of terms is required.
7879

7980
**Tokenizer**
8081

8182
There are two `tokenizer`s provided: `standard` and `ngram`.
8283
`standard` simply splits text into tokens using this regular expression: `/[ ,;:!]/`.
8384
`ngram` splits the text into n-grams and accepts a configuration object that allows you to specify the `tokenLength`.
8485

85-
**m** and **k**
86+
**bf** and **k**
8687

87-
`k` and `m` are optional fields for configuring [bloom filters](https://en.wikipedia.org/wiki/Bloom_filter) that back full text search.
88+
`k` and `bf` are optional fields for configuring [bloom filters](https://en.wikipedia.org/wiki/Bloom_filter) that back full text search.
8889

89-
`m` is the size of the bloom filter in bits. `filterSize` must be a power of 2 between `32` and `65536` and defaults to `2048`.
90+
`bf` is the size of the bloom filter in bits. It must be a power of 2 between `32` and `65536` and defaults to `2048`.
9091

9192
`k` is the number of hash functions to use per term.
9293
This determines the maximum number of bits that will be set in the bloom filter per term.
@@ -103,7 +104,9 @@ Try to ensure that the string you search for is at least as long as the `tokenLe
103104

104105
#### Options for ste_vec indexes (`opts`)
105106

106-
An ste_vec index on a encrypted JSONB column enables the use of PostgreSQL's `@>` and `<@` [containment operators](https://www.postgresql.org/docs/16/functions-json.html#FUNCTIONS-JSONB-OP-TABLE).
107+
An ste_vec index on an encrypted JSONB column enables the use of PostgreSQL's `@>` and `<@` [containment operators](https://www.postgresql.org/docs/16/functions-json.html#FUNCTIONS-JSONB-OP-TABLE).
108+
109+
> **Note:** The `@>` and `<@` operators work directly on `eql_v2_encrypted` types, allowing simple query syntax like `encrypted_col @> search_term`.
107110
108111
An ste_vec index requires one piece of configuration: the `prefix` (a string) which is passed as an info string to a MAC (Message Authenticated Code).
109112
This ensures that all of the encrypted values are unique to that prefix.
@@ -204,7 +207,7 @@ A query prior to encrypting and indexing looks like a structurally similar subse
204207
}
205208
```
206209

207-
The expression `cs_ste_vec_v2(encrypted_account) @> cs_ste_vec_v2($query)` would match all records where the `encrypted_account` column contains a JSONB object with an "account" key containing an object with an "email" key where the value is the string "[email protected]".
210+
The expression `encrypted_account @> $query` would match all records where the `encrypted_account` column contains a JSONB object with an "account" key containing an object with an "email" key where the value is the string "[email protected]".
208211

209212
When reduced to a prefix list, it would look like this:
210213

@@ -224,9 +227,26 @@ When reduced to a prefix list, it would look like this:
224227

225228
Which is then turned into an ste_vec of hashes which can be directly queries against the index.
226229

230+
#### GIN indexing for ste_vec
231+
232+
For efficient containment queries on large tables, you can create a GIN index using the `eql_v2.jsonb_array()` function:
233+
234+
```sql
235+
-- Create GIN index for containment queries
236+
CREATE INDEX idx_encrypted_jsonb ON mytable USING GIN (eql_v2.jsonb_array(encrypted_col));
237+
238+
-- Query using containment (will use the GIN index)
239+
SELECT * FROM mytable WHERE encrypted_col @> $1::eql_v2_encrypted;
240+
```
241+
242+
The following helper functions are available for GIN-indexed containment queries:
243+
- `eql_v2.jsonb_array(val)` - Extracts encrypted JSONB as an array for GIN indexing
244+
- `eql_v2.jsonb_contains(a, b)` - GIN-indexable containment check (`a @> b`)
245+
- `eql_v2.jsonb_contained_by(a, b)` - GIN-indexable "is contained by" check (`a <@ b`)
246+
227247
### Modifying an index (`eql_v2.modify_search_config`)
228248

229-
Modifies an existing index configuration.
249+
Modifies an existing index configuration. Returns the updated configuration as JSONB.
230250
Accepts the same parameters as `eql_v2.add_search_config`
231251

232252
```sql
@@ -240,9 +260,22 @@ SELECT eql_v2.modify_search_config(
240260
);
241261
```
242262

263+
**Example:**
264+
265+
```sql
266+
-- Update match index options to increase bloom filter size
267+
SELECT eql_v2.modify_search_config(
268+
'users',
269+
'email',
270+
'match',
271+
'text',
272+
'{"bf": 4096, "k": 8}'::jsonb
273+
);
274+
```
275+
243276
### Removing an index (`eql_v2.remove_search_config`)
244277

245-
Removes an index configuration from the column.
278+
Removes an index configuration from the column. Returns the updated configuration as JSONB.
246279

247280
```sql
248281
SELECT eql_v2.remove_search_config(
@@ -253,6 +286,17 @@ SELECT eql_v2.remove_search_config(
253286
);
254287
```
255288

289+
**Example:**
290+
291+
```sql
292+
-- Remove the match index from the email column
293+
SELECT eql_v2.remove_search_config(
294+
'users',
295+
'email',
296+
'match'
297+
);
298+
```
299+
256300
---
257301

258302
### Didn't find what you wanted?

docs/reference/json-support.md

Lines changed: 43 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,24 @@ WHERE jsonb_column @> '{"top":{"nested":["a"]}}';
102102

103103
**Note:** The `@>` operator checks if the left value contains the right value. The `<@` operator checks the reverse (if left is contained in right).
104104

105+
#### Indexed Containment Queries
106+
107+
For better performance on large tables, create a GIN index and use the `jsonb_array()` function:
108+
109+
```sql
110+
-- Create GIN index
111+
CREATE INDEX idx_encrypted_jsonb_gin
112+
ON examples USING GIN (eql_v2.jsonb_array(encrypted_json));
113+
ANALYZE examples;
114+
115+
-- Query using the GIN index
116+
SELECT * FROM examples
117+
WHERE eql_v2.jsonb_array(encrypted_json) @>
118+
eql_v2.jsonb_array($1::eql_v2_encrypted);
119+
```
120+
121+
See [GIN Indexes for JSONB Containment](./database-indexes.md#gin-indexes-for-jsonb-containment) for complete setup instructions.
122+
105123
### Field extraction (`jsonb_path_query`)
106124

107125
Extract fields from encrypted JSONB using selector hashes. Selectors are generated during encryption and identify specific JSON paths.
@@ -138,10 +156,15 @@ Use standard PostgreSQL JSON operators on encrypted columns:
138156
-- Extract field by selector (returns eql_v2_encrypted)
139157
SELECT encrypted_json->'selector_hash' FROM examples;
140158

141-
-- Extract field as text (returns ciphertext)
159+
-- Extract field as text (returns encrypted value as text)
142160
SELECT encrypted_json->>'selector_hash' FROM examples;
161+
162+
-- Extract array element by index (0-based, returns eql_v2_encrypted)
163+
SELECT encrypted_array->0 FROM examples;
143164
```
144165

166+
**Note:** The `->` operator supports integer array indexing (e.g., `encrypted_array->0`), but the `->>` operator does not. Use `->` to access array elements by index.
167+
145168
### Array operations
146169

147170
EQL supports array operations on encrypted JSONB arrays:
@@ -200,6 +223,9 @@ GROUP BY eql_v2.jsonb_path_query_first(encrypted_json, 'color_selector');
200223

201224
### Core Functions
202225

226+
- **`eql_v2.ste_vec(val jsonb) RETURNS eql_v2_encrypted[]`**
227+
- Extracts the ste_vec index array from a JSONB payload
228+
203229
- **`eql_v2.ste_vec(val eql_v2_encrypted) RETURNS eql_v2_encrypted[]`**
204230
- Extracts the ste_vec index array from an encrypted value
205231

@@ -243,6 +269,22 @@ GROUP BY eql_v2.jsonb_path_query_first(encrypted_json, 'color_selector');
243269
- **`eql_v2.selector(val eql_v2_encrypted) RETURNS text`**
244270
- Extracts the selector hash from an encrypted value
245271

272+
### GIN-Indexable Functions
273+
274+
These functions enable efficient GIN-indexed containment queries. See [GIN Indexes for JSONB Containment](./database-indexes.md#gin-indexes-for-jsonb-containment) for index setup.
275+
276+
- **`eql_v2.jsonb_array(val eql_v2_encrypted) RETURNS jsonb[]`**
277+
- Extracts encrypted JSONB as native PostgreSQL jsonb array for GIN indexing
278+
- Create GIN indexes on this function for indexed containment queries
279+
280+
- **`eql_v2.jsonb_contains(a eql_v2_encrypted, b eql_v2_encrypted) RETURNS boolean`**
281+
- GIN-indexed containment check: returns true if a contains b
282+
- Alternative to `jsonb_array(a) @> jsonb_array(b)`
283+
284+
- **`eql_v2.jsonb_contained_by(a eql_v2_encrypted, b eql_v2_encrypted) RETURNS boolean`**
285+
- GIN-indexed reverse containment: returns true if a is contained by b
286+
- Alternative to `jsonb_array(a) <@ jsonb_array(b)`
287+
246288
### Aggregate Functions
247289

248290
- **`eql_v2.grouped_value(jsonb) RETURNS jsonb`**

0 commit comments

Comments
 (0)