cipherstash
diff --git a/‎.gitignore‎
Lines changed: 4 additions & 0 deletions b/‎.gitignore‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎docs/reference/database-indexes.md‎
Lines changed: 85 additions & 1 deletion b/‎docs/reference/database-indexes.md‎
Lines changed: 85 additions & 1 deletion
diff --git a/‎docs/reference/index-config.md‎
Lines changed: 60 additions & 16 deletions b/‎docs/reference/index-config.md‎
Lines changed: 60 additions & 16 deletions
diff --git a/‎docs/reference/json-support.md‎
Lines changed: 43 additions & 1 deletion b/‎docs/reference/json-support.md‎
Lines changed: 43 additions & 1 deletion
@@ -215,5 +215,9 @@ eql--*.sql
 # Generated SQLx migration (built from src/, never commit)
 tests/sqlx/migrations/001_install_eql.sql
 
+# Large generated test data files
+tests/ste_vec_vast.sql
+tests/ste_vec_*M.sql*
+
 # Rust build artifacts (using sccache)
 tests/sqlx/target/
@@ -10,6 +10,7 @@ EQL supports PostgreSQL B-tree indexes on `eql_v2_encrypted` columns to improve
 - [Query Patterns That Don't Use Indexes](#query-patterns-that-dont-use-indexes)
 - [Index Limitations](#index-limitations)
 - [Best Practices](#best-practices)
+- [GIN Indexes for JSONB Containment](#gin-indexes-for-jsonb-containment)
 
 ---
 
@@ -217,7 +218,7 @@ B-tree indexes **only work** with:
 
 They **do not work** with:
 - `bf` (bloom_filter) - pattern matching
-- `sv` (ste_vec) - JSONB containment
+- Data with `sv` field (ste_vec) - JSONB containment uses GIN indexes instead (see [GIN Indexes](#gin-indexes-for-jsonb-containment))
 - Data without any index terms
 
 ### 2. Index Creation Timing
@@ -326,6 +327,89 @@ DROP INDEX IF EXISTS idx_users_encrypted_email;
 
 ---
 
+## GIN Indexes for JSONB Containment
+
+While B-tree indexes don't support `ste_vec` (JSONB containment), you can use PostgreSQL GIN indexes for efficient containment queries on encrypted JSONB columns.
+
+### When to Use GIN Indexes
+
+Use GIN indexes when:
+- You need to perform JSONB containment queries (`@>`, `<@`)
+- The table has a significant number of rows (500+ recommended)
+- Query performance on containment operations is important
+
+### Creating a GIN Index
+
+Create a GIN index using the `jsonb_array()` function, which extracts the encrypted JSONB as a native `jsonb[]` array:
+
+```sql
+CREATE INDEX idx_encrypted_jsonb_gin
+ON table_name USING GIN (eql_v2.jsonb_array(encrypted_column));
+
+ANALYZE table_name;
+```
+
+**Important:** Always run `ANALYZE` after creating the index so PostgreSQL's query planner has accurate statistics.
+
+### Query Patterns for GIN Indexes
+
+There are two approaches to write containment queries that use GIN indexes:
+
+#### Approach 1: Using jsonb_array() Function
+
+Convert both sides to `jsonb[]` and use the native containment operator:
+
+```sql
+SELECT * FROM table_name
+WHERE eql_v2.jsonb_array(encrypted_column) @>
+      eql_v2.jsonb_array($1::eql_v2_encrypted);
+```
+
+#### Approach 2: Using Helper Function
+
+Use the convenience function which handles the conversion internally:
+
+```sql
+SELECT * FROM table_name
+WHERE eql_v2.jsonb_contains(encrypted_column, $1::eql_v2_encrypted);
+```
+
+Both approaches produce the same result and use the GIN index.
+
+### Verifying Index Usage
+
+Use `EXPLAIN` to verify the GIN index is being used:
+
+```sql
+EXPLAIN SELECT * FROM table_name
+WHERE eql_v2.jsonb_array(encrypted_column) @>
+      eql_v2.jsonb_array($1::eql_v2_encrypted);
+```
+
+**Expected output:**
+```
+Bitmap Heap Scan on table_name
+  Recheck Cond: (jsonb_array(encrypted_column) @> jsonb_array(...))
+  ->  Bitmap Index Scan on idx_encrypted_jsonb_gin
+        Index Cond: (jsonb_array(encrypted_column) @> jsonb_array(...))
+```
+
+If you see `Seq Scan`, ensure:
+1. The index exists
+2. `ANALYZE` has been run
+3. The table has enough rows (PostgreSQL may choose sequential scan for very small tables)
+
+### GIN vs B-tree Index Comparison
+
+| Feature | B-tree Index | GIN Index |
+|---------|-------------|-----------|
+| **Use case** | Equality, range queries | JSONB containment |
+| **Index terms** | `hm`, `b3`, `ob` | `sv` (via jsonb_array) |
+| **Operators** | `=`, `<`, `>`, `<=`, `>=` | `@>`, `<@` |
+| **Function** | Direct column reference | `eql_v2.jsonb_array()` |
+
+---
+
 ## Troubleshooting
 
 ### Index Not Being Used
 
@@ -5,14 +5,14 @@
 > If you are using Protect.js, see the [Protect.js schema](https://github.com/cipherstash/protectjs/blob/main/docs/reference/schema.md).
 
 The following functions allow you to configure indexes for encrypted columns.
-All these functions modify the `eql_v2_configuration` table in your database, and are added during the EQL installation.
+All these functions modify the `public.eql_v2_configuration` table in your database, and are added during the EQL installation.
 
 > **IMPORTANT:** When you modify or add search configuration  index, you must re-encrypt data that's already been stored in the database.
 > The CipherStash encryption solution will encrypt the data based on the current state of the configuration.
 
 ### Configuring search (`eql_v2.add_search_config`)
 
-Add an index to an encrypted column.
+Add an index to an encrypted column. Returns the updated configuration as JSONB.
 
 ```sql
 SELECT eql_v2.add_search_config(
@@ -31,6 +31,7 @@ SELECT eql_v2.add_search_config(
 | `index_name`  | The index kind                                     | Required                                                                 |
 | `cast_as`     | The PostgreSQL type decrypted data will be cast to | Optional. Defaults to `text`                                             |
 | `opts`        | Index options                                      | Optional for `match` indexes, required for `ste_vec` indexes (see below) |
+| `migrating`   | Skip auto-migration if true                        | Optional. Defaults to `false`. Set to `true` for batch operations        |
 
 #### Option (`cast_as`)
 
@@ -60,33 +61,33 @@ The default match index options are:
     "tokenizer": {
       "kind": "ngram",
       "token_length": 3
-    }
-    "token_filters": {
-      "kind": "downcase"
-    }
+    },
+    "token_filters": [
+      {"kind": "downcase"}
+    ]
   }
 ```
 
-- `tokenFilters`: a list of filters to apply to normalize tokens before indexing.
+- `token_filters`: a list of filters to apply to normalize tokens before indexing.
 - `tokenizer`: determines how input text is split into tokens.
-- `m`: The size of the backing [bloom filter](https://en.wikipedia.org/wiki/Bloom_filter) in bits. Defaults to `2048`.
+- `bf`: The size of the backing [bloom filter](https://en.wikipedia.org/wiki/Bloom_filter) in bits. Defaults to `2048`.
 - `k`: The maximum number of bits set in the bloom filter per term. Defaults to `6`.
 
 **Token filters**
 
-There are currently only two token filters available: `downcase` and `upcase`. These are used to normalise the text before indexing and are also applied to query terms. An empty array can also be passed to `tokenFilters` if no normalisation of terms is required.
+The `downcase` token filter is available to normalise text before indexing and is also applied to query terms. An empty array can also be passed to `token_filters` if no normalisation of terms is required.
 
 **Tokenizer**
 
 There are two `tokenizer`s provided: `standard` and `ngram`.
 `standard` simply splits text into tokens using this regular expression: `/[ ,;:!]/`.
 `ngram` splits the text into n-grams and accepts a configuration object that allows you to specify the `tokenLength`.
 
-**m** and **k**
+**bf** and **k**
 
-`k` and `m` are optional fields for configuring [bloom filters](https://en.wikipedia.org/wiki/Bloom_filter) that back full text search.
+`k` and `bf` are optional fields for configuring [bloom filters](https://en.wikipedia.org/wiki/Bloom_filter) that back full text search.
 
-`m` is the size of the bloom filter in bits. `filterSize` must be a power of 2 between `32` and `65536` and defaults to `2048`.
+`bf` is the size of the bloom filter in bits. It must be a power of 2 between `32` and `65536` and defaults to `2048`.
 
 `k` is the number of hash functions to use per term.
 This determines the maximum number of bits that will be set in the bloom filter per term.
@@ -103,7 +104,9 @@ Try to ensure that the string you search for is at least as long as the `tokenLe
 
 #### Options for ste_vec indexes (`opts`)
 
-An ste_vec index on a encrypted JSONB column enables the use of PostgreSQL's `@>` and `<@` [containment operators](https://www.postgresql.org/docs/16/functions-json.html#FUNCTIONS-JSONB-OP-TABLE).
+An ste_vec index on an encrypted JSONB column enables the use of PostgreSQL's `@>` and `<@` [containment operators](https://www.postgresql.org/docs/16/functions-json.html#FUNCTIONS-JSONB-OP-TABLE).
+
+> **Note:** The `@>` and `<@` operators work directly on `eql_v2_encrypted` types, allowing simple query syntax like `encrypted_col @> search_term`.
 
 An ste_vec index requires one piece of configuration: the `prefix` (a string) which is passed as an info string to a MAC (Message Authenticated Code).
 This ensures that all of the encrypted values are unique to that prefix.
@@ -204,7 +207,7 @@ A query prior to encrypting and indexing looks like a structurally similar subse
 }
 ```
 
-The expression `cs_ste_vec_v2(encrypted_account) @> cs_ste_vec_v2($query)` would match all records where the `encrypted_account` column contains a JSONB object with an "account" key containing an object with an "email" key where the value is the string "[email protected]".
+The expression `encrypted_account @> $query` would match all records where the `encrypted_account` column contains a JSONB object with an "account" key containing an object with an "email" key where the value is the string "[email protected]".
 
 When reduced to a prefix list, it would look like this:
 
@@ -224,9 +227,26 @@ When reduced to a prefix list, it would look like this:
 
 Which is then turned into an ste_vec of hashes which can be directly queries against the index.
 
+#### GIN indexing for ste_vec
+
+For efficient containment queries on large tables, you can create a GIN index using the `eql_v2.jsonb_array()` function:
+
+```sql
+-- Create GIN index for containment queries
+CREATE INDEX idx_encrypted_jsonb ON mytable USING GIN (eql_v2.jsonb_array(encrypted_col));
+
+-- Query using containment (will use the GIN index)
+SELECT * FROM mytable WHERE encrypted_col @> $1::eql_v2_encrypted;
+```
+
+The following helper functions are available for GIN-indexed containment queries:
+- `eql_v2.jsonb_array(val)` - Extracts encrypted JSONB as an array for GIN indexing
+- `eql_v2.jsonb_contains(a, b)` - GIN-indexable containment check (`a @> b`)
+- `eql_v2.jsonb_contained_by(a, b)` - GIN-indexable "is contained by" check (`a <@ b`)
+
 ### Modifying an index (`eql_v2.modify_search_config`)
 
-Modifies an existing index configuration.
+Modifies an existing index configuration. Returns the updated configuration as JSONB.
 Accepts the same parameters as `eql_v2.add_search_config`
 
 ```sql
@@ -240,9 +260,22 @@ SELECT eql_v2.modify_search_config(
 );
 ```
 
+**Example:**
+
+```sql
+-- Update match index options to increase bloom filter size
+SELECT eql_v2.modify_search_config(
+  'users',
+  'email',
+  'match',
+  'text',
+  '{"bf": 4096, "k": 8}'::jsonb
+);
+```
+
 ### Removing an index (`eql_v2.remove_search_config`)
 
-Removes an index configuration from the column.
+Removes an index configuration from the column. Returns the updated configuration as JSONB.
 
 ```sql
 SELECT eql_v2.remove_search_config(
@@ -253,6 +286,17 @@ SELECT eql_v2.remove_search_config(
 );
 ```
 
+**Example:**
+
+```sql
+-- Remove the match index from the email column
+SELECT eql_v2.remove_search_config(
+  'users',
+  'email',
+  'match'
+);
+```
+
 ---
 
 ### Didn't find what you wanted?
 
@@ -102,6 +102,24 @@ WHERE jsonb_column @> '{"top":{"nested":["a"]}}';
 
 **Note:** The `@>` operator checks if the left value contains the right value. The `<@` operator checks the reverse (if left is contained in right).
 
+#### Indexed Containment Queries
+
+For better performance on large tables, create a GIN index and use the `jsonb_array()` function:
+
+```sql
+-- Create GIN index
+CREATE INDEX idx_encrypted_jsonb_gin
+ON examples USING GIN (eql_v2.jsonb_array(encrypted_json));
+ANALYZE examples;
+
+-- Query using the GIN index
+SELECT * FROM examples
+WHERE eql_v2.jsonb_array(encrypted_json) @>
+      eql_v2.jsonb_array($1::eql_v2_encrypted);
+```
+
+See [GIN Indexes for JSONB Containment](./database-indexes.md#gin-indexes-for-jsonb-containment) for complete setup instructions.
+
 ### Field extraction (`jsonb_path_query`)
 
 Extract fields from encrypted JSONB using selector hashes. Selectors are generated during encryption and identify specific JSON paths.
@@ -138,10 +156,15 @@ Use standard PostgreSQL JSON operators on encrypted columns:
 -- Extract field by selector (returns eql_v2_encrypted)
 SELECT encrypted_json->'selector_hash' FROM examples;
 
--- Extract field as text (returns ciphertext)
+-- Extract field as text (returns encrypted value as text)
 SELECT encrypted_json->>'selector_hash' FROM examples;
+
+-- Extract array element by index (0-based, returns eql_v2_encrypted)
+SELECT encrypted_array->0 FROM examples;
 ```
 
+**Note:** The `->` operator supports integer array indexing (e.g., `encrypted_array->0`), but the `->>` operator does not. Use `->` to access array elements by index.
+
 ### Array operations
 
 EQL supports array operations on encrypted JSONB arrays:
@@ -200,6 +223,9 @@ GROUP BY eql_v2.jsonb_path_query_first(encrypted_json, 'color_selector');
 
 ### Core Functions
 
+- **`eql_v2.ste_vec(val jsonb) RETURNS eql_v2_encrypted[]`**
+  - Extracts the ste_vec index array from a JSONB payload
+
 - **`eql_v2.ste_vec(val eql_v2_encrypted) RETURNS eql_v2_encrypted[]`**
   - Extracts the ste_vec index array from an encrypted value
 
@@ -243,6 +269,22 @@ GROUP BY eql_v2.jsonb_path_query_first(encrypted_json, 'color_selector');
 - **`eql_v2.selector(val eql_v2_encrypted) RETURNS text`**
   - Extracts the selector hash from an encrypted value
 
+### GIN-Indexable Functions
+
+These functions enable efficient GIN-indexed containment queries. See [GIN Indexes for JSONB Containment](./database-indexes.md#gin-indexes-for-jsonb-containment) for index setup.
+
+- **`eql_v2.jsonb_array(val eql_v2_encrypted) RETURNS jsonb[]`**
+  - Extracts encrypted JSONB as native PostgreSQL jsonb array for GIN indexing
+  - Create GIN indexes on this function for indexed containment queries
+
+- **`eql_v2.jsonb_contains(a eql_v2_encrypted, b eql_v2_encrypted) RETURNS boolean`**
+  - GIN-indexed containment check: returns true if a contains b
+  - Alternative to `jsonb_array(a) @> jsonb_array(b)`
+
+- **`eql_v2.jsonb_contained_by(a eql_v2_encrypted, b eql_v2_encrypted) RETURNS boolean`**
+  - GIN-indexed reverse containment: returns true if a is contained by b
+  - Alternative to `jsonb_array(a) <@ jsonb_array(b)`
+
 ### Aggregate Functions
 
 - **`eql_v2.grouped_value(jsonb) RETURNS jsonb`**