Skip to content

Comments

feat: Lucene query parser with sql and dynamodb support#124

Open
Lutherwaves wants to merge 11 commits intomainfrom
dev
Open

feat: Lucene query parser with sql and dynamodb support#124
Lutherwaves wants to merge 11 commits intomainfrom
dev

Conversation

@Lutherwaves
Copy link
Contributor

@Lutherwaves Lutherwaves commented Dec 27, 2025

Implements Apache Lucene query syntax parser by extending https://github.com/grindlemire/go-lucene with custom drivers for SQL (JSONB) and DynamoDB PartiQL.

Supports field:value queries, wildcards, ranges, boolean operators, quoted phrases, fuzzy search, implicit search support for text fields only, and JSONB columns support. Includes field validation, security limits, and added tests.


PostgreSQL

pg_trgm - Required for fuzzy search functionality (similarity() function)

CREATE EXTENSION IF NOT EXISTS pg_trgm;

JSONB support - Built-in for PostgreSQL 9.4+ (no extension needed)

MySQL

JSON functions - Built-in for MySQL 5.7+ (no extension needed)
SOUNDEX() - Built-in function (no extension needed)

SQLite

JSON1 extension - Required for JSON field queries (JSON_EXTRACT())
    Usually compiled by default in SQLite 3.38.0+
    Verify with: SELECT json_valid('{}');
    If not available, recompile SQLite with -DSQLITE_ENABLE_JSON1
Note: Fuzzy search is not supported (no extension available)

DynamoDB

No extensions required (fully managed service)

Example Queries and SQL Translation

Basic Field Search

Query: name:john

PostgreSQL: "name" = $1 (params: ["john"])
MySQL: "name" = ? (params: ["john"])
SQLite: "name" = ? (params: ["john"])
DynamoDB: "name" = ? (params: ["john"])

Wildcard Search (Case-Insensitive)

Query: name:john*

PostgreSQL: "name"::text ILIKE $1 (params: ["john%"])
MySQL: LOWER("name") LIKE LOWER(?) (params: ["john%"])
SQLite: "name" LIKE ? (params: ["john%"])
DynamoDB: begins_with("name", 'john')

Wildcard Contains

Query: name:john

PostgreSQL: "name"::text ILIKE $1 (params: ["%john%"])
MySQL: LOWER("name") LIKE LOWER(?) (params: ["%john%"])
SQLite: "name" LIKE ? (params: ["%john%"])
DynamoDB: contains("name", 'john')

Fuzzy Search

Query: name:roam~2

PostgreSQL: similarity("name"::text, $1) > 0.3 (params: ["roam"])
MySQL: SOUNDEX("name") = SOUNDEX(?) (params: ["roam"])
SQLite: ❌ Error: "fuzzy search not supported with SQLite"
DynamoDB: ❌ Not supported

JSON Field Access

Query: labels.env:prod

PostgreSQL: labels->>'env' = $1 (params: ["prod"])
MySQL: JSON_UNQUOTE(JSON_EXTRACT(labels, '$.env')) = ? (params: ["prod"])
SQLite: JSON_EXTRACT(labels, '$.env') = ? (params: ["prod"])
DynamoDB: labels.env = ? (params: ["prod"])

JSON Field Wildcard

Query: labels.category:backend*

PostgreSQL: labels->>'category' ILIKE $1 (params: ["backend%"])
MySQL: LOWER(JSON_UNQUOTE(JSON_EXTRACT(labels, '$.category'))) LIKE LOWER(?) (params: ["backend%"])
SQLite: JSON_EXTRACT(labels, '$.category') LIKE ? (params: ["backend%"])
DynamoDB: begins_with(labels.category, 'backend')

Boolean AND

Query: name:john AND status:active

PostgreSQL: ("name" = $1) AND ("status" = $2) (params: ["john", "active"])
MySQL: ("name" = ?) AND ("status" = ?) (params: ["john", "active"])
SQLite: ("name" = ?) AND ("status" = ?) (params: ["john", "active"])
DynamoDB: ("name" = ?) AND ("status" = ?) (params: ["john", "active"])

Boolean OR

Query: status:active OR status:pending

PostgreSQL: ("status" = $1) OR ("status" = $2) (params: ["active", "pending"])
MySQL: ("status" = ?) OR ("status" = ?) (params: ["active", "pending"])
SQLite: ("status" = ?) OR ("status" = ?) (params: ["active", "pending"])
DynamoDB: ("status" = ?) OR ("status" = ?) (params: ["active", "pending"])

Range Query (Inclusive)

Query: age:[18 TO 65]

PostgreSQL: "age" BETWEEN $1 AND $2 (params: ["18", "65"])
MySQL: "age" BETWEEN ? AND ? (params: ["18", "65"])
SQLite: "age" BETWEEN ? AND ? (params: ["18", "65"])
DynamoDB: "age" BETWEEN ? AND ? (params: ["18", "65"])

Range Query (Open-Ended)

Query: age:[18 TO *]

PostgreSQL: "age" >= $1 (params: ["18"])
MySQL: "age" >= ? (params: ["18"])
SQLite: "age" >= ? (params: ["18"])
DynamoDB: "age" >= ? (params: ["18"])

Null Check

Query: parent_id:null

PostgreSQL: "parent_id" IS NULL
MySQL: "parent_id" IS NULL
SQLite: "parent_id" IS NULL
DynamoDB: ❌ Not supported

NOT Operator

Query: NOT status:deleted

PostgreSQL: NOT ("status" = $1) (params: ["deleted"])
MySQL: NOT ("status" = ?) (params: ["deleted"])
SQLite: NOT ("status" = ?) (params: ["deleted"])
DynamoDB: NOT ("status" = ?) (params: ["deleted"])

Complex Query

Query: (name:john* OR email:*@example.com) AND status:active

PostgreSQL: (("name"::text ILIKE $1) OR ("email"::text ILIKE $2)) AND ("status" = $3) (params: ["john%", "%@example.com", "active"])
MySQL: ((LOWER("name") LIKE LOWER(?)) OR (LOWER("email") LIKE LOWER(?))) AND ("status" = ?) (params: ["john%", "%@example.com", "active"])
SQLite: (("name" LIKE ?) OR ("email" LIKE ?)) AND ("status" = ?) (params: ["john%", "%@example.com", "active"])
DynamoDB: ((begins_with("name", 'john')) OR (contains("email", '@example.com'))) AND ("status" = ?) (params: ["active"])

@Lutherwaves Lutherwaves self-assigned this Dec 27, 2025
@Lutherwaves Lutherwaves marked this pull request as ready for review December 27, 2025 13:24
@Lutherwaves Lutherwaves changed the title feat: implement Lucene query parser with PostgreSQL and DynamoDB support feat: Lucene query parser with JSONB and PartiQL support Dec 28, 2025
@Lutherwaves Lutherwaves marked this pull request as draft December 28, 2025 12:04
@Lutherwaves
Copy link
Contributor Author

Moving to draft to refactor a bit for better readability and consistency

@Lutherwaves Lutherwaves marked this pull request as ready for review December 31, 2025 14:56
Lutherwaves pushed a commit to Lutherwaves/magic that referenced this pull request Dec 31, 2025
…itions

Added 34 search query examples to types/types.go covering all Lucene search capabilities:

- Basic field search (exact match, wildcards)
- Boolean operators (AND, OR, NOT)
- Required/Prohibited operators (+, -)
- Range queries (inclusive, exclusive, open-ended, date ranges)
- Quoted phrases and special characters
- Complex nested queries
- Implicit search across all string fields
- JSONB/nested field access with dot notation
- Null value queries
- Fuzzy search with edit distance

These examples can be referenced in Swagger annotations using:
  $ref: "#/components/examples/SearchQueryBasic"
  $ref: "#/components/examples/SearchQueryWildcard"
etc.

This allows API consumers to easily understand and use the Lucene query syntax
for filtering and searching resources, similar to how PatchBody provides examples
for PATCH operations.

Related to PR tink3rlabs#124 which added Lucene search support.
Lutherwaves pushed a commit to Lutherwaves/magic that referenced this pull request Jan 1, 2026
…efinitions

Added SearchQuery schema to types/types.go with 34 comprehensive examples
covering all Lucene search capabilities:

Schema Structure:
- Type: string
- Description: Full Lucene query syntax reference
- Default example: "name:john AND status:active"
- 34 example queries covering:
  * Basic field searches and wildcards
  * Boolean operators (AND, OR, NOT, +, -)
  * Range queries (inclusive, exclusive, open-ended, dates)
  * Quoted phrases and escaped characters
  * Complex nested queries
  * Implicit search across string fields
  * JSONB/nested field access (field.subfield:value)
  * Null value queries (field:null)
  * Fuzzy search (term~, term~2)

Usage in Swagger/OpenAPI annotations:
  schema:
    $ref: "#/components/schemas/SearchQuery"

Or in Go swaggo annotations:
  // @param query query string false "Search query" SchemaExample(SearchQuery)

This provides a reusable schema definition similar to PatchBody, making it
easy for API consumers to understand and use Lucene query syntax for filtering
and searching resources.

Related to PR tink3rlabs#124 which added Lucene search support.
Lutherwaves pushed a commit to Lutherwaves/magic that referenced this pull request Jan 1, 2026
…efinitions

Added SearchQuery schema to types/types.go with 34 comprehensive examples
covering all Lucene search capabilities:

Schema Structure:
- Type: string
- Description: Full Lucene query syntax reference
- Default example: "name:john AND status:active"
- 34 example queries covering:
  * Basic field searches and wildcards
  * Boolean operators (AND, OR, NOT, +, -)
  * Range queries (inclusive, exclusive, open-ended, dates)
  * Quoted phrases and escaped characters
  * Complex nested queries
  * Implicit search across string fields
  * JSONB/nested field access (field.subfield:value)
  * Null value queries (field:null)
  * Fuzzy search (term~, term~2)

Usage in Swagger/OpenAPI annotations:
  schema:
    $ref: "#/components/schemas/SearchQuery"

Or in Go swaggo annotations:
  // @param query query string false "Search query" SchemaExample(SearchQuery)

This provides a reusable schema definition similar to PatchBody, making it
easy for API consumers to understand and use Lucene query syntax for filtering
and searching resources.

Related to PR tink3rlabs#124 which added Lucene search support.
@Lutherwaves Lutherwaves changed the title feat: Lucene query parser with JSONB and PartiQL support feat: Lucene query parser with sql and dynamodb support Jan 1, 2026
@Lutherwaves
Copy link
Contributor Author

@deanefrati @ayashjorden if anyone has some time to review, I also tested on a service using PSQL - all works as expected, see description for full examples.

Copy link
Contributor

@ayashjorden ayashjorden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Please add unit-tests

@Lutherwaves
Copy link
Contributor Author

One additional thing I will be adding is the ability to instatiate the parser with some fields which are excluded from implicit search for multi-tenancy

@Lutherwaves
Copy link
Contributor Author

LGTM, Please add unit-tests

@ayashjorden Ready

},
},
{
name: "duplicate field names (last wins)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Havan't read upstream docs, just make sure that we document this, both in func comments and in Magic README.md (or the appropriate sub-readme

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I think it would be better to error out if there are conflicting types, making hte query invalid.

Copy link
Contributor

@ayashjorden ayashjorden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lutherwaves, This is a mammoth of a PR, I've added static-is comments suggesting simplifications or brevity.
Not sure how to go about approving it, @deanefrati as a production user, if you have an idea?

@ayashjorden ayashjorden dismissed their stale review January 19, 2026 01:04

Updates were made, tests were added, I'm ok with the PR, but Dean needs to also take a look as he's using it in several places.

@Lutherwaves
Copy link
Contributor Author

@ayashjorden thanks for the review, I will look into the suggestions. WRT the size - splitting this PR just to make it smaller, while I agree with the intention behind it, seems overkill in this case. This is a new, backwards compatible extension of the previous feature and the majority of lines come from the unit tests.

@ayashjorden
Copy link
Contributor

ayashjorden commented Jan 19, 2026 via email

@Lutherwaves
Copy link
Contributor Author

It's literally an eye strain to review, I really appreciate you taking the time.

@Lutherwaves
Copy link
Contributor Author

@ayashjorden to the best of my prompt & review skills, I addressed the review.

@Lutherwaves Lutherwaves force-pushed the dev branch 2 times, most recently from dfcf68b to 66595d3 Compare February 10, 2026 19:27
@deanefrati
Copy link
Contributor

That's indeed a behemoth of a PR :) would you be able to add some usage instructions for developers for how to use this new functionality? I would like to test it in one of my services to make sure it works in real life...

@Lutherwaves
Copy link
Contributor Author

That's indeed a behemoth of a PR :) would you be able to add some usage instructions for developers for how to use this new functionality? I would like to test it in one of my services to make sure it works in real life...

It is basically the StorageAdapter.Search method now accepts proper lucene. I have tested it extensively on postgresql and somewhat with sqlite, but not in dynamodb. Check the PR description for all examples. Happy to prompt up some docs before we merge if you think this would add value.

Implements Apache Lucene query syntax parser using go-lucene library
with custom drivers for PostgreSQL (JSONB) and DynamoDB PartiQL.

Supports field:value queries, wildcards, ranges, boolean operators,
quoted phrases, fuzzy search, implicit search expansion, and JSONB
field notation. Includes field validation, security limits, and
comprehensive test coverage.
- Remove driver storage redundancy
- Simpler and more configurable parser initi
- Make security limits and tag names configurable
- Clean up unused code
   - NewParser(model) replaces NewParserFromType(model)
   - Implicit search restricted to string fields only
   - Removed complex tag configuration
   - Split driver.go into postgres_driver.go and dynamodb_driver.go
   - NewPostgresDriver() and NewDynamoDBDriver() constructors
   - Checks for JSONB/JSON in type name, maps, and structs
   - Parse-time validation (HTTP 400) not runtime (HTTP 500)
…ySQL, and SQLite

- Renamed postgres_driver.go to sql_driver.go for generic SQL support
- Refactored PostgresJSONBDriver to SQLDriver with provider field
- Added provider-specific switch statements for:
  * Case-insensitive LIKE (ILIKE vs LOWER() vs LIKE)
  * Fuzzy search (similarity() vs SOUNDEX() vs error)
  * JSON field extraction (JSONB ->> vs JSON_EXTRACT)
  * Parameter placeholders ($N vs ?)
- Updated ParseToSQL() to accept provider string parameter
- Updated SQLAdapter.Search() to pass provider to parser
- Updated all tests to include "postgresql" provider parameter
- PostgreSQL: ILIKE, ::text casting, similarity(), JSONB ->> operators, $N placeholders
- MySQL: LOWER() + LIKE, JSON_UNQUOTE(JSON_EXTRACT()), SOUNDEX(), ? placeholders
- SQLite: LIKE (case-insensitive), JSON_EXTRACT(), no fuzzy search, ? placeholders
…efinitions

Schema Structure:
- Type: string
- Description: Full Lucene query syntax reference
- Default example: "name:john AND status:active"
- 34 example queries covering:
  * Basic field searches and wildcards
  * Boolean operators (AND, OR, NOT, +, -)
  * Range queries (inclusive, exclusive, open-ended, dates)
  * Quoted phrases and escaped characters
  * Complex nested queries
  * Implicit search across string fields
  * JSONB/nested field access (field.subfield:value)
  * Null value queries (field:null)
  * Fuzzy search (term~, term~2)
- Escape PartiQL values in DynamoDB driver
- Add lucene query openapi
- Simplify tests
…holder handling

* Use JSON struct tags instead of Go field names for cursor value extraction
* Remove PostgreSQL  pre-conversion that conflicted with GORM's placeholders
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants