Skip to content

Conversation

@airborne12
Copy link
Member

@airborne12 airborne12 commented Oct 24, 2025

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

  • Extend search() to accept 1–3 arguments:

    1. query: the original search DSL string (unchanged).
    2. default_field (optional): field name applied to unqualified terms in query.
    3. default_operator (optional): boolean operator for multi-term queries; accepts "and" or "or" (case-insensitive).
    • If omitted or empty, the operator defaults to or.
  • Parser updates:

    • When default_field is provided, unqualified terms in query are automatically rewritten to default_field:term.
    • default_operator is validated and normalized; invalid values produce a clear error.

Example

Before (must qualify every term or rely on engine defaulting rules):

SELECT *
FROM t
WHERE search('title:payment title:timeout')

After (use default_field = "title" and default_operator = "and"):

SELECT *
FROM t
WHERE search('payment timeout', 'title', 'and');

This evaluates as if the query were title:payment AND title:timeout.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Oct 24, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@airborne12
Copy link
Member Author

run buildall

@airborne12 airborne12 requested a review from Copilot October 24, 2025 12:33
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for default field and default operator parameters to the search DSL functionality, enabling simplified query syntax where field names and boolean operators can be implied rather than explicitly specified.

Key Changes:

  • Extended the search() function to accept 1-3 parameters (previously only 1)
  • Added DSL expansion logic to automatically prefix field names when a default field is provided
  • Implemented default operator support ("and"/"or") for multi-term queries

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
Search.java Updated function signatures to support optional default_field and default_operator parameters; added getter methods for new parameters
SearchDslParser.java Implemented DSL expansion logic including field prefix injection, operator normalization, and term tokenization
SearchDslParserTest.java Added comprehensive test coverage for default field/operator scenarios including edge cases and validation

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +137 to +147
private static String normalizeDefaultOperator(String operator) {
if (operator == null || operator.trim().isEmpty()) {
return "or"; // Default to OR
}
String normalized = operator.trim().toLowerCase();
if ("and".equals(normalized) || "or".equals(normalized)) {
return normalized;
}
throw new IllegalArgumentException("Invalid default operator: " + operator
+ ". Must be 'and' or 'or'");
}
Copy link

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The null/empty check and trimming logic is duplicated. Consider extracting the trimming and null handling: String trimmed = (operator == null) ? \"\" : operator.trim(); if (trimmed.isEmpty()) return \"or\"; This reduces redundancy and makes the intent clearer.

Copilot uses AI. Check for mistakes.
Comment on lines +269 to +270
if (upperRemaining.startsWith("AND ") || upperRemaining.startsWith("AND\t")
|| (upperRemaining.equals("AND") && i + 3 >= dsl.length())) {
Copy link

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The operator matching logic for AND, OR, and NOT (lines 269-317) contains duplicated patterns. Consider extracting a helper method matchOperator(String remaining, String operator, int length) to reduce code duplication and improve maintainability.

Copilot uses AI. Check for mistakes.
Comment on lines +242 to +244
return upper.matches(".*\\s+(AND|OR)\\s+.*")
|| upper.matches("^NOT\\s+.*")
|| upper.matches(".*\\s+NOT\\s+.*");
Copy link

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple regex matches on the same string can be inefficient. Consider using a single compiled Pattern or combining these patterns into one regex: Pattern.matches(\"^NOT\\s+.*|.*\\s+(AND|OR|NOT)\\s+.*\", upper) to improve performance for frequently called code.

Copilot uses AI. Check for mistakes.
// for incomplete DSL like "foo" (no field specified)
String dsl = "foo";

// This should throw an exception because "foo" alone is not valid DSL
Copy link

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is misleading. The test expects an exception not because \"foo\" is invalid DSL in general, but because with an empty default field, the expansion fails. Clarify that the exception occurs due to the empty default field preventing proper DSL expansion.

Suggested change
// This should throw an exception because "foo" alone is not valid DSL
// This should throw an exception because with an empty default field, the parser cannot expand "foo" into a valid fielded query

Copilot uses AI. Check for mistakes.
@doris-robot
Copy link

TPC-DS: Total hot run time: 189640 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b1bb6da43bc3aebbb99ed2b2d8a54b4e517acfc9, data reload: false

query1	1060	438	420	420
query2	6577	1698	1677	1677
query3	6758	226	220	220
query4	26225	24034	23395	23395
query5	5781	639	471	471
query6	359	248	241	241
query7	4662	499	298	298
query8	324	286	280	280
query9	8736	2569	2603	2569
query10	550	362	297	297
query11	15235	14990	14948	14948
query12	185	125	124	124
query13	1759	590	457	457
query14	11906	9329	9290	9290
query15	254	186	182	182
query16	7826	668	488	488
query17	1598	760	628	628
query18	2041	502	355	355
query19	284	239	182	182
query20	151	133	130	130
query21	219	139	123	123
query22	4579	4555	4501	4501
query23	34904	33874	33823	33823
query24	8554	2507	2555	2507
query25	635	551	445	445
query26	1224	278	162	162
query27	3049	526	390	390
query28	4390	2199	2189	2189
query29	806	670	521	521
query30	349	240	207	207
query31	967	1036	760	760
query32	84	73	66	66
query33	588	389	337	337
query34	819	870	538	538
query35	813	929	797	797
query36	1007	1045	922	922
query37	130	105	86	86
query38	3543	3592	3519	3519
query39	1460	1402	1413	1402
query40	222	126	116	116
query41	59	56	57	56
query42	125	108	109	108
query43	492	510	453	453
query44	1206	730	729	729
query45	183	180	171	171
query46	872	985	643	643
query47	1786	1802	1732	1732
query48	399	427	316	316
query49	781	509	415	415
query50	648	693	415	415
query51	3853	3944	3861	3861
query52	109	101	97	97
query53	229	264	190	190
query54	609	592	521	521
query55	91	87	82	82
query56	338	314	332	314
query57	1164	1197	1112	1112
query58	286	291	274	274
query59	2536	2636	2505	2505
query60	350	351	334	334
query61	163	163	155	155
query62	781	741	667	667
query63	230	192	194	192
query64	4550	1171	861	861
query65	4049	3950	3949	3949
query66	1039	431	328	328
query67	15364	15260	14852	14852
query68	8309	924	590	590
query69	545	324	301	301
query70	1394	1262	1266	1262
query71	517	343	338	338
query72	5744	5015	4849	4849
query73	676	658	360	360
query74	8862	9081	8677	8677
query75	4153	3409	2894	2894
query76	3772	1153	736	736
query77	819	395	316	316
query78	9472	9572	8861	8861
query79	2015	837	591	591
query80	648	559	498	498
query81	491	268	227	227
query82	460	163	136	136
query83	272	281	244	244
query84	258	113	98	98
query85	876	476	429	429
query86	335	292	305	292
query87	3692	3723	3681	3681
query88	3657	2230	2220	2220
query89	395	327	290	290
query90	2046	215	218	215
query91	182	173	135	135
query92	88	73	65	65
query93	1480	984	641	641
query94	680	457	325	325
query95	399	325	317	317
query96	504	577	277	277
query97	2916	2995	2855	2855
query98	257	210	211	210
query99	1380	1418	1315	1315
Total cold run time: 280199 ms
Total hot run time: 189640 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 27.7 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit b1bb6da43bc3aebbb99ed2b2d8a54b4e517acfc9, data reload: false

query1	0.05	0.06	0.05
query2	0.09	0.06	0.05
query3	0.26	0.09	0.09
query4	1.61	0.12	0.13
query5	0.28	0.26	0.25
query6	1.21	0.65	0.64
query7	0.03	0.03	0.03
query8	0.05	0.04	0.04
query9	0.64	0.52	0.52
query10	0.60	0.58	0.57
query11	0.16	0.11	0.11
query12	0.14	0.12	0.12
query13	0.63	0.61	0.60
query14	1.01	1.03	1.01
query15	0.85	0.83	0.87
query16	0.41	0.39	0.41
query17	1.03	1.02	1.02
query18	0.22	0.20	0.21
query19	1.90	1.83	1.79
query20	0.02	0.01	0.01
query21	15.50	0.20	0.13
query22	5.05	0.08	0.05
query23	15.66	0.27	0.10
query24	2.87	0.58	1.02
query25	0.08	0.06	0.06
query26	0.14	0.13	0.14
query27	0.06	0.04	0.05
query28	5.15	1.13	0.95
query29	12.63	3.94	3.27
query30	0.28	0.13	0.11
query31	2.82	0.59	0.37
query32	3.23	0.55	0.47
query33	3.01	3.09	3.01
query34	15.79	5.15	4.58
query35	4.58	4.58	4.55
query36	0.66	0.51	0.49
query37	0.11	0.07	0.06
query38	0.07	0.04	0.04
query39	0.04	0.03	0.03
query40	0.19	0.14	0.14
query41	0.10	0.03	0.03
query42	0.04	0.03	0.03
query43	0.04	0.04	0.04
Total cold run time: 99.29 s
Total hot run time: 27.7 s

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 5.81% (9/155) 🎉
Increment coverage report
Complete coverage report

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Oct 25, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@zzzxl1993 zzzxl1993 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@airborne12 airborne12 merged commit ef415d2 into apache:master Oct 25, 2025
31 of 33 checks passed
@airborne12 airborne12 deleted the feature-variant branch October 25, 2025 11:47
github-actions bot pushed a commit that referenced this pull request Oct 25, 2025
### What problem does this PR solve?

Problem Summary:

* Extend `search()` to accept **1–3 arguments**:

  1. `query`: the original search DSL string (unchanged).
2. `default_field` *(optional)*: field name applied to unqualified terms
in `query`.
3. `default_operator` *(optional)*: boolean operator for multi-term
queries; accepts `"and"` or `"or"` (case-insensitive).

  * If omitted or empty, the operator defaults to **`or`**.
* Parser updates:

* When `default_field` is provided, unqualified terms in `query` are
automatically rewritten to `default_field:term`.
* `default_operator` is validated and normalized; invalid values produce
a clear error.

### Example

**Before** (must qualify every term or rely on engine defaulting rules):

```sql
SELECT *
FROM t
WHERE search('title:payment title:timeout')
```

**After** (use `default_field = "title"` and `default_operator =
"and"`):

```sql
SELECT *
FROM t
WHERE search('payment timeout', 'title', 'and');
```

This evaluates as if the query were `title:payment AND title:timeout`.
yiguolei pushed a commit that referenced this pull request Oct 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.0.1-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants