Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support random sample #39532

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

SpadeA-Tang
Copy link
Contributor

@SpadeA-Tang SpadeA-Tang commented Jan 23, 2025

issue: #39541

This PR implements random sample, the syntax is:

filter="random_sample(factor)"
or 
filter="random_sample(factor, boolean_expression)"

where factor is a float between (0, 1) and boolean_expression is like
 "1 <= number < 10", "color in ["read, "blue"]" or others

Some simple benchmarks with not using sample:
test environment:
ubuntu22, standalone cluster, CPU 9900K, 10_000_000 rows with schema:

schema = milvus_client.create_schema(enable_dynamic_field=True, auto_id=True)
schema.add_field("id", DataType.INT64, is_primary=True, max_length=100)
schema.add_field("number", DataType.INT64, max_length=100)
schema.add_field("content", DataType.VARCHAR, max_length=10000, enable_analyzer=True,
                 analyzer_params=analyzer_params, enable_match=True)
schema.add_field("embedding", DataType.FLOAT_VECTOR, dim=EMBEDDING_DIM)

image

number filed distributes within [0, 10) evenly.
content has 1/3 hit rate for the query for each row.

…for expr in proxy module, 2. polish code and make enure the correctness, 3. add tests

Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: SpadeA-Tang
To complete the pull request process, please assign wxyucs after the PR has been reviewed.
You can assign the PR to them by writing /assign @wxyucs in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@SpadeA-Tang SpadeA-Tang marked this pull request as draft January 23, 2025 02:11
@sre-ci-robot sre-ci-robot added do-not-merge/work-in-progress Don't merge even CI passed. size/XXL Denotes a PR that changes 1000+ lines. area/compilation labels Jan 23, 2025
@mergify mergify bot added dco-passed DCO check passed. kind/feature Issues related to feature request from users labels Jan 23, 2025
Copy link
Contributor

mergify bot commented Jan 23, 2025

@SpadeA-Tang Please associate the related issue to the body of your Pull Request. (eg. “issue: #”)

Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
Copy link

codecov bot commented Jan 23, 2025

Codecov Report

Attention: Patch coverage is 77.94118% with 30 lines in your changes missing coverage. Please review.

Project coverage is 69.61%. Comparing base (1a1ed07) to head (23de207).
Report is 36 commits behind head on master.

Files with missing lines Patch % Lines
internal/core/src/query/PlanProto.cpp 50.00% 10 Missing ⚠️
...ternal/core/src/exec/operator/RandomSampleNode.cpp 86.79% 7 Missing ⚠️
internal/core/src/exec/operator/RandomSampleNode.h 40.00% 6 Missing ⚠️
internal/core/src/plan/PlanNode.h 60.00% 4 Missing ⚠️
internal/core/src/query/SearchBruteForce.cpp 0.00% 1 Missing ⚠️
internal/core/src/segcore/Utils.cpp 0.00% 1 Missing ⚠️
internal/core/src/storage/Util.cpp 50.00% 1 Missing ⚠️

❌ Your patch check has failed because the patch coverage (77.94%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.
❌ Your project check has failed because the head coverage (69.61%) is below the target coverage (77.00%). You can increase the head coverage or adjust the target coverage.

❗ There is a different number of reports uploaded between BASE (1a1ed07) and HEAD (23de207). Click for more details.

HEAD has 2 uploads less than BASE
Flag BASE (1a1ed07) HEAD (23de207)
3 1
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #39532       +/-   ##
===========================================
- Coverage   81.00%   69.61%   -11.40%     
===========================================
  Files        1407      302     -1105     
  Lines      198705    27103   -171602     
===========================================
- Hits       160970    18868   -142102     
+ Misses      32074     8235    -23839     
+ Partials     5661        0     -5661     
Components Coverage Δ
Client ∅ <ø> (∅)
Core 69.61% <77.94%> (+0.03%) ⬆️
Go ∅ <ø> (∅)
Files with missing lines Coverage Δ
internal/core/src/exec/Driver.cpp 81.81% <100.00%> (+0.42%) ⬆️
internal/core/src/exec/operator/FilterBitsNode.cpp 98.82% <100.00%> (+5.96%) ⬆️
internal/core/src/exec/operator/FilterBitsNode.h 66.66% <ø> (+16.66%) ⬆️
internal/core/src/segcore/ConcurrentVector.h 95.97% <ø> (ø)
internal/core/src/query/SearchBruteForce.cpp 79.60% <0.00%> (ø)
internal/core/src/segcore/Utils.cpp 71.85% <0.00%> (ø)
internal/core/src/storage/Util.cpp 75.14% <50.00%> (ø)
internal/core/src/plan/PlanNode.h 52.89% <60.00%> (+0.64%) ⬆️
internal/core/src/exec/operator/RandomSampleNode.h 40.00% <40.00%> (ø)
...ternal/core/src/exec/operator/RandomSampleNode.cpp 86.79% <86.79%> (ø)
... and 1 more

... and 1109 files with indirect coverage changes

Signed-off-by: SpadeA-Tang <[email protected]>
@SpadeA-Tang SpadeA-Tang marked this pull request as ready for review January 23, 2025 06:21
@sre-ci-robot sre-ci-robot removed the do-not-merge/work-in-progress Don't merge even CI passed. label Jan 23, 2025
Copy link
Contributor

mergify bot commented Jan 23, 2025

@SpadeA-Tang E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Jan 23, 2025

@SpadeA-Tang cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

Copy link
Contributor

mergify bot commented Jan 23, 2025

@SpadeA-Tang go-sdk check failed, comment rerun go-sdk can trigger the job again.

Signed-off-by: SpadeA-Tang <[email protected]>
Copy link
Contributor

mergify bot commented Jan 24, 2025

@SpadeA-Tang go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Jan 24, 2025

@SpadeA-Tang E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@SpadeA-Tang
Copy link
Contributor Author

rerun go-sdk

@SpadeA-Tang
Copy link
Contributor Author

/run-cpu-e2e

Copy link
Contributor

mergify bot commented Jan 24, 2025

@SpadeA-Tang go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Jan 24, 2025

@SpadeA-Tang E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@SpadeA-Tang
Copy link
Contributor Author

/run-cpu-e2e

@SpadeA-Tang
Copy link
Contributor Author

rerun go-sdk

Copy link
Contributor

mergify bot commented Jan 24, 2025

@SpadeA-Tang E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@SpadeA-Tang
Copy link
Contributor Author

/run-cpu-e2e

Copy link
Contributor

mergify bot commented Feb 5, 2025

@SpadeA-Tang E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@SpadeA-Tang
Copy link
Contributor Author

/run-cpu-e2e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/compilation dco-passed DCO check passed. kind/feature Issues related to feature request from users size/XXL Denotes a PR that changes 1000+ lines.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants