Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Normal compaction doesnt start after import #39633

Open
1 task done
gland1 opened this issue Feb 4, 2025 · 8 comments · May be fixed by #39650
Open
1 task done

[Bug]: Normal compaction doesnt start after import #39633

gland1 opened this issue Feb 4, 2025 · 8 comments · May be fixed by #39650
Assignees
Labels
kind/bug Issues or changes related a bug triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@gland1
Copy link

gland1 commented Feb 4, 2025

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:2.5.3 (or latest)
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):    pulsar
- SDK version(e.g. pymilvus v2.0.0rc2):pymilvus v2.4
- OS(Ubuntu or CentOS): Ubuntu
- CPU/Memory: 128G
- GPU: none
- Others:

Current Behavior

I'm importing a 50M dataset with vector length 768.
stats task is disabled and no index is created.
I expected compaction to start right after import.
Compaction doesn't start.
I see compaction_trigger in debug mode printing:
datacoord/compaction_trigger.go:316] ["the length of SegmentsChanPart is 0, skip to handle compaction\

I suspect it has to do with the new IsInvisible flag

Expected Behavior

I expected compaction to start and reducing the number of segments as I configured 38400 segment size

Steps To Reproduce

1. use environment described above(although I believe it should happen with any 2.5 setup)
2. set maxSegmentSize and maxDiskSegmentSize to 38400
3. upload a dataset and  then import
4. 35 segments are flushed and compaction doesn't occur

Milvus Log

No response

Anything else?

No response

@gland1 gland1 added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 4, 2025
@xiaofan-luan
Copy link
Collaborator

  1. why disable stats? the stats task shoulnd't not be disabled
  2. why we set segment size to that large? We recommend segment size to be no more than 8G
  3. no compaction will be triggered if segment index is not done
  4. need more logs for details.

also 128G memory may not be enough for hosting 50M 768 dim data.
if you need more help please contact us [email protected]

@yanliang567
Copy link
Contributor

/assign @gland1

@yanliang567 yanliang567 added triage/needs-information Indicates an issue needs more information in order to work on it. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 5, 2025
@gland1
Copy link
Author

gland1 commented Feb 5, 2025

  1. why disable stats? the stats task shoulnd't not be disabled
  2. why we set segment size to that large? We recommend segment size to be no more than 8G
  3. no compaction will be triggered if segment index is not done
  4. need more logs for details.

also 128G memory may not be enough for hosting 50M 768 dim data. if you need more help please contact us [email protected]

  1. we disable stats because when importing a large dataset it takes much longer. we do all search by vectors so we dont see why we need stats.
  2. We work with large dataset (largest 1B). with 8G segments using DISKANN we get sub 100 QPS.so we need large segments
  3. I thought compaction is about segments not indices. why wait for indexing on small segments , only to destroy them and create final indices on the larger segments?
  4. I'll attach logs.

We want to import a dataset , compact it to desired number of segments and only then create an index.
This used to work in 2.4

@gland1
Copy link
Author

gland1 commented Feb 5, 2025

0.log.gz
Attaching datacoord logs

@yanliang567
Copy link
Contributor

/assign @czs007
please help to explain the reason that Milvus does not do compaction if the segment is not built index.

@xiaofan-luan
Copy link
Collaborator

  1. why disable stats? the stats task shoulnd't not be disabled
  2. why we set segment size to that large? We recommend segment size to be no more than 8G
  3. no compaction will be triggered if segment index is not done
  4. need more logs for details.

also 128G memory may not be enough for hosting 50M 768 dim data. if you need more help please contact us [email protected]

  1. we disable stats because when importing a large dataset it takes much longer. we do all search by vectors so we dont see why we need stats.
  2. We work with large dataset (largest 1B). with 8G segments using DISKANN we get sub 100 QPS.so we need large segments
  3. I thought compaction is about segments not indices. why wait for indexing on small segments , only to destroy them and create final indices on the larger segments?
  4. I'll attach logs.

We want to import a dataset , compact it to desired number of segments and only then create an index. This used to work in 2.4

  1. large diskann index build is fairly slow, could be even more than 1 hour.
  2. compaction need to wait on index build done., this is the current rule. The reason is to avoid we do cascade compaction and no segment with index can be loaded, cause slow search.

@xiaofan-luan
Copy link
Collaborator

you can tune dataCoord.compaction.indexBasedCompaction to be false if you just do one time ingestion.

@xiaofan-luan xiaofan-luan linked a pull request Feb 5, 2025 that will close this issue
@gland1
Copy link
Author

gland1 commented Feb 6, 2025

indexBasedCompaction

Thanks I can confirm this solves the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants