-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using ObjectReader in the SqlSegmentsMetadataManager.doPollSegments #17732
base: master
Are you sure you want to change the base?
Conversation
Thanks for creating the PR, @umisan ! Have you been able to compare the performance of this code while polling real segments before and after the change? In my experience, most of the time in the poll is actually spent in the IO itself rather than the Jackson deserialization. |
@kfaraz Sorry, I haven't tested this change on our Druid cluster yet. I completely agree with your point that most of the time spent during polling is due to I/O, and the improvement from deserialization optimization might be negligible. Our Druid cluster has about 1 million segments and takes several minutes to load newly added segments. Unfortunately, we don't have a staging Druid cluster, so I haven't been able to test this change in an environment with a large number of segments. I am considering setting up a test Druid cluster to evaluate this change. However, it's possible that the results will show that this PR doesn't provide meaningful improvements. |
@umisan , yes, that's what I fear as well. FYI, we have recently merged a segment caching feature in #17653 . |
I understand the current situation. Thank you for reviewing my PR and sharing your insights! |
Thanks a lot, @umisan ! I am really glad to hear that you have enjoyed using Druid. |
@umisan i didn't find any information about ObjectReader from the link you gave. If the ObjectReader has better performance, what I think is we can apply it to the ingestion module which uses ObjectMapper heavily. |
@FrankChen021 |
Description
This PR aims to speed up metadata reading, improving performance during metadata polling.
This patch changes the code to use ObjectReader instead of ObjectMapper when reading multiple JSON objects. Since ObjectReader is slightly faster in this scenario, this change should improve the performance of metadata polling.
jackson document
Release note
Improved: You can now load newly added segments more quickly.
Key changed/added classes in this PR
SqlSegmentsMetadataManager
This PR has: