Using ObjectReader in the SqlSegmentsMetadataManager.doPollSegments #17732

umisan · 2025-02-16T03:40:43Z

Description

This PR aims to speed up metadata reading, improving performance during metadata polling.

This patch changes the code to use ObjectReader instead of ObjectMapper when reading multiple JSON objects. Since ObjectReader is slightly faster in this scenario, this change should improve the performance of metadata polling.
jackson document

Release note

Improved: You can now load newly added segments more quickly.

Key changed/added classes in this PR

SqlSegmentsMetadataManager

This PR has:

been self-reviewed.

kfaraz · 2025-02-16T07:05:39Z

Thanks for creating the PR, @umisan !

Have you been able to compare the performance of this code while polling real segments before and after the change?

In my experience, most of the time in the poll is actually spent in the IO itself rather than the Jackson deserialization.

umisan · 2025-02-16T08:14:08Z

@kfaraz
Thanks for reviewing my PR.

Sorry, I haven't tested this change on our Druid cluster yet.

I completely agree with your point that most of the time spent during polling is due to I/O, and the improvement from deserialization optimization might be negligible.

Our Druid cluster has about 1 million segments and takes several minutes to load newly added segments. Unfortunately, we don't have a staging Druid cluster, so I haven't been able to test this change in an environment with a large number of segments.

I am considering setting up a test Druid cluster to evaluate this change. However, it's possible that the results will show that this PR doesn't provide meaningful improvements.

kfaraz · 2025-02-16T09:13:34Z

I am considering setting up a test Druid cluster to evaluate this change. However, it's possible that the results will show that this PR doesn't provide meaningful improvements.

@umisan , yes, that's what I fear as well.
I had actually tried out a change to reduce the amount of deserialization done but it did not affect the total polling time at all.

FYI, we have recently merged a segment caching feature in #17653 .
The code in this patch is able to do a delta poll of ~600k segments in just a couple of seconds.
Essentially, it does not fetch segments from the metadata store which have not been updated.
While this does not currently impact the polling time in SqlSegmentsMetadataManager,
I intend to plug in the same logic there once I have had time to test out and benchmark the cache changes thoroughly.

umisan · 2025-02-16T11:51:23Z

I understand the current situation.
For now, I’m planning to set up a test Druid cluster to test this PR and also to prepare for future contributions. However, I don’t have experience building and running Druid locally, and I can only work on this during weekends, so it will take some time.
Given this, I’m thinking of closing this PR for now. If testing shows that this change provides meaningful improvements, I will reopen it.

Thank you for reviewing my PR and sharing your insights!
Thanks to this amazing open-source project, our team has been able to improve system performance while reducing costs.
Moving forward, I hope to contribute, even if only in a small way, to help improve Druid.

kfaraz · 2025-02-17T02:25:10Z

Thanks a lot, @umisan !

I am really glad to hear that you have enjoyed using Druid.
I look forward to the contributions from you!

FrankChen021 · 2025-02-17T10:12:12Z

@umisan i didn't find any information about ObjectReader from the link you gave. If the ObjectReader has better performance, what I think is we can apply it to the ingestion module which uses ObjectMapper heavily.

umisan · 2025-02-17T10:37:43Z

@FrankChen021
Sorry, my previous link was incorrect.
Here is the correct link:
https://github.com/fasterxml/jackson-docs/wiki/presentation:-jackson-performance
This document explains that ObjectReader performs better than ObjectMapper.

mod to using ObjectReader

703e32e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using ObjectReader in the SqlSegmentsMetadataManager.doPollSegments #17732

Using ObjectReader in the SqlSegmentsMetadataManager.doPollSegments #17732

umisan commented Feb 16, 2025

kfaraz commented Feb 16, 2025

umisan commented Feb 16, 2025

kfaraz commented Feb 16, 2025

umisan commented Feb 16, 2025

kfaraz commented Feb 17, 2025

FrankChen021 commented Feb 17, 2025

umisan commented Feb 17, 2025

Using ObjectReader in the SqlSegmentsMetadataManager.doPollSegments #17732

Are you sure you want to change the base?

Using ObjectReader in the SqlSegmentsMetadataManager.doPollSegments #17732

Conversation

umisan commented Feb 16, 2025

Description

Release note

Key changed/added classes in this PR

kfaraz commented Feb 16, 2025

umisan commented Feb 16, 2025

kfaraz commented Feb 16, 2025

umisan commented Feb 16, 2025

kfaraz commented Feb 17, 2025

FrankChen021 commented Feb 17, 2025

umisan commented Feb 17, 2025