Skip to content

Conversation

@morningman
Copy link
Contributor

@morningman morningman commented Oct 24, 2025

What problem does this PR solve?

Summary

Introduced adaptive logic to dynamically control merge window sizing in
MergeRangeFileReader, preventing severe read amplification in sparse data
scenarios while maintaining efficient merging for dense data patterns.

Problem

The original implementation used a fixed merge window (e.g., 8MB) which
worked well for dense columnar data but caused severe read amplification
with sparse ranges:

Example: Large Gap Scenario (3 ranges × 100KB with 600KB gaps)

  • User needs: 300KB total
  • If original logic merged all: 1500KB
  • Overall amplification: 5.0x
  • Problem: Reads 5x more data than needed

Solution

Implemented a three-layer adaptive defense mechanism in read_at_impl():

  1. Hard Gap Limit (max_single_gap = 512KB)
    Immediately rejects merging if a single gap exceeds 512KB, preventing
    catastrophic amplification from huge gaps.

  2. Original Threshold (SMALL_IO = 2MB)
    Stops merging when accumulated data > 2MB and next gap ≥ 2MB, maintaining
    backward compatibility for typical use cases.

  3. Predictive Gap Ratio Check (adaptive_shrink_threshold = 0.4)
    Key Innovation: Proactively checks if including the next gap would push
    the gap/content ratio above 40%. Stops merging BEFORE including problematic
    gaps, not after.

  • Only activates after accumulating ≥512KB content (min_content_for_adaptive)
  • Prevents over-conservative behavior with small initial ranges
  • Formula: if (hollow_size + next_gap) / content_size > 0.4 → STOP

Performance Results

  1. Scenario 1: Sparse Gaps (15 ranges × 80KB with 50KB gaps)

    Before (without adaptive logic):

    • Physical IO: 1950KB (merges all ranges in 1 IO)
    • User requests: 1200KB
    • Overall amplification: 1.625x
    • IO count: 1

    After (with adaptive logic):

    • Physical IO: 1800KB (partial merges in 3 IOs)
    • User requests: 1200KB
    • Overall amplification: 1.5x
    • IO count: 3
  2. Scenario 2: Dense Gaps (10 ranges × 100KB with 5KB gaps)

    Before & After (identical behavior):

    • Physical IO: 1045KB
    • User requests: 1000KB
    • Overall amplification: 1.045x
    • IO count: 1
  3. Scenario 3: Large Gaps (3 ranges × 100KB with 600KB gaps)

    Before (if original logic merged):

    • Physical IO: 1500KB
    • User requests: 300KB
    • Overall amplification: 5.0x (catastrophic!)
    • IO count: 1

    After (with max_single_gap=512KB):

    • Physical IO: 300KB (each range separate)
    • User requests: 300KB
    • Overall amplification: 1.0x
    • IO count: 3

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Oct 24, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@morningman morningman changed the title [opt](merge-io) adaptive merge io [opt](merge-io) Implement adaptive merge window sizing for MergeRangeFileReader to prevent read amplification Oct 24, 2025
@morningman
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

ClickBench: Total hot run time: 27.81 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 198a459c9f2605de577197aa6fc37035efe7f756, data reload: false

query1	0.06	0.04	0.05
query2	0.10	0.06	0.06
query3	0.26	0.08	0.08
query4	1.61	0.12	0.12
query5	0.29	0.26	0.25
query6	1.20	0.66	0.66
query7	0.03	0.03	0.03
query8	0.06	0.05	0.05
query9	0.64	0.53	0.52
query10	0.59	0.59	0.59
query11	0.16	0.11	0.12
query12	0.16	0.13	0.13
query13	0.62	0.60	0.60
query14	1.01	1.03	1.02
query15	0.86	0.84	0.86
query16	0.40	0.41	0.39
query17	1.06	1.05	1.03
query18	0.23	0.20	0.21
query19	1.93	1.80	1.81
query20	0.01	0.01	0.01
query21	15.47	0.18	0.12
query22	5.18	0.07	0.04
query23	15.65	0.27	0.10
query24	3.13	0.54	0.59
query25	0.07	0.07	0.06
query26	0.14	0.12	0.13
query27	0.07	0.06	0.06
query28	4.46	1.13	0.94
query29	12.57	4.03	3.34
query30	0.28	0.14	0.11
query31	2.83	0.59	0.39
query32	3.25	0.56	0.46
query33	3.03	3.03	3.07
query34	15.73	5.23	4.53
query35	4.63	4.57	4.58
query36	0.68	0.51	0.49
query37	0.10	0.07	0.07
query38	0.07	0.04	0.04
query39	0.03	0.03	0.03
query40	0.17	0.14	0.14
query41	0.09	0.03	0.04
query42	0.04	0.03	0.03
query43	0.04	0.04	0.04
Total cold run time: 98.99 s
Total hot run time: 27.81 s

if (gap >= max_single_gap) {
break;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gap >= max_single_gap meas gap >= SMALL_IO is always true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants