enhancement(sinks): Add support for `max_bytes` for memory buffers #23330

graphcareful · 2025-07-02T17:56:21Z

Summary

This PR adds support for memory buffers to be bound in terms of bytes allocated. This is an opt-in feature which is defaulted to false, meaning that the current implementation and its defaults will still be selected when not explicitly supplying a value for max_bytes.

At the core of this change is a new interface that allowes the selection of a different lock-free queue. This queue is at the center of the implementation of memory buffers. Today that queue is crossbeam_queue::ArrayQueue which is a fixed-sized lock-free data structure. This queue being fixed size is the reason that #8679 could not easily be implemented. The new interface allows to drop in a non-fixed sized queue. The crossbeam_queue::SegQueue was chosen, since it showed to be performant in initial testing and didn't require the inclusion of any new dependencies.

The main resource (queue) is already guarded by a semaphore. This semaphore currently bounds the queue by number of elements but there's no reason for why it couldn't guard against bytes allocated, therefore much of that existing code remains the same - which is positive as it is already battle tested and seems relatively stable as is.

Finally a new unit test was added and new benchmarks included in vector-buffers/benches.

Vector configuration

To any sink configuration try:

buffer:
  type: memory
  max_size: 123456

How did you test this PR?

Via the existing unit test and the developed benchmarks

Change Type

Bug fix
New feature
Non-functional (chore, refactoring, docs)
Performance

Is this a breaking change?

Yes
No

Does this PR include user facing changes?

Yes. Please add a changelog fragment based on our guidelines.
No. A maintainer will apply the no-changelog label to this PR.

References

Closes: Support max_bytes for memory buffers #8679

Notes

Please read our Vector contributor resources.
Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
Some CI checks run only after we manually approve them.
- We recommend adding a pre-push hook, please see this template.
- Alternatively, we recommend running the following locally before pushing to the remote branch:
  - cargo fmt --all
  - cargo clippy --workspace --all-targets -- -D warnings
  - cargo nextest run --workspace (alternatively, you can run cargo test --all)
After a review is requested, please avoid force pushes to help us review incrementally.
- Feel free to push as many commits as you want. They will be squashed into one before merging.
- For example, you can run git merge origin master and git push.
If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
run cargo vdev build licenses to regenerate the license inventory and commit the changes (if any). More details here.

lib/vector-buffers/src/config.rs

lib/vector-buffers/src/topology/builder.rs

lib/vector-buffers/src/topology/channel/limited_queue.rs

lib/vector-buffers/src/config.rs

lib/vector-buffers/benches/common.rs

lib/vector-buffers/examples/buffer_perf.rs

lib/vector-buffers/src/config.rs

lib/vector-buffers/src/test/variant.rs

Copilot

Pull Request Overview

Adds support for byte-based limits for in-memory buffers by introducing a unified MemoryBufferSize enum and dynamically selecting between element-count and byte-size queues.

Replace standalone max_events with a size object backed by MemoryBufferSize across configs and APIs
Implement QueueImpl to choose between ArrayQueue (by events) and SegQueue (by bytes) using a semaphore guard
Update all tests, examples, benchmarks, and documentation to use the new byte/event buffer sizing model

Reviewed Changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
website/cue/reference/components/base/sinks.cue	Update CUE schema to use new `size` object with `max_bytes` & `max_size`
src/topology/test/backpressure.rs	Adapt backpressure tests to `MemoryBufferSize`
src/test_util/mock/sources/basic.rs	Use `MemoryBufferSize` in mock sources
src/source_sender/mod.rs	Initialize `limited` channel with `MemoryBufferSize`
lib/vector-buffers/src/variants/in_memory.rs	Refactor `MemoryBuffer` to accept `MemoryBufferSize`
lib/vector-buffers/src/topology/test_util.rs	Enhance `Sample` to account for heap-allocated bytes
lib/vector-buffers/src/topology/channel/limited_queue.rs	Add `QueueImpl`, `SizeTerms`, & dynamic queue selection
lib/vector-buffers/src/topology/builder.rs	Update topology builder to use `MemoryBufferSize`
lib/vector-buffers/src/test/variant.rs	Update variant tests to use `MemoryBufferSize`
lib/vector-buffers/src/lib.rs	Export `MemoryBufferSize`
lib/vector-buffers/src/config.rs	Implement serde de/serialization for `MemoryBufferSize`
lib/vector-buffers/examples/buffer_perf.rs	Adjust example to use new buffer size API
lib/vector-buffers/benches/sized_records.rs	Add benchmarks for byte-based buffers via a `BoundBy` helper
lib/vector-buffers/benches/common.rs	Extend `Message` to simulate heap allocation for size-based tests
changelog.d/8679_add_support_max_bytes_memory_buffers.feature.md	Add changelog entry for the new `max_bytes` feature

Comments suppressed due to low confidence (3)

lib/vector-buffers/src/config.rs:102

[nitpick] The error message for invalid max_bytes is unclear and grammatically awkward; consider rephrasing to something like "max_bytes must fit within the platform's usize range" and include the actual bounds.

                            &"For memory buffers max_bytes expects an integer within the range of 268435488 and your architecture dependent usize",

lib/vector-buffers/src/config.rs:430

Add a unit test in the config module to verify that a YAML or JSON config using max_bytes correctly deserializes into MemoryBufferSize::MaxSize under BufferType::Memory.

        for stage in self.stages() {

lib/vector-buffers/src/config.rs:207

[nitpick] The variant name MaxSize is ambiguous; consider renaming it to MaxBytes to clearly indicate it represents a byte-based limit.

    MaxSize {

- Also removing its configurable_component tag as it is no longer officially part of the configuration

- Previous commit hash: 4354874

- This makes the options more consistent with the existing ones like max_size

- When the flatten attribute was applied to a variant it would not be respected in the generated output

bruceg

A few comments around the memory buffer size enum variants, otherwise LGTM

lib/vector-buffers/src/config.rs

bruceg

One issue with the sinks doc file, otherwise LGTM. I am approving it in advance of your resolution of that one, but please see if it can be fixed.

website/cue/reference/components/generated/sinks.cue

- This correctly identifies the attributes as having the bytes unit type

pront

A few comments, I still need to review the latter parts.

lib/vector-buffers/benches/common.rs

- Since the size is fixed and known at compile time this type is a better fit here then Vec

pront

Neat!

…ectordotdev#23330) * New interface around buffers to select impl at runtime * Expose a non-fixed sized queue in vector-buffers * Modifications to make max_bytes configurable * Modify Sample to test limiting behavior on bytes allocated * Unit test for semaphore guarding SeqQueue * Include new SegQueue in buffering benchmarks * Generated documentation updates * Add changelog fragment * Implement QueueImpl trait directly on crossbeam queue types * Add helper method to reduce terseness * Convert config to flat layout * Modify MemoryBufferSize to be a tuple variant - Also removing its configurable_component tag as it is no longer officially part of the configuration * Update error message * Prefer size_of over magic numbers * Remove stray comment * Revert test behavior to use arbitrary u16s * Update documentation * Addressing some comments - Replace if let chain with match expression - Replace map/sum with just calls to + - Replace function pointer in limited_queue.rs with enum + variant check * Revert "Modify MemoryBufferSize to be a tuple variant" - Previous commit hash: f87aeab0 * Revert "Config config to flat layout" - Previous commit hash: 4354874 * Fix config bug where flatten option isn't respected for variants * Rename MemoryBufferSize variant options - This makes the options more consistent with the existing ones like max_size * Add unit test for parsing memory buffer config w/ byte_size * Update doc comments * Update generated documentation * Fix bug in config generator with flattening enum values - When the flatten attribute was applied to a variant it would not be respected in the generated output * Modify MemoryBufferSize to be a tuple variant * Updating generated documentation * Remove unnecessary serde attribute * Move MaxSizes configurable attribute within the tuple - This correctly identifies the attributes as having the bytes unit type * Modify _heap_allocated to be a heap allocated array - Since the size is fixed and known at compile time this type is a better fit here then Vec --------- Co-authored-by: Pavlos Rontidis <[email protected]>

graphcareful added 7 commits July 2, 2025 13:55

New interface around buffers to select impl at runtime

a2dbb5a

Expose a non-fixed sized queue in vector-buffers

4721f7a

Modifications to make max_bytes configurable

ffc93af

Modify Sample to test limiting behavior on bytes allocated

863f51b

Unit test for semaphore guarding SeqQueue

96258ca

Include new SegQueue in buffering benchmarks

817dfe6

Generated documentation updates

02dfce0

graphcareful requested review from a team as code owners July 2, 2025 17:56

graphcareful requested review from bruceg and removed request for a team July 2, 2025 17:56

github-actions bot added domain: topology Anything related to Vector's topology code domain: external docs Anything related to Vector's external, public documentation labels Jul 2, 2025

graphcareful requested review from pront, jszwedko and Copilot July 2, 2025 17:57

Add changelog fragment

bdc9f1d

graphcareful force-pushed the rob/buffer-size-bytes branch from e3ef011 to bdc9f1d Compare July 2, 2025 17:59

This comment was marked as outdated.

Sign in to view

pront added the domain: buffers Anything related to Vector's memory/disk buffers label Jul 2, 2025

tobz suggested changes Jul 2, 2025

View reviewed changes

graphcareful commented Jul 2, 2025

View reviewed changes

lib/vector-buffers/src/config.rs Show resolved Hide resolved

graphcareful added 2 commits July 2, 2025 15:31

Implement QueueImpl trait directly on crossbeam queue types

ae34cdf

Add helper method to reduce terseness

71fe048

bruceg reviewed Jul 5, 2025

View reviewed changes

pront requested a review from Copilot July 7, 2025 18:34

Copilot AI reviewed Jul 7, 2025

View reviewed changes

graphcareful added 2 commits July 8, 2025 15:25

Convert config to flat layout

4354874

Modify MemoryBufferSize to be a tuple variant

f87a3ab

- Also removing its configurable_component tag as it is no longer officially part of the configuration

Revert "Config config to flat layout"

4f570b5

- Previous commit hash: 4354874

github-actions bot removed the meta: awaiting author Pull requests that are awaiting their author. label Jul 11, 2025

Fix config bug where flatten option isn't respected for variants

1f3dc94

graphcareful force-pushed the rob/buffer-size-bytes branch from 28f8661 to 2a7ddc9 Compare July 11, 2025 21:43

Rename MemoryBufferSize variant options

df3e488

- This makes the options more consistent with the existing ones like max_size

graphcareful force-pushed the rob/buffer-size-bytes branch from 2a7ddc9 to cde6698 Compare July 11, 2025 21:58

graphcareful added 3 commits July 11, 2025 18:05

Add unit test for parsing memory buffer config w/ byte_size

c14bd01

Update doc comments

f9e6cf7

Update generated documentation

ee8e854

graphcareful force-pushed the rob/buffer-size-bytes branch from cde6698 to ee8e854 Compare July 11, 2025 22:05

graphcareful added 2 commits July 11, 2025 18:07

Fix bug in config generator with flattening enum values

614cab5

- When the flatten attribute was applied to a variant it would not be respected in the generated output

Merge branch 'master' into rob/buffer-size-bytes

6336015

bruceg reviewed Jul 12, 2025

View reviewed changes

graphcareful added 3 commits July 14, 2025 13:11

Modify MemoryBufferSize to be a tuple variant

b1e8bbb

Updating generated documentation

159ef56

Remove unnecessary serde attribute

eef1fbc

graphcareful requested a review from bruceg July 14, 2025 17:22

graphcareful enabled auto-merge July 14, 2025 18:13

bruceg approved these changes Jul 14, 2025

View reviewed changes

website/cue/reference/components/generated/sinks.cue Outdated Show resolved Hide resolved

Move MaxSizes configurable attribute within the tuple

beca214

- This correctly identifies the attributes as having the bytes unit type

pront reviewed Jul 14, 2025

View reviewed changes

lib/vector-buffers/benches/common.rs Outdated Show resolved Hide resolved

lib/vector-buffers/benches/common.rs Outdated Show resolved Hide resolved

lib/vector-buffers/benches/common.rs Outdated Show resolved Hide resolved

lib/vector-buffers/benches/common.rs Outdated Show resolved Hide resolved

Modify _heap_allocated to be a heap allocated array

21c06cd

- Since the size is fixed and known at compile time this type is a better fit here then Vec

pront approved these changes Jul 15, 2025

View reviewed changes

pront disabled auto-merge July 15, 2025 16:05

Merge branch 'master' into rob/buffer-size-bytes

b8122dd

pront enabled auto-merge July 15, 2025 16:05

pront added this pull request to the merge queue Jul 15, 2025

Merged via the queue into vectordotdev:master with commit 121b9da Jul 15, 2025
42 checks passed

enhancement(sinks): Add support for max_bytes for memory buffers #23330

enhancement(sinks): Add support for max_bytes for memory buffers #23330

Uh oh!

Conversation

graphcareful commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Vector configuration

How did you test this PR?

Change Type

Is this a breaking change?

Does this PR include user facing changes?

References

Notes

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

bruceg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bruceg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pront left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pront left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

enhancement(sinks): Add support for `max_bytes` for memory buffers #23330

enhancement(sinks): Add support for `max_bytes` for memory buffers #23330

graphcareful commented Jul 2, 2025 •

edited

Loading