Skip to content

Commit 14f72c1

Browse files
authored
Reduce allocations in TraceContext (#7874)
## Summary of changes Replace `ArraySegment<Span>` with `SpanCollection` ## Reason for change `ArraySegment<Span>` is a wrapper around an array, and by design, always requires allocating an array. However, if we look at the [distribution of traces](https://app.datadoghq.com/notebook/13270656/partial-flush-worth-it-or-not?computational_onboarding=true&fullscreen_end_ts=1760111663000&fullscreen_paused=true&fullscreen_refresh_mode=paused&fullscreen_start_ts=1759829808000&fullscreen_widget=3c8idtpl) received for processing, we see that 50% of traces only have a single span. Consequently, `SpanCollection` takes a similar approach [to `StringValues`](https://andrewlock.net/a-brief-look-at-stringvalues/), in which it is an abstraction around _either_ a single `Span` _or_ a `Span[]`. This means we can avoid allocating the array entirely until we need to. For small traces (with a single span), this saves 56 Bytes per scope/trace or ~8% of the basic trace size. For larger traces, `SpanCollection` and `ArraySegment<Span>` are essentially identical, so the allocation is the same. Given `SpanCollection` (and `ArraySegment` actually) are `readonly struct`, also added `in` to the signatures (given that both structs are the same size and > pointer size). > The only practical way I could see to actually make `SpanCollection` pointer-sized is to remove the `Count` parameter. But that means we _either_ need to allocate an "ArraySegment" wrapper around the Array, to hold the count, _or_ we always allocate an array of the "correct" length. I explored the latter in a separate PR, using an array pool during the "builder" step, and then allocating an array of the correct size subsequently, but the allocation gains were marginal, and it didn't ## Implementation details The changes are essentially: - `SpanCollection` holds _either_ a `Span` _or_ a `Span[]` (_or_ `null`) - We store this in the same field (much like `StringValues` does), as it reduces the size of the struct which brings small perf benefits - Pass the span around via `in` to reduce chance of defensive copies - Fix/replace uses of Moq which requires different usage for `in`/`ref` fields, and can't provide the same functionality as a stub ## Test coverage Added some unit tests for the implementation, but the important thing is the benchmarks. We see an 8-8.5% reduction in the allocations for create span/create scope: | Benchmark | Base Allocated | Diff Allocated | Change | Change % | |:----------|-----------:|-----------:|--------:|--------:| | Benchmarks.Trace.SpanBenchmark.StartFinishScope&#8209;net6.0 | 696 B | 640 B | -56 B | -8.05% | Benchmarks.Trace.SpanBenchmark.StartFinishScope&#8209;netcoreapp3.1 | 696 B | 640 B | -56 B | -8.05% | Benchmarks.Trace.SpanBenchmark.StartFinishScope&#8209;net472 | 658 B | 602 B | -56 B | -8.51% | Benchmarks.Trace.SpanBenchmark.StartFinishSpan&#8209;net472 | 578 B | 522 B | -56 B | -9.69% | Benchmarks.Trace.SpanBenchmark.StartFinishSpan&#8209;net6.0 | 576 B | 520 B | -56 B | -9.72% | Benchmarks.Trace.SpanBenchmark.StartFinishSpan&#8209;netcoreapp3.1 | 576 B | 520 B | -56 B | -9.72% Other benchmarks which create a single span see similar improvements. Note that the _two_ scope benchmark (added in #7869) shows essentially no change (as expected). ## Other details I tried a variety of variations on this approach: - Keep separate fields for `Span` and `Span[]` - Nicer as it's type safe again, but increases allocations (by a pointer) so prob not worth it as this is hot path - Don't pass via `in` - Slows things down slightly - Remove the `Count` field (and make `SpanCollection` pointer-sized) - Requires allocating exact-sized arrays, so not practical to handle array growth - Same as above, but use an array pool for the build stage, and then only allocate the fixed size on close - Does show improvements in allocation, but more complexity. May be worth considering (I'll create a separate PR for it) - Don't have a builder, just use an array pool + a Count field, and rely on reliably cleaning up the pooling, https://datadoghq.atlassian.net/browse/LANGPLAT-841
1 parent 9edde96 commit 14f72c1

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+856
-341
lines changed

tracer/src/Datadog.Trace/Agent/AgentWriter.cs

Lines changed: 28 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
// <copyright file="AgentWriter.cs" company="Datadog">
1+
// <copyright file="AgentWriter.cs" company="Datadog">
22
// Unless explicitly stated otherwise all files in this repository are licensed under the Apache 2 License.
33
// This product includes software developed at Datadog (https://www.datadoghq.com/). Copyright 2017 Datadog, Inc.
44
// </copyright>
@@ -128,7 +128,7 @@ internal AgentWriter(IApi api, IStatsAggregator? statsAggregator, IStatsdManager
128128

129129
public Task<bool> Ping() => _api.SendTracesAsync(EmptyPayload, 0, false, 0, 0);
130130

131-
public void WriteTrace(ArraySegment<Span> trace)
131+
public void WriteTrace(in SpanCollection trace)
132132
{
133133
if (trace.Count == 0)
134134
{
@@ -143,7 +143,7 @@ public void WriteTrace(ArraySegment<Span> trace)
143143
}
144144
else
145145
{
146-
_pendingTraces.Enqueue(new WorkItem(trace));
146+
_pendingTraces.Enqueue(new WorkItem(in trace));
147147

148148
if (!_serializationMutex.IsSet)
149149
{
@@ -395,7 +395,7 @@ async Task InternalBufferFlush()
395395
}
396396
}
397397

398-
private void SerializeTrace(ArraySegment<Span> spans)
398+
private void SerializeTrace(in SpanCollection spans)
399399
{
400400
// Declaring as inline method because only safe to invoke in the context of SerializeTrace
401401
SpanBuffer? SwapBuffers()
@@ -427,26 +427,26 @@ private void SerializeTrace(ArraySegment<Span> spans)
427427
}
428428

429429
int? chunkSamplingPriority = null;
430+
var chunk = spans;
430431
if (CanComputeStats)
431432
{
432-
spans = _statsAggregator?.ProcessTrace(spans) ?? spans;
433-
bool shouldSendTrace = _statsAggregator?.ShouldKeepTrace(spans) ?? true;
434-
_statsAggregator?.AddRange(spans);
433+
chunk = _statsAggregator?.ProcessTrace(in chunk) ?? chunk;
434+
bool shouldSendTrace = _statsAggregator?.ShouldKeepTrace(in chunk) ?? true;
435+
_statsAggregator?.AddRange(in chunk);
435436
var singleSpanSamplingSpans = new List<Span>(); // TODO maybe we can store this from above?
436437

437-
for (var i = 0; i < spans.Count; i++)
438+
foreach (var span in chunk)
438439
{
439-
var index = i + spans.Offset;
440-
if (spans.Array?[index].GetMetric(Metrics.SingleSpanSampling.SamplingMechanism) is not null)
440+
if (span.GetMetric(Metrics.SingleSpanSampling.SamplingMechanism) is not null)
441441
{
442-
singleSpanSamplingSpans.Add(spans.Array[index]);
442+
singleSpanSamplingSpans.Add(span);
443443
}
444444
}
445445

446446
if (shouldSendTrace)
447447
{
448448
TelemetryFactory.Metrics.RecordCountTraceChunkEnqueued(MetricTags.TraceChunkEnqueueReason.P0Keep);
449-
TelemetryFactory.Metrics.RecordCountSpanEnqueuedForSerialization(MetricTags.SpanEnqueueReason.P0Keep, spans.Count);
449+
TelemetryFactory.Metrics.RecordCountSpanEnqueuedForSerialization(MetricTags.SpanEnqueueReason.P0Keep, chunk.Count);
450450
}
451451
else
452452
{
@@ -456,8 +456,8 @@ private void SerializeTrace(ArraySegment<Span> spans)
456456
if (singleSpanSamplingSpans.Count == 0)
457457
{
458458
Interlocked.Increment(ref _droppedP0Traces);
459-
Interlocked.Add(ref _droppedP0Spans, spans.Count);
460-
TelemetryFactory.Metrics.RecordCountSpanDropped(MetricTags.DropReason.P0Drop, spans.Count);
459+
Interlocked.Add(ref _droppedP0Spans, chunk.Count);
460+
TelemetryFactory.Metrics.RecordCountSpanDropped(MetricTags.DropReason.P0Drop, chunk.Count);
461461
return;
462462
}
463463
else
@@ -466,11 +466,11 @@ private void SerializeTrace(ArraySegment<Span> spans)
466466
// this will override the TraceContext sampling priority when we do a SpanBuffer.TryWrite
467467
chunkSamplingPriority = SamplingPriorityValues.UserKeep;
468468
Interlocked.Increment(ref _droppedP0Traces); // increment since we are sampling out the entire trace
469-
var spansDropped = spans.Count - singleSpanSamplingSpans.Count;
469+
var spansDropped = chunk.Count - singleSpanSamplingSpans.Count;
470470
Interlocked.Add(ref _droppedP0Spans, spansDropped);
471-
spans = new ArraySegment<Span>(singleSpanSamplingSpans.ToArray());
471+
chunk = new SpanCollection(singleSpanSamplingSpans.ToArray(), singleSpanSamplingSpans.Count);
472472
TelemetryFactory.Metrics.RecordCountSpanDropped(MetricTags.DropReason.P0Drop, spansDropped);
473-
TelemetryFactory.Metrics.RecordCountSpanEnqueuedForSerialization(MetricTags.SpanEnqueueReason.SingleSpanSampling, spans.Count);
473+
TelemetryFactory.Metrics.RecordCountSpanEnqueuedForSerialization(MetricTags.SpanEnqueueReason.SingleSpanSampling, chunk.Count);
474474
TelemetryFactory.Metrics.RecordCountTracePartialFlush(MetricTags.PartialFlushReason.SingleSpanIngestion);
475475
}
476476
}
@@ -479,11 +479,11 @@ private void SerializeTrace(ArraySegment<Span> spans)
479479
{
480480
// not using stats, so trace always kept
481481
TelemetryFactory.Metrics.RecordCountTraceChunkEnqueued(MetricTags.TraceChunkEnqueueReason.Default);
482-
TelemetryFactory.Metrics.RecordCountSpanEnqueuedForSerialization(MetricTags.SpanEnqueueReason.Default, spans.Count);
482+
TelemetryFactory.Metrics.RecordCountSpanEnqueuedForSerialization(MetricTags.SpanEnqueueReason.Default, chunk.Count);
483483
}
484484

485485
// Add the current keep rate to trace
486-
if (spans.Array?[spans.Offset].Context.TraceContext is { } trace)
486+
if (chunk.FirstSpan?.Context.TraceContext is { } trace)
487487
{
488488
trace.TracesKeepRate = _traceKeepRateCalculator.GetKeepRate();
489489
}
@@ -492,7 +492,7 @@ private void SerializeTrace(ArraySegment<Span> spans)
492492
// This allows the serialization thread to keep doing its job while a buffer is being flushed
493493
var buffer = _activeBuffer;
494494

495-
var writeStatus = buffer.TryWrite(spans, ref _temporaryBuffer, chunkSamplingPriority);
495+
var writeStatus = buffer.TryWrite(in chunk, ref _temporaryBuffer, chunkSamplingPriority);
496496

497497
if (writeStatus == SpanBuffer.WriteStatus.Success)
498498
{
@@ -503,7 +503,7 @@ private void SerializeTrace(ArraySegment<Span> spans)
503503
if (writeStatus == SpanBuffer.WriteStatus.Overflow)
504504
{
505505
// The trace is too big for the buffer, no point in trying again
506-
DropTrace(spans);
506+
DropTrace(chunk.Count);
507507
return;
508508
}
509509

@@ -515,22 +515,22 @@ private void SerializeTrace(ArraySegment<Span> spans)
515515
// One buffer is full, request an eager flush
516516
RequestFlush();
517517

518-
if (buffer.TryWrite(spans, ref _temporaryBuffer, chunkSamplingPriority) == SpanBuffer.WriteStatus.Success)
518+
if (buffer.TryWrite(in chunk, ref _temporaryBuffer, chunkSamplingPriority) == SpanBuffer.WriteStatus.Success)
519519
{
520520
// Serialization to the secondary buffer succeeded
521521
return;
522522
}
523523
}
524524

525525
// All the buffers are full :( drop the trace
526-
DropTrace(spans);
526+
DropTrace(chunk.Count);
527527
}
528528

529-
private void DropTrace(ArraySegment<Span> spans)
529+
private void DropTrace(int count)
530530
{
531531
Interlocked.Increment(ref _droppedTraces);
532532
_traceKeepRateCalculator.IncrementDrops(1);
533-
TelemetryFactory.Metrics.RecordCountSpanDropped(MetricTags.DropReason.OverfullBuffer, spans.Count);
533+
TelemetryFactory.Metrics.RecordCountSpanDropped(MetricTags.DropReason.OverfullBuffer, count);
534534
TelemetryFactory.Metrics.RecordCountTraceChunkDropped(MetricTags.DropReason.OverfullBuffer);
535535

536536
if (Volatile.Read(ref _traceMetricsEnabled))
@@ -539,7 +539,7 @@ private void DropTrace(ArraySegment<Span> spans)
539539
if (lease.Client is { } statsd)
540540
{
541541
statsd.Increment(TracerMetricNames.Queue.DroppedTraces);
542-
statsd.Increment(TracerMetricNames.Queue.DroppedSpans, spans.Count);
542+
statsd.Increment(TracerMetricNames.Queue.DroppedSpans, count);
543543
}
544544
}
545545
}
@@ -602,10 +602,10 @@ private void SerializeTracesLoop()
602602

603603
private readonly struct WorkItem
604604
{
605-
public readonly ArraySegment<Span> Trace;
605+
public readonly SpanCollection Trace;
606606
public readonly Action? Callback;
607607

608-
public WorkItem(ArraySegment<Span> trace)
608+
public WorkItem(in SpanCollection trace)
609609
{
610610
Trace = trace;
611611
Callback = null;

tracer/src/Datadog.Trace/Agent/IAgentWriter.cs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ namespace Datadog.Trace.Agent
1212
{
1313
internal interface IAgentWriter
1414
{
15-
void WriteTrace(ArraySegment<Span> trace);
15+
void WriteTrace(in SpanCollection trace);
1616

1717
Task<bool> Ping();
1818

tracer/src/Datadog.Trace/Agent/IStatsAggregator.cs

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55

66
using System;
77
using System.Threading.Tasks;
8+
using Datadog.Trace.SourceGenerators;
89

910
namespace Datadog.Trace.Agent
1011
{
@@ -30,22 +31,23 @@ internal interface IStatsAggregator
3031
/// Receives an array of spans and computes stats points for them.
3132
/// </summary>
3233
/// <param name="spans">The array of spans to process.</param>
34+
[TestingOnly]
3335
void Add(params Span[] spans);
3436

3537
/// <summary>
3638
/// Receives an array of spans and computes stats points for them.
3739
/// </summary>
3840
/// <param name="spans">The ArraySegment of spans to process.</param>
39-
void AddRange(ArraySegment<Span> spans);
41+
void AddRange(in SpanCollection spans);
4042

4143
/// <summary>
4244
/// Runs a series of samplers over the entire trace chunk
4345
/// </summary>
4446
/// <param name="spans">The trace chunk to sample</param>
4547
/// <returns>True if the trace chunk should be sampled, false otherwise.</returns>
46-
bool ShouldKeepTrace(ArraySegment<Span> spans);
48+
bool ShouldKeepTrace(in SpanCollection spans);
4749

48-
ArraySegment<Span> ProcessTrace(ArraySegment<Span> trace);
50+
SpanCollection ProcessTrace(in SpanCollection trace);
4951

5052
Task DisposeAsync();
5153
}

tracer/src/Datadog.Trace/Agent/MessagePack/TraceChunkModel.cs

Lines changed: 33 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
using System.Collections.Generic;
1010
using System.Threading;
1111
using Datadog.Trace.Configuration;
12+
using Datadog.Trace.SourceGenerators;
1213
using Datadog.Trace.Tagging;
1314

1415
namespace Datadog.Trace.Agent.MessagePack;
@@ -20,7 +21,7 @@ namespace Datadog.Trace.Agent.MessagePack;
2021
internal readonly struct TraceChunkModel
2122
{
2223
// for small trace chunks, use the ArraySegment<Span> copy directly, no heap allocations
23-
private readonly ArraySegment<Span> _spans;
24+
private readonly SpanCollection _spans;
2425

2526
// for large trace chunks, use a HashSet<ulong> instead of iterating the array.
2627
// there are 3 possible states:
@@ -75,8 +76,8 @@ internal readonly struct TraceChunkModel
7576
/// <param name="spans">The spans that will be within this <see cref="TraceChunkModel"/>.</param>
7677
/// <param name="samplingPriority">Optional sampling priority to override the <see cref="TraceContext"/> sampling priority.</param>
7778
/// <param name="isFirstChunkInPayload">Indicates if this is the first trace chunk being written to the output buffer.</param>
78-
public TraceChunkModel(in ArraySegment<Span> spans, int? samplingPriority = null, bool isFirstChunkInPayload = false)
79-
: this(spans, TraceContext.GetTraceContext(spans), samplingPriority, isFirstChunkInPayload)
79+
public TraceChunkModel(in SpanCollection spans, int? samplingPriority = null, bool isFirstChunkInPayload = false)
80+
: this(in spans, TraceContext.GetTraceContext(in spans), samplingPriority, isFirstChunkInPayload)
8081
{
8182
// since all we have is an array of spans, use the trace context from the first span
8283
// to get the other values we need (sampling priority, origin, trace tags, etc) for now.
@@ -85,8 +86,8 @@ public TraceChunkModel(in ArraySegment<Span> spans, int? samplingPriority = null
8586
}
8687

8788
// used only to chain constructors
88-
private TraceChunkModel(in ArraySegment<Span> spans, TraceContext? traceContext, int? samplingPriority, bool isFirstChunkInPayload)
89-
: this(spans, traceContext?.RootSpan)
89+
private TraceChunkModel(in SpanCollection spans, TraceContext? traceContext, int? samplingPriority, bool isFirstChunkInPayload)
90+
: this(in spans, traceContext?.RootSpan)
9091
{
9192
// sampling decision override takes precedence over TraceContext.SamplingPriority
9293
SamplingPriority = samplingPriority;
@@ -128,8 +129,8 @@ private TraceChunkModel(in ArraySegment<Span> spans, TraceContext? traceContext,
128129
}
129130
}
130131

131-
// used in tests
132-
internal TraceChunkModel(in ArraySegment<Span> spans, Span? localRootSpan)
132+
[TestingAndPrivateOnly]
133+
internal TraceChunkModel(in SpanCollection spans, Span? localRootSpan)
133134
{
134135
_spans = spans;
135136

@@ -161,7 +162,7 @@ public SpanModel GetSpanModel(int spanIndex)
161162
ThrowHelper.ThrowArgumentOutOfRangeException(nameof(spanIndex));
162163
}
163164

164-
var span = _spans.Array![_spans.Offset + spanIndex];
165+
var span = _spans[spanIndex];
165166
var parentId = span.Context.ParentId ?? 0;
166167
bool isLocalRoot = parentId is 0 || span.SpanId == LocalRootSpanId;
167168
bool isFirstSpan = spanIndex == 0;
@@ -225,9 +226,9 @@ private bool Contains(ulong spanId, int startIndex)
225226

226227
if (hashSet.Count == 0)
227228
{
228-
for (var i = 0; i < _spans.Count; i++)
229+
foreach (var span in _spans)
229230
{
230-
hashSet.Add(_spans.Array![_spans.Offset + i].SpanId);
231+
hashSet.Add(span.SpanId);
231232
}
232233
}
233234

@@ -238,20 +239,37 @@ private bool Contains(ulong spanId, int startIndex)
238239
}
239240

240241
/// <summary>
241-
/// Searches for the specified spanId by iteration the array of spans.
242+
/// Searches for the specified spanId by iterating the array of spans.
242243
/// </summary>
243244
private int IndexOf(ulong spanId, int startIndex)
244245
{
245246
// wrap around the end of the array
246-
if (startIndex >= _spans.Count)
247+
var count = _spans.Count;
248+
if (count == 0)
249+
{
250+
return -1;
251+
}
252+
253+
if (startIndex >= count)
247254
{
248255
startIndex = 0;
249256
}
250257

258+
if (count == 1)
259+
{
260+
return _spans[0].SpanId == spanId ? 0 : -1;
261+
}
262+
263+
if (_spans.TryGetArray() is not { } array)
264+
{
265+
// Shouldn't be possible, because we handle 0 and 1 spans above
266+
return -1;
267+
}
268+
251269
// iterate over the span array starting at the specified index + 1
252-
for (var i = startIndex; i < _spans.Count; i++)
270+
for (var i = startIndex; i < count; i++)
253271
{
254-
if (spanId == _spans.Array![_spans.Offset + i].SpanId)
272+
if (spanId == array.Array![array.Offset + i].SpanId)
255273
{
256274
return i;
257275
}
@@ -260,7 +278,7 @@ private int IndexOf(ulong spanId, int startIndex)
260278
// if not found above, wrap around to the beginning to search the rest of the array
261279
for (var i = 0; i < startIndex; i++)
262280
{
263-
if (spanId == _spans.Array![_spans.Offset + i].SpanId)
281+
if (spanId == array.Array![array.Offset + i].SpanId)
264282
{
265283
return i;
266284
}

tracer/src/Datadog.Trace/Agent/NullStatsAggregator.cs

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,13 +16,13 @@ public void Add(params Span[] spans)
1616
{
1717
}
1818

19-
public void AddRange(ArraySegment<Span> spans)
19+
public void AddRange(in SpanCollection spans)
2020
{
2121
}
2222

23-
public bool ShouldKeepTrace(ArraySegment<Span> spans) => true;
23+
public bool ShouldKeepTrace(in SpanCollection spans) => true;
2424

25-
public ArraySegment<Span> ProcessTrace(ArraySegment<Span> trace) => trace;
25+
public SpanCollection ProcessTrace(in SpanCollection trace) => trace;
2626

2727
public Task DisposeAsync()
2828
{

tracer/src/Datadog.Trace/Agent/SpanBuffer.cs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ public ArraySegment<byte> Data
7373
// For tests only
7474
internal bool IsEmpty => !_locked && !IsFull && TraceCount == 0 && SpanCount == 0 && _offset == HeaderSize;
7575

76-
public WriteStatus TryWrite(ArraySegment<Span> spans, ref byte[] temporaryBuffer, int? samplingPriority = null)
76+
public WriteStatus TryWrite(in SpanCollection spans, ref byte[] temporaryBuffer, int? samplingPriority = null)
7777
{
7878
bool lockTaken = false;
7979

@@ -91,7 +91,7 @@ public WriteStatus TryWrite(ArraySegment<Span> spans, ref byte[] temporaryBuffer
9191
// to get the other values we need (sampling priority, origin, trace tags, etc) for now.
9292
// the idea is that as we refactor further, we can pass more than just the spans,
9393
// and these values can come directly from the trace context.
94-
var traceChunk = new TraceChunkModel(spans, samplingPriority, isFirstChunkInPayload: TraceCount == 0);
94+
var traceChunk = new TraceChunkModel(in spans, samplingPriority, isFirstChunkInPayload: TraceCount == 0);
9595

9696
// We don't know what the serialized size of the payload will be,
9797
// so we need to write to a temporary buffer first

0 commit comments

Comments
 (0)