ExperimentFramework provides comprehensive telemetry capabilities to track experiment execution, measure performance, and integrate with observability platforms.
The framework supports three levels of telemetry:
- Built-in Logging - Benchmark and error logging decorators
- OpenTelemetry Integration - Distributed tracing with Activity API
- Custom Telemetry - Implement your own telemetry providers
The framework includes decorators for common telemetry scenarios that integrate with ILogger.
The benchmark decorator measures execution time for each experiment invocation.
var experiments = ExperimentFrameworkBuilder.Create()
.AddLogger(l => l.AddBenchmarks())
.Trial<IDatabase>(t => t
.UsingFeatureFlag("UseCloudDb")
.AddControl<LocalDatabase>("false")
.AddVariant<CloudDatabase>("true"));
services.AddExperimentFramework(experiments);info: ExperimentFramework.Benchmarks[0]
Experiment call: IDatabase.QueryAsync trial=true elapsedMs=42.3
The log includes:
- Interface name:
IDatabase - Method name:
QueryAsync - Trial key:
true - Elapsed time in milliseconds:
42.3
- Performance comparison between trials
- Identifying slow implementations
- Tracking performance regressions
- SLA monitoring
The error logging decorator captures exceptions before they propagate or trigger fallback.
var experiments = ExperimentFrameworkBuilder.Create()
.AddLogger(l => l.AddErrorLogging())
.Trial<IPaymentProcessor>(t => t
.UsingFeatureFlag("UseNewPaymentProvider")
.AddControl<StripePayment>("false")
.AddVariant<NewPaymentProvider>("true")
.OnErrorRedirectAndReplayDefault());
services.AddExperimentFramework(experiments);error: ExperimentFramework.ErrorLogging[0]
Experiment error: IPaymentProcessor.ChargeAsync trial=true
System.InvalidOperationException: Connection timeout
at NewPaymentProvider.ChargeAsync(Decimal amount)
at ExperimentFramework.Decorators.ErrorLoggingDecorator.InvokeAsync(...)
The log includes:
- Interface and method name
- Trial key that failed
- Full exception with stack trace
- Monitoring condition failure rates
- Debugging experimental implementations
- Alerting on error thresholds
- Understanding fallback behavior
You can enable both decorators simultaneously:
var experiments = ExperimentFrameworkBuilder.Create()
.AddLogger(l => l
.AddBenchmarks()
.AddErrorLogging())
.Trial<IDatabase>(t => t
.UsingFeatureFlag("UseCloudDb")
.AddControl<LocalDatabase>("false")
.AddVariant<CloudDatabase>("true")
.OnErrorRedirectAndReplayDefault());Decorators execute in registration order, so benchmarks will include the time spent in error logging.
The framework integrates with OpenTelemetry to provide distributed tracing for experiments.
OpenTelemetry support uses the System.Diagnostics.Activity API, which is part of .NET. No additional packages are required for basic functionality.
To export traces to an observability platform, install the OpenTelemetry SDK:
dotnet add package OpenTelemetry.Exporter.Console
dotnet add package OpenTelemetry.Exporter.Jaeger
dotnet add package OpenTelemetry.Exporter.ZipkinEnable OpenTelemetry tracking:
var experiments = ExperimentFrameworkBuilder.Create()
.Trial<IDatabase>(t => t
.UsingFeatureFlag("UseCloudDb")
.AddControl<LocalDatabase>("false")
.AddVariant<CloudDatabase>("true"));
services.AddExperimentFramework(experiments);
services.AddOpenTelemetryExperimentTracking();Configure the OpenTelemetry SDK to listen to the ExperimentFramework activity source:
using OpenTelemetry.Resources;
using OpenTelemetry.Trace;
services.AddOpenTelemetry()
.WithTracing(tracing => tracing
.SetResourceBuilder(ResourceBuilder.CreateDefault()
.AddService("MyApplication"))
.AddSource("ExperimentFramework") // Listen to experiment activities
.AddConsoleExporter()
.AddJaegerExporter());Each experiment invocation creates an activity with these tags:
| Tag | Description | Example |
|---|---|---|
experiment.service |
Interface type name | IDatabase |
experiment.method |
Method being invoked | QueryAsync |
experiment.selector |
Selector name (feature flag/config key) | UseCloudDb |
experiment.trial.selected |
Initially selected condition key | true |
experiment.trial.candidates |
All available condition keys | false,true |
experiment.outcome |
Success or failure | success |
experiment.fallback |
Fallback condition key (if applicable) | false |
experiment.variant |
Variant name (for variant mode) | variant-a |
experiment.variant.source |
How variant was determined | variantManager |
Activities are named using this pattern:
Experiment IDatabase.QueryAsync
This makes it easy to filter and group experiment traces in observability platforms.
Span: Experiment IDatabase.QueryAsync
experiment.service: IDatabase
experiment.method: QueryAsync
experiment.selector: UseCloudDb
experiment.trial.selected: true
experiment.trial.candidates: false,true
experiment.outcome: success
Duration: 42.3ms
When exported to Jaeger, traces appear hierarchically:
HTTP Request
└─ DataService.GetCustomers
└─ Experiment IDatabase.QueryAsync
└─ CloudDatabase.QueryAsync
└─ SQL Query
If no ActivityListener is attached to the ExperimentFramework source, activity creation is a no-op with negligible overhead (typically < 50ns).
Implement the IExperimentTelemetry interface to create custom telemetry providers.
public interface IExperimentTelemetry
{
IExperimentTelemetryScope StartInvocation(
Type serviceType,
string methodName,
string selectorName,
string trialKey,
IReadOnlyList<string> candidateKeys);
}
public interface IExperimentTelemetryScope : IDisposable
{
void RecordSuccess();
void RecordFailure(Exception exception);
void RecordFallback(string fallbackKey);
void RecordVariant(string variantName, string variantSource);
}Create a custom telemetry provider that records metrics:
using System.Diagnostics.Metrics;
public class MetricsExperimentTelemetry : IExperimentTelemetry
{
private static readonly Meter Meter = new("ExperimentFramework", "1.0.0");
private static readonly Counter<long> InvocationCounter = Meter.CreateCounter<long>("experiment.invocations");
private static readonly Histogram<double> DurationHistogram = Meter.CreateHistogram<double>("experiment.duration");
public IExperimentTelemetryScope StartInvocation(
Type serviceType,
string methodName,
string selectorName,
string trialKey,
IReadOnlyList<string> candidateKeys)
{
return new MetricsScope(serviceType, methodName, trialKey);
}
private class MetricsScope : IExperimentTelemetryScope
{
private readonly Type _serviceType;
private readonly string _methodName;
private readonly string _trialKey;
private readonly long _startTimestamp;
private string _outcome = "success";
public MetricsScope(Type serviceType, string methodName, string trialKey)
{
_serviceType = serviceType;
_methodName = methodName;
_trialKey = trialKey;
_startTimestamp = Stopwatch.GetTimestamp();
}
public void RecordSuccess()
{
_outcome = "success";
}
public void RecordFailure(Exception exception)
{
_outcome = "failure";
}
public void RecordFallback(string fallbackKey)
{
InvocationCounter.Add(1, new KeyValuePair<string, object?>("event", "fallback"));
}
public void RecordVariant(string variantName, string variantSource)
{
// Can record variant metrics if needed
}
public void Dispose()
{
var elapsed = Stopwatch.GetElapsedTime(_startTimestamp).TotalMilliseconds;
var tags = new TagList
{
{ "service", _serviceType.Name },
{ "method", _methodName },
{ "trial", _trialKey },
{ "outcome", _outcome }
};
InvocationCounter.Add(1, tags);
DurationHistogram.Record(elapsed, tags);
}
}
}Register the custom telemetry provider:
services.AddSingleton<IExperimentTelemetry, MetricsExperimentTelemetry>();Integrate with Application Insights:
using Microsoft.ApplicationInsights;
using Microsoft.ApplicationInsights.DataContracts;
public class ApplicationInsightsTelemetry : IExperimentTelemetry
{
private readonly TelemetryClient _telemetryClient;
public ApplicationInsightsTelemetry(TelemetryClient telemetryClient)
{
_telemetryClient = telemetryClient;
}
public IExperimentTelemetryScope StartInvocation(
Type serviceType,
string methodName,
string selectorName,
string trialKey,
IReadOnlyList<string> candidateKeys)
{
return new AppInsightsScope(_telemetryClient, serviceType, methodName, trialKey);
}
private class AppInsightsScope : IExperimentTelemetryScope
{
private readonly TelemetryClient _client;
private readonly DependencyTelemetry _telemetry;
public AppInsightsScope(TelemetryClient client, Type serviceType, string methodName, string trialKey)
{
_client = client;
_telemetry = new DependencyTelemetry
{
Type = "Experiment",
Name = $"{serviceType.Name}.{methodName}",
Data = trialKey
};
_telemetry.Properties["service"] = serviceType.Name;
_telemetry.Properties["method"] = methodName;
_telemetry.Properties["trial"] = trialKey;
}
public void RecordSuccess()
{
_telemetry.Success = true;
}
public void RecordFailure(Exception exception)
{
_telemetry.Success = false;
_client.TrackException(exception);
}
public void RecordFallback(string fallbackKey)
{
_telemetry.Properties["fallback"] = fallbackKey;
}
public void RecordVariant(string variantName, string variantSource)
{
_telemetry.Properties["variant"] = variantName;
}
public void Dispose()
{
_client.TrackDependency(_telemetry);
}
}
}Be mindful of tag cardinality in metrics:
// Good: Low cardinality
tags.Add("trial", trialKey); // Limited number of values
// Bad: High cardinality
tags.Add("user_id", userId); // Millions of unique valuesFor high-traffic experiments, consider sampling:
public class SamplingTelemetry : IExperimentTelemetry
{
private readonly IExperimentTelemetry _inner;
private readonly double _sampleRate;
private readonly Random _random = new();
public SamplingTelemetry(IExperimentTelemetry inner, double sampleRate)
{
_inner = inner;
_sampleRate = sampleRate;
}
public IExperimentTelemetryScope StartInvocation(...)
{
if (_random.NextDouble() < _sampleRate)
{
return _inner.StartInvocation(...);
}
return NoopScope.Instance;
}
}Use multiple telemetry mechanisms together:
// Logs for debugging
.AddLogger(l => l.AddBenchmarks().AddErrorLogging())
// Traces for distributed tracing
services.AddOpenTelemetryExperimentTracking();
// Custom metrics for dashboards
services.AddSingleton<IExperimentTelemetry, MetricsExperimentTelemetry>();Track how often fallback occurs:
-- Query for fallback rate in your metrics system
SELECT
service,
COUNT(CASE WHEN event = 'fallback' THEN 1 END) / COUNT(*) AS fallback_rate
FROM experiment_invocations
GROUP BY serviceAlert when fallback rate exceeds threshold:
IF fallback_rate > 0.05 THEN ALERT "Experiment fallback rate too high"
Link experiment telemetry to business outcomes:
public async Task ProcessOrderAsync(Order order)
{
// Experiment executes and logs telemetry
var payment = await _paymentProcessor.ChargeAsync(order.Total);
// Log business outcome
_metrics.Record("order.value", order.Total, new TagList
{
{ "payment_trial", payment.ProviderName }
});
}This allows correlation:
- Did the new payment provider increase successful transactions?
- What was the revenue impact of the experimental recommendation engine?
Use benchmark logs to identify performance issues:
# Filter logs for slow invocations
cat application.log | grep "Experiment call" | grep "elapsedMs" | awk '$NF > 1000'Analyze error logs to find common failure causes:
# Count failure types
cat application.log | grep "Experiment error" | awk '{print $NF}' | sort | uniq -cUse OpenTelemetry traces to understand request flow:
- Filter traces by experiment service
- Compare durations between conditions
- Identify downstream impacts
- Spot error propagation patterns
- Naming Conventions - Customize selector names for telemetry
- Advanced Topics - Build custom decorators and telemetry providers
- Samples - See complete telemetry implementations