-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Signed-off-by: Juraci Paixão Kröhling <[email protected]>
- Loading branch information
1 parent
6e2b6dc
commit df936e7
Showing
4 changed files
with
190 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
# 🍜 Recipe: Scalable tail sampling | ||
|
||
This recipe shows how to prepare a scalable tail sampling pipeline. Tail sampling is a strategy that allows the decision to be made after a trace has had enough time to be completed, and has the ability to use trace-based information to determine whether the trace interesting or not. Because traces are kept in memory, we use a trace ID aware load balancer to consistently route spans belonging to the same trace to the same backing collectors. We have therefore two layers of collectors: one doing the load-balancing, and one doing the sampling. | ||
|
||
We are discarding the telemetry data that we are generating, as we are only interested in assessing this behavior by observing the Collector's metrics. | ||
|
||
**Note:** at this moment, not all metrics are being exported from the Collector using the new OpenTelemetry Metrics exporter. Until that is done, you might want to remove the `telemetry` section of the configuration files and scrape those metrics using a Prometheus-compatible scraper (like another OTel Collector instance with the Prometheus receiver). | ||
|
||
## 🧄 Ingredients | ||
|
||
- OpenTelemetry Operator, see the main [`README.md`](../README.md) for instructions | ||
- The `telemetrygen` tool, or any other application that is able to send OTLP data to our collector | ||
- The `otelcol-load-balancer.yaml` file from this directory | ||
- The `otelcol-sampling.yaml` file from this directory | ||
- A `GRAFANA_CLOUD_USER` environment variable, also known as Grafana Cloud instance ID, found under the instructions for "OpenTelemetry" on your Grafana Cloud stack. | ||
- A `GRAFANA_CLOUD_TOKEN` environment variable, which can be generated under the instructions for "OpenTelemetry" on your Grafana Cloud stack. | ||
- The endpoint for your stack | ||
|
||
## 🥣 Preparation | ||
|
||
1. Create and switch to a namespace for our recipe | ||
```terminal | ||
kubectl create ns scalable-tail-sampling | ||
kubens scalable-tail-sampling | ||
``` | ||
|
||
1. Generate a basic auth HTTP header | ||
```terminal | ||
echo -n "$GRAFANA_CLOUD_USER:$GRAFANA_CLOUD_TOKEN" | base64 -w0 | ||
``` | ||
|
||
2. Use the output from the command above as the value for the basic auth header on both `otelcol-sampling.yaml` and `otelcol-loadbalancer.yaml` | ||
|
||
3. On the same files, change the `endpoint` parameter for the `otlp` exporter within the `telemetry` node to point to your stack's endpoint, plus the path `/v1/traces` | ||
|
||
4. Install the OTel Collector custom resource | ||
```terminal | ||
kubectl apply -f otelcol-sampling.yaml | ||
kubectl apply -f otelcol-loadbalancer.yaml | ||
``` | ||
|
||
5. Open a port-forward to the Collector: | ||
```terminal | ||
kubectl port-forward svc/otelcol-loadbalancer-collector 4317 | ||
``` | ||
|
||
6. Send some telemetry to your Collector: those should all be sampled, given that they belong to VIP customers | ||
```terminal | ||
telemetrygen traces --traces 1_000 --otlp-insecure --otlp-attributes='recipe="scalable-tail-sampling"' --otlp-attributes='vip="true"' | ||
``` | ||
|
||
7. Open your Grafana instance, go to Explore, select the metrics datasource, and verify how many spans have been received by the load balancer instances by running the following query: `receiver_accepted_spans_total{job="otelcol-loadbalancer"}` | ||
|
||
8. Send some more telemetry to your Collector: only 10% of those should be sampled, as they are not from VIP customers | ||
```terminal | ||
telemetrygen traces --traces 1_000 --otlp-insecure --otlp-attributes='recipe="scalable-tail-sampling"' --otlp-attributes='vip="false"' | ||
``` | ||
|
||
9. Verify how many spans have been received by the sampling instances by running the following query: `receiver_accepted_spans_total{job="otelcol-sampling"}` | ||
|
||
|
||
## 😋 Executed last time with these versions | ||
|
||
The most recent execution of this recipe was done with these versions: | ||
|
||
- OpenTelemetry Operator v0.100.1 | ||
- OpenTelemetry Collector Contrib v0.101.0 | ||
- `telemetrygen` v0.101.0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
apiVersion: opentelemetry.io/v1beta1 | ||
kind: OpenTelemetryCollector | ||
metadata: | ||
name: otelcol-loadbalancer | ||
spec: | ||
image: ghcr.io/jpkrohling/otelcol-distributions/otelcol-loadbalancer:0.101.7 | ||
config: | ||
receivers: | ||
otlp: | ||
protocols: | ||
grpc: {} | ||
|
||
exporters: | ||
loadbalancing: | ||
protocol: | ||
otlp: | ||
tls: | ||
insecure: true | ||
resolver: | ||
dns: | ||
hostname: otelcol-sampling-collector-headless | ||
|
||
service: | ||
extensions: [ ] | ||
pipelines: | ||
traces: | ||
receivers: [ otlp ] | ||
processors: [ ] | ||
exporters: [ loadbalancing ] | ||
logs: | ||
receivers: [ otlp ] | ||
processors: [ ] | ||
exporters: [ loadbalancing ] | ||
metrics: | ||
receivers: [ otlp ] | ||
processors: [ ] | ||
exporters: [ loadbalancing ] | ||
telemetry: | ||
metrics: | ||
level: detailed | ||
readers: | ||
- periodic: | ||
exporter: | ||
otlp: | ||
endpoint: https://otlp-gateway-prod-eu-west-2.grafana.net/otlp/v1/metrics | ||
protocol: http/protobuf | ||
headers: | ||
Authorization: "Basic ..." | ||
resource: | ||
"service.name": "otelcol-loadbalancer" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
apiVersion: opentelemetry.io/v1beta1 | ||
kind: OpenTelemetryCollector | ||
metadata: | ||
name: otelcol-sampling | ||
spec: | ||
image: ghcr.io/jpkrohling/otelcol-distributions/otelcol-sampling:0.101.7 | ||
replicas: 10 | ||
config: | ||
receivers: | ||
otlp: | ||
protocols: | ||
grpc: {} | ||
|
||
processors: | ||
tail_sampling: | ||
decision_wait: 1s | ||
num_traces: 50_000 | ||
expected_new_traces_per_sec: 500 | ||
policies: | ||
[ | ||
{ | ||
type: and, | ||
and: | ||
{ | ||
and_sub_policy: | ||
[ | ||
{ | ||
name: only-10-percent, | ||
type: probabilistic, | ||
probabilistic: { sampling_percentage: 10 }, | ||
}, | ||
{ | ||
name: vip, | ||
type: string_attribute, | ||
string_attribute: { key: vip, values: ["true"] }, | ||
}, | ||
], | ||
}, | ||
}, | ||
] | ||
|
||
exporters: | ||
nop: {} | ||
|
||
service: | ||
extensions: [ ] | ||
pipelines: | ||
traces: | ||
receivers: [ otlp ] | ||
processors: [ ] | ||
exporters: [ nop ] | ||
logs: | ||
receivers: [ otlp ] | ||
processors: [ ] | ||
exporters: [ nop ] | ||
metrics: | ||
receivers: [ otlp ] | ||
processors: [ ] | ||
exporters: [ nop ] | ||
telemetry: | ||
metrics: | ||
level: detailed | ||
readers: | ||
- periodic: | ||
exporter: | ||
otlp: | ||
endpoint: https://otlp-gateway-prod-eu-west-2.grafana.net/otlp/v1/metrics | ||
protocol: http/protobuf | ||
headers: | ||
Authorization: "Basic ..." | ||
resource: | ||
"service.name": "otelcol-sampling" |