Skip to content

Commit 32aa893

Browse files
authored
Move EDOT Troubleshooting docs (#2035)
This moves EDOT Troubleshooting docs to `docs-content`. After checking with @elastic/docs-engineering, it was decided that this was the only feasible way to have the docs within the Troubleshooting nav without significant effort.
1 parent abbec06 commit 32aa893

File tree

12 files changed

+900
-0
lines changed

12 files changed

+900
-0
lines changed

docset.yml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,11 @@ cross_links:
3737
- ecs-logging-ruby
3838
- eland
3939
- elastic-serverless-forwarder
40+
- elastic-otel-dotnet
41+
- elastic-otel-java
42+
- elastic-otel-node
43+
- elastic-otel-php
44+
- elastic-otel-python
4045
- elasticsearch
4146
- elasticsearch-hadoop
4247
- elasticsearch-java
@@ -78,6 +83,8 @@ subs:
7883
ece: "Elastic Cloud Enterprise"
7984
eck: "Elastic Cloud on Kubernetes"
8085
edot: "Elastic Distribution of OpenTelemetry"
86+
motlp: "Elastic Cloud Managed OTLP Endpoint"
87+
edot-cf: "EDOT Cloud Forwarder"
8188
serverless-full: "Elastic Cloud Serverless"
8289
serverless-short: "Serverless"
8390
es-serverless: "Elasticsearch Serverless"
Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
---
2+
navigation_title: Collector out of memory
3+
description: Diagnose and resolve out-of-memory issues in the EDOT Collector using Go’s Performance Profiler.
4+
applies_to:
5+
stack:
6+
serverless:
7+
observability:
8+
product:
9+
edot_collector: ga
10+
products:
11+
- id: cloud-serverless
12+
- id: observability
13+
- id: edot-collector
14+
---
15+
16+
# Troubleshoot an out-of-memory EDOT Collector
17+
18+
If your EDOT Collector pods terminate with an `OOMKilled` status, this usually indicates sustained memory pressure or potentially a memory leak due to an introduced regression or a bug. You can use the Performance Profiler (`pprof`) extension to collect and analyze memory profiles, helping you identify the root cause of the issue.
19+
20+
## Symptoms
21+
22+
These symptoms typically indicate that the EDOT Collector is experiencing a memory-related failure:
23+
24+
- EDOT Collector pod restarts with an `OOMKilled` status in Kubernetes.
25+
- Memory usage steadily increases before the crash.
26+
- The Collector's logs don't show clear errors before termination.
27+
28+
## Resolution
29+
30+
Turn on runtime profiling using the `pprof` extension and then gather memory heap profiles from the affected pod:
31+
32+
::::::{stepper}
33+
34+
:::::{step} Enable `pprof` in the Collector
35+
36+
Edit the EDOT Collector Daemonset configuration and include the `pprof` extension:
37+
38+
```yaml
39+
exporters:
40+
...
41+
processors:
42+
...
43+
receivers:
44+
...
45+
extensions:
46+
pprof:
47+
48+
service:
49+
extensions:
50+
- pprof
51+
- ...
52+
pipelines:
53+
metrics:
54+
receivers: [ ... ]
55+
processors: [ ... ]
56+
exporters: [ ... ]
57+
```
58+
59+
Restart the Collector after applying these changes. When the Daemonset is deployed again, spot the pod that is getting restarted.
60+
:::::
61+
62+
:::::{step} Access the affected pod and collect a heap dump
63+
64+
When a pod starts exhibiting high memory usage or restarts due to OOM, run the following to enter a debug shell:
65+
66+
```console
67+
kubectl debug -it <collector-pod-name> --image=ubuntu:latest
68+
```
69+
70+
In the debug container:
71+
72+
```console
73+
apt update
74+
apt install -y curl
75+
curl http://localhost:1777/debug/pprof/heap > heap.out
76+
```
77+
:::::
78+
79+
:::::{step} Copy the heap file from the pod
80+
81+
From your local machine, copy the heap file using:
82+
83+
```bash
84+
kubectl cp <collector-pod-name>:heap.out ./heap.out -c <debug-container-name>
85+
```
86+
::::{note}
87+
Replace `<debug-container-name>` with the name assigned to the debug container. Without the `-c` flag, Kubernetes will show the list of available containers.
88+
::::
89+
:::::
90+
91+
:::::{step} Convert the heap profile for analysis
92+
93+
You can now generate a visual representation, for example PNG:
94+
95+
```bash
96+
go tool pprof -png heap.out > heap.png
97+
```
98+
:::::
99+
::::::
100+
101+
## Best practices
102+
103+
To improve the effectiveness of memory diagnostics and reduce investigation time, consider the following:
104+
105+
- Collect multiple heap profiles over time (for example, every few minutes) to observe memory trends before the crash.
106+
107+
- Automate heap profile collection at intervals to observe trends over time.
108+
109+
## Resources
110+
111+
- [Go's pprof documentation](https://pkg.go.dev/net/http/pprof)
112+
- [OpenTelemetry Collector troubleshooting documentation](https://opentelemetry.io/docs/collector/troubleshooting/#performance-profiler-pprof)
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
---
2+
navigation_title: EDOT Collector
3+
description: Troubleshooting common issues with the EDOT Collector.
4+
applies_to:
5+
stack:
6+
serverless:
7+
observability:
8+
products:
9+
- id: cloud-serverless
10+
- id: observability
11+
- id: edot-collector
12+
---
13+
14+
# Troubleshoot the EDOT Collector
15+
16+
Perform these checks when troubleshooting common Collector issues:
17+
18+
* Check logs: Review the Collector’s logs for error messages.
19+
* Validate configuration: Use the `--dry-run` option to test configurations.
20+
* Enable debug logging: Run the Collector with `--log-level=debug` for detailed logs.
21+
* Check service status: Ensure the Collector is running with `systemctl status <collector-service>` (Linux) or `tasklist` (Windows).
22+
* Test connectivity: Use `telnet <endpoint> <port>` or `curl` to verify backend availability.
23+
* Check open ports: Run netstat `-tulnp or lsof -i` to confirm the Collector is listening.
24+
* Monitor resource usage: Use top/htop (Linux) or Task Manager (Windows) to check CPU & memory.
25+
* Validate exporters: Ensure exporters are properly configured and reachable.
26+
* Verify pipelines: Use `otelctl` diagnose (if available) to check pipeline health.
27+
* Check permissions: Ensure the Collector has the right file and network permissions.
28+
* Review recent changes: Roll back recent config updates if the issue started after changes.
29+
30+
For in-depth details on troubleshooting refer to the [OpenTelemetry Collector troubleshooting documentation](https://opentelemetry.io/docs/collector/troubleshooting/).
Lines changed: 197 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,197 @@
1+
---
2+
navigation_title: EDOT .NET
3+
description: Use the information in this section to troubleshoot common problems affecting the {{edot}} .NET.
4+
applies_to:
5+
stack:
6+
serverless:
7+
observability:
8+
product:
9+
edot_dotnet: ga
10+
products:
11+
- id: cloud-serverless
12+
- id: observability
13+
- id: edot-sdk
14+
---
15+
16+
# Troubleshooting the EDOT .NET SDK
17+
18+
Use the information in this section to troubleshoot common problems. As a first step, make sure your stack is compatible with the [supported technologies](opentelemetry://reference/edot-sdks/dotnet/supported-technologies.md) for EDOT .NET and the OpenTelemetry SDK.
19+
20+
If you have an Elastic support contract, create a ticket in the [Elastic Support portal](https://support.elastic.co/customers/s/login/). If you don't, post in the [APM discuss forum](https://discuss.elastic.co/c/apm) or [open a GitHub issue](https://github.com/elastic/elastic-otel-dotnet/issues).
21+
22+
## Obtain EDOT .NET diagnostic logs
23+
24+
For most problems, such as when you don't see data in your Elastic Observability backend, first check the EDOT .NET logs. These logs show initialization details and OpenTelemetry SDK events. If you don't see any warnings or errors in the EDOT .NET logs, switch the log level to `Trace` to investigate further.
25+
26+
The {{edot}} .NET includes built-in diagnostic logging. You can direct logs to a file, STDOUT, or, in common scenarios, an `ILogger` instance. EDOT .NET also observes the built-in diagnostics events from the upstream OpenTelemetry SDK and includes those in its logging output. You can collect the log output and use it to diagnose issues locally during development or when working with Elastic support channels.
27+
28+
## ASP.NET Core (generic host) logging integration
29+
30+
When you build applications based on the generic host, such as those created by the [ASP.NET Core](https://learn.microsoft.com/aspnet/core/introduction-to-aspnet-core) and [worker service](https://learn.microsoft.com/dotnet/core/extensions/workers) templates, the {{edot}} .NET will try to automatically register with the built-in logging components when you use the `IHostApplicationBuilder.AddElasticOpenTelemetry` extension method to register EDOT .NET.
31+
32+
```csharp
33+
var builder = WebApplication.CreateBuilder(args);
34+
builder.AddElasticOpenTelemetry();
35+
```
36+
37+
In this scenario, EDOT .NET tries to access an available `ILoggerFactory` and create an `ILogger`, logging to the event category `Elastic.OpenTelemetry`. EDOT .NET will register this as the additional logger for its diagnostics unless you have already configured a user-provided `ILogger`. This ensures that EDOT .NET and OpenTelemetry SDK logs are written for your application's configured logging providers. In ASP.NET Core, this includes the console logging provider and results in logs such as the following:
38+
39+
```
40+
info: Elastic.OpenTelemetry[0]
41+
Elastic Distribution of OpenTelemetry (EDOT) .NET: 1.0.0
42+
info: Elastic.OpenTelemetry[0]
43+
EDOT log file: <disabled>
44+
info: Microsoft.Hosting.Lifetime[14]
45+
Now listening on: https://localhost:7295
46+
info: Microsoft.Hosting.Lifetime[14]
47+
Now listening on: http://localhost:5247
48+
info: Microsoft.Hosting.Lifetime[0]
49+
Application started. Press Ctrl+C to shut down.
50+
info: Microsoft.Hosting.Lifetime[0]
51+
Hosting environment: Development
52+
```
53+
54+
In the preceding log output, informational level logging is enabled as the default for this application. You can control the output by configuring the log levels.
55+
56+
### Configuring the log level
57+
58+
You can [configure](https://learn.microsoft.com/en-us/dotnet/core/extensions/logging?tabs=command-line#configure-logging) logs sent to the integrated `Microsoft.Extensions.Logging` library in several ways. A common choice is to use the `appsettings.json` file to configure log-level filters for specific categories.
59+
60+
```json
61+
{
62+
"Logging": {
63+
"LogLevel": {
64+
"Default": "Information",
65+
"Microsoft.AspNetCore": "Warning",
66+
"Elastic.OpenTelemetry": "Warning"
67+
}
68+
},
69+
"AllowedHosts": "*"
70+
}
71+
```
72+
73+
In the preceding code, you have filtered `Elastic.OpenTelemetry` to only emit log entries when they have the `Warning` log level or a higher severity. This overrides the `Default` configuration of `Information`.
74+
75+
## Enable global file logging
76+
77+
Integrated logging is helpful because it requires little to no setup. The logging infrastructure is not present by default in some application types, such as console applications. EDOT .NET also offers a global file logging feature, which is the easiest way for you to get diagnostics and debug information. You must enable file logging when you work with Elastic support, as trace logs will be requested.
78+
79+
Specify at least one of the following environment variables to make sure that EDOT .NET logs into a file.
80+
81+
`OTEL_LOG_LEVEL` _(optional)_:
82+
Set the log level at which the profiler should log. Valid values are
83+
84+
* trace
85+
* debug
86+
* information
87+
* warning
88+
* error
89+
* none
90+
91+
The default value is `information`. More verbose log levels like `trace` and `debug` can affect the runtime performance of profiler auto instrumentation, so use them _only_ for diagnostics purposes.
92+
93+
:::{note}
94+
If you don't explicitly set `ELASTIC_OTEL_LOG_TARGETS` to include `file`, global file logging will only be enabled when you configure it with `trace` or `debug`.
95+
:::
96+
97+
`OTEL_DOTNET_AUTO_LOG_DIRECTORY` _(optional)_:
98+
Set the directory in which to write log files. If you don't set this, the default is:
99+
100+
* `%USERPROFILE%\AppData\Roaming\elastic\elastic-otel-dotnet` on Windows
101+
* `/var/log/elastic/elastic-otel-dotnet` on Linux
102+
* `~/Library/Application Support/elastic/elastic-otel-dotnet` on OSX
103+
104+
> ::::{important}
105+
> Make sure the user account under which the profiler process runs has permission to write to the destination log directory. Specifically, when you run on IIS, ensure that the [AppPool identity](https://learn.microsoft.com/en-us/iis/manage/configuring-security/application-pool-identities) has write permissions in the target directory.
106+
> ::::
107+
108+
`ELASTIC_OTEL_LOG_TARGETS` _(optional)_:
109+
A semi-colon separated list of targets for profiler logs. Valid values are
110+
111+
* file
112+
* stdout
113+
* none
114+
115+
The default value is `file` if you set `OTEL_DOTNET_AUTO_LOG_DIRECTORY` or set `OTEL_LOG_LEVEL` to `trace` or `debug`.
116+
117+
## Advanced troubleshooting
118+
119+
### Diagnosing initialization or bootstrap issues
120+
121+
If EDOT for .NET fails before fully bootstrapping its internal components, it won't generate a log file. In such circumstances, you can provide an additional logger for diagnostic purposes. Alternatively, you can enable the `STDOUT` log target.
122+
123+
#### Providing an additional application logger
124+
125+
You can provide an additional `ILogger` that EDOT .NET will use to log pre-bootstrap events by creating an instance of `ElasticOpenTelemetryOptions`.
126+
127+
```csharp
128+
using Elastic.OpenTelemetry;
129+
using Microsoft.Extensions.Logging;
130+
using OpenTelemetry;
131+
132+
using ILoggerFactory loggerFactory = LoggerFactory.Create(static builder =>
133+
{
134+
builder
135+
.AddFilter("Elastic.OpenTelemetry", LogLevel.Trace)
136+
.AddConsole();
137+
});
138+
139+
ILogger logger = loggerFactory.CreateLogger("EDOT");
140+
141+
var options = new ElasticOpenTelemetryOptions
142+
{
143+
AdditionalLogger = logger
144+
};
145+
146+
using var sdk = OpenTelemetrySdk.Create(builder => builder
147+
.WithElasticDefaults(options));
148+
```
149+
150+
This example adds the console logging provider, but you can include any provider here. To use this sample code, add a dependency on the `Microsoft.Extensions.Logging.Console` [NuGet package](https://www.nuget.org/packages/microsoft.extensions.logging.console).
151+
152+
You create and configure an `ILoggerFactory`. In this example, you configure the `Elastic.OpenTelemetry` category to capture trace logs, which is the most verbose option. This is the best choice when you diagnose initialization issues.
153+
154+
You use the `ILoggerFactory` to create an `ILogger`, which you then assign to the `ElasticOpenTelemetryOptions.AdditionalLogger` property. Once you pass the `ElasticOpenTelemetryOptions` into the `WithElasticDefaults` method, the provided logger can capture bootstrap logs.
155+
156+
To simplify the preceding code, you can also configure the `ElasticOpenTelemetryOptions` with an `ILoggerFactory` instance that EDOT .NET can use to create its own logger.
157+
158+
```csharp
159+
using var loggerFactory = LoggerFactory.Create(static builder =>
160+
{
161+
builder
162+
.AddFilter("Elastic.OpenTelemetry", LogLevel.Debug)
163+
.AddConsole();
164+
});
165+
166+
var options = new ElasticOpenTelemetryOptions
167+
{
168+
AdditionalLoggerFactory = loggerFactory
169+
};
170+
171+
using var sdk = OpenTelemetrySdk.Create(builder => builder
172+
.WithElasticDefaults(options));
173+
```
174+
175+
## Known issues
176+
177+
The following known issues affect EDOT .NET.
178+
179+
### Missing log records
180+
181+
The upstream SDK currently does not [comply with the spec](https://github.com/open-telemetry/opentelemetry-dotnet/issues/4324) regarding the deduplication of attributes when exporting log records. When you create a log within multiple scopes, each scope may store information using the same logical key. In this situation, the exported data will have duplicated attributes.
182+
183+
You are most likely to see this when you log in the scope of a request and enable the `OpenTelemetryLoggerOptions.IncludeScopes` option. ASP.NET Core adds the `RequestId` to multiple scopes. We recommend that you don't enable `IncludeScopes` until the SDK fixes this. When you use the EDOT Collector or the [{{motlp}}](opentelemetry://reference/motlp.md) in serverless, non-compliant log records will fail to be ingested.
184+
185+
EDOT .NET currently emits a warning if it detects that you use `IncludeScopes` in ASP.NET Core scenarios.
186+
187+
This can also happen even when you set `IncludeScopes` to false. The following code will also result in duplicate attributes and the potential for lost log records.
188+
189+
```csharp
190+
Logger.LogInformation("Eat your {fruit} {fruit} {fruit}!", "apple", "banana", "mango");
191+
```
192+
193+
To avoid this scenario, make sure each placeholder uses a unique name. For example:
194+
195+
```csharp
196+
Logger.LogInformation("Eat your {fruit1} {fruit2} {fruit3}!", "apple", "banana", "mango");
197+
```
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
---
2+
navigation_title: EDOT SDKs
3+
description: Troubleshoot issues with the EDOT SDKs using these guides.
4+
applies_to:
5+
stack:
6+
serverless:
7+
observability:
8+
products:
9+
- id: cloud-serverless
10+
- id: observability
11+
- id: edot-sdk
12+
---
13+
14+
# Troubleshooting the EDOT SDKs
15+
16+
Find solutions to common issues with EDOT SDKs.
17+
18+
- [.NET](/troubleshoot/ingest/opentelemetry/edot-sdks/dotnet/index.md)
19+
- [Java](/troubleshoot/ingest/opentelemetry/edot-sdks/java/index.md)
20+
- [Node.js](/troubleshoot/ingest/opentelemetry/edot-sdks/nodejs/index.md)
21+
- [PHP](/troubleshoot/ingest/opentelemetry/edot-sdks/php/index.md)
22+
- [Python](/troubleshoot/ingest/opentelemetry/edot-sdks/python/index.md)

0 commit comments

Comments
 (0)