Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Datadog agent -7.57.0 tasks failing to start in AWS ECS. #29285

Open
ShivaSharm opened this issue Sep 12, 2024 · 5 comments
Open

[BUG] Datadog agent -7.57.0 tasks failing to start in AWS ECS. #29285

ShivaSharm opened this issue Sep 12, 2024 · 5 comments

Comments

@ShivaSharm
Copy link

ShivaSharm commented Sep 12, 2024

DataDog Agent Service (Daemon) task failing to start for version 7.57.0 in AWS ECS.
It is working fine with 7.56.2 version.

--
Datadog Agent Logs:

2024-09-11 13:55:29 UTC | CORE | INFO | (pkg/logs/client/http/destination.go:444 in CheckConnectivity) | HTTP connectivity successful
2024-09-11 13:55:29 UTC | CORE | WARN | (pkg/logs/launchers/integration/launcher.go:49 in NewLauncher) | Unable to make integrations logs directory: mkdir /opt/datadog-agent/run/integrations: read-only file system
2024-09-11 13:55:29 UTC | CORE | INFO | (pkg/logs/auditor/auditor.go:203 in recoverRegistry) | Could not find state file at "/opt/datadog-agent/run/registry.json", will start with default offsets
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x7642b8f]
goroutine 585 [running]:
github.com/DataDog/datadog-agent/pkg/logs/launchers/integration.(*Launcher).run(0x0)
/omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/logs/launchers/integration/launcher.go:78 +0x2f
created by github.com/DataDog/datadog-agent/pkg/logs/launchers/integration.(*Launcher).Start in goroutine 399
/omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/logs/launchers/integration/launcher.go:66 +0x4f
AGENT EXITED WITH CODE 2, SIGNAL 0, KILLING CONTAINER
2024-09-11 13:55:29 UTC | SYS-PROBE | INFO | (cmd/system-probe/subcommands/run/command.go:162 in func2) | Received signal 'terminated', shutting down...
2024-09-11 13:55:29 UTC | PROCESS | INFO | (pkg/process/util/signal_nowindows.go:26 in HandleSignals) | Caught signal 'terminated'; terminating.
trace-agent exited with code 256, signal 15, restarting in 2 seconds
security-agent exited with code 256, signal 15, restarting in 2 seconds
2024-09-11 13:55:30 UTC | SYS-PROBE | INFO | (pkg/config/remote/client/client.go:423 in pollLoop) | retrying the first update of remote-config state (rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:5001: connect: connection refused")
2024-09-11 13:55:31 UTC | SYS-PROBE | INFO | (pkg/config/remote/client/client.go:423 in pollLoop) | retrying the first update of remote-config state (rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:5001: connect: connection refused")
system-probe exited with code 0, disabling
2024-09-11 13:56:27 UTC | PROCESS | ERROR | (pkg/process/checks/host_info.go:105 in getHostname) | failed to get hostname from grpc: cannot connect to datadog agent via grpc: context deadline exceeded
2024-09-11 13:56:27 UTC | PROCESS | INFO | (pkg/process/metadata/workloadmeta/extractor.go:84 in NewWorkloadMetaExtractor) | Instantiating a new WorkloadMetaExtractor

@FlorentClarret
Copy link
Member

Hi @ShivaSharm and thanks for opening this issue.

We're working on a fix for this and we will release it with Agent 7.57.2.

@mfarrokhnia
Copy link

mfarrokhnia commented Nov 10, 2024

I am facing with the same issue, however version 7.57.2 did not solve it. @ShivaSharm were you able to make it to work? If so, what version have you used?

@rlacoduf
Copy link

rlacoduf commented Jan 8, 2025

Hello, I also had a similar issue, though it was slightly different, but I managed to resolve it, so I’m sharing my experience in case it helps.

I am serving the server using Node.js + Express + Beanstalk packaging, and about 2-3 months ago, I encountered a situation where metrics and logs were visible, but traces were not.

After a lot of trial and error, I was able to solve the problem by downgrading the dd-trace version in package.json from 5.30.0 to 5.22.0. and Datadog agent version from 7.57.2 to 7.56.2.

I hope this helps!

@jpiazza35
Copy link

jpiazza35 commented Jan 30, 2025

hello , we are facing similar issue in ecs fargate

current error we are facing is

`

2025-01-30 11:13:50 UTC | SYS-PROBE | INFO | (comp/core/workloadmeta/collectors/internal/remote/generic.go:171 in func1) | unable to establish stream, will possibly retry: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp :5001: connect: connection refused" | ea80bc3aaef042a78efa6db37bb2b4b1 | datadog-agent
January 30, 2025 at 08:13 (UTC-3:00) | 2025-01-30 11:13:50 UTC | SYS-PROBE | INFO | (pkg/config/remote/client/client.go:434 in pollLoop) | retrying the first update of remote-config state (rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:5001: connect: connection refused") | ea80bc3aaef042a78efa6db37bb2b4b1 | datadog-agent
January 30, 2025 at 08:13 (UTC-3:00) | 2025-01-30 11:13:49 UTC | SYS-PROBE | INFO | (comp/core/tagger/impl-remote/remote.go:493 in func1) | unable to establish stream, will possibly retry: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp :5001: connect: connection refused" | ea80bc3aaef042a78efa6db37bb2b4b1 | datadog-agent
January 30, 2025 at 08:13 (UTC-3:00) | 2025-01-30 11:13:49 UTC | SYS-PROBE | INFO | (comp/core/workloadmeta/collectors/internal/remote/generic.go:171 in func1) | unable to establish stream, will possibly retry: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp :5001: connect: connection refused" | ea80bc3aaef042a78efa6db37bb2b4b1 | datadog-agent
January 30, 2025 at 08:13 (UTC-3:00) | 2025-01-30 11:13:49 UTC | SYS-PROBE | INFO | (pkg/config/remote/client/client.go:434 in pollLoop) | retrying the first update of remote-config state (rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:5001: connect: connection refused") | ea80bc3aaef042a78efa6db37bb2b4b1 | datadog-agent
January 30, 2025 at 08:13 (UTC-3:00) | 2025-01-30 11:13:48 UTC | SYS-PROBE | INFO | (comp/core/tagger/impl-remote/remote.go:493 in func1) | unable to establish stream, will possibly retry: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp :5001: connect: connection refused" | ea80bc3aaef042a78efa6db37bb2b4b1 | datadog-agent
January 30, 2025 at 08:13 (UTC-3:00) | 2025-01-30 11:13:48 UTC | SYS-PROBE | INFO | (pkg/config/remote/client/client.go:434 in pollLoop) | retrying the first update of remote-config state (rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:5001: connect: connection refused") | ea80bc3aaef042a78efa6db37bb2b4b1 | datadog-agent
January 30, 2025 at 08:13 (UTC-3:00) | 2025-01-30 11:13:47 UTC | SYS-PROBE | INFO | (comp/core/workloadmeta/collectors/internal/remote/generic.go:171 in func1) | unable to establish stream, will possibly retry: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp :5001: connect: connection refused" | ea80bc3aaef042a78efa6db37bb2b4b1 | datadog-agent
January 30, 2025 at 08:13 (UTC-3:00) | 2025-01-30 11:13:47 UTC | SYS-PROBE | INFO | (comp/core/tagger/impl-remote/remote.go:493 in func1) | unable to establish stream, will possibly retry: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp :5001: connect: connection refused" | ea80bc3aaef042a78efa6db37bb2b4b1 | datadog-agent
January 30, 2025 at 08:13 (UTC-3:00) |  


2025-01-30 11:13:50 UTC | SYS-PROBE | INFO | (comp/core/workloadmeta/collectors/internal/remote/generic.go:171 in func1) | unable to establish stream, will possibly retry: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp :5001: connect: connection refused"
ea80bc3aaef042a78efa6db37bb2b4b1
datadog-agent
January 30, 2025 at 08:13 (UTC-3:00)
2025-01-30 11:13:50 UTC | SYS-PROBE | INFO | (pkg/config/remote/client/client.go:434 in pollLoop) | retrying the first update of remote-config state (rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:5001: connect: connection refused")
ea80bc3aaef042a78efa6db37bb2b4b1
datadog-agent
January 30, 2025 at 08:13 (UTC-3:00)
2025-01-30 11:13:49 UTC | SYS-PROBE | INFO | (comp/core/tagger/impl-remote/remote.go:493 in func1) | unable to establish stream, will possibly retry: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp :5001: connect: connection refused"
ea80bc3aaef042a78efa6db37bb2b4b1
datadog-agent
January 30, 2025 at 08:13 (UTC-3:00)
2025-01-30 11:13:49 UTC | SYS-PROBE | INFO | (comp/core/workloadmeta/collectors/internal/remote/generic.go:171 in func1) | unable to establish stream, will possibly retry: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp :5001: connect: connection refused"
ea80bc3aaef042a78efa6db37bb2b4b1
datadog-agent
January 30, 2025 at 08:13 (UTC-3:00)
2025-01-30 11:13:49 UTC | SYS-PROBE | INFO | (pkg/config/remote/client/client.go:434 in pollLoop) | retrying the first update of remote-config state (rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:5001: connect: connection refused")
ea80bc3aaef042a78efa6db37bb2b4b1
datadog-agent
January 30, 2025 at 08:13 (UTC-3:00)
2025-01-30 11:13:48 UTC | SYS-PROBE | INFO | (comp/core/tagger/impl-remote/remote.go:493 in func1) | unable to establish stream, will possibly retry: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp :5001: connect: connection refused"
ea80bc3aaef042a78efa6db37bb2b4b1
datadog-agent
January 30, 2025 at 08:13 (UTC-3:00)
2025-01-30 11:13:48 UTC | SYS-PROBE | INFO | (pkg/config/remote/client/client.go:434 in pollLoop) | retrying the first update of remote-config state (rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:5001: connect: connection refused")
ea80bc3aaef042a78efa6db37bb2b4b1
datadog-agent
January 30, 2025 at 08:13 (UTC-3:00)
2025-01-30 11:13:47 UTC | SYS-PROBE | INFO | (comp/core/workloadmeta/collectors/internal/remote/generic.go:171 in func1) | unable to establish stream, will possibly retry: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp :5001: connect: connection refused"
ea80bc3aaef042a78efa6db37bb2b4b1
datadog-agent
January 30, 2025 at 08:13 (UTC-3:00)
2025-01-30 11:13:47 UTC | SYS-PROBE | INFO | (comp/core/tagger/impl-remote/remote.go:493 in func1) | unable to establish stream, will possibly retry: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp :5001: connect: connection refused"
ea80bc3aaef042a78efa6db37bb2b4b1
datadog-agent
January 30, 2025 at 08:13 (UTC-3:00)`

@adilnaimi
Copy link

I'm facing the same issue with ECS. I can't deploy a new version of my App because the datadog-agent keeps failing to start, crashing the entire task:

desc = "transport: Error while dialing: dial tcp :5001: connect: connection refused"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants