Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agent does not start with read-only file system #15127

Open
kayman-mk opened this issue Jan 18, 2023 · 30 comments
Open

Agent does not start with read-only file system #15127

kayman-mk opened this issue Jan 18, 2023 · 30 comments

Comments

@kayman-mk
Copy link

Our security team asked me to make the root file system of all containers read only. But I figured out that the Datadog agent dies and is not able to run on a read only file system.

Log output

2023-01-18T09:21:53.820+01:00 | [s6-init] making user provided files available at /var/run/s6/etc...exited 0.
2023-01-18T09:21:53.906+01:00 | [s6-init] ensuring user provided files have correct perms...exited 0.
2023-01-18T09:21:53.945+01:00 | [fix-attrs.d] applying ownership & permissions fixes...
2023-01-18T09:21:53.959+01:00 | [fix-attrs.d] done.
2023-01-18T09:21:53.959+01:00 | [cont-init.d] executing container initialization scripts...
2023-01-18T09:21:53.959+01:00 | [cont-init.d] 01-check-apikey.sh: executing...
2023-01-18T09:21:53.960+01:00 | [cont-init.d] 01-check-apikey.sh: exited 0.
2023-01-18T09:21:53.962+01:00 | [cont-init.d] 50-ci.sh: executing...
2023-01-18T09:21:53.972+01:00 | ln: failed to create symbolic link '/etc/datadog-agent/datadog.yaml': Read-only file system
2023-01-18T09:21:53.972+01:00 | [cont-init.d] 50-ci.sh: exited 0.
2023-01-18T09:21:53.972+01:00 | [cont-init.d] 50-ecs.sh: executing...
2023-01-18T09:21:53.990+01:00 | rm: cannot remove '/etc/datadog-agent/conf.d/cpu.d/conf.yaml.default': Read-only file system
2023-01-18T09:21:53.990+01:00 | rm: cannot remove '/etc/datadog-agent/conf.d/network.d/conf.yaml.default': Read-only file system
2023-01-18T09:21:53.990+01:00 | rm: cannot remove '/etc/datadog-agent/conf.d/io.d/conf.yaml.default': Read-only file system
2023-01-18T09:21:53.990+01:00 | rm: cannot remove '/etc/datadog-agent/conf.d/ntp.d/conf.yaml.default': Read-only file system
2023-01-18T09:21:53.990+01:00 | rm: cannot remove '/etc/datadog-agent/conf.d/uptime.d/conf.yaml.default': Read-only file system
2023-01-18T09:21:53.990+01:00 | rm: cannot remove '/etc/datadog-agent/conf.d/disk.d/conf.yaml.default': Read-only file system
2023-01-18T09:21:53.990+01:00 | rm: cannot remove '/etc/datadog-agent/conf.d/load.d/conf.yaml.default': Read-only file system
2023-01-18T09:21:53.990+01:00 | rm: cannot remove '/etc/datadog-agent/conf.d/file_handle.d/conf.yaml.default': Read-only file system
2023-01-18T09:21:53.990+01:00 | rm: cannot remove '/etc/datadog-agent/conf.d/memory.d/conf.yaml.default': Read-only file system
2023-01-18T09:21:53.993+01:00 | [cont-init.d] 50-ecs.sh: exited 123.
2023-01-18T09:21:54.020+01:00 | [cont-finish.d] executing container finish scripts...
2023-01-18T09:21:54.022+01:00 | [cont-finish.d] done.
2023-01-18T09:21:54.023+01:00 | [s6-finish] waiting for services.
2023-01-18T09:21:54.227+01:00 | [s6-finish] sending all processes the TERM signal.
2023-01-18T09:21:57.262+01:00 | [s6-finish] sending all processes the KILL signal and exiting.

Agent Environment

I am pulling the agent from public.ecr.aws/datadog/agent:latest. I do not see a version number in the log. I included it as a side car to my AWS ECS task definition.

Describe what happened:
After setting "readonlyRootFilesystem": true, in the task definition, the Datadog agent isn't able to start.

Describe what you expected:
Datadog agent should run as normal.

Steps to reproduce the issue:
Run the agent as a sidecar in AWS ECS. Set "readonlyRootFilesystem": true, in your container task definition.

Additional environment details (Operating System, Cloud provider, etc):
AWS ECS

@tomwire
Copy link

tomwire commented Jan 20, 2023

Funny Im just now checking this off on my InfoSec checklist... Perfect timing?

@vyrtus15
Copy link

vyrtus15 commented Feb 1, 2023

+1 waiting for Datadog agent to work with read-only FS.

@clamoriniere
Copy link
Contributor

Hi @kayman-mk, @tomwire and @vyrtus15

Thanks for reporting this issue.

In order to prioritise this feature request, please contact Datadog support and link this issue.

Thanks for your comprehension. 🙇

@kayman-mk
Copy link
Author

Support contacted: https://help.datadoghq.com/hc/en-us/requests/1101939

@maaz-nafees
Copy link

Hi @kayman-mk,
I ran into the same issue. Were you able to resolve this problem?

@kayman-mk
Copy link
Author

@clamoriniere Any news here?

The support answered on Feb 20 with:

Thanks for getting back to me. I understand this is an important feature for your organisation. I've gone ahead and created a
Feature Request for this with a note of it's impact on your business. In the meantime I'm going to mark this ticket as closed as
your request has been processed.

@tomwire
Copy link

tomwire commented Jun 27, 2023

@kayman-mk

Our workaround was to docker diff the running container and get a list of all the paths that are written in the container. Then in the task definition that uses the datadog image, we added a docker volume which was configured to use those paths that came back in the docker diff. This doesnt necessarily need to be a docker volume, any would work. We only need to link /etc/datadog-agent and /opt/datadog-agent to that docker volume before locking down the root volume. I suspect people may have different paths that need to be available, but that's what worked for us.

Our agent is currently running and reporting correctly with the root volume locked.

@kayman-mk
Copy link
Author

kayman-mk commented Jul 6, 2023

Good solution, @tomwire, but I am a little afraid that I run into problems if I update the version of the agent and it needs a different file set than the one before.

@tomwire
Copy link

tomwire commented Jul 6, 2023

@kayman-mk 100% agree, this is definitely the concern we have. I suspect the solution might end up being the configuration I recommended and a promise from DD that the filesystem will not be changed without proper notice. And some extra caution that our stacks are nothing alike, results may vary.

FWIW, our pipelines for our agents always grab the latest DD image, build and deploys, on a routine schedule. We haven't had any issues since and there have been updates.

I suppose a script that monitors syslog messages for permission errors on writing to files outside of the mounted volumes would save some headaches, but Im going to cross that bridge when DD breaks. I have a feeling the agents are well engineered and wont be throwing many surprises.

@thiago-youper
Copy link

+1 waiting for Datadog agent to work with read-only FS.

@aayushchhabra1999
Copy link

+1

1 similar comment
@jornskjerven
Copy link

+1

@cgspohn
Copy link

cgspohn commented Sep 27, 2023

+1 Other vendors are supporting this already, so waiting for the official solution by DataDog. Formal support case also entered.

@Siivers
Copy link

Siivers commented Oct 3, 2023

+1

7 similar comments
@danlaramay
Copy link

+1

@naomichi-y
Copy link

+1

@SlevinWasAlreadyTaken
Copy link

+1

@h-nago
Copy link

h-nago commented Oct 23, 2023

+1

@jdliauw
Copy link

jdliauw commented Oct 24, 2023

+1

@yokobot
Copy link

yokobot commented Nov 8, 2023

+1

@rod-murphy
Copy link

+1

@marklynch
Copy link

Given this article https://docs.datadoghq.com/security/default_rules/cis-docker-1.2.0-5.12/ would be good to see progress on this.

@eli-gc
Copy link

eli-gc commented Jan 24, 2024

I just got my agent deployed in AKS with read-only root filesystem. I am using the helm chart v3.52.0
I have readOnlyRootFilesystem enabled for initContainers, agent, process agent, and cluster agent. Not sure if this is a new feature, but might be worth it to try again for those of you who haven't checked in awhile.

@henare
Copy link

henare commented Mar 5, 2024

I also successfully have the agent running with a read-only root filesystem. This is on ECS Fargate.

When the agent boots it tries to write configuration to /etc/datadog-agent so you have to mount a read/write filesystem at this location. This can be done in your task definition by creating a volume and mounting it at that location in the agent container definition.

@jjshinobi
Copy link

+1 Can we please prioritise this? We'd like this to be solved in the Datadog agent rather than applying the workaround mentioned above. Thank you!

@nihauc12
Copy link

+1

@jjshinobi
Copy link

This docker-compose.yml helps to test the issue locally. Working version:

services:
  datadog:
    image: public.ecr.aws/datadog/agent:7
    environment:
      - DD_API_KEY=<your_api_key>
      - DD_LOGS_ENABLED=true
      - DD_LOG_LEVEL=DEBUG
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - /proc/:/host/proc/:ro
      - /sys/fs/cgroup:/host/sys/fs/cgroup:ro
      - datadog:/etc/datadog-agent
      - datadog:/opt/datadog-agent/run
    read_only: true
volumes:
  datadog:

If /opt/datadog-agent is mounted the container dies. There are references of /opt/datadog-agent/run mount point in the codebase where the agent is running in Kubernetes cluster.

@kashitaka
Copy link

kashitaka commented Sep 20, 2024

Hi, I had the same issue and came across here.
As @henare and @jjshinobi mentioned, the datadog/agent container worked when I attached a host volume to the container's /etc/datadog-agent path.

Although It seems to be working, I am concerned that the image's /etc/datadog-agent directory contains lots of conf files and directories in it, such as conf.d and datadog-docker.yaml, and mounting host volume to this path overwrites those directories and files with empty directory. I feel like this workaround might cause other problems, so I think the image should support the readonlyRootFilesystem: true option.

Edit:

With this workaround, ln command in 50-*.sh and 51-docker.sh will make link to no file:

if [[ ! -e /etc/datadog-agent/datadog.yaml ]]; then
ln -s /etc/datadog-agent/datadog-docker.yaml \
/etc/datadog-agent/datadog.yaml
fi

and this case is managed in 59-defaults.sh, which makes/etc/datadog-agent/datadog.yaml when the file does not exist, so it looks ok.

@y-kimura-q
Copy link

+1

1 similar comment
@davidjoliver86
Copy link

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests