|
| 1 | +# ECS CLI Logging |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +The following Proposal lays out a design and implementation plan for creating a user experience in the ECS CLI for getting container logs from CloudWatch. |
| 6 | + |
| 7 | +### Use Cases |
| 8 | + |
| 9 | +1. User has a known error and wants to find more info on it. |
| 10 | +2. User is doing a deployment and wants to tail the logs and grep for errors. |
| 11 | +3. User wants to quickly set up their CloudWatch Logs based upon the configuration specified in their docker compose file, and have the CLI creates any necessary log groups for them. |
| 12 | +4. User wants to monitor their task/service, so they continually stream the logs. |
| 13 | + |
| 14 | +## Phase 1 Solution |
| 15 | +Top level `ecs-cli logs` command that will not use the docker compose file. This allows it to be used by a wide array of ECS customers, not just compose users. The command will allow customers to find logs for a given task. |
| 16 | + |
| 17 | +``` |
| 18 | +ecs-cli logs --help |
| 19 | +--follow Stream logs (continuously poll for updates) |
| 20 | +--task-id [Required] View logs for a given Task ID |
| 21 | +--task-def Required with Task ID if the task has been stopped already. Format: family:revision |
| 22 | +--filter-pattern Substring to search for within the logs. |
| 23 | +--container-name, -c Filter logs for a given container definition |
| 24 | +--since Filter logs in the last X minutes (can not be used with start time and end time) |
| 25 | +--start-time Filter logs within a time frame, use with --end-time |
| 26 | +--end-time Filter logs within a time frame, use with --end-time |
| 27 | +--timestamps, -t View time-stamps with the logs |
| 28 | +``` |
| 29 | + |
| 30 | +``` |
| 31 | +ecs-cli logs --task-id d86079d1-6858-45e9-8ce2-1ba881c55c12 --time-stamps |
| 32 | +Time-stamp Message |
| 33 | +2017-09-28 22:32:11 WordPress not found in /var/www/html - copying now... |
| 34 | +2017-09-28 22:32:11 Complete! WordPress has been successfully copied to /var/www/html |
| 35 | +2017-09-28 22:32:12 AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 172.17.0.3. Set the 'ServerName' directive globally to suppress this message |
| 36 | +2017-09-28 22:32:12 AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 172.17.0.3. Set the 'ServerName' directive globally to suppress this message |
| 37 | +2017-09-28 22:32:12 [Wed Sep 27 22:32:12.300422 2017] [mpm_prefork:notice] [pid 1] AH00163: Apache/2.4.10 (Debian) PHP/5.6.31 configured -- resuming normal operations |
| 38 | +2017-09-28 22:32:12 [Wed Sep 27 22:32:12.300456 2017] [core:notice] [pid 1] AH00094: Command line: 'apache2 -D FOREGROUND' |
| 39 | +``` |
| 40 | + |
| 41 | +### Implementation |
| 42 | + |
| 43 | +- The logs implementation will not include any pagination- the command will return all logs corresponding to the specified search. We expect most users will be piping the output of the command to save it to a file, so this should not be a problem. |
| 44 | +- If the user has not specified a log stream prefix in their task definition, then the command will fail and print an error message. Because without the log stream prefix set, we have no way of getting the logs for an individual task. |
| 45 | +- For performance reasons, the command will only pull from *a single log group*. If the customer has not configured all of their container definitions to use the same log group, then the command will fail with an error and tell the customer they must re-run the command with the `--container-name` argument. This way, only 1 log group needs to be queried. |
| 46 | + |
| 47 | +Work Flow: |
| 48 | +1. User gives Task ID |
| 49 | +2. Call Describe Tasks to get the TaskDef ARN (Skip this step if user provides Task Def) |
| 50 | +3. Call Describe Task Definition to get the Container Definitions. |
| 51 | +4. From Container Definitions, get the log configuration. |
| 52 | +5. Create a list of log streams that correspond to the correct task for the log group. |
| 53 | +6. Call FilterLogEvents on the log group to get the log events. |
| 54 | +7. Print log events. |
| 55 | + |
| 56 | + |
| 57 | +## Compose Logs (Phase 2) |
| 58 | +Phase 2 will be implemented in the future when we have time, it is lower priority than Phase 1, and thus Phase 2 may not be implemented for some time. We welcome the contributions of any customer who wishes to help start implementation of Phase 2 sooner. |
| 59 | + |
| 60 | +### Configure Logs |
| 61 | +- Log configuration using the docker-compose file is already supported |
| 62 | +- Problem: Customer not required specify log stream prefix, however, we basically need log stream prefix to be specified because of how the ECS Agent sets the log stream name. If prefix is specified then it adds the container name and task ID to the log stream name (so we can use it to get the logs for each task). However, if a prefix is not specified, then the log stream name will for all intents and purposes be a random useless string (its an ID picked by the docker daemon on the instance, which from our point of view is meaningless). |
| 63 | + - The log stream will be named like this (by the ECS Agent): `prefix-name/container-name/ecs-task-id` |
| 64 | + |
| 65 | +*Solution:* Existing ability to configure logs remains undisturbed, but add additional flag `--create-log-groups` that creates the necessary log group(s) in CloudWatch. |
| 66 | +- The log configuration from the docker compose file will be read |
| 67 | +- If user has not specified a log stream prefix, warn them that we are auto-setting it to a default value in their task definition. |
| 68 | + - *Additionally*, even if `--create-log-groups` is not specified, but we detect that the there is no prefix configured in their docker compose file (but log group and awslogs driver is specified), prefix will still be auto-set, and the user will be warned about this. This technically will break backwards compatibility- however, this risk is acceptable. It is very unlikely that ECS CLI users would actually desire to have their log streams named without a prefix. If no prefix is given, the ECS Agent sets the log stream name to be the container ID which was randomly generated by docker. Understanding this random ID requires logging into the underlying instances and retrieving info from the Docker Daemon. For all intents and purposes, the container ID is meaningless from a customer standpoint. |
| 69 | + - The ECS CLI is designed to simplify workflows and make it easier to understand ECS. Therefore, we should be opinionated and protect users from accidentally configuring there logs in a poor way. We can help protect users from the less useful, complicated, legacy behavior of ECS. |
| 70 | + - Additionally, the user will be warned that the default retention policy is to keep all log events forever, causing them to be charged for all time. They can change the policy in the CloudWatch Console or AWS CLI. |
| 71 | + |
| 72 | +``` |
| 73 | +ecs-cli compose up --help |
| 74 | +--create-log-groups Creates any necessary log groups in CloudWatch. |
| 75 | +``` |
| 76 | + |
| 77 | +``` |
| 78 | +ecs-cli compose up --create-log-groups |
| 79 | +INFO[0000] Creating Resources in CloudWatch for your logs. |
| 80 | +WARN[0001] You have not specified a log stream prefix, auto-setting it to 'ecs-compose-' |
| 81 | +WARN[0002] By default, CloudWatch will store your logs forever, it is recommended that you set a retention policy. |
| 82 | +``` |
| 83 | + |
| 84 | +*Suggested Configuration:* |
| 85 | +- If the user has not specified a log configuration in their compose file, then using the `--create-log-groups` command will fail and will print a help message with the suggested configuration. Here is one possible idea: |
| 86 | +For Services: |
| 87 | +``` |
| 88 | +awslogs-group: ${cluster name}/${service name} |
| 89 | +``` |
| 90 | +For Tasks: |
| 91 | +``` |
| 92 | +awslogs-group: ${cluster name}/${task def family} |
| 93 | +``` |
| 94 | + |
| 95 | +### View Logs |
| 96 | +- New Commands: `ecs-cli compose logs`, and `ecs-cli compose service logs` |
| 97 | +- *Log command reads the configuration in user's docker compose file* |
| 98 | + |
| 99 | +*Solution:* In docker-compose, and ECS task def, logs are configured per container definition. In docker-compose, these are called services and they must have names. Therefore, a user can view the logs per container definition. Since the agent will add the task ID to the log stream name, we can also list the logs for each task. |
| 100 | + |
| 101 | +``` |
| 102 | +ecs-cli compose logs --help |
| 103 | +--follow Stream logs (continuously poll for updates) |
| 104 | +--task-id View logs for a given Task ID |
| 105 | +--container-name, -c View logs for a given container definition |
| 106 | +--since View logs in the last X minutes (can not be used with start time and end time) |
| 107 | +--start-time View logs within a time frame, use with --end-time |
| 108 | +--end-time View logs within a time frame, use with --end-time |
| 109 | +--time-stamps, -t View time-stamps with the logs |
| 110 | +--output, -o Output to a file |
| 111 | +``` |
| 112 | + |
| 113 | +User's docker-compose file: |
| 114 | +``` |
| 115 | +version: '2' |
| 116 | +services: |
| 117 | + mysql: |
| 118 | + image: mysql |
| 119 | + cpu_shares: 100 |
| 120 | + mem_limit: 524288000 |
| 121 | + cap_add: |
| 122 | + - ALL |
| 123 | + logging: |
| 124 | + driver: awslogs |
| 125 | + options: |
| 126 | + awslogs-group: ecs-log-streaming |
| 127 | + awslogs-region: us-west-2 |
| 128 | + awslogs-stream-prefix: mysql-logs |
| 129 | + wordpress: |
| 130 | + image: wordpress |
| 131 | + cpu_shares: 132 |
| 132 | + mem_limit: 524288001 |
| 133 | + ports: |
| 134 | + - "80:80" |
| 135 | + links: |
| 136 | + - mysql |
| 137 | + logging: |
| 138 | + driver: awslogs |
| 139 | + options: |
| 140 | + awslogs-group: ecs-log-streaming |
| 141 | + awslogs-region: us-west-2 |
| 142 | + awslogs-stream-prefix: wordpress-logs |
| 143 | +``` |
| 144 | + |
| 145 | +##### Examples |
| 146 | + |
| 147 | +*User views logs for all MySQL Containers:* |
| 148 | +- Outputs the logs for all containers running the given container definition. Ie if the user has 10 tasks running using this compose file, then the logs for all 10 of the mysql containers will be outputted. The output can be organized by the task ID. |
| 149 | + |
| 150 | +*Implementation Details:* |
| 151 | +- From the compose file, we know the log group for this container definition. We can call DescribeLogStreams to get a list of the log streams. FilterLogEvents can then be called with the list of LogStreams to get the log events. Each returned log event will have the log stream name associated with it- this will contain the task ID. |
| 152 | + |
| 153 | +``` |
| 154 | +ecs-cli compose logs --container-name mysql --time-stamps |
| 155 | +INFO[0000] Showing logs for all mysql containers |
| 156 | +_______________________________________ |
| 157 | +Task: d86079d1-6858-45e9-8ce2-1ba881c55c12 |
| 158 | +_______________________________________ |
| 159 | +Time-stamp Message |
| 160 | +2017-09-28 22:32:11 WordPress not found in /var/www/html - copying now... |
| 161 | +2017-09-28 22:32:11 Complete! WordPress has been successfully copied to /var/www/html |
| 162 | +2017-09-28 22:32:12 AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 172.17.0.3. Set the 'ServerName' directive globally to suppress this message |
| 163 | +2017-09-28 22:32:12 AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 172.17.0.3. Set the 'ServerName' directive globally to suppress this message |
| 164 | +_______________________________________ |
| 165 | +Task: d86079d1-6858-45e9-8ce2-1ba881c55c12 |
| 166 | +______________________________________ |
| 167 | +Time-stamp Message |
| 168 | +2017-09-28 22:32:12 [Wed Sep 27 22:32:12.300422 2017] [mpm_prefork:notice] [pid 1] AH00163: Apache/2.4.10 (Debian) PHP/5.6.31 configured -- resuming normal operations |
| 169 | +2017-09-28 22:32:12 [Wed Sep 27 22:32:12.300456 2017] [core:notice] [pid 1] AH00094: Command line: 'apache2 -D FOREGROUND' |
| 170 | +``` |
| 171 | + |
| 172 | + |
| 173 | +*User views logs for a given task:* |
| 174 | +- Outputs the logs for a given task ID |
| 175 | +- The logs can be organized by the container name |
| 176 | + |
| 177 | +*Implementation Details:* |
| 178 | +- From the compose file, we know the log group for each container definition. We can call DescribeLogStreams to get a list of the log streams for each container definition. Each log stream will contain the Task ID in its name- so we can then call FilterLogEvents and use only the log streams for the given task ID as arguments. Each returned log event will have the log stream name associated with it- this will contain the container name. |
| 179 | + |
| 180 | +``` |
| 181 | +ecs-cli compose logs --task-id --t |
| 182 | +Container: MySql |
| 183 | +_______________________ |
| 184 | +Time-stamp Message |
| 185 | +2017-09-28 22:32:11 WordPress not found in /var/www/html - copying now... |
| 186 | +2017-09-28 22:32:11 Complete! WordPress has been successfully copied to /var/www/html |
| 187 | +2017-09-28 22:32:12 AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 172.17.0.3. Set the 'ServerName' directive globally to suppress this message |
| 188 | +2017-09-28 22:32:12 AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 172.17.0.3. Set the 'ServerName' directive globally to suppress this message |
| 189 | +_______________________ |
| 190 | +Container: Wordpress |
| 191 | +_______________________ |
| 192 | +Time-stamp Message |
| 193 | +2017-09-28 22:32:12 [Wed Sep 27 22:32:12.300422 2017] [mpm_prefork:notice] [pid 1] AH00163: Apache/2.4.10 (Debian) PHP/5.6.31 configured -- resuming normal operations |
| 194 | +2017-09-28 22:32:12 [Wed Sep 27 22:32:12.300456 2017] [core:notice] [pid 1] AH00094: Command line: 'apache2 -D FOREGROUND' |
| 195 | +``` |
0 commit comments