Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please unify cloud provider tags #7396

Open
tonglil opened this issue Feb 8, 2021 · 8 comments
Open

Please unify cloud provider tags #7396

tonglil opened this issue Feb 8, 2021 · 8 comments

Comments

@tonglil
Copy link
Contributor

tonglil commented Feb 8, 2021

Describe what happened:

Currently, the tags Datadog collects and applies to cloud resources are 1) inconsistent and 2) varied.

1. For example, there is no convention or standardization between - and _

Please unify them on one or the other.

For GCP, these are the tags collected:

Zone
Instance-type
Instance-id
Automatic-restart
On-host-maintenance
Numeric_project_id

For AWS:

# EC2
autoscaling_group
availability-zone
instance-id
instance-type
security_group_name

In this case, there is

  1. a difference between zone (GCP) and availability-zone (AWS), causing duplication & multiplication of tags (and therefore custom metrics) if you want to unify them across clouds (ie one app deployed in both)
  2. a difference between _ and - within the same category

2. Duplicated tags from integrations

Along with the above, different integration levels collect the same tags with different keys, resulting in examples like this (GCP & K8s):

cluster_name
cluster-name

I have no idea which one did what because

  1. the gcp docs don't say which tags are collected for what integration (like the aws docs)
    image
  2. the k8s docs don't say anywhere what tags are tacked on automatically (i assume it's cluster_name based on DD_CLUSTER_NAME
    image

3. Another naming oddity

# EBS
volumeid, volume-name, volume-type
# EC2
instance-id, name, instance-type
# ECS
instance_id, clustername, servicename

4. cloud_provider tag is inconsistently automatically applied

For agents running in GCP, the cloud_provider:gcp tag is automatically added to all things.
However based on a chat with a support agent, the cloud_provider:aws tag is not automatically added for AWS:
image

This is inconsistent behavior

5. aws_account tag is inconsistently applied to AWS metrics

Some AWS metrics collected by the AWS integration are automatically tagged with the account number, while others lack this tag.
For example, ELB metrics have this tag, but EC2 metrics do not.

This makes filtering on this tag value difficult when building a multi-account dashboard.

This is not a problem with GCP as all metrics are tagged with project_id.

Describe what you expected:

I expect:

  • unifying all name separators with _ or - (probably _ since you automatically convert camelCase to underscores already)
  • use separators consistently (volume_id instance_id)
  • use names consistently (volume_name instance_name cluster_name service_name)
  • use "categories" consistently (zone)
  • unify tags between cloud providers (I know much of this is currently an afterthought)

This could be easier if I could actually configure how tags are formatted or configured.

It doesn't seem like the code allows for that right now since it is hardcoded:

tags = append(tags, fmt.Sprintf("zone:%s", ts[len(ts)-1]))
}
if metadata.Instance.MachineType != "" {
ts := strings.Split(metadata.Instance.MachineType, "/")
tags = append(tags, fmt.Sprintf("instance-type:%s", ts[len(ts)-1]))
}
if metadata.Instance.Hostname != "" {
tags = append(tags, fmt.Sprintf("internal-hostname:%s", metadata.Instance.Hostname))
}
if metadata.Instance.ID != 0 {
tags = append(tags, fmt.Sprintf("instance-id:%d", metadata.Instance.ID))
}
if metadata.Project.ProjectID != "" {
tags = append(tags, fmt.Sprintf("project:%s", metadata.Project.ProjectID))
if config.Datadog.GetBool("gce_send_project_id_tag") {
tags = append(tags, fmt.Sprintf("project_id:%s", metadata.Project.ProjectID))
}
}
if metadata.Project.NumericProjectID != 0 {
tags = append(tags, fmt.Sprintf("numeric_project_id:%d", metadata.Project.NumericProjectID))

It would also be helpful if I can pick and choose which tags are collected, but that's not possible either.

I ask for this because building multi/cross cloud queries & dashboards are not trivial.

This seems like a sensible thing to do (review tag names and make them conform to some "datadog-internal" standard) so user's don't have a poor experience when trying to correlate data from multiple sources.

@tonglil
Copy link
Contributor Author

tonglil commented Jun 17, 2021

Here's more anomalies:

AZs

The concept of zones use different tags for each cloud provider:

cloud name
azure availability_zone
aws availability-zone
gcp zone

Regions

The region tag is collected from Azure and AWS, however not from all GCP integrations. Some have it, but then some don't, like GAE, cloud nats, etc...

For select integrations it's named location instead, like cloud run, spanner, memcache, cloud tasks, dns, etc...

@edwardaux
Copy link

Tap, tap, tap... is this thing on? Just wondering if there's any plans to address this at all?

It makes it /really/ hard to build dashboards if the tagging isn't consistent.

@ian28223
Copy link
Contributor

@edwardaux Thanks for feedback. Have you raised this issue/open a ticket through the support channels? If not, I would advise that you do because most of the tagging issues you mentioned are not done by the agent (except maybe for the clustername in a k8s env); there are different teams involved with Crawler/Web/Cloud-based integrations that have no dependencies to the Datadog Agent (this repository). That said, a support ticket might be a better way for this issue to get more traction and have it routed to teams responsible.

@tonglil
Copy link
Contributor Author

tonglil commented May 13, 2022

@ian28223 I'm sorry but Datadog is a complete product, and as customers of this product it is surprising to me that this kind of interweaving and crosscutting issue is not of interest to be addressed.

Furthermore, asking customers to open tickets when employees can do so much easier is just shocking to me. Tickets opened often end up closed as a "thanks we'll file this as a feature request" with no accountability. Sharing here provides visibility to other customers that they're not the only ones having issues with that process and the product itself.

Lastly, it's unfortunate to see redirecting the responsibility and involvement of the Datadog agent. As I clearly linked to in the original comment, there are multiple places where tags are set or used by the agent.

The "different teams involved with Crawler/Web/Cloud-based integrations that have no dependencies to the Datadog Agent" together make the user experience for this product and the issues outlined result in a poor user experience.

I recommend Datadog (or the Datadog agent team) to revaluate the "somebody's else's problem" approach it takes to this kind of tech debt.

@btkostner
Copy link

I'd like to add that this is very annoying because there are interfaces in Datadog that assume one type of tag. For instance, loading up the infrastructure map with gcp and gke looks like this by default:

image

Some of the group drop down options also don't work because they are named different in gcp:

image

I think an acceptable stop gap would be to allow aliasing tags, so if you were running in gcp and had a tag like cluster-location:us-east1 you could alias it to create region:us-east1.

@ian28223
Copy link
Contributor

I understand and I agree. I have raised this to the relevant team's PM. Proper tracking would still be via official support channels.

@kallangerard
Copy link

Here's more anomalies:

AZs

The concept of zones use different tags for each cloud provider:

cloud name
azure availability_zone
aws availability-zone
gcp zone

Regions

The region tag is collected from Azure and AWS, however not from all GCP integrations. Some have it, but then some don't, like GAE, cloud nats, etc...

For select integrations it's named location instead, like cloud run, spanner, memcache, cloud tasks, dns, etc...

Note that for GCP, location is not the same as region. They're two separate attributes that often have the same value, but they're not the same.

See Cloud Storage Locations for example https://cloud.google.com/storage/docs/locations

@tonglil
Copy link
Contributor Author

tonglil commented May 29, 2024

For compute, it's still zones and regions.

https://cloud.google.com/compute/docs/regions-zones

https://cloud.google.com/docs/geography-and-regions

location is not the same as region

Correct, they are supplementary to each other. While GCS doesn't allow you to choose a specific zone, they still have regions (which is conceptually shared across other products like compute) in addition to multi-region codes (aka locations).

For cloud run, you can only specify region (no zone or multi region).

gcloud storage buckets create gs://BUCKET_NAME --location=US --placement=US-CENTRAL1,US-EAST1

DD should collect tags for GCP's zone, region, and location as applicable - and unify them (or allow us to rename them so) to deal with multicloud setups.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants