From cf9c674f0f8d880e309f97104d8ebc619028e31b Mon Sep 17 00:00:00 2001 From: Eduardo Filho Date: Mon, 24 Feb 2025 10:41:19 -0500 Subject: [PATCH 1/7] Analysis gotchas: comapare Legacy and Glean in GLAM --- src/concepts/analysis_gotchas.md | 41 ++++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) diff --git a/src/concepts/analysis_gotchas.md b/src/concepts/analysis_gotchas.md index 95a4e66c6..795272a07 100644 --- a/src/concepts/analysis_gotchas.md +++ b/src/concepts/analysis_gotchas.md @@ -300,3 +300,44 @@ A build id might be formatted in any way and contain the time or version control Do not assume build id's are consistent across the products we ship. A build id format may vary between products, between channels of the same product, or over time within the same channel of the same product. The build id format for Firefox Desktop has been very stable over time thus far, but even it can be different for different platforms in some respin circumstances (if e.g. only one platform's builder failed). + +## Comparing Legacy Telemetry and Glean Data in GLAM + +### Official Recommendation + +> **Do Not Compare Legacy Telemetry and Glean Data Directly in GLAM.** + +- If you need to track long-term trends for a particular metric, treat the Legacy Telemetry timeframe and the Glean timeframe as **separate eras**. +- For in-depth analysis, rely on the Glean instrumentation once you have fully migrated, and use Legacy Telemetry only for historical reference. +- Recognize that both Legacy Telemetry and Glean “tell the same story” but from different angles and with different measurement methodologies. +- Both data sources remain valid and useful, but **side-by-side comparisons is not recommended and if done should be approached with caution**. Instead, analysts are encouraged to use Legacy Telemetry data for historical context and Glean data for current and future trends. + +#### If you still need to do side-by-side comparisons, be aware that significant discrepancies will occur due to a variety of factors: + +1. **Bucket Discrepancies (Histograms)** + + - **Legacy Telemetry**: Less buckets; Uses a fixed number of buckets depending on histogram type. + - **Glean**: More buckets; Uses an algorithmically-generated number of buckets depending on the metric's distribution type. + - **Result**: The distributions and percentiles can look different in GLAM even when measuring the same underlying data because the histogram bounds and number of buckets do not match. + +2. **Cross-Process vs. Per-Process Collection** + + - **Legacy Telemetry**: Often collects data per process (e.g., main, content, etc.) and can send data differently depending on the process. + - **Glean**: Consolidates measurements across multiple processes. + - **Result**: Aggregated Glean data may appear larger or differently distributed compared to Legacy data, because it merges what Legacy would treat as separate process-specific measurements. + +3. **Ping Differences (Baseline & Metrics Pings in Glean)** + + - **Legacy Telemetry**: Typically sends one primary ping type (e.g., the “main” ping) for most data. + - **Glean**: Splits data into multiple ping types (e.g., a “baseline” ping, a “metrics” ping, etc.). + - **Result**: The same metric can appear to have more frequent updates or different submission times in Glean if it is reported in multiple pings. + +4. **Different Reporting Frequencies (Especially for Scalars)** + - **Legacy Telemetry**: Sends telemetry data at distinct intervals or under certain conditions. Usually per browsing session. + - **Glean**: Generally sends data less often. Usually once a day for the `metrics` ping. + - **Result**: Scalar comparisons (like sums or counts) often diverge because each system “batches” or “chunks” the data differently over time. + +#### Impact on Analyses + +- **Histogram Metrics**: Expect to see different bucket distributions, total counts, and percentile shapes. +- **Scalars**: Differences in sums, counts, and other simple accumulations are common. The magnitude of these discrepancies may vary depending on how often the ping is sent, how usage patterns differ, and whether data is merged across processes. From 13963923e6226027539dbd33d74fddbb3e398b37 Mon Sep 17 00:00:00 2001 From: Eduardo Filho Date: Mon, 24 Feb 2025 10:46:01 -0500 Subject: [PATCH 2/7] adds timeframe to spelling --- .spelling | 1 + 1 file changed, 1 insertion(+) diff --git a/.spelling b/.spelling index a02401229..86de4f5f6 100644 --- a/.spelling +++ b/.spelling @@ -370,6 +370,7 @@ Taskcluster TBD TCP templated +timeframe timeline timelines timestamp From 5fa464d5089389470f5f0cb24693a7a0ee9adfce Mon Sep 17 00:00:00 2001 From: Eduardo Filho Date: Mon, 24 Feb 2025 12:35:18 -0500 Subject: [PATCH 3/7] Update src/concepts/analysis_gotchas.md Co-authored-by: Chris H-C --- src/concepts/analysis_gotchas.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/concepts/analysis_gotchas.md b/src/concepts/analysis_gotchas.md index 795272a07..fea751ea8 100644 --- a/src/concepts/analysis_gotchas.md +++ b/src/concepts/analysis_gotchas.md @@ -310,7 +310,7 @@ The build id format for Firefox Desktop has been very stable over time thus far, - If you need to track long-term trends for a particular metric, treat the Legacy Telemetry timeframe and the Glean timeframe as **separate eras**. - For in-depth analysis, rely on the Glean instrumentation once you have fully migrated, and use Legacy Telemetry only for historical reference. - Recognize that both Legacy Telemetry and Glean “tell the same story” but from different angles and with different measurement methodologies. -- Both data sources remain valid and useful, but **side-by-side comparisons is not recommended and if done should be approached with caution**. Instead, analysts are encouraged to use Legacy Telemetry data for historical context and Glean data for current and future trends. +- Both data sources remain valid and useful, but **side-by-side comparison is not recommended and if done should be approached with caution**. Instead, analysts are encouraged to use Legacy Telemetry data for historical context and Glean data for current and future trends. #### If you still need to do side-by-side comparisons, be aware that significant discrepancies will occur due to a variety of factors: From 750fec8c88b1f23082b221db5bf7c9f6d66950c6 Mon Sep 17 00:00:00 2001 From: Eduardo Filho Date: Mon, 24 Feb 2025 12:35:24 -0500 Subject: [PATCH 4/7] Update src/concepts/analysis_gotchas.md Co-authored-by: Chris H-C --- src/concepts/analysis_gotchas.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/concepts/analysis_gotchas.md b/src/concepts/analysis_gotchas.md index fea751ea8..5a4a063ac 100644 --- a/src/concepts/analysis_gotchas.md +++ b/src/concepts/analysis_gotchas.md @@ -316,7 +316,7 @@ The build id format for Firefox Desktop has been very stable over time thus far, 1. **Bucket Discrepancies (Histograms)** - - **Legacy Telemetry**: Less buckets; Uses a fixed number of buckets depending on histogram type. + - **Legacy Telemetry**: Fewer buckets; Uses a fixed number of buckets depending on histogram type. - **Glean**: More buckets; Uses an algorithmically-generated number of buckets depending on the metric's distribution type. - **Result**: The distributions and percentiles can look different in GLAM even when measuring the same underlying data because the histogram bounds and number of buckets do not match. From a1321dd914a65c396fc99ef094404423ab8bca92 Mon Sep 17 00:00:00 2001 From: Eduardo Filho Date: Mon, 24 Feb 2025 12:35:35 -0500 Subject: [PATCH 5/7] Update src/concepts/analysis_gotchas.md Co-authored-by: Chris H-C --- src/concepts/analysis_gotchas.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/concepts/analysis_gotchas.md b/src/concepts/analysis_gotchas.md index 5a4a063ac..4dfe99d16 100644 --- a/src/concepts/analysis_gotchas.md +++ b/src/concepts/analysis_gotchas.md @@ -326,7 +326,7 @@ The build id format for Firefox Desktop has been very stable over time thus far, - **Glean**: Consolidates measurements across multiple processes. - **Result**: Aggregated Glean data may appear larger or differently distributed compared to Legacy data, because it merges what Legacy would treat as separate process-specific measurements. -3. **Ping Differences (Baseline & Metrics Pings in Glean)** +3. **Ping Differences ("baseline" & "metrics" Pings in Glean, "main" pings in Legacy Telemetry)** - **Legacy Telemetry**: Typically sends one primary ping type (e.g., the “main” ping) for most data. - **Glean**: Splits data into multiple ping types (e.g., a “baseline” ping, a “metrics” ping, etc.). From 9a8c8ae23d06a23eb4718100bd0a6fba34364c97 Mon Sep 17 00:00:00 2001 From: Eduardo Filho Date: Mon, 24 Feb 2025 12:35:54 -0500 Subject: [PATCH 6/7] Update src/concepts/analysis_gotchas.md Co-authored-by: Chris H-C --- src/concepts/analysis_gotchas.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/concepts/analysis_gotchas.md b/src/concepts/analysis_gotchas.md index 4dfe99d16..b27fe8a4e 100644 --- a/src/concepts/analysis_gotchas.md +++ b/src/concepts/analysis_gotchas.md @@ -333,7 +333,7 @@ The build id format for Firefox Desktop has been very stable over time thus far, - **Result**: The same metric can appear to have more frequent updates or different submission times in Glean if it is reported in multiple pings. 4. **Different Reporting Frequencies (Especially for Scalars)** - - **Legacy Telemetry**: Sends telemetry data at distinct intervals or under certain conditions. Usually per browsing session. + - **Legacy Telemetry**: Sends telemetry data [at distinct intervals or under certain conditions](https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/data/main-ping.html). Usually per browsing session. - **Glean**: Generally sends data less often. Usually once a day for the `metrics` ping. - **Result**: Scalar comparisons (like sums or counts) often diverge because each system “batches” or “chunks” the data differently over time. From d24eacf1e1a029255d1bdfc8200da8a665c8ea9b Mon Sep 17 00:00:00 2001 From: Eduardo Filho Date: Mon, 24 Feb 2025 12:36:08 -0500 Subject: [PATCH 7/7] Update src/concepts/analysis_gotchas.md Co-authored-by: Chris H-C --- src/concepts/analysis_gotchas.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/concepts/analysis_gotchas.md b/src/concepts/analysis_gotchas.md index b27fe8a4e..7bfbbbb64 100644 --- a/src/concepts/analysis_gotchas.md +++ b/src/concepts/analysis_gotchas.md @@ -334,7 +334,7 @@ The build id format for Firefox Desktop has been very stable over time thus far, 4. **Different Reporting Frequencies (Especially for Scalars)** - **Legacy Telemetry**: Sends telemetry data [at distinct intervals or under certain conditions](https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/data/main-ping.html). Usually per browsing session. - - **Glean**: Generally sends data less often. Usually once a day for the `metrics` ping. + - **Glean**: Generally sends data [less often](https://mozilla.github.io/glean/book/user/pings/metrics.html#scheduling). Usually once a day for the `metrics` ping. - **Result**: Scalar comparisons (like sums or counts) often diverge because each system “batches” or “chunks” the data differently over time. #### Impact on Analyses