Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 7 additions & 5 deletions .linkcheck.json
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,14 @@
{ "pattern": "^https://monitor.firefox.com$" },
{ "pattern": "^https://developer.android.com/studio/build/application-id$" },
{ "pattern": "^https://github.com" },
{ "pattern": "^https://help.looker.com"},
{ "pattern": "^https://mozilla-hub.atlassian.net" },
{ "pattern": "^https:///mozilla.slack.com" },
{ "pattern": "^https://mozilla.udemy.com"},
{ "pattern": "^#bigquery-materialized-views$"},
{ "pattern": "^#looker-pdts--aggregate-awareness$"},
{ "pattern": "^#experiment-unpacking$"}
{ "pattern": "^https://mozilla.udemy.com" },
{ "pattern": "^https://sso.mozilla.com" },
{ "pattern": "^https://experimenter.services.mozilla.com" },
{ "pattern": "^#bigquery-materialized-views$" },
{ "pattern": "^#looker-pdts--aggregate-awareness$" },
{ "pattern": "^#experiment-unpacking$" },
{ "pattern": "/v2-system-addon/data_events.html" }
]
}
1,284 changes: 991 additions & 293 deletions package-lock.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,6 @@
"dependencies": {
"markdown-link-check": "^3.11.2",
"markdown-spellcheck": "^1.3.1",
"prettier": "^2.1.1"
"prettier": "^3.6.2"
}
}
2 changes: 1 addition & 1 deletion scripts/prettier_fix.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ export TERM=xterm-color

SEARCH_PATH=${@:-'src/**/*.md'}

npx prettier --write --loglevel warn $SEARCH_PATH
npx prettier --write --log-level warn $SEARCH_PATH
2 changes: 1 addition & 1 deletion src/concepts/airflow_gotchas.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ While upstream dependencies are automatically determined between generated DAGs

## Downstream dependencies are managed via `ExternalTaskMarker`s

[`ExternalTaskMarker`s](https://airflow.apache.org/docs/apache-airflow/stable/_api/airflow/sensors/external_task/index.html#airflow.sensors.external_task.ExternalTaskMarker) are used to indicate all downstream dependencies to a task. Whenever the task is cleared with _Downstream Recursive_ selected, then all downstream tasks will get cleared automatically. This is extremely useful when running backfill of Airflow tasks. When clearing the tasks, a pop-up will show all the downstream tasks that will get cleared. In case a task should be cleared without its downstream dependencies running as well, deselect the _Downstream Recursive_ option.
[`ExternalTaskMarker`s](https://airflow.apache.org/docs/apache-airflow-providers-standard/stable/sensors/external_task_sensor.html#externaltaskmarker) are used to indicate all downstream dependencies to a task. Whenever the task is cleared with _Downstream Recursive_ selected, then all downstream tasks will get cleared automatically. This is extremely useful when running backfill of Airflow tasks. When clearing the tasks, a pop-up will show all the downstream tasks that will get cleared. In case a task should be cleared without its downstream dependencies running as well, deselect the _Downstream Recursive_ option.

`ExternalTaskMarker`s are generally wrapped into a `TaskGroup` and defined like:

Expand Down
3 changes: 0 additions & 3 deletions src/concepts/analysis_gotchas.md
Original file line number Diff line number Diff line change
Expand Up @@ -318,19 +318,16 @@ The build id format for Firefox Desktop has been very stable over time thus far,
#### If you still need to do side-by-side comparisons, be aware that significant discrepancies will occur due to a variety of factors:

1. **Bucket Discrepancies (Histograms)**

- **Legacy Telemetry**: Fewer buckets; Uses a fixed number of buckets depending on histogram type.
- **Glean**: More buckets; Uses an algorithmically-generated number of buckets depending on the metric's distribution type.
- **Result**: The distributions and percentiles can look different in GLAM even when measuring the same underlying data because the histogram bounds and number of buckets do not match.

2. **Cross-Process vs. Per-Process Collection**

- **Legacy Telemetry**: Often collects data per process (e.g., main, content, etc.) and can send data differently depending on the process.
- **Glean**: Consolidates measurements across multiple processes.
- **Result**: Aggregated Glean data may appear larger or differently distributed compared to Legacy data, because it merges what Legacy would treat as separate process-specific measurements.

3. **Ping Differences ("baseline" & "metrics" Pings in Glean, "main" pings in Legacy Telemetry)**

- **Legacy Telemetry**: Typically sends one primary ping type (e.g., the “main” ping) for most data.
- **Glean**: Splits data into multiple ping types (e.g., a “baseline” ping, a “metrics” ping, etc.).
- **Result**: The same metric can appear to have more frequent updates or different submission times in Glean if it is reported in multiple pings.
Expand Down
3 changes: 1 addition & 2 deletions src/concepts/glean/glean.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ Because Glean knows more about the individual data, such as its type and the ran

**Provide a consistent base of telemetry**

A baseline of analysis is important for all our products, from counting active users to retention and session times. This is supported out-of-the-box by the SDK, and funnels directly into visualization tools like the [Growth and Usage Dashboard (GUD)](https://gud.telemetry.mozilla.org/).
A baseline of analysis is important for all our products, from counting active users to retention and session times. This is supported out-of-the-box by the SDK, and funnels directly into visualization tools like the [Growth and Usage Dashboard (GUD)](../../cookbooks/looker/growth_usage_dashboards.md).

Metrics that are common to all products, such as the operating system and architecture, are provided automatically in a consistent way.

Expand Down Expand Up @@ -78,7 +78,6 @@ This includes previously manual and error-prone steps such as updating the ping
- [Integrate the Glean SDK](https://mozilla.github.io/glean/book/user/adding-glean-to-your-project/index.html) into your product.
- [Use Looker](https://mozilla.cloud.looker.com/) to build Explores and Dashboards using your product's datasets.
- If Looker does not provide the necessary Explores you can resort to [using Redash](https://sql.telemetry.mozilla.org/) to write SQL queries & build dashboards using your products datasets, e.g.:

- `org_mozilla_fenix.baseline`
- `org_mozilla_fenix.events`
- `org_mozilla_fenix.metrics`
Expand Down
11 changes: 3 additions & 8 deletions src/concepts/pipeline/schemas.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,7 @@

Schemas describe the structure of ingested data. They are used in the pipeline to validate the types
and values of data, and to define a table schema in a data store. We use a repository of JSON
Schemas to sort incoming data into [`decoded` and `error` datasets][bq-datasets]. We also generate
BigQuery table schemas on business days from the JSON Schemas: you can see the current status of
this job on the [`mozilla-pipeline-schemas` deploy dashboard][mps-deploys].
Schemas to sort incoming data into [`decoded` and `error` datasets][bq-datasets].

```mermaid
graph TD
Expand Down Expand Up @@ -74,9 +72,7 @@ probe-scraper will automatically pick up changes from `metrics.yaml`.

Schema deploys happen on business days around UTC+04 when new changes are found in the
[`generated-schemas` branch of `mozilla-pipeline-schemas`][generated-schemas]. This means that any
changes merged after UTC+04 on Friday will not propagate until Monday UTC+04. See the
[`mozilla-pipeline-schemas` deploy][mps-deploys] dashboard for up-to-date information on the most
recent deploys.
changes merged after UTC+04 on Friday will not propagate until Monday UTC+04.

### What does it mean when a schema deploy is blocked?

Expand All @@ -94,9 +90,8 @@ is not registered before collection begins, then it will be sorted into the erro
may be affected by blocked schema deploys.

[bq-datasets]: ../../cookbooks/bigquery/querying.md#projects-with-bigquery-datasets
[mps-deploys]: https://protosaur.dev/mps-deploys/
[mps]: https://github.com/mozilla-services/mozilla-pipeline-schemas
[generated-schemas]: https://github.com/mozilla-services/mozilla-pipeline-schemas/tree/generated-schema
[generated-schemas]: https://github.com/mozilla-services/mozilla-pipeline-schemas/tree/generated-schemas
[msg]: https://github.com/mozilla/mozilla-schema-generator
[probe-scraper]: https://github.com/mozilla/probe-scraper

Expand Down
6 changes: 4 additions & 2 deletions src/cookbooks/clients_last_seen_bits.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ when we report a retention value for 2020-01-01, we're talking about what
portion of clients active on 2020-01-01 are still active some number of days
later.

In particular, let's consider the "1-Week Retention" measure shown in [GUD](https://gud.telemetry.mozilla.org/)
In particular, let's consider the "1-Week Retention" measure shown in [GUD]
which considers a window of 14 days.
For each client active in "week 0" (days 0 through 6), we determine retention by
checking if they were also active in "week 1" (days 7 through 13).
Expand Down Expand Up @@ -434,7 +434,7 @@ When we define forward-looking windows, however, we always choose a metric date
some time in the past. How we number the individual bits depends on what
metric date we choose.

For example, in [GUD](https://gud.telemetry.mozilla.org/), we show a "1-Week Retention" which considers a window of 14 days.
For example, in [GUD] we show a "1-Week Retention" which considers a window of 14 days.
For each client active in "week 0" (days 0 through 6), we determine retention by
checking if they were also active in "week 1" (days 7 through 13).

Expand Down Expand Up @@ -899,3 +899,5 @@ the standard 1-Week, 2-Week, and 3-Week retention definitions.
```sql
bits28.retention(bits INT64, submission_date DATE)
```

[gud]: looker/growth_usage_dashboards.md
5 changes: 2 additions & 3 deletions src/cookbooks/looker/intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,9 +61,8 @@ The following video demonstrates this workflow in detail:
If you want to explore Looker more deeply, you can check out:

- ["BI and Analytics with Looker" training hub](https://www.cloudskillsboost.google/journeys/28): A collection of self-paced video training courses for new users. Full courses are free, but require registration, but course descriptions contain material that is useful on its own.
- [Looker Documentation](https://docs.looker.com/): Extensive text and video documentation, a “textbook” reference on how the product works.
- [Looker Help Center](https://help.looker.com/): Contains articles on common problems, specific use cases, error messages, and best practices.
- [Looker Community](https://community.looker.com/) has customer-written material, broadcasts from Looker employees (primarily release notes), and topics written by Looker employees that are not officially supported by Looker.
- [Looker Documentation](https://cloud.google.com/looker/docs/intro): Extensive text and video documentation, a “textbook” reference on how the product works.
- [Looker Community](https://discuss.google.dev/c/looker/19) has customer-written material, broadcasts from Looker employees (primarily release notes), and topics written by Looker employees that are not officially supported by Looker.

You can find additional Looker training resources on the [Looker Training Resources] mana page (LDAP access required).

Expand Down
5 changes: 1 addition & 4 deletions src/metrics/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@
This section provides an overview of standard metrics used at Mozilla.
Here you'll find the definitions and descriptions for each.

For a deep dive into these metrics, see [the GUD documentation](https://mozilla.github.io/gud/).

The [Telemetry Behavior Reference](../concepts/index.md) section also provides
information related to the definitions below.

Expand Down Expand Up @@ -49,7 +47,7 @@ specified day, what proportion (out of 1) are active during the following week.
## Frequently Asked Questions

- Why isn't "New Users" a metric?
- "New Users" is considered a [usage criterion], which means it may be used
- "New Users" is considered a usage criterion, which means it may be used
to filter other metrics, rather than itself being a metric. For example,
you can compute "New User DAU", which would be the subset of DAU that match
the "New User" criterion. The exception here is 1-Week New Profile
Expand All @@ -62,6 +60,5 @@ specified day, what proportion (out of 1) are active during the following week.
- For Firefox Desktop, we use the `main` ping to determine activity.
- For products instrumented using Glean, we use the `baseline` ping.

[usage criterion]: https://mozilla.github.io/gud#data-model
[submission dates]: https://bugzilla.mozilla.org/show_bug.cgi?id=1422892
[pings]: ../datasets/pings.md
1 change: 0 additions & 1 deletion src/tools/projects.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,4 +139,3 @@ starting a new project using anything in this section.

| Name and repo | Description |
| ------------- | ----------- |

1 change: 0 additions & 1 deletion src/tools/stmo.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,6 @@ You should see a `clients_last_seen` entry (appearing as `telemetry.clients_last
- Introspect the available columns

Click `telemetry.clients_last_seen` in the schema browser to display the columns that are available in the table. The following columns are of interest for this query:

- `country`
- `days_since_seen`
- `submission_date`.
Expand Down