As part of our ongoing focus on analytics (and other things), we should look into leveraging our CLI and code-gen pipelines to setup custom (and proprietary) analytics events from our Runtime SDKs and user code. The goal is to:
- Allow people to define schemas for custom analytics events.
- Define schemas for our existing analytics events (we own and emit).
- Define some way of providing AIs with the semantics of each field in these schemas. User-defined ones too.
- Define some data-driven schema validation system; along with configurable validation handling behaviors (crash, log + emit error, replace with fallback values, etc...)
In short, the goal here is to design a pipeline that:
- Enables teams to reduce the cost of QA'ing analytics events (in order to ensure data going in isn't garbage data).
- Enables teams to leverage AI tools better (since AI can be provided with context about what each schema field semantically means within the customer project).
- Virtually removes the "cleanup" step of data analytics pipelines --- as in: bad data can be guaranteed to NOT fall into the pipeline AND to warn you such that you can fix any problems ASAP (bad data is still collected in some sort of "errors bin").
As part of our ongoing focus on analytics (and other things), we should look into leveraging our CLI and code-gen pipelines to setup custom (and proprietary) analytics events from our Runtime SDKs and user code. The goal is to:
In short, the goal here is to design a pipeline that: