DOCS-3198: Add offline data pipelines, SDK docs, hot data store fixes #4440

nathan-contino · 2025-07-02T20:53:33Z

adds information about offline data pipelines (docs/data-ai/data/data-pipelines.md)
- previously had no page in the docs besides some minimal CLI doc discussing pipelines; this introduces that missing page
- includes examples in all supported languages (Python, Go, TypeScript) for basic data pipeline tasks
  - note that Go snippets link directly to Go API reference -- see SDK docs notes for more info
updates generated SDK documentation to include Python and Typescript data pipelines APIs (no Flutter yet)
- no Go because our data page doesn't seem to have or support any Go snippets (i'm guessing there's an out-of-scope story here)
yanked hot data store out into its own page ((docs/data-ai/data/hot-data-store.md), since:
- it has a lot in common with data pipelines, which could easily lead to user confusion
- was very buried (increasing the likelihood of user confusion)
- the recent API improvements for data pipelines broke our existing hot data store examples
slight reorder of 'Advanced data capture and sync configurations' since some short-but-useful sections were buried all the way at the end of a very long page of complex, niche examples
note that the alias for hot data store doesn't work -- leaving it for now in the hopes that someone can suggest a better alternative for relocating a single section of a still-existing page to another page

netlify · 2025-07-02T20:53:37Z

✅ Deploy Preview for viam-docs ready!

Name	Link
🔨 Latest commit	`599fe80`
🔍 Latest deploy log	https://app.netlify.com/projects/viam-docs/deploys/6874fc66ba7dd400087ba20e
😎 Deploy Preview	https://deploy-preview-4440--viam-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.
Lighthouse	1 paths audited Performance: 54 (no change from production) Accessibility: 100 (no change from production) Best Practices: 100 (no change from production) SEO: 92 (no change from production) PWA: 70 (no change from production) View the detailed breakdown and full score reports

To edit notification comments on pull requests, go to your Netlify project configuration.

JessamyT

Started reviewing but this is a philosophical one about the extent to which we want to redundantly document the APIs and CLI commands. Feels like a new precedent, since for each item for example "Delete a pipeline," there's a 1:1 matching section in the API docs as well as an example in the CLI page, so will leave to Naomi to review.

static/include/app/apis/generated/data-table.md

docs/data-ai/data/data-pipelines.md

JessamyT · 2025-07-02T22:10:55Z

static/include/app/apis/generated/app.md

@@ -2587,7 +2622,7 @@ User-defined metadata is billed as data.

 **Parameters:**

- `robot_id` ([str](https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str)) (required): The ID of the robot with which to associate the user-defined metadata. You can obtain your robot ID from your machine's page.
+- `robot_id` ([str](https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str)) (required): The ID of the robot with which to associate the user-defined metadata. You can obtain your robot ID from the machine page.


This seems less helpful

This is generated documentation from the SDK doc. I'll file a PR to improve this wording but I think it's out of scope and would hold this PR up for a while while we fix it upstream.

static/include/app/apis/overrides/protos/data.ListDataPipelines.md

static/include/app/apis/overrides/protos/data.GetDataPipeline.md

JessamyT · 2025-07-02T22:37:22Z

When you address the merge conflicts with /generated/app.md, note that we changed a couple things manually in #4431 but I created this upstream PR to get them to stick.

vijayvuyyuru

Thanks Nathan!

docs/data-ai/data/hot-data-store.md

static/include/app/apis/generated/data-table.md

static/include/app/apis/overrides/protos/data.ListDataPipelineRuns.md

static/include/app/apis/overrides/protos/data.ListDataPipelines.md

katiepeters · 2025-07-03T19:08:42Z

docs/data-ai/data/data-pipelines.md

+{{% /tab %}}
+{{< /tabs >}}
+
+### Update a pipeline


I'm tempted to not include this in the documentation. We don't want people changing pipeline schedules or queries after a pipeline has started inserting query results. Might end up just being the name that we allow them to update. Is it ok to leave this out?

FYI, it'll still be in the SDK docs unless we manually hide it from each SDK. If we think it's dangerous for users to edit existing pipeline queries or schedules, we should either:

not provide that functionality

warn users about the dangers

So I would strongly prefer to keep documentation in place, with a warning explaining the danger, unless we decide to remove that functionality. Otherwise users will still wind up using the API and Ask AI will scrape the SDK doc and recommend using the method anyway, just without any warnings.

I'm adding a warning for now; let me know how you feel about the warning once I've pushed it.

Maybe a question for @jdamon96

if we don’t “want” the users to update schedule/pipeline once documents have been written, i think we should just disallow that to be done at all. that said, don't want to block this docs update for now, so we can revisit and edit the docs when/if we actually make that change

docs/data-ai/data/data-pipelines.md

npentrel · 2025-07-04T14:42:28Z

(I'll hold off on review until commetns are resolved)

vijayvuyyuru

Hot data page looks good to me!

added link to hot data store mention

CLAassistant · 2025-07-07T17:47:11Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

docs/data-ai/data/data-pipelines.md

docs/data-ai/data/hot-data-store.md

katiepeters · 2025-07-07T18:17:33Z

docs/data-ai/data/data-pipelines.md

 {{% /tab %}}
 {{< /tabs >}}

 To create a pipeline that reads data from the [hot data store](/data-ai/data/hot-data-store/), specify a `dataSourceType` in your pipeline configuration.

+{{< alert title="Caution" color="caution" >}}
+
+Avoid specifying an `_id` value in your pipeline's final group stage unless you can guarantee its uniqueness across all pipeline runs.


I probably commented incorrectly in my last review. I believe $group needs an _id. So if users want to include a $group stage, there should be a stage after (such as $project) to remove the _id field.

$group: { _id: "$location_id", count: { $sum: 1 } }, $project: { location: "$_id", count: 1, _id: 0, }

I see. So the bigger takeaway is that data pipelines should never end with the $group stage.

I'll update the docs accordingly, but I wonder if this issue is dangerous enough that the UI (and maybe the backend?) should either warn users, log a message, or even add a sneaky silent $project stage to rename _id fields to something else. I imagine a lot of data pipelines will wind up using $group, so users are bound to run into this eventually!

@jdamon96 thoughts? If you like one of these product solutions, happy to file a ticket for it (and update the docs once it lands).

agree it could make sense to address this on a product level; for now i'd prefer to release as-is w/ this docs level warning and wait a bit to see how users interact with the product and use that to inform our solution approach

Sounds good. I updated the create example with the $project stage and added more information about this in the subsequent admonition. I'll keep an eye on Ask AI to see if this comes up at all.

Great, thank you

docs/data-ai/data/hot-data-store.md

Co-authored-by: Naomi Pentrel <[email protected]>

docs/data-ai/data/hot-data-store.md

… convo with Jack

npentrel · 2025-07-09T14:02:18Z

docs/data-ai/data/hot-data-store.md

@@ -2,7 +2,7 @@
 linkTitle: "Cache recent data"
 title: "Cache recent data"
 weight: 25
-description: "Make processed automatically available for faster, simpler queries."
+description: "Store the last 24 hours of data in a shared recent-data database, while continuing to write all data to blob storage."


Can you explain please why this says 24h?

Sorry, I tried using the line you liked from the old hot data store section (https://github.com/viamrobotics/docs/pull/4440/files#r2192540953) but I should have removed the explicit 24h condition.

Fixing.

what does shared mean in this context?

npentrel · 2025-07-09T14:02:36Z

docs/data-ai/data/hot-data-store.md

@@ -12,11 +12,11 @@ platformarea: ["data", "cli"]
 date: "2024-12-03"
 ---

-The hot data store enables faster queries on recent sensor data.
+The hot data store caches the last 24 hours of data in a shared recent-data database, while continuing to write all data to blob storage.


npentrel · 2025-07-09T14:54:55Z

docs/data-ai/data/hot-data-store.md

-Queries typically execute on blob storage.
-To query data from hot data store instead of blob storage, specify hot storage as your data source in your query.
+Queries execute on blob storage by default which is slower than queries to a hot data store.
+If you have configured a hot data store, you must specify it in any queries as the data source to be used for the query.


Again: Can you explain what happens if some of the data is no longer in the hot storage and I specify that as the data_source? Is there fallback? Either way we should explain that.

Reaching out to @dmhilly for a concrete answer.

Hi! There is no fallback; if you specify hot_data_store and there is no data in the store, you get an empty result set, even if data that matches your query exists in blob storage.

to chime in here: i'm not sure there is anything to explain; it's expected behavior for a query to return exactly what is being asked for (i.e. query = specific question, result = answer), and in this case if the user is querying for data that isn't there, the empty set is that answer

I think the word cache being in the title is what made me question this and is throwing me off. Maybe that needs to change? When I think cache I think if there's a cache miss I'll still get the data, it'll just take longer. Absolutely tell me I'm wrong but if I am stumbling on this, others likely will too. I'd prefer we avoid any doubt for users than leave a possible stumbling block.

@npentrel I just removed all usage of the word cache from this page, including the title. What do you think?

I personally am a fan of not using cache to describe the hot data store.

npentrel · 2025-07-09T14:57:01Z

docs/data-ai/data/data-pipelines.md

+
+## Prerequisites
+
+Before creating a data pipeline, you must enable data capture from at least one component and begin syncing data with Viam.


@nathan-contino why did you resolve this? You didn't change anything. Generally unless it's optional feedback, I'd expect a comment or a change. Resolving comments should in most cases be left to the commenter

npentrel · 2025-07-10T09:05:15Z

docs/data-ai/data/hot-data-store.md

+linkTitle: "Cache recent data"
+title: "Cache recent data"
+weight: 25
+description: "Cache recent data while continuing to write all data to blob storage."


might provide more context

Suggested change

description: "Cache recent data while continuing to write all data to blob storage."

description: "Cache recent data to allow faster access, while continuing to write all data to blob storage."

Reworded a bit differently based on your statements elsewhere regarding the term 'cache'. Hopefully my changes are mostly in the same spirit.

npentrel · 2025-07-10T09:11:42Z

docs/data-ai/data/hot-data-store.md

+                "capture_frequency_hz": 0.5,
+                "additional_params": {},
+                "recent_data_store": {
+                  "stored_hours": 24


I asked for the example to stay in the docs in some format because the description added a nice explanation of the feature. That really didn't mean please use this as THE thing to tell people to add - because that doesn't help with explanation. A tab with an example configuration or an example configuration just below the configure section might work.

I really don't think you should tell people to add this, then show an example, without explaning it's an example and that the stored_hours are configurable.

Thanks for the explanation. I didn't understand that you were referring to the sentence and the example snippet before. Sorry for misunderstanding.

Reworded this to explicitly state that the configuration is an example, lifted the sentence about stored_hours configuration above the snippet, and added an introductory sentence to help users connect the 24 in the example to 24 stored hours of data.

nathan-contino · 2025-07-10T14:08:57Z

@npentrel updated based on your feedback, let me know what other suggestions you have so i can keep this moving

npentrel · 2025-07-11T14:57:04Z

docs/data-ai/data/hot-data-store.md

@@ -0,0 +1,164 @@
+---
+linkTitle: "Speed up queries to recent data"


this is a bit long for the side bar.

Maybe?

Suggested change

linkTitle: "Speed up queries to recent data"

linkTitle: "Optimize recend data queries"

Anything that stays in imperative mood but doesn't spread across two lines

npentrel

LGTM % changing the hot data store link title

npentrel · 2025-07-14T10:00:30Z

Is this ready to be merged?

…data store title

github-actions · 2025-07-14T12:51:21Z

🔎💬 Inkeep AI search and chat service is syncing content for source 'Viam Docs'

nathan-contino added 3 commits July 2, 2025 16:36

DOCS-3198: data pipelines

620b3f6

Minor fix

7477955

Prettier fixes

d245dd3

viambot added the safe to build This pull request is marked safe to build from a trusted zone label Jul 2, 2025

Remove alias for hot data store, which breaks netlify

b475d83

JessamyT reviewed Jul 2, 2025

View reviewed changes

vijayvuyyuru reviewed Jul 3, 2025

View reviewed changes

docs/data-ai/data/hot-data-store.md Outdated Show resolved Hide resolved

docs/data-ai/data/hot-data-store.md Outdated Show resolved Hide resolved

katiepeters reviewed Jul 3, 2025

View reviewed changes

npentrel self-requested a review July 4, 2025 14:42

nathan-contino added 2 commits July 7, 2025 10:19

Implement feedback from vijay and katie

7631334

Update SDK doc

15d7199

nathan-contino requested review from katiepeters and vijayvuyyuru July 7, 2025 14:28

nathan-contino added 5 commits July 7, 2025 10:29

Merge branch 'main' into DOCS-3198-data-pipelines-prose

c51d6ff

Fix prettier check

9f340fc

Add absolute link to hot data store

cbec756

Fix links

ba3a78d

Fix missing links

14fdc8d

vijayvuyyuru approved these changes Jul 7, 2025

View reviewed changes

dmhilly requested a review from jdamon96 July 7, 2025 16:36

Update data-pipelines.md

eb01100

added link to hot data store mention

jdamon96 reviewed Jul 7, 2025

View reviewed changes

docs/data-ai/data/data-pipelines.md Outdated Show resolved Hide resolved

Remove duplicate line about pipeline data sources

0da2742

katiepeters reviewed Jul 7, 2025

View reviewed changes

nathan-contino added 2 commits July 7, 2025 14:24

Update docs/data-ai/data/data-pipelines.md

4176b6d

Update group stage warning to match reality

41d53d2

nathan-contino commented Jul 8, 2025

View reviewed changes

docs/data-ai/data/hot-data-store.md Outdated Show resolved Hide resolved

nathan-contino and others added 3 commits July 8, 2025 09:09

Apply suggestions from code review

757a927

Co-authored-by: Naomi Pentrel <[email protected]>

Update static/include/services/apis/generated/motion.md

d1770ce

Co-authored-by: Naomi Pentrel <[email protected]>

Merge branch 'main' into DOCS-3198-data-pipelines-prose

da2b2c5

nathan-contino commented Jul 8, 2025

View reviewed changes

docs/data-ai/data/hot-data-store.md Outdated Show resolved Hide resolved

nathan-contino commented Jul 8, 2025

View reviewed changes

docs/data-ai/data/hot-data-store.md Outdated Show resolved Hide resolved

nathan-contino added 5 commits July 8, 2025 11:12

Apply suggestions from code review

3436a38

Fix prettier

29aee1a

Move list of supported aggregation operators into query page based on…

8de6dff

… convo with Jack

Improve update snippets

379df39

Small wording tweaks to improve pages based on product conversations

8f83505

npentrel requested changes Jul 9, 2025

View reviewed changes

nathan-contino added 2 commits July 9, 2025 11:12

incorporate feedback

74d3b61

Add admonition about caches

7a1e98b

npentrel reviewed Jul 10, 2025

View reviewed changes

npentrel changed the title ~~DOCS-3198: offline data pipelines, SDK docs, hot data store fixes~~ DOCS-3198: Add offline data pipelines, SDK docs, hot data store fixes Jul 10, 2025

nathan-contino added 2 commits July 10, 2025 10:01

Remove cache term usage from hot data store page

1aab71c

Introduce hot data store configuration as an example

c9f2489

nathan-contino requested a review from npentrel July 11, 2025 12:51

Update TypeScript example for new API

8d61157

npentrel reviewed Jul 11, 2025

View reviewed changes

npentrel approved these changes Jul 11, 2025

View reviewed changes

nathan-contino added 2 commits July 14, 2025 08:45

Add imports to all code snippets as per Jack discussion, improve hot …

b07a945

…data store title

Prettier fix

599fe80

nathan-contino merged commit 8560cd8 into viamrobotics:main Jul 14, 2025
11 of 12 checks passed

nathan-contino deleted the DOCS-3198-data-pipelines-prose branch July 14, 2025 12:50


		## Prerequisites

		Before creating a data pipeline, you must enable data capture from at least one component and begin syncing data with Viam.

	description: "Cache recent data while continuing to write all data to blob storage."
	description: "Cache recent data to allow faster access, while continuing to write all data to blob storage."

		@@ -0,0 +1,164 @@
		---
		linkTitle: "Speed up queries to recent data"

	linkTitle: "Speed up queries to recent data"
	linkTitle: "Optimize recend data queries"

DOCS-3198: Add offline data pipelines, SDK docs, hot data store fixes #4440

DOCS-3198: Add offline data pipelines, SDK docs, hot data store fixes #4440

Uh oh!

Conversation

nathan-contino commented Jul 2, 2025

Uh oh!

netlify bot commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for viam-docs ready!

Uh oh!

JessamyT left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

JessamyT commented Jul 2, 2025

Uh oh!

vijayvuyyuru left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

npentrel commented Jul 4, 2025

Uh oh!

vijayvuyyuru left a comment

Choose a reason for hiding this comment

Uh oh!

CLAassistant commented Jul 7, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jdamon96 Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

netlify bot commented Jul 2, 2025 •

edited

Loading

jdamon96 Jul 7, 2025 •

edited

Loading

dmhilly Jul 9, 2025 •

edited

Loading

npentrel Jul 10, 2025 •

edited

Loading