Skip to content

Commit 8560cd8

Browse files
nathan-continojdamon96npentrel
authored
DOCS-3198: Add offline data pipelines, SDK docs, hot data store fixes (#4440)
Co-authored-by: Jack Damon <[email protected]> Co-authored-by: Naomi Pentrel <[email protected]>
1 parent ed4efc9 commit 8560cd8

37 files changed

+1860
-624
lines changed

.github/workflows/sdk_protos_map.csv

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -425,6 +425,11 @@ data,GetDatabaseConnection,,get_database_connection,,getDatabaseConnection,getDa
425425
data,ConfigureDatabaseUser,,configure_database_user,,configureDatabaseUser,configureDatabaseUser
426426
data,AddBinaryDataToDatasetByIDs,,add_binary_data_to_dataset_by_ids,,addBinaryDataToDatasetByIds,addBinaryDataToDatasetByIds
427427
data,RemoveBinaryDataFromDatasetByIDs,,remove_binary_data_from_dataset_by_ids,,removeBinaryDataFromDatasetByIds,removeBinaryDataFromDatasetByIds
428+
data,GetDataPipeline,,get_data_pipeline,,,getDataPipeline
429+
data,ListDataPipelines,,list_data_pipelines,,,listDataPipelines
430+
data,CreateDataPipeline,,create_data_pipeline,,,createDataPipeline
431+
data,DeleteDataPipeline,,delete_data_pipeline,,,deleteDataPipeline
432+
data,ListDataPipelineRuns,,list_data_pipeline_runs,,,listDataPipelineRuns
428433

429434
## Dataset
430435
dataset,CreateDataset,,create_dataset,,createDataset,createDataset

docs/data-ai/capture-data/advanced/advanced-data-capture-sync.md

Lines changed: 21 additions & 69 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,24 @@ Some data use cases require advanced configuration beyond the attributes accessi
1515
You can use raw JSON to configure additional attributes for both data management and data capture.
1616
You can also configure data capture for remote parts.
1717

18-
### Advanced data management service configurations
18+
## Cloud data retention
19+
20+
Configure how long your synced data remains stored in the cloud:
21+
22+
- **Retain data up to a certain size (for example, 100GB) or for a specific length of time (for example, 14 days):** Set `retention_policies` at the resource level.
23+
See the `retention_policy` field in [data capture configuration attributes](/data-ai/capture-data/advanced/advanced-data-capture-sync/#click-to-view-data-capture-attributes).
24+
- **Delete data captured by a machine when you delete the machine:** Control whether your cloud data is deleted when a machine or machine part is removed.
25+
See the `delete_data_on_part_deletion` field in the [data management service configuration attributes](/data-ai/capture-data/advanced/advanced-data-capture-sync/#click-to-view-data-management-attributes).
26+
27+
## Sync optimization
28+
29+
**Configurable sync threads:** You can control how many concurrent sync operations occur by adjusting the `maximum_num_sync_threads` setting.
30+
Higher values may improve throughput on more powerful hardware, but raising it too high may introduce instability on resource-constrained devices.
31+
32+
**Wait time before syncing arbitrary files:** If you choose to sync arbitrary files (beyond those captured by the data management service), the `file_last_modified_millis` configuration attribute specifies how long a file must remain unmodified before the data manager considers it for syncing.
33+
The default is 10 seconds.
34+
35+
## Advanced data management service configuration
1936

2037
To configure the data manager in JSON, see the following example configurations:
2138

@@ -96,7 +113,7 @@ The following attributes are available for the data management service:
96113

97114
You can edit the JSON directly by switching to **JSON** mode in the UI.
98115

99-
### Advanced data capture configurations
116+
## Advanced data capture configuration
100117

101118
{{< alert title="Caution" color="caution" >}}
102119

@@ -524,62 +541,14 @@ The following attributes are available for data capture configuration:
524541
| `capture_frequency_hz` | float | **Required** | Frequency in hertz at which to capture data. For example, to capture a reading every 2 seconds, enter `0.5`. |
525542
| `method` | string | **Required** | Depends on the type of component or service. See [Supported components and services](/data-ai/capture-data/capture-sync/#click-to-see-resources-that-support-data-capture-and-cloud-sync). **Note:** For tabular data, Viam enforces a maximum size of 4MB for any single reading. |
526543
| `retention_policy` | object | Optional | Option to configure how long data collected by this component or service should remain stored in the Viam Cloud. You must set this in JSON mode. See the JSON example for a camera component. <br> **Options:** `"days": <int>`, `"binary_limit_gb": <int>`, `"tabular_limit_gb": <int>`. <br> Days are in UTC time. Setting a retention policy of 1 day means that data stored now will be deleted the following day **in UTC time**. You can set either or both of the size limit options and size is in gigabytes. The `retention_policy` does not affect logs. For information about logs, see [Logging](/operate/reference/viam-server/#logging). |
527-
| `recent_data_store` | object | Optional | Configure a rolling time frame of recent data to store in a [hot data store](#capture-to-the-hot-data-store) for faster access. Example: `{ "stored_hours": 24 }` |
544+
| `recent_data_store` | object | Optional | Configure a rolling time frame of recent data to store in a [hot data store](/data-ai/data/hot-data-store/) for faster access. Example: `{ "stored_hours": 24 }` |
528545
| `additional_params` | depends | depends | Varies based on the method. For example, `ReadImage` requires a MIME type. |
529546

530547
{{< /expand >}}
531548

532549
You can edit the JSON directly by switching to **JSON** mode in the UI.
533550

534-
### Capture to the hot data store
535-
536-
If you want faster access to your most recent sensor readings, you can configure hot data storage.
537-
The hot data store keeps a rolling window of hot data for faster queries.
538-
All historical data remains in your default storage.
539-
540-
To configure the hot data store:
541-
542-
1. Use the `recent_data_store` attribute on each capture method in your data manager service.
543-
2. Configure your queries' data source to the hot data store by passing the `use_recent_data` boolean argument to [tabularDataByMQL](/dev/reference/apis/data-client/#tabulardatabymql).
544-
545-
{{% expand "Click to view a sample configuration" %}}
546-
547-
The following sample configuration captures data from a sensor at 0.5 Hz.
548-
`viam-server` stores the last 24 hours of data in a shared recent-data database, while continuing to write all data to blob storage:
549-
550-
```json {class="line-numbers linkable-line-numbers" data-line="17-19"}
551-
{
552-
"components": [
553-
{
554-
"name": "sensor-1",
555-
"api": "rdk:component:sensor",
556-
"model": "rdk:builtin:fake",
557-
"attributes": {},
558-
"service_configs": [
559-
{
560-
"type": "data_manager",
561-
"attributes": {
562-
"capture_methods": [
563-
{
564-
"method": "Readings",
565-
"capture_frequency_hz": 0.5,
566-
"additional_params": {},
567-
"recent_data_store": {
568-
"stored_hours": 24
569-
}
570-
}
571-
]
572-
}
573-
}
574-
]
575-
}
576-
]
577-
}
578-
```
579-
580-
{{% /expand%}}
581-
582-
### Capture directly to your own MongoDB cluster
551+
## Capture directly to your own MongoDB cluster
583552

584553
You can configure direct capture of tabular data to a MongoDB instance alongside disk storage on your edge device.
585554
This can be useful for powering real-time dashboards before data is synced from the edge to the cloud.
@@ -699,20 +668,3 @@ Failing to write to MongoDB doesn't affect capturing and syncing data to cloud s
699668
If your use case needs to support very high capture rates, this feature may not be appropriate.
700669

701670
{{< /alert >}}
702-
703-
### Cloud data retention
704-
705-
Configure how long your synced data remains stored in the cloud:
706-
707-
- **Retain data up to a certain size (for example, 100GB) or for a specific length of time (for example, 14 days):** Set `retention_policies` at the resource level.
708-
See the `retention_policy` field in [data capture configuration attributes](/data-ai/capture-data/advanced/advanced-data-capture-sync/#click-to-view-data-capture-attributes).
709-
- **Delete data captured by a machine when you delete the machine:** Control whether your cloud data is deleted when a machine or machine part is removed.
710-
See the `delete_data_on_part_deletion` field in the [data management service configuration attributes](/data-ai/capture-data/advanced/advanced-data-capture-sync/#click-to-view-data-management-attributes).
711-
712-
### Sync optimization
713-
714-
**Configurable sync threads:** You can control how many concurrent sync operations occur by adjusting the `maximum_num_sync_threads` setting.
715-
Higher values may improve throughput on more powerful hardware, but raising it too high may introduce instability on resource-constrained devices.
716-
717-
**Wait time before syncing arbitrary files:** If you choose to sync arbitrary files (beyond those captured by the data management service), the `file_last_modified_millis` configuration attribute specifies how long a file must remain unmodified before the data manager considers it for syncing.
718-
The default is 10 seconds.

0 commit comments

Comments
 (0)