Skip to content

feat: add tfds dataset format and bridge v2 dataset#400

Open
kwangneuraco wants to merge 1 commit intodevelopfrom
feat/add_tfds_bridge_dataset
Open

feat: add tfds dataset format and bridge v2 dataset#400
kwangneuraco wants to merge 1 commit intodevelopfrom
feat/add_tfds_bridge_dataset

Conversation

@kwangneuraco
Copy link
Contributor

  • Add tfds support, which is a alive dataformat and slightly better than rlds (stopped maintaining)
  • add bridge v2 dataset import config
  • add tests relevant and lerobot test

@github-actions
Copy link
Contributor

github-actions bot commented Feb 13, 2026

PR source branch is valid

  • Source: feat/add_tfds_bridge_dataset
  • Target: develop

@kwangneuraco kwangneuraco added the version:minor non-breaking feature updates, new functions or endpoints label Feb 13, 2026
@github-actions
Copy link
Contributor

Consider updating changelogs/pending-changelog.md with a summary of this change for the release notes. This is optional and non-blocking.

@kwangneuraco kwangneuraco force-pushed the feat/add_tfds_bridge_dataset branch from 90100d4 to 9270d2a Compare February 13, 2026 12:15
@sdas-neuraco
Copy link
Contributor

Split into two PRs with Lerobot tests in a separate one. We should back test the existing Datasets to check nothing is broken with RLDS. comments included from @ypang-neuraco

@kwangneuraco kwangneuraco force-pushed the feat/add_tfds_bridge_dataset branch from 9270d2a to 22bbf62 Compare February 13, 2026 13:37
Copy link
Contributor

@ypang-neuraco ypang-neuraco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check if you can merge the _record_step of TFDS and RLDS. It seems to be the same to me.

output_dataset:
name: bridge_v2
tags: [widowx, bridge_v2, manipulation]
description: "This dataset contains images and sensor data collected from the WidowX 250 robot in an indoor environment. Dataset source: https://rail-berkeley.github.io/bridgedata/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please give more details about the type of environments and tasks performed

Comment on lines +422 to +427
elif (
self.allow_mapping_name_fallback
and isinstance(source, dict)
and item.name in source
):
source_data = source[item.name]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the existing behaviour of the importer. Currently, only the source name is used to access the data and users are expected to correct the source name if it is incorrect. What is the reason for this change?

Comment on lines 429 to 451
elif item.index is not None:
if isinstance(source, dict):
self.logger.error(
"Cannot index dict with integer index for %s.%s. "
"Source path '%s' resolved to a dict, not a tensor. "
"Check your dataset config.",
data_type.value,
item.name if hasattr(item, "name") else "unknown",
import_source_path,
)
return _SKIP_ITEM
source_data = source[item.index]
elif item.index_range is not None:
if isinstance(source, dict):
self.logger.error(
"Cannot slice dict with index_range for %s.%s. "
"Source path '%s' resolved to a dict, not a tensor.",
data_type.value,
item.name if hasattr(item, "name") else "unknown",
import_source_path,
)
return _SKIP_ITEM
source_data = source[item.index_range.start : item.index_range.end]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the RLDS example. The source name is applied first, then the index or index range is applied on top.

Comment on lines 455 to 470
if (
allow_dict_search
and self.allow_nested_tensor_search
and isinstance(source_data, dict)
):
tensor = self._find_tensor_in_nested(source_data, tf_module)
if tensor is None:
self.logger.warning(
"Could not find tensor in nested dict for %s.%s. "
"Dict keys: %s. Skipping.",
data_type.value,
item.name if hasattr(item, "name") else "unknown",
list(source_data.keys()),
)
return _SKIP_ITEM
source_data = tensor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be removed. If the path is incorrect the user should update the path instead of relying on hidden logic to find the data. This can lead to incorrect data being imported

Comment on lines 430 to 439
if isinstance(source, dict):
self.logger.error(
"Cannot index dict with integer index for %s.%s. "
"Source path '%s' resolved to a dict, not a tensor. "
"Check your dataset config.",
data_type.value,
item.name if hasattr(item, "name") else "unknown",
import_source_path,
)
return _SKIP_ITEM
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check needed? If we try to index a dict an error will be raised anyway

@kwangneuraco kwangneuraco force-pushed the feat/add_tfds_bridge_dataset branch from 22bbf62 to 83868eb Compare February 13, 2026 17:29
@github-actions
Copy link
Contributor

PR title and commit messages must follow conventional commit format: : . Valid prefixes: feat, fix, chore, docs, ci, test, refactor, style, perf.

@kwangneuraco kwangneuraco force-pushed the feat/add_tfds_bridge_dataset branch from dc92cbf to 41b0031 Compare February 13, 2026 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

version:minor non-breaking feature updates, new functions or endpoints

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants