feat: add tfds dataset format and bridge v2 dataset#400

Open

kwangneuraco wants to merge 1 commit intodevelopfrom

feat/add_tfds_bridge_dataset

Contributor

kwangneuraco commented Feb 13, 2026

Add tfds support, which is a alive dataformat and slightly better than rlds (stopped maintaining)
add bridge v2 dataset import config
add tests relevant and lerobot test

kwangneuraco requested review from sdas-neuraco and ypang-neuraco

February 13, 2026 12:08

Contributor

github-actions bot commented Feb 13, 2026 •

edited

Loading

✅ PR source branch is valid

Source: feat/add_tfds_bridge_dataset
Target: develop

kwangneuraco added the version:minor label

Contributor

github-actions bot commented Feb 13, 2026

Consider updating changelogs/pending-changelog.md with a summary of this change for the release notes. This is optional and non-blocking.

kwangneuraco force-pushed the feat/add_tfds_bridge_dataset branch from 90100d4 to 9270d2a Compare

February 13, 2026 12:15

Contributor

sdas-neuraco commented Feb 13, 2026

Split into two PRs with Lerobot tests in a separate one. We should back test the existing Datasets to check nothing is broken with RLDS. comments included from @ypang-neuraco

kwangneuraco force-pushed the feat/add_tfds_bridge_dataset branch from 9270d2a to 22bbf62 Compare

February 13, 2026 13:37

ypang-neuraco requested changes

View reviewed changes

Contributor

ypang-neuraco left a comment

Please check if you can merge the _record_step of TFDS and RLDS. It seems to be the same to me.

neuracore/importer/config/bridge_v2.yaml Outdated

+              output_dataset:
+                name: bridge_v2
+                tags: [widowx, bridge_v2, manipulation]
+                description: "This dataset contains images and sensor data collected from the WidowX 250 robot in an indoor environment. Dataset source: https://rail-berkeley.github.io/bridgedata/"

Contributor

ypang-neuraco Feb 13, 2026

Please give more details about the type of environments and tasks performed

neuracore/importer/rlds_tfds_importer.py Outdated Show resolved Hide resolved

neuracore/importer/rlds_tfds_importer.py

Comment on lines +422 to +427

+                      elif (
+                          self.allow_mapping_name_fallback
+                          and isinstance(source, dict)
+                          and item.name in source
+                      ):
+                          source_data = source[item.name]

Contributor

ypang-neuraco Feb 13, 2026

This changes the existing behaviour of the importer. Currently, only the source name is used to access the data and users are expected to correct the source name if it is incorrect. What is the reason for this change?

neuracore/importer/rlds_tfds_importer.py Outdated

Comment on lines 429 to 451

+                      elif item.index is not None:
+                          if isinstance(source, dict):
+                              self.logger.error(
+                                  "Cannot index dict with integer index for %s.%s. "
+                                  "Source path '%s' resolved to a dict, not a tensor. "
+                                  "Check your dataset config.",
+                                  data_type.value,
+                                  item.name if hasattr(item, "name") else "unknown",
+                                  import_source_path,
+                              )
+                              return _SKIP_ITEM
+                          source_data = source[item.index]
+                      elif item.index_range is not None:
+                          if isinstance(source, dict):
+                              self.logger.error(
+                                  "Cannot slice dict with index_range for %s.%s. "
+                                  "Source path '%s' resolved to a dict, not a tensor.",
+                                  data_type.value,
+                                  item.name if hasattr(item, "name") else "unknown",
+                                  import_source_path,
+                              )
+                              return _SKIP_ITEM
+                          source_data = source[item.index_range.start : item.index_range.end]

Contributor

ypang-neuraco Feb 13, 2026

Please check the RLDS example. The source name is applied first, then the index or index range is applied on top.

neuracore/importer/rlds_tfds_importer.py

Comment on lines 455 to 470

+                      if (
+                          allow_dict_search
+                          and self.allow_nested_tensor_search
+                          and isinstance(source_data, dict)
+                      ):
+                          tensor = self._find_tensor_in_nested(source_data, tf_module)
+                          if tensor is None:
+                              self.logger.warning(
+                                  "Could not find tensor in nested dict for %s.%s. "
+                                  "Dict keys: %s. Skipping.",
+                                  data_type.value,
+                                  item.name if hasattr(item, "name") else "unknown",
+                                  list(source_data.keys()),
+                              )
+                              return _SKIP_ITEM
+                          source_data = tensor

Contributor

ypang-neuraco Feb 13, 2026

I think this should be removed. If the path is incorrect the user should update the path instead of relying on hidden logic to find the data. This can lead to incorrect data being imported

neuracore/importer/rlds_tfds_importer.py Outdated

Comment on lines 430 to 439

+                          if isinstance(source, dict):
+                              self.logger.error(
+                                  "Cannot index dict with integer index for %s.%s. "
+                                  "Source path '%s' resolved to a dict, not a tensor. "
+                                  "Check your dataset config.",
+                                  data_type.value,
+                                  item.name if hasattr(item, "name") else "unknown",
+                                  import_source_path,
+                              )
+                              return _SKIP_ITEM

Contributor

ypang-neuraco Feb 13, 2026

Is this check needed? If we try to index a dict an error will be raised anyway

kwangneuraco force-pushed the feat/add_tfds_bridge_dataset branch from 22bbf62 to 83868eb Compare

February 13, 2026 17:29

Contributor

github-actions bot commented Feb 13, 2026

PR title and commit messages must follow conventional commit format: : . Valid prefixes: feat, fix, chore, docs, ci, test, refactor, style, perf.


          feat: add tfds dataformat import and combine with rlds, add bridge v2…

41b0031

… dataset upload and all relevant tests

kwangneuraco force-pushed the feat/add_tfds_bridge_dataset branch from dc92cbf to 41b0031 Compare

February 13, 2026 17:53

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels