feat: add tfds dataset format and bridge v2 dataset#400
feat: add tfds dataset format and bridge v2 dataset#400kwangneuraco wants to merge 1 commit intodevelopfrom
Conversation
kwangneuraco
commented
Feb 13, 2026
- Add tfds support, which is a alive dataformat and slightly better than rlds (stopped maintaining)
- add bridge v2 dataset import config
- add tests relevant and lerobot test
|
✅ PR source branch is valid
|
|
Consider updating changelogs/pending-changelog.md with a summary of this change for the release notes. This is optional and non-blocking. |
90100d4 to
9270d2a
Compare
|
Split into two PRs with Lerobot tests in a separate one. We should back test the existing Datasets to check nothing is broken with RLDS. comments included from @ypang-neuraco |
9270d2a to
22bbf62
Compare
ypang-neuraco
left a comment
There was a problem hiding this comment.
Please check if you can merge the _record_step of TFDS and RLDS. It seems to be the same to me.
| output_dataset: | ||
| name: bridge_v2 | ||
| tags: [widowx, bridge_v2, manipulation] | ||
| description: "This dataset contains images and sensor data collected from the WidowX 250 robot in an indoor environment. Dataset source: https://rail-berkeley.github.io/bridgedata/" |
There was a problem hiding this comment.
Please give more details about the type of environments and tasks performed
| elif ( | ||
| self.allow_mapping_name_fallback | ||
| and isinstance(source, dict) | ||
| and item.name in source | ||
| ): | ||
| source_data = source[item.name] |
There was a problem hiding this comment.
This changes the existing behaviour of the importer. Currently, only the source name is used to access the data and users are expected to correct the source name if it is incorrect. What is the reason for this change?
| elif item.index is not None: | ||
| if isinstance(source, dict): | ||
| self.logger.error( | ||
| "Cannot index dict with integer index for %s.%s. " | ||
| "Source path '%s' resolved to a dict, not a tensor. " | ||
| "Check your dataset config.", | ||
| data_type.value, | ||
| item.name if hasattr(item, "name") else "unknown", | ||
| import_source_path, | ||
| ) | ||
| return _SKIP_ITEM | ||
| source_data = source[item.index] | ||
| elif item.index_range is not None: | ||
| if isinstance(source, dict): | ||
| self.logger.error( | ||
| "Cannot slice dict with index_range for %s.%s. " | ||
| "Source path '%s' resolved to a dict, not a tensor.", | ||
| data_type.value, | ||
| item.name if hasattr(item, "name") else "unknown", | ||
| import_source_path, | ||
| ) | ||
| return _SKIP_ITEM | ||
| source_data = source[item.index_range.start : item.index_range.end] |
There was a problem hiding this comment.
Please check the RLDS example. The source name is applied first, then the index or index range is applied on top.
| if ( | ||
| allow_dict_search | ||
| and self.allow_nested_tensor_search | ||
| and isinstance(source_data, dict) | ||
| ): | ||
| tensor = self._find_tensor_in_nested(source_data, tf_module) | ||
| if tensor is None: | ||
| self.logger.warning( | ||
| "Could not find tensor in nested dict for %s.%s. " | ||
| "Dict keys: %s. Skipping.", | ||
| data_type.value, | ||
| item.name if hasattr(item, "name") else "unknown", | ||
| list(source_data.keys()), | ||
| ) | ||
| return _SKIP_ITEM | ||
| source_data = tensor |
There was a problem hiding this comment.
I think this should be removed. If the path is incorrect the user should update the path instead of relying on hidden logic to find the data. This can lead to incorrect data being imported
| if isinstance(source, dict): | ||
| self.logger.error( | ||
| "Cannot index dict with integer index for %s.%s. " | ||
| "Source path '%s' resolved to a dict, not a tensor. " | ||
| "Check your dataset config.", | ||
| data_type.value, | ||
| item.name if hasattr(item, "name") else "unknown", | ||
| import_source_path, | ||
| ) | ||
| return _SKIP_ITEM |
There was a problem hiding this comment.
Is this check needed? If we try to index a dict an error will be raised anyway
22bbf62 to
83868eb
Compare
|
PR title and commit messages must follow conventional commit format: : . Valid prefixes: feat, fix, chore, docs, ci, test, refactor, style, perf. |
… dataset upload and all relevant tests
dc92cbf to
41b0031
Compare