Skip to content

Commit e17ceb4

Browse files
author
Kye
committed
[CLENAUP]
1 parent 9b8b07f commit e17ceb4

File tree

1 file changed

+0
-8
lines changed

1 file changed

+0
-8
lines changed

README.md

-8
Original file line numberDiff line numberDiff line change
@@ -80,14 +80,6 @@ RT-2 integrates a high-capacity Vision-Language model (VLM), initially pre-train
8080

8181
RT-2 is fine-tuned using both web and robotics data. The resultant model interprets robot camera images and predicts direct actions for the robot to execute. In essence, it converts visual and language patterns into action-oriented instructions, a remarkable feat in the field of robotic control.
8282

83-
# Datasets
84-
| Dataset | Description | Source | Percentage in Training Mixture (RT-2-PaLI-X) | Percentage in Training Mixture (RT-2-PaLM-E) |
85-
|---------|-------------|--------|----------------------------------------------|----------------------------------------------|
86-
| WebLI | Around 10B image-text pairs across 109 languages, filtered to the top 10% scoring cross-modal similarity examples to give 1B training examples. | Chen et al. (2023b), Driess et al. (2023) | N/A | N/A |
87-
| Episodic WebLI | Not used in co-fine-tuning RT-2-PaLI-X. | Chen et al. (2023a) | N/A | N/A |
88-
| Robotics Dataset | Demonstration episodes collected with a mobile manipulation robot. Each demonstration is annotated with a natural language instruction from one of seven skills. | Brohan et al. (2022) | 50% | 66% |
89-
| Language-Table | Used for training on several prediction tasks. | Lynch et al. (2022) | N/A | N/A |
90-
9183

9284
## Datasets
9385
Datasets used in the paper

0 commit comments

Comments
 (0)