[CLENAUP]

Kye · Kye · commit e17ceb4fd9cd · 2023-12-22T20:27:50.000-05:00
diff --git a/README.md b/README.md
@@ -80,14 +80,6 @@ RT-2 integrates a high-capacity Vision-Language model (VLM), initially pre-train
 
 RT-2 is fine-tuned using both web and robotics data. The resultant model interprets robot camera images and predicts direct actions for the robot to execute. In essence, it converts visual and language patterns into action-oriented instructions, a remarkable feat in the field of robotic control.
 
-# Datasets
-| Dataset | Description | Source | Percentage in Training Mixture (RT-2-PaLI-X) | Percentage in Training Mixture (RT-2-PaLM-E) |
-|---------|-------------|--------|----------------------------------------------|----------------------------------------------|
-| WebLI | Around 10B image-text pairs across 109 languages, filtered to the top 10% scoring cross-modal similarity examples to give 1B training examples. | Chen et al. (2023b), Driess et al. (2023) | N/A | N/A |
-| Episodic WebLI | Not used in co-fine-tuning RT-2-PaLI-X. | Chen et al. (2023a) | N/A | N/A |
-| Robotics Dataset | Demonstration episodes collected with a mobile manipulation robot. Each demonstration is annotated with a natural language instruction from one of seven skills. | Brohan et al. (2022) | 50% | 66% |
-| Language-Table | Used for training on several prediction tasks. | Lynch et al. (2022) | N/A | N/A |
-
 
 ## Datasets
 Datasets used in the paper