You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+26-45
Original file line number
Diff line number
Diff line change
@@ -23,7 +23,7 @@
23
23
---
24
24
25
25
26
-
Robotic Transformer 2 (RT-2) leverages both web and robotics data to generate actionable instructions for robotic control.
26
+
This is my implementation of the model behind RT-2. RT-2 leverages PALM-E as the backbone with a Vision encoder and language backbone where images are embedded and concatenated in the same space as the language embeddings. This architecture is quite easy to architect but suffers from a lack of deep understanding of both the unified multi modal representation or the individual modality representations.
27
27
28
28
[CLICK HERE FOR THE PAPER](https://robotics-transformer2.github.io/assets/rt2.pdf)
29
29
@@ -35,13 +35,6 @@ RT-2 can be easily installed using pip:
35
35
```bash
36
36
pip install rt2
37
37
```
38
-
39
-
Additionally, you can manually install the dependencies:
40
-
41
-
```bash
42
-
pip install -r requirements.txt
43
-
```
44
-
45
38
# Usage
46
39
47
40
@@ -53,26 +46,21 @@ First, you need to initialize the `RT2` class. You can do this by providing the
53
46
54
47
```python
55
48
56
-
import torch
49
+
import torch
57
50
from rt2.model importRT2
58
51
59
-
model = RT2()
60
-
61
-
video = torch.randn(2, 3, 6, 224, 224)
52
+
# img: (batch_size, 3, 256, 256)
53
+
# caption: (batch_size, 1024)
54
+
img = torch.randn(1, 3, 256, 256)
55
+
caption = torch.randint(0, 20000, (1, 1024))
62
56
63
-
instructions = [
64
-
'bring me that apple sitting on the table',
65
-
'please pass the butter'
66
-
]
67
-
68
-
# compute the train logits
69
-
train_logits = model.train(video, instructions)
57
+
# model: RT2
58
+
model = RT2()
70
59
71
-
# set the model to evaluation mode
72
-
model.model.eval()
60
+
# Run model on img and caption
61
+
output = model(img, caption)
62
+
print(output) # (1, 1024, 20000)
73
63
74
-
# compute the eval logits with a conditional scale of 3
| WebLI | Around 10B image-text pairs across 109 languages, filtered to the top 10% scoring cross-modal similarity examples to give 1B training examples. | Chen et al. (2023b), Driess et al. (2023) | N/A | N/A |
99
+
| Episodic WebLI | Not used in co-fine-tuning RT-2-PaLI-X. | Chen et al. (2023a) | N/A | N/A |
100
+
| Robotics Dataset | Demonstration episodes collected with a mobile manipulation robot. Each demonstration is annotated with a natural language instruction from one of seven skills. | Brohan et al. (2022) | 50% | 66% |
101
+
| Language-Table | Used for training on several prediction tasks. | Lynch et al. (2022) | N/A | N/A |
116
102
117
-
for writing this amazing paper and advancing Humanity
118
103
119
-
* LucidRains for providing the base repositories for [PALM](https://github.com/lucidrains/PaLM-rlhf-pytorch) and [RT-1](https://github.com/kyegomez/RT-2)
120
104
121
-
* Any you yes the Human looking at this right now, I appreciate you and love you.
122
105
123
106
## Commercial Use Cases
124
107
@@ -128,25 +111,18 @@ The unique capabilities of RT-2 open up numerous commercial applications:
128
111
-**Healthcare**: In robotic surgeries or patient care, RT-2 can assist in understanding and performing tasks based on both visual and verbal instructions.
129
112
-**Smart Homes**: Integration of RT-2 in smart home systems can lead to improved automation, understanding homeowner instructions in a much more nuanced manner.
130
113
131
-
## Examples and Documentation
132
-
133
-
Detailed examples and comprehensive documentation for using RT-2 can be found in the [examples](https://github.com/kyegomez/RT-2/tree/master/examples) directory and the [documentation](https://github.com/kyegomez/RT-2/tree/master/docs) directory, respectively.
134
114
135
115
## Contributing
136
116
137
117
Contributions to RT-2 are always welcome! Feel free to open an issue or pull request on the GitHub repository.
138
118
139
-
## License
140
-
141
-
RT-2 is provided under the MIT License. See the LICENSE file for details.
142
-
143
119
## Contact
144
120
145
121
For any queries or issues, kindly open a GitHub issue or get in touch with [kyegomez](https://github.com/kyegomez).
146
122
147
123
## Citation
148
124
149
-
```
125
+
```bibtex
150
126
@inproceedings{RT-2,2023,
151
127
title={},
152
128
author={Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski,
@@ -160,4 +136,9 @@ Ayzaan Wahid, Stefan Welker, Paul Wohlhart, Jialin Wu, Fei Xia, Ted Xiao, Peng X
160
136
and Brianna Zitkovich},
161
137
year={2024}
162
138
}
163
-
```
139
+
```
140
+
141
+
142
+
## License
143
+
144
+
RT-2 is provided under the MIT License. See the LICENSE file for details.
0 commit comments