Intro/Motivation

Context (Furkan)

Robot Utility Models is a framework for collecting versatile data with portable data collection tools that allow collecting data from different environments in an efficient and quick way, using these data to train an imitation-learning policy that can generalize to new environments, and deploying these policies on Hello Robot Stretch robot. [1]

Project Objectives (Jaron)

Robot Utility Models are able to perform individual short-term tasks learned from expert demonstrations. We seek to investigate and expand the capabilities they provide in regards to multimodality and task complexity.

RUMs uses BeT, which has been shown to be able to learn multimodal actions. We aim to test the extent of this capability with a distinctly multimodal task: sorting items based on their visual appearance into location-specific receptacles. RUMs has not yet been tested on such an explicitly forked task conditioned on visual indicators.

It is plausible that RUMs could handle longer, complex tasks with accordingly longer and more complex expert demonstrations, but each task would require a new dataset, and collecting these demonstrations would take exponentially longer due to the greater degree of total variety required for a robust policy. Longer tasks are often able to be decomposed into shorter, simpler subtasks. The same subtask may also be present in many different complex tasks. Therefore, our second objective is to implement task composition.

Finally, the current deployment procedure requires the user to first carefully position the robot and align the camera with the task scene. We explore methods of reducing the dependence on human intervention, such as automated navigation.

Expected Contributions

The expected contributions of the project include:

A robust visually-conditioned sorting policy.
An implementation of task composition.
The integration of navigation capabilities with RUMs.

Experiments / Procedures

Initial Ideation Stage (Alex)

The project started with an idea curation phase, where the current RUM is able to pick up bag and tissue, open door and drawer, and reorient fallen objects on the table. Given our proposal to extend RUM's capability and to explore more complex home tasks, our idea is to first select a task that is adjacent to established possible tasks, and then compose that task with some natural extension.

In terms of task selection, our first idea is to have a bottle/can/cup pick up policy, and the composition task will be placing cups in cabinet. Here we will train: a) a cup pickup policy, b) a cabinet open policy, c) a cup placing policy. The main obstacle here is navigation and object occlusion: while pick up by itself should be straightforward, interacting with the cabinet requires acting at different height and placing cups require overcoming occlusion caused by the form factor of cups. Other natural tasks, such as taking cans out of the fridge, have similar navigational challenges. We therefore realized that a more suitable task would involve common objects, preferably those small in size so as to reduce occlusion, and a compositional task that requires minimal navigation. Picking fruits naturally came up, and we decided that we will have a pick up policy for lemon and lime, and the compositional task will be sorting them. More detailed description and variations are presented below.

Besides task selection, we also need to overcome the technical challenge of task composition, as this was not previously implemented. Our goal is to have a general, policy independent way of composing arbitrary policies, as this will allow arbitrary extension of task complexity, as long as the tasks are compositional. There were discussions on techniques to achieve that, below we present two alignment methods, one compares image encoding similarity while the other uses DynaMem.

Lemon Pickup Policy (Alex)

This is the first policy we trained, where we familiarized ourselves with the data collection procedure, and this policy can be seen as replicating existing result, since it is just a simple pick up task, like those already trained. From conversation with Mahi and Haritheja, we understood we needed roughly 500 demos to learn a lemon pick up policy, so we collected lemon pick up in various environments, some of the demos can be seen in the "overview of policy training procedure" section. Below we attach a successful policy rollout demonstration (2x speed).

Lemon Lime Sorting (Left/Right) (Alex)

This task is the first variant of sorting lemon and lime, where the idea is heavily inspired from Mahi. It has the advantage of conceptually simple and easy to collect data while compared to the labeled bowls sorting it is less data efficient, see next section for more detail.

The task is as follows: the stretch robot starts with a lemon/lime in its gripper, with two containers in front of it, and the task is to put the lemon/lime in the left/right container respectively. Alternatively, this task can be seen as a placing policy that is condition on fruit variety where the placing locaiton is fixed, in the next variant the we relaxes the second constraint. This task can also be seen as a combination of two separate task where each sorts lemon and lime respectively, and in that case one could imagine using a higher level policy to decide which sub-policy to trigger. The environmental diversity/out of distribution generalization comes in the form of a) different containers, b) different table top, c) different approach directions. While lemon and lime also comes in variations, the variations are much more minor than that of the other factors. In our data collection, to insure robustness, we made sure to consider variation and combination of the containers, such as swapping the left and right container, and we made sure to use different table top to record our demo. We also had different appraoch directions, though at the present moment whether we have under-collected the number of demos given our dataset diversity, more discussion in the results/conclusion section.

Our first training involved around 1.5k demos, however that seems to not be enough, a further 1k demos is collected and the training is currently underway; we also had some hiccups during trainig because of data preprocessing and gpu availability, more discussion in the results/conclusion section. Below we show some test environment rollout and some sample of collected demos.

For the second round of training we added ~500 demos, we collected more but weren't able to use all of it because of data consistency worries. The larger dataset now contain mostly plates, and the demos are now more conscious of object occlusion. The policy is much better, below are tests performed on two different table, with plates not seen before (though the demo does contain white plates). The success rate for sorting alone is around 50% or higher, and the main failure mode is a failure to release lemon/lime at the plate. We also notice the curious behavior of a "slow start" where the stretch robot moves minutely in the first few steps, and then seemingly finding it's target, moves quickly and confidently. This slow start does not seem to correlate with policy success rate, although the test video below shows a failure case.

We see that the first rollout is a success, the second rollout demonstrates the failed to release failure mode, the third rollout is a success with some slow start behavior, and the fourth one shows a slow start behavior and also a failed to release. All vidoe are 2x speed up.

Lemon/Lime Sorting with Labeled Bowls (Jaron)

Idea

The first sorting policy is visually conditioned to direct an object to a set relative location (left/right). We consider a version of this task where the sorting destination is not fixed. Picture 'lemon' and 'lime' signs affixed to each bowl.

Two problems with the implementation of this task are that 1: It appears to require double the training data as the previous version (we now need to demonstrate sorting each fruit into both bowls), and 2: There are still significant inflexibilities - what if we need to make modifications to the labels we select for lemons and limes, or would also like to sort oranges?

Tailored Image Transformations

We suggest that the need for increased data acquisition may be avoided by applying targeted image transformations to a smaller dataset. Training demonstrations can be performed such that we are able to modify the appearance of the labels and the sorted item in post.

From these "generic" demonstrations, we can produce lemon-sorting demonstrations by changing the sign on the destination bowl to appear as the lemon label and color-shifting the held fruit to resemble a lemon. We can likewise produce lime-sorting demonstrations by changing the sign to the lime label and color-shifting the held fruit to resemble a lime.

Training demonstrations are performed with ARUCO fiducials that facilitate label substitution. A lemon is placed into either the left or right bowl. The same ARUCO marker is always associated with the destination bowl. With this procedure, lemon-sorting demonstrations can be produced by drawing the lemon label atop the ARUCO marker, and lime-sorting demonstrations can be produced by inserting the lime label and color-shifting the lemon.

The raw sample (left), the derived lemon sorting sample (center), and the derived lime sorting sample (right).

Implementation

OpenCV allows us to intermittently extract the position of ARUCO markers from videos frame-by-frame. We tested three methods of turning these sporadic signals into persistent ones:

Kalman filters
Forward optical flow tracking
Interpolated forward and reverse optical flow tracking

Below, we provide annotated signal tracking visualizations. First, the raw frame-by-frame detections. Second, with Kalman filtering. Third, using forward optical flow. Lastly and most robust, using interpolated dual optical flow.

We map colored labels with blue borders and random variation (in hue, border width, and noise level) onto the signal stabilized by dual optical flow interpolation.

Note that our 728 "generic" samples all use lemons. The plan was to mask by lemons by location and hue, and then color-shift in order to transmute lemons to limes. Alas, yellow is too varied a color category to easily filter. The example given above is mottled with only mild leakage of green to the bowl in which it is placed, but many other samples taken on light wooden tables have proven infeasible to reliably constrain green within the fruit. We ought to have implemented and tested the image transforms on a few samples before collecting the full dataset. In that case, we would have used a more easily masked color - perhaps by painting a lemon neon pink.

The full implementation of ARUCO tracking and custom image transformations resides at https://github.com/jaron-cui/aruco-label-tracking.

Outcome

Due to time constraints and problems with the vanilla left/right lemon/lime sorting policy, we were unable to train and test a policy on the transformed data. However, we were able to learn techniques for data processing and the importance of small-scale pre-testing.

Image Encoding-based alignment (Jaron)

Motivation

Given RUM policies for various simple tasks, composing compound tasks requires a robust chaining method. The behavior of each individual task policy is sensitive to the deployment environment. If the model receives out-of-distribution observations, we can expect it to perform poorly. Thus, it is important that the robot realigns itself after completing a subtask such that the following is presented with an appropriate initial observation.

For example, suppose the robot is instructed to pick up a lemon and place it in a bowl. First, the robot should align its camera so that a lemon is in view. Second, deploy the lemon-pickup policy. Third, align the camera with a bowl. Fourth, deploy the place-lemon-in-bowl policy.

We propose an alignment function that does not require additional data collection and that can be run entirely locally.