Q&A with Russ Tedrake on LBM 1.0: How Large Behavior Models Improve Robot Manipulation

Q&A with Russ Tedrake on LBM 1.0: How Large Behavior Models Improve Robot Manipulation

We sat down with Russ Tedrake , SVP of the Large Behavior Models (LBMs) division at Toyota Research Institute (TRI), to talk about a major milestone for the team: LBM 1.0! 

Here’s what he had to say:

Q: What’s the big news with LBM 1.0?

Russ: We’ve officially published our latest results from studying Large Behavior Models in a paper! This is the culmination of months of work and represents our first full milestone evaluation of our LBMs. We periodically “lock down” our training and evaluation stack to rigorously test our progress — LBM 1.0 is the first major result from that process. One of our main goals for this paper was to put out a very careful and thorough study on the topic to help people understand the state of the technology and to share a lot of details on how we're achieving it.

The short version is: LBMs work! We see consistent and statistically significant improvements as we increase the amount of pretraining data. But doing the science is still hard; as a field we have more work to do to improve the statistical power of our experiments.

LBM vs Single Task Side-by-Side Comparison

Q: What are the most important takeaways from this milestone?

Russ: Three things stood out:

  1. Evaluation is a real bottleneck for the field. Even with strong investments in a robust eval pipeline, distinguishing small (but meaningful) changes in policy performance is a real challenge. We still lack the statistical power to distinguish changes when the effect on success rate or task completion is only a few percentage points.
  2. Pretraining has a strong and measurable effect on policy performance. By pretraining on the 500+ long-duration tasks we’ve collected over the last few years, we can develop more robust policies faster.
  3. The future looks very bright! Our early scaling laws show that performance keeps improving as we add more diverse data. This is a strong signal to keep going.

Q: Do you have a favorite result from the paper?

Russ: Probably my favorite plot from the paper, which sums it all up, is this one.

Article content

The plot compares performance using different amounts of pretraining data used before training a new task: 0% (aka single task), 25, 50, or 100% of TRI’s data, then 100% of TRI’s data + all of the open-source robot data (the red line) that we’ve curated. It’s just awesome that the distributions over task completion are so tight and that trends as we increase data are so consistent. The results show clearly that with pretraining, we can train a novel skill with substantially less data or use the same amount of data and get much better task performance. And the benefits appear to continue with more data.

Q: What’s next for the LBM team?

Russ: More data, more experiments, and more insights. LBM 1.0 is just the beginning, and the team is already iterating on what’s next! 

Q: Where can people learn more about these findings?

Russ: Check out the LBM Project Website — it has the full paper, links to the seminar, and more technical details. 


🙌 A huge shoutout to the LBM team — this was a massive effort by the entire team, with a number of individuals really pouring their hearts into this paper. The paper is packed full of (too many?) details. Your comments and feedback would be very welcome.

To view or add a comment, sign in

More articles by Toyota Research Institute

Others also viewed

Explore content categories