DUET: Dual-Robot Understanding via Efficient Teaching

Jan 1, 2026ยท
Yiqi Zhao
Yiqi Zhao
,
Ruohai Ge
,
Celina Shiyu Wang
,
Junjie Ye
,
Muchen Xu
,
Minhao Li
,
Sergey Zakharov
,
Basile Van Hoorick
,
Vitor Campagnolo Guizilini
,
Leonidas Guibas
,
Gaurav S. Sukhatme
,
Jyotirmoy v. Deshmukh
,
Yue Wang
ยท 1 min read
Abstract
Dual-robot collaboration enables tasks that exceed the reach and payload of a single robot, such as collaboratively transporting objects across environments and executing coordinated handovers. Data acquisition is the primary bottleneck for training these systems. To this end, we introduce DUET, a dual-robot learning framework for mobile manipulation. For efficient data collection, we create a unified dual-embodiment synchronized VR-based teleoperation system for in-domain heterogeneous robot data collection. We further develop a complementary tracking pipeline that records human-human coordination and collaborative mobile manipulation priors. To allow efficient learning, we introduce an Action Chunking Transformer based architecture that first pretrains collaborative policies on efficient human-human demonstrations, before finetuning them on a minimal set of real-robot teleoperation trajectories. We develop a benchmark of four collaborative tasks to evaluate our framework using a Unitree G1 humanoid and a Dexmate Vega1 mobile manipulator. The results demonstrate that harnessing human priors not only yields superior task performance compared to baselines trained only on robot data, but also reduces the total human effort required for data collection. Our human data collection pipeline achieves 5.,4x acceleration on average from teleoperation, but we perform equally or better than robot-only data trained policies across all tasks.
Type

DUET human-data training pipeline
DUET