Mobile ALOHA

Stanford IRIS Lab's low-cost whole-body teleoperation mobile bimanual manipulation platform

Key Significance

  • Democratization of Mobile Bimanual Manipulation: Implemented ~$32,000 platform vs existing $200,000+ mobile manipulators - greatly improved research accessibility
  • Whole-Body Teleoperation: Enables complex household tasks that require moving while using both arms, beyond tabletop manipulation
  • Co-Training Paradigm: Achieves 80-90% success rate with only 50 demonstrations by jointly training with existing static ALOHA datasets
  • Open-Source Ecosystem: Complete disclosure of hardware design, software, 3D printing files, assembly tutorials
  • Practical Household Robot Research: Demonstrated potential for general household robots through real-life task demonstrations like cooking, cleaning, elevator calling

Overview

Mobile ALOHA is a low-cost whole-body teleoperation system developed by Stanford IRIS Lab. By mounting the existing tabletop-only ALOHA system on a mobile base, it enables learning complex tasks that simultaneously perform locomotion and bimanual manipulation.

ItemSpec
DevelopmentStanford IRIS Lab
AuthorsZipeng Fu*, Tony Z. Zhao*, Chelsea Finn
PublicationarXiv: January 2024 / CoRL 2024 presentation
BaseAgileX Tracer AGV
ArmsViperX 300 6-DoF x 2 (follower) + WidowX 250 x 2 (leader)
GripperCustom parallel gripper (3D printed)
CamerasWrist x 2 + front 1 (Logitech C922x)
ComputeLaptop (RTX 3070 Ti, i7-12800H)
Total Cost~$32,000
PaperarXiv:2401.02117
Projectmobile-aloha.github.io

Hardware Configuration

Mobile Base: AgileX Tracer AGV

Source: AgileX TRACER Documentation

ItemSpec
Drive2-wheel differential drive + 4 freewheel casters
Motors150W brushless servo x 2
Max Speed1.6 m/s (human walking speed)
Payload100 kg
Size (L x W x H)660 x 516 x 163.5 mm
Ground Clearance30 mm
Obstacle Crossing10 mm height, 8 degree slope
Operating TimeUp to 4 hours (100kg load)
Price~$7,000

Robot Arms: ViperX 300 6-DoF

Source: Trossen Robotics ViperX 300

ItemSpec
Configuration2 followers (for autonomous execution)
Arm DoF6-DoF (arm body)
Gripper1-DoF (open/close)
Arm+Gripper Total DoF7-DoF per arm
Horizontal Reach75 cm (base center to gripper)
Total Span150 cm
Working Payload750 g
ServosDYNAMIXEL XM540-W270-R, XM430-W350-R
Resolution4096 positions
Material20mm x 40mm extruded aluminum
Price~$6,130 x 2

Teleoperation Leader Arms: WidowX 250 6-DoF

ItemSpec
Configuration2 leaders (for data collection)
Features3D printed ergonomic handles
UseUsed only for demonstration data collection
Price~$3,550 x 2

Sensors and Compute

ItemSpec
CamerasLogitech C922x x 3
Resolution640 x 480
Control Frequency50 Hz (camera streaming and policy execution)
Placement2 wrist + 1 front
ComputeConsumer-grade laptop
GPUNVIDIA RTX 3070 Ti (8GB VRAM)
CPUIntel i7-12800H
CommunicationUSB serial (arms) + CAN bus (base)

Power System

ItemSpec
Battery1.26 kWh
Weight14 kg
PositionBottom of base (doubles as counterweight)
FeaturesUntethered wireless operation

Base Weight

ItemSpec
AgileX Tracer Weight30 kg
Battery Weight14 kg

Cost Breakdown (~$32,000)

Source: Mobile ALOHA Project Page, Trossen Robotics

ComponentPrice (USD)Notes
AgileX Tracer AGV~$7,000Mobile base
ViperX 300 6-DoF x 2~$12,260Follower arms
WidowX 250 6-DoF x 2~$7,100Leader arms (for teleop)
Battery (1.26kWh)~$2,000Estimate
Cameras (C922x x 3)~$300RGB webcams
3D Printed Parts~$500Grippers, mounts, etc.
Other Hardware~$2,840Brackets, cables, etc.
Total~$32,000Officially stated on project page

Comparison: Existing commercial mobile manipulators (e.g., Clearpath + dual arms) are $200,000+


Physical Specifications

Source: Mobile ALOHA Project Page

ItemSpec
Footprint90 cm x 135 cm
Arm Reach Height65 cm ~ 200 cm
Arm Forward Extension100 cm (from base)
Total Weight75 kg
Pull Force100 N @ 1.5 m height
Movement SpeedUp to 1.6 m/s

Differences from Static ALOHA

ItemALOHA (Static)Mobile ALOHA
BaseFixed tableAgileX Tracer (mobile)
Action Dimensions14-DoF (arms+grippers)16-DoF (arms+grippers + base velocity)
Task RangeTabletop manipulationFull indoor environment
Teleop MethodHand-operate leader armsWhole-body teleop (walk while manipulating)
Cost~$20,000~$32,000
LoadFixedSelf-balancing (using battery weight)

Action Space Expansion

DoF Explanation: Each ViperX 300 arm is 6-DoF (arm) + 1-DoF (gripper) = 7-DoF

ALOHA: 14-DoF joint positions
       [arm1(6) + gripper1(1) + arm2(6) + gripper2(1)]

Mobile ALOHA: 16-DoF
       [arm1(6) + gripper1(1) + arm2(6) + gripper2(1) + base_linear_vel(1) + base_angular_vel(1)]

This design allows existing imitation learning algorithms to be applied with minimal modification.


Co-Training: Core Technique

Motivation

Mobile bimanual manipulation datasets are sparse, but static bimanual manipulation data is abundant. Co-training improves performance by training these two types of data together.

Method

Training Data = Mobile ALOHA demos (50) + Static ALOHA datasets (existing)

Mobile data: Full 16-DoF actions
Static data: 14-DoF actions (base velocity padded with 0)

Effect

ConditionAverage Success Rate
Mobile data only~50%
With co-training~84%
Improvement+34%p

Demonstrated Tasks

Source: arXiv:2401.02117 Table 1

Success Rates (50 demos, with co-training)

TaskSuccess RateDescription
Wipe Wine100%Wipe wine spill
Call Elevator95%Call elevator and board
Use Cabinet85%Open wall cabinet and store pot
High Five85%High five (20 demos)
Rinse Pan80%Rinse pan at kitchen sink
Push Chairs80%Organize chairs
Cook Shrimp40%Stir-fry shrimp (75 sec, only 20 demos used)

Task Categories

Cooking

  • Stir-fry and serve shrimp
  • Handle pots/pans
  • Rinse at sink

Cleaning/Organizing

  • Wipe wine spill
  • Push chairs to organize
  • Store items in cabinet
  • Use vacuum cleaner

Navigation + Manipulation

  • Press elevator button and board
  • Transport items between rooms

Interaction

  • High five
  • Hand items to people

Technical Details

Supported Algorithms

AlgorithmDescription
ACTAction Chunking Transformer
Diffusion PolicyDiffusion-based action generation
VINNVisual Imitation through Nearest Neighbors

Simulation Environments

  • Transfer Cube
  • Bimanual Insertion

Training Settings

ItemValue
Number of Demos50/task
Control Frequency50 Hz
Image Resolution640 x 480
Number of Cameras3 (2 wrist + 1 front)

Open-Source Resources

Public Materials

ResourceLink
PaperarXiv:2401.02117
Project Pagemobile-aloha.github.io
GitHub (Hardware)mobile-aloha
GitHub (Algorithms)act-plus-plus
Assembly TutorialIncluded in project page
3D Printing FilesIncluded in GitHub

Tutorial Contents

  • 3D printing guide
  • Assembly sequence
  • Software installation
  • Calibration methods
  • Teleoperation usage

Research Team and Support

Authors

NameRole
Zipeng FuCo-first author
Tony Z. ZhaoCo-first author
Chelsea FinnAdvisor

Support

  • Stanford Robotics Center
  • Steve Cousins
  • Stanford IRIS Lab members

Subsequent Developments

ALOHA 2 (Google DeepMind, Stanford, Hoku Labs, 2024)

Improved version jointly developed by Google DeepMind, Stanford, and Hoku Labs:

  • Gripper Operation Force Reduction: Leader arm gripper force requirement 14.68N to 0.84N (10x reduction)
  • 2x Gripper Strength: Follower gripper max gripping force 12.8N to 27.9N
  • Material Upgrade: PLA/acrylic to 3D-printed carbon fiber nylon
  • Passive Gravity Compensation: Rubber bands to adjustable hanging retractors, 42% efficiency improvement
  • Large-Scale Data Collection: Capable of collecting 100s-1000s demonstrations per robot per day

Commercialization

Trossen Robotics sells ALOHA kits:

  • ALOHA Solo
  • ALOHA Bimanual Kit
  • Mobile ALOHA compatible parts

Significance and Impact

Academic Impact

  • Co-training Effect Proven: Performance improvement possible with related task data
  • Low-cost Research Platform: High-quality research possible at $32K
  • Reproducibility: Complete open-source enables replication in labs worldwide

Industrial Implications

“Mobile ALOHA has demonstrated something unique: relatively cheap robot hardware can solve really complex problems.” - Lerrel Pinto, Associate Professor of Computer Science, NYU

  • Demonstrated feasibility of household robots
  • Complex tasks possible even with low-cost hardware
  • Dramatically reduced data collection costs

References

Paper

@article{fu2024mobile,
  author    = {Fu, Zipeng and Zhao, Tony Z. and Finn, Chelsea},
  title     = {Mobile ALOHA: Learning Bimanual Mobile Manipulation
               with Low-Cost Whole-Body Teleoperation},
  journal   = {arXiv preprint arXiv:2401.02117},
  year      = {2024},
  note      = {Presented at CoRL 2024}
}

See Also