ALOHA

Low-cost open-source bimanual teleoperation platform

ALOHA

Home > Hardware > Arms > ALOHA


Overview

ItemDetails
Full NameA Low-cost Open-source Hardware System for Bimanual Teleoperation
DeveloperStanford University, UC Berkeley, Meta; built with Interbotix arms (Trossen Robotics)
Key ResearchersTony Z. Zhao, Vikash Kumar, Sergey Levine, Chelsea Finn
PublicationRSS 2023
ConfigurationInterbotix ViperX-300 (6DoF) x 2 (Follower) + Interbotix WidowX-250 (6DoF) x 2 (Leader)
Total CostRoughly ~$20k–$32k depending on what’s included (leaders, cameras, compute)
SizeNot standardized; depends on frame/table build (no single canonical dimension in paper)
ApplicationsVLA data collection, bimanual manipulation research, imitation learning

Key Significance

1. Standard Platform for VLA Research

ALOHA is not just hardware, but has established itself as core infrastructure for modern robot imitation learning research. At a low five-figure cost (often quoted in the ~$20k–$30k+ range depending on what’s included), it provides performance comparable to commercial bimanual robots ($200,000+), revolutionizing research accessibility.

2. Original Platform for ACT (Action Chunking with Transformers)

ALOHA was designed as the platform for ACT algorithm development. To address the compounding error problem of simple behavior cloning, it enabled an innovative approach of predicting action sequences (action chunks) rather than single actions.

3. Major Data Source for Open X-Embodiment

In the Open X-Embodiment dataset (22 robots, 1M+ trajectories) led by Google DeepMind, ALOHA is one of the platforms providing the richest bimanual manipulation data. It serves as a key data source for RT-X model training.

4. Open Source Ecosystem

Hardware designs, software, and data collection code are all publicly available, enabling replication and extension by research labs worldwide.


Importance of Bimanual Manipulation

Human bimanual manipulation demonstrates capabilities beyond the simple sum of two arms. This originates from spatio-temporal coordination.

Tasks Requiring Bimanual Operation

Task TypeExamples
Stabilization-ManipulationHolding object with one arm, screwing with the other
Cooperative TransportCarrying large boxes, trays
Tool UseSweeping with broom, pushing with mop
CookingMixing ingredients, opening lids, stir-frying with spatula
AssemblyTying zip ties, chain assembly

Coordination Paradigms

  • Leader-Follower: Primary arm leads the task, secondary arm supports
  • Synergistic: Both arms cooperate equally for simultaneous tasks

Hardware Configuration

Complete Cost Breakdown

Costs vary substantially by SKU (e.g., 5DoF vs 6DoF), whether you count the leader station, and what compute/cameras you include. A typical build is in the low five-figure USD range.

Note: Costs based on original paper and Trossen Robotics pricing (2023). May vary by exchange rate and purchase timing.

ViperX 300 6DoF Specifications

SpecValue
Degrees of Freedom6 DoF (arm) + 1 DoF (gripper)
Reach750mm
Payload750g (recommended at 50% extension)
MotorsDYNAMIXEL X-Series
Waist/Shoulder/Elbow/Forearm Roll/Wrist AngleXM540-W270
Wrist Rotate/GripperXM430-W350
CommunicationRS485 (1Mbps), U2D2 interface

Camera System

VersionCameraFeatures
ALOHA (Original)Logitech C922x webcams x 4RGB 480x640 @ 50Hz, 2 stationary + 2 wrist-mounted
ALOHA 2Intel RealSense D405 x 4RGB + Depth, global shutter, wide FOV, left/right wrist + top/bottom views

Gripper

  • ALOHA (Original): Scissor-head gripper, XL430-W250-T motor
  • ALOHA 2: Low-friction rail design, XC430-W150-T motor (10x lower opening/closing force, plastic gears replaced with low-friction metal gears)

Teleoperation Method

ALOHA’s teleoperation uses backdriving-based puppeteering.

Operating Principle

User → Physical manipulation of leader arm → Read joint positions → Synchronize follower arm
  1. Physical Backdriving: User directly moves leader arm (WidowX) by hand
  2. Real-time Synchronization: Leader joint positions immediately reflected to follower (ViperX)
  3. Data Collection: RGB images + joint states recorded simultaneously

Key Advantages

AdvantageDescription
IntuitivenessNatural interface of directly moving robot arm
Low LatencyLower latency compared to joystick/VR controllers
Force FeedbackSensing physical resistance enables delicate manipulation
Low CostUses only existing arm encoders without additional sensors

Required Environment

  • 6+ USB3 ports (4 robots + 2 cameras)
  • Connection instability possible when using USB hub

ALOHA Version Comparison

ItemALOHA (Original)ALOHA 2
Release2023 (RSS)2024
DevelopersStanford, UC Berkeley, MetaGoogle, Stanford, Hoku Labs
GripperScissor-head, high friction (XL430-W250-T)Low-friction rail design (XC430-W150-T)
Gravity CompensationRubber bandsPassive kinematic mechanism (off-the-shelf components)
CameraLogitech C922x webcams x 4Intel RealSense D405 x 4
FrameBasic48” x 30” table + aluminum cage
SoftwareROSROS 2 (50Hz logging)
DurabilityModerateEnhanced

Extension to Mobile ALOHA

Mobile ALOHA is an extended version with ALOHA mounted on a mobile base (AgileX Tracer).

ItemDetails
BaseAgileX Tracer AGV (~$7,000) - differential drive, max speed 1.6m/s, max payload 100kg
Total Cost~$32,000 (including onboard power and compute)
Dimensions90cm x 135cm, weight 75kg
Movement MethodUser physically connected to system, backdriving wheels
DataBase velocity + arm puppeteering recorded simultaneously
LearningAutonomous execution possible with 50 demonstrations (co-training boosts success rate up to 90%)

Mobile ALOHA Training Task Examples

  • Putting pot in cabinet
  • Calling elevator
  • Pushing in chair
  • Stir-frying shrimp
  • Cleaning wine spill
  • High-five

See Mobile ALOHA for details.


VLA Research Applications

ALOHA is a key evaluation/training platform for various VLA (Vision-Language-Action) models.

ACT (Action Chunking with Transformers)

ItemDetails
Core IdeaPredict action sequences (chunks) instead of single actions
ArchitectureConditional VAE + Transformer Encoder/Decoder
Problem SolvedReduces compounding error by factor of k (k = chunk length)
PerformanceReports ~80–90% on some tasks with ~10 minutes of demonstrations (task/data-regime dependent)
TasksFine-grained bimanual manipulation like opening transparent cups, battery insertion

OpenVLA

ItemDetails
Parameters7B
BaseLlama 2 + DINOv2 + SigLIP
Training Data970k real robot demonstrations (including ALOHA)
Performance16.5% higher success rate than RT-2-X (55B)

Physical Intelligence Pi Series

ModelFeatures
Pi-03B PaLiGemma VLM + 300M Diffusion action expert
Pi-0-FASTSpeed improved with tokenized action output
Pi-0.5Open-world generalization through heterogeneous data co-training

Pi-0 is benchmarked on various robot platforms including ALOHA, significantly outperforming existing baselines like OpenVLA, Octo.

OpenVLA-OFT

Achieves high-frequency language-based control with 7B VLA policy on ALOHA. 97.1% success rate on LIBERO benchmark, surpassing Pi-0, Diffusion Policy, etc.


Software Ecosystem

PackageDetails
ROS / ROS 2Drivers, URDF, Gazebo simulation
MoveItMotion planning support
LeRobotHugging Face robot learning library integration
ACT CodeOfficial training/inference code released

References

Papers

Project Sites

GitHub

Hardware Purchase


See Also