ALOHA

Low-cost open-source bimanual teleoperation platform

ALOHA

Home > Hardware > Arms > ALOHA


Overview

ItemDetails
Full NameA Low-cost Open-source Hardware System for Bimanual Teleoperation
DeveloperStanford IRIS Lab (Chelsea Finn) + Trossen Robotics
Key ResearchersTony Z. Zhao, Zipeng Fu, Chelsea Finn
PublicationRSS 2023
ConfigurationViperX 300 arms x 2 (Follower) + WidowX 250 arms x 2 (Leader)
Total CostApproximately $32,000 (including webcams, laptop)
SizeL121.9 x W101.6 x H203.2 cm
ApplicationsVLA data collection, bimanual manipulation research, imitation learning

Key Significance

1. Standard Platform for VLA Research

ALOHA is not just hardware, but has established itself as core infrastructure for modern robot imitation learning research. At a price of $32,000, it provides performance comparable to commercial bimanual robots ($200,000+), revolutionizing research accessibility.

2. Original Platform for ACT (Action Chunking with Transformers)

ALOHA was designed as the platform for ACT algorithm development. To address the compounding error problem of simple behavior cloning, it enabled an innovative approach of predicting action sequences (action chunks) rather than single actions.

3. Major Data Source for Open X-Embodiment

In the Open X-Embodiment dataset (22 robots, 1M+ trajectories) led by Google DeepMind, ALOHA is one of the platforms providing the richest bimanual manipulation data. It serves as a key data source for RT-X model training.

4. Open Source Ecosystem

Hardware designs, software, and data collection code are all publicly available, enabling replication and extension by research labs worldwide.


Importance of Bimanual Manipulation

Human bimanual manipulation demonstrates capabilities beyond the simple sum of two arms. This originates from spatio-temporal coordination.

Tasks Requiring Bimanual Operation

Task TypeExamples
Stabilization-ManipulationHolding object with one arm, screwing with the other
Cooperative TransportCarrying large boxes, trays
Tool UseSweeping with broom, pushing with mop
CookingMixing ingredients, opening lids, stir-frying with spatula
AssemblyTying zip ties, chain assembly

Coordination Paradigms

  • Leader-Follower: Primary arm leads the task, secondary arm supports
  • Synergistic: Both arms cooperate equally for simultaneous tasks

Hardware Configuration

Complete Cost Breakdown

ComponentCostNotes
ViperX 300 x 2 + WidowX 250 x 2~$9,6802 followers + 2 leaders set
Cameras, sensors, mounts~$5,000Wrist cameras + external views
Laptop (with GPU)~$5,000Consumer-grade GPU
Frame and other hardware~$12,320Aluminum extrusion, cables, power, etc.
Total~$32,000Can reduce further with 3D printing

Note: Costs based on original paper and Trossen Robotics pricing (2023). May vary by exchange rate and purchase timing.

ViperX 300 Specifications

SpecValue
Degrees of Freedom6 DoF
Reach750mm
Payload750g
MotorsDYNAMIXEL X-Series
Waist/ShoulderXM540-W270 (4096 level feedback, ±0.1mm repeatability)
Wrist/GripperXM430-W350 (enhanced thermal management)

Camera System

VersionCameraFeatures
ALOHA (Original)Consumer webcamsRGB, multiple viewpoints
ALOHA 2Intel RealSense D405Wide FOV, depth, global shutter

Gripper

  • Follower: Parallel gripper with enhanced grip tape
  • Leader (ALOHA 2): Replaced with XC430-W150-T motor (low-friction metal gears)

Teleoperation Method

ALOHA’s teleoperation uses backdriving-based puppeteering.

Operating Principle

User → Physical manipulation of leader arm → Read joint positions → Synchronize follower arm
  1. Physical Backdriving: User directly moves leader arm (WidowX) by hand
  2. Real-time Synchronization: Leader joint positions immediately reflected to follower (ViperX)
  3. Data Collection: RGB images + joint states recorded simultaneously

Key Advantages

AdvantageDescription
IntuitivenessNatural interface of directly moving robot arm
Low LatencyLower latency compared to joystick/VR controllers
Force FeedbackSensing physical resistance enables delicate manipulation
Low CostUses only existing arm encoders without additional sensors

Required Environment

  • 6+ USB3 ports (4 robots + 2 cameras)
  • Connection instability possible when using USB hub

ALOHA Version Comparison

ItemALOHA (Original)ALOHA 2
Release2023 (RSS)2024
GripperHigh frictionLow-friction rail design
Gravity CompensationRubber bandsPassive kinematic mechanism
CameraWebcamIntel RealSense D405
FrameBasic20x20mm aluminum extrusion
DurabilityModerateEnhanced

Extension to Mobile ALOHA

Mobile ALOHA is an extended version with ALOHA mounted on a mobile base (AgileX Tracer).

ItemDetails
BaseAgileX Tracer AGV (~$7,000)
Movement MethodUser physically connected to system, backdrivin wheels
DataBase velocity + arm puppeteering recorded simultaneously
LearningAutonomous execution possible with 50 demonstrations

Mobile ALOHA Training Task Examples

  • Putting pot in cabinet
  • Calling elevator
  • Pushing in chair
  • Stir-frying shrimp
  • Cleaning wine spill
  • High-five

See Mobile ALOHA for details.


VLA Research Applications

ALOHA is a key evaluation/training platform for various VLA (Vision-Language-Action) models.

ACT (Action Chunking with Transformers)

ItemDetails
Core IdeaPredict action sequences (chunks) instead of single actions
ArchitectureConditional VAE + Transformer Encoder/Decoder
Problem SolvedReduces compounding error by factor of k (k = chunk length)
Performance80-90% success rate with 10-minute demonstration
TasksFine-grained bimanual manipulation like opening transparent cups, battery insertion

OpenVLA

ItemDetails
Parameters7B
BaseLlama 2 + DINOv2 + SigLIP
Training Data970k real robot demonstrations (including ALOHA)
Performance16.5% higher success rate than RT-2-X (55B)

Physical Intelligence Pi Series

ModelFeatures
Pi-03B PaLiGemma VLM + 300M Diffusion action expert
Pi-0-FASTSpeed improved with tokenized action output
Pi-0.5Open-world generalization through heterogeneous data co-training

Pi-0 is benchmarked on various robot platforms including ALOHA, significantly outperforming existing baselines like OpenVLA, Octo.

OpenVLA-OFT

Achieves high-frequency language-based control with 7B VLA policy on ALOHA. 97.1% success rate on LIBERO benchmark, surpassing Pi-0, Diffusion Policy, etc.


Software Ecosystem

PackageDetails
ROS / ROS 2Drivers, URDF, Gazebo simulation
MoveItMotion planning support
LeRobotHugging Face robot learning library integration
ACT CodeOfficial training/inference code released

References

Papers

Project Sites

GitHub

Hardware Purchase


See Also