SmolVLA

HuggingFace's Lightweight Open-Source VLA Model

SmolVLA

Home > Models > SmolVLA


Key Significance

  • Extreme Efficiency: 450M parameters achieving equal LIBERO benchmark performance (87.3%) as 7x larger Pi0 (3.3B)
  • Runs Anywhere: Works on MacBook, consumer GPU, even CPU - dramatically lowers barrier to VLA research
  • Community Data Only: Trained on 487 LeRobot community datasets (10M frames), achieves SOTA with only public data
  • Asynchronous Inference: Separates action prediction and execution for 30% faster response, 2x throughput
  • Fast Training: Completes training in 4 hours (20K steps) on single A100
  • Flow Matching Applied: SmolVLM-2 + Flow Matching Action Expert architecture
  • Fully Reproducible: Anyone can reproduce using open-source model, code, and public datasets only

SmolVLA Overview

SmolVLA: Achieving Pi0-level performance with 450M parameters


Overview

SmolVLA is a lightweight VLA model developed by HuggingFace, showing equal or better performance than 10x larger models with just 450M parameters. Runs on consumer GPUs and MacBooks, trained on LeRobot community data.

ItemDetails
PublishedJune 2025
DeveloperHuggingFace
Parameters450M
PaperarXiv:2506.01844
Bloghuggingface.co/blog/smolvla
Modellerobot/smolvla_base

Architecture

SmolVLA = SmolVLM-2 (VLM) + Flow Matching Action Expert

+---------------------------------------------------------+
|                  SmolVLA Architecture                    |
+---------------------------------------------------------+
|  Inputs:                                                |
|  +----------+  +----------+  +----------+               |
|  | Multiple |  | Robot    |  | Language |               |
|  | Camera   |  | State    |  | Instruct |               |
|  | Views    |  |          |  |          |               |
|  +----+-----+  +----+-----+  +----+-----+               |
|       |             |             |                      |
|       +-------------+-------------+                      |
|                     |                                    |
|              +------v------+                             |
|              |  SmolVLM-2  |   Compact VLM              |
|              |  (Context)  |                             |
|              +------+------+                             |
|                     |                                    |
|              +------v------+                             |
|              |   Action    |   Flow Matching            |
|              |   Expert    |                             |
|              +------+------+                             |
|                     |                                    |
|              +------v------+                             |
|              |   Action    |                             |
|              |   Chunk     |                             |
|              +-------------+                             |
+---------------------------------------------------------+

Training Data

ItemDetails
Data10M frames
Source487 LeRobot community datasets
Episodes<30K (1/10 of other VLAs)
EnvironmentsVarious from labs to homes

Performance

Benchmark Results

LIBERO Benchmark:

ModelParametersSuccess Rate
SmolVLA0.45B87.3%
Pi03.3B~87%

-> Equal performance with 7x smaller model

Other Comparisons

  • Superior to ACT (simulation and real environments)
  • Validated on LIBERO, Meta-World, SO100, SO101

Key Advantages

FeatureDescription
Lightweight450M parameters (10x smaller)
EfficientRuns on MacBook, CPU
Fast Training4 hours on single A100 (20K steps)
Async Inference30% faster response, 2x throughput
Open-SourceComplete release of model, code, data

Asynchronous Inference

SmolVLA’s differentiating feature: Separating action prediction and execution

  • Low-latency control
  • Real-time operation even in resource-constrained environments
  • 30% response speed improvement
  • 2x task throughput

Hardware Requirements

EnvironmentPossible
MacBookO
Consumer GPUO
CPU onlyO
Training (single A100)4 hours

Impact

Significance of SmolVLA:

  • Accessibility: VLA research possible without expensive infrastructure
  • Efficiency: SOTA performance with less data, smaller model
  • Community: Contributes to LeRobot ecosystem development
  • Reproducibility: Fully open-source, public datasets only

References


See Also