The Challenge of Evaluation

Why evaluating VLA models is so difficult

The Challenge of Evaluation

Home > Challenges > Evaluation Problem


The Core Problem

Evaluation requires running physical robots and observing their behavior, which presents several issues:

  • Risk of hardware failure
  • Risk of environmental damage (e.g., breaking dishes)
  • Time and cost constraints


Differences from LLM Evaluation

AspectLLMVLA
Evaluation EnvironmentDigitalPhysical World
Evaluation CostLowHigh
ReproducibilityHighLow
RiskNonePresent

Solution Attempts

Automation Through World Models

  • 1X’s approach

Distributed Evaluation Systems

  • RoboArena

See Also