Physical AI - The Age of Physical Intelligence is Coming

After LLM AGI, the era of physical intelligence is dawning

1. LLM, AGI, and Physical AI

The AGI race, starting with LLMs, is a journey to achieve human-level or beyond intelligence. As of 2026, it seems achievable in the near future.

But this intelligence is limited to Cognitive Intelligence.

  • Coding, math, reasoning, research, literature…

The journey to solve Physical Intelligence is a different dimension of problem.

Just as ChatGPT changed the world, there’s an expectation that Physical AI will transform the world of physical labor.


2. Latest Demos (2026.01)

CES 2026

Boston Dynamics Atlas + LBM

The star of this CES, Boston Dynamics' ATLAS

Does this demo have “intelligence”?


Boston Dynamics, the pioneer of Classical Robotics, is also transitioning to Physical AI.

  • 450M Diffusion Transformer: Co-developed with Toyota Research Institute (TRI)
  • Whole-body single model control: Walking + manipulation integrated
  • Deformable object manipulation like rope tying, cloth unfolding

→ Details: LBM (Large Behavior Model)

Sharpa CraftNet

Sharpa unveiled CraftNet and the North humanoid.

  • VTLA (Vision-Tactile-Language-Action): First commercial model integrating tactile sensing into VLA
  • Pinwheel folding, playing card handling, and other demos requiring tactile sensing
  • CES 2026 Innovation Award winner

Figure Helix 02

Figure AI’s Helix 02 is the first fully autonomous whole-body humanoid.

  • 61 consecutive actions over 4 minutes: No resets, no human intervention
  • System 0/1/2 architecture: High-speed control up to 1kHz
  • Replaced 109,504 lines of C++ code with a 10M parameter neural network

3. What is Physical AI?

An AI system based on end-to-end VLA models that performs generalist physical tasks impossible with past rule-based approaches

Why is Physical AI Getting Attention?

What was impossible before is now possible.

Past (Classical Robotics)Present (Physical AI)
Only predetermined actionsAdapts to various situations
Couldn’t fold laundryCan fold laundry
Struggled with deformable objectsCan handle plastic wrap, ropes

Classical Robotics vs Physical AI

AspectClassical RoboticsPhysical AI (VLA)
ArchitectureModular (perception → planning → control)End-to-end integrated
LearningRule-based + partial MLData-driven full learning
GeneralizationBound to training environmentZero-shot generalization
KnowledgeDomain-specificInherits World Knowledge (LLM/VLM)

→ Details: Definition of Physical AI


4. VLA, RFM, LBM Terminology

Evolution from LLM to VLA

LLM → VLM → VLA
Language → + Vision → + Action
TermFull NameDescription
VLAVision-Language-ActionModel integrating vision + language + action
LBMLarge Behavior ModelAction expressed as Behavior. Same as VLA
RFMRobot Foundation ModelFoundation Model for robots

Why VLA is Special

LLMs have common sense. So VLAs have common sense too.

  • Inherits World Knowledge from LLMs that learned all internet knowledge
  • Can work at a different cafe, with new menu items
  • Can handle packages of various shapes, clothes of various designs

→ Details: What are RFM & VLA?


5. Data Scaling Problem

There are reasons why VLA can’t simply follow LLM’s success formula.

Difference Between LLM and VLA

AspectLLMVLA
Data SourceInternet (infinite)Real robot actions (limited)
Collection CostLowHigh
EvaluationCan be automatedRequires physical robot operation

Solution Attempts

ApproachByDescription
TeleoperationTesla, Google, PIDirect data collection
SimulationNVIDIAOmniverse + Cosmos
CommunityHuggingFaceOpen-source collaboration
World Model1X, NVIDIASynthetic data generation

→ Details: Action Data Scaling Problem


6. VLA & RFM Progress

Convergent Evolution: System 1/2 Architecture

In 2025, different research groups independently arrived at similar structures.

SystemRoleFrequency
System 2High-level planning, language/vision understanding7-10 Hz
System 1Low-level motor control100-200 Hz

Models that adopted this:

  • GR00T N1.6 (NVIDIA)
  • Figure Helix (Figure AI)
  • Gemini Robotics (Google DeepMind)

Convergent Evolution: Continuous Action Generation

RT-2’s “Action as Language” (discrete tokens) → Flow Matching / Diffusion (continuous values)

ModelMethodFeatures
π0Flow Matching50Hz control
GR00T N1Diffusion TransformerDual system
SmolVLAFlow Matching450M lightweight

→ Details: VLA & RFM Progress


7. Physical vs Cognitive Intelligence

Moravec’s Paradox

“High-level reasoning requires relatively little computation, but low-level sensorimotor skills require enormous computational resources.”

Easy (for AI)Hard (for AI)
Beat chess world championTake keys out of pocket
Beat Go mastersWash dishes
Write complex equationsPick up fruit

AI that beats chess champions was created in 1997, but taking keys out of a pocket still doesn’t work well.

Why is Physical Intelligence Hard?

  1. Evolutionary perspective: Motor/sensory/perception evolved over ~1 billion years, abstract thinking over ~millions of years
  2. Dimensionality problem: Text is abstracted low-dimension, physical world is high-dimension + real-time interaction
  3. Learning difference: Humans learn through experience, VLAs mainly through imitation learning

→ Details: Physical vs Cognitive Intelligence


8. Conclusion: It Still Seems Possible

Reasons for optimism:

  • Much of human labor doesn’t necessarily require tactile sensing
  • Robots can add senses humans don’t have (depth cameras, etc.)
  • Robots can make movements impossible for human bodies
  • As Tesla FSD showed in autonomous driving, implementation can differ from human approach

Humanity, having learned so much from LLMs, seems likely to find the answer soon.


Learn More

Fundamentals Guide

  1. Definition of Physical AI
  2. What are RFM & VLA?
  3. Action Data Scaling Problem

Insights

Key Models

See Also