Physical AI - The Age of Physical Intelligence is Coming

1. LLM, AGI, and Physical AI

The AGI race, starting with LLMs, is a journey to achieve human-level or beyond intelligence. As of 2026, it seems achievable in the near future.

But this intelligence is limited to Cognitive Intelligence.

Coding, math, reasoning, research, literature…

The journey to solve Physical Intelligence is a different dimension of problem.

Just as ChatGPT changed the world, there’s an expectation that Physical AI will transform the world of physical labor.

2. Latest Demos (2026.01)

CES 2026

Boston Dynamics Atlas + LBM

The star of this CES, Boston Dynamics' ATLAS

Does this demo have “intelligence”?

Boston Dynamics, the pioneer of Classical Robotics, is also transitioning to Physical AI.

450M Diffusion Transformer: Co-developed with Toyota Research Institute (TRI)
Whole-body single model control: Walking + manipulation integrated
Deformable object manipulation like rope tying, cloth unfolding

→ Details: LBM (Large Behavior Model)

Sharpa CraftNet

Sharpa unveiled CraftNet and the North humanoid.

VTLA (Vision-Tactile-Language-Action): First commercial model integrating tactile sensing into VLA
Pinwheel folding, playing card handling, and other demos requiring tactile sensing
CES 2026 Innovation Award winner

Figure Helix 02

Figure AI’s Helix 02 is the first fully autonomous whole-body humanoid.

61 consecutive actions over 4 minutes: No resets, no human intervention
System 0/1/2 architecture: High-speed control up to 1kHz
Replaced 109,504 lines of C++ code with a 10M parameter neural network

3. What is Physical AI?

An AI system based on end-to-end VLA models that performs generalist physical tasks impossible with past rule-based approaches

Why is Physical AI Getting Attention?

What was impossible before is now possible.

Past (Classical Robotics)	Present (Physical AI)
Only predetermined actions	Adapts to various situations
Couldn’t fold laundry	Can fold laundry
Struggled with deformable objects	Can handle plastic wrap, ropes

Classical Robotics vs Physical AI

Aspect	Classical Robotics	Physical AI (VLA)
Architecture	Modular (perception → planning → control)	End-to-end integrated
Learning	Rule-based + partial ML	Data-driven full learning
Generalization	Bound to training environment	Zero-shot generalization
Knowledge	Domain-specific	Inherits World Knowledge (LLM/VLM)

→ Details: Definition of Physical AI

4. VLA, RFM, LBM Terminology

Evolution from LLM to VLA

LLM → VLM → VLA
Language → + Vision → + Action

Term	Full Name	Description
VLA	Vision-Language-Action	Model integrating vision + language + action
LBM	Large Behavior Model	Action expressed as Behavior. Same as VLA
RFM	Robot Foundation Model	Foundation Model for robots

Why VLA is Special

LLMs have common sense. So VLAs have common sense too.

Inherits World Knowledge from LLMs that learned all internet knowledge
Can work at a different cafe, with new menu items
Can handle packages of various shapes, clothes of various designs

→ Details: What are RFM & VLA?

5. Data Scaling Problem

There are reasons why VLA can’t simply follow LLM’s success formula.

Difference Between LLM and VLA

Aspect	LLM	VLA
Data Source	Internet (infinite)	Real robot actions (limited)
Collection Cost	Low	High
Evaluation	Can be automated	Requires physical robot operation

Solution Attempts

Approach	By	Description
Teleoperation	Tesla, Google, PI	Direct data collection
Simulation	NVIDIA	Omniverse + Cosmos
Community	HuggingFace	Open-source collaboration
World Model	1X, NVIDIA	Synthetic data generation

→ Details: Action Data Scaling Problem

6. VLA & RFM Progress

Convergent Evolution: System 1/2 Architecture

In 2025, different research groups independently arrived at similar structures.

System	Role	Frequency
System 2	High-level planning, language/vision understanding	7-10 Hz
System 1	Low-level motor control	100-200 Hz

Models that adopted this:

GR00T N1.6 (NVIDIA)
Figure Helix (Figure AI)
Gemini Robotics (Google DeepMind)

Convergent Evolution: Continuous Action Generation

RT-2’s “Action as Language” (discrete tokens) → Flow Matching / Diffusion (continuous values)

Model	Method	Features
π0	Flow Matching	50Hz control
GR00T N1	Diffusion Transformer	Dual system
SmolVLA	Flow Matching	450M lightweight

→ Details: VLA & RFM Progress

7. Physical vs Cognitive Intelligence

Moravec’s Paradox

“High-level reasoning requires relatively little computation, but low-level sensorimotor skills require enormous computational resources.”

Easy (for AI)	Hard (for AI)
Beat chess world champion	Take keys out of pocket
Beat Go masters	Wash dishes
Write complex equations	Pick up fruit

AI that beats chess champions was created in 1997, but taking keys out of a pocket still doesn’t work well.

Why is Physical Intelligence Hard?

Evolutionary perspective: Motor/sensory/perception evolved over ~1 billion years, abstract thinking over ~millions of years
Dimensionality problem: Text is abstracted low-dimension, physical world is high-dimension + real-time interaction
Learning difference: Humans learn through experience, VLAs mainly through imitation learning

→ Details: Physical vs Cognitive Intelligence

8. Conclusion: It Still Seems Possible

Reasons for optimism:

Much of human labor doesn’t necessarily require tactile sensing
Robots can add senses humans don’t have (depth cameras, etc.)
Robots can make movements impossible for human bodies
As Tesla FSD showed in autonomous driving, implementation can differ from human approach

Humanity, having learned so much from LLMs, seems likely to find the answer soon.

Learn More

Fundamentals Guide

Insights

Key Models

π0 / π0.5 - Physical Intelligence
GR00T N1 - NVIDIA
Figure Helix - Figure AI
LBM - Boston Dynamics + TRI
CraftNet - Sharpa