Defining Physical AI

Physical AI: The Emergence of the Term

On March 18, 2024, NVIDIA CEO Jensen Huang put “Physical AI” front and center during his GTC keynote.

“The next wave of AI will be AI learning about the physical world… Physical AI, AI that can perceive, reason, plan and act.”

— Jensen Huang, GTC 2024 (NVIDIA Blog)

“The ChatGPT moment for general robotics is just around the corner.”

— Jensen Huang (HPC Wire)

ChatGPT has been shaking the world since its release in late 2022. Jensen Huang predicts a similar moment will soon happen in the robotics industry. Physical AI has become a central topic in the AI industry. However, the definition of this term is not yet unified and is being used interchangeably. Let’s take a closer look.

Various Definitions

Definitions of Physical AI can be broadly divided into two categories.

Broad Definition: “AI with a Body”

This is the wide definition primarily used by NVIDIA and general media.

Source: NVIDIA Glossary - Generative Physical AI

“The next big thing is Physical AI, AI with a body.”

— Jensen Huang

From this perspective, Physical AI includes all AI that interacts with the physical world:

Autonomous vehicles
Drones
Industrial robots
Humanoids
Digital twins

As a GPU vendor, it makes sense for NVIDIA to unify various GPU use cases under the “Physical AI” keyword.

Narrow Definition: VLA or End-to-End Learning-based General-Purpose Robots

Companies actually developing robot AI, such as Physical Intelligence, Google DeepMind, and Figure AI, use a narrower definition.

“Vision-language-action (VLA) model… can ‘see’ (vision), ‘understand’ (language) and ‘act’ (action) within the physical world.”

— Google DeepMind, Gemini Robotics

From this perspective, Physical AI means general-purpose robot AI based on VLA (Vision-Language-Action) models. VLA is the core technology that enables the actual implementation of Physical AI. Through VLA, robot actions that were difficult in the past have become possible, and this is the biggest reason Physical AI is getting attention. Therefore, I believe the narrow definition of Physical AI is more compelling. Let’s now examine how this new VLA technology differs from past approaches.

Our Definition

We adopt the narrow definition and will explore it in depth.

AI systems and their ecosystem that perform generalist physical tasks—previously impossible with rule-based (Specialist) approaches—based on End-to-End VLA models

Why this definition?

1. A Threshold Exists

Just as LLMs had the “emergence of GPT,” robots also have a clear threshold.

July 2023: The Emergence of RT-2

“The concept of Vision-Language-Action (VLA) models was pioneered in July 2023 by Google DeepMind with RT-2.” — Wikipedia: Vision-language-action model

RT-2 was the first to train on both web data and robot data, treating robot actions like language tokens. This enabled:

Demonstrated generalization ability to objects/commands not in training data
Ability to execute reasoning-based commands
Multi-step planning through chain-of-thought reasoning

Sources: Google DeepMind RT-2 Blog, arXiv 2307.15818

2. What Was Impossible Before Is Now Possible

Physical AI is not just a marketing term because previously impossible tasks have become possible.

Representative Example: Folding Laundry

Laundry has been called the “holy grail” of robot manipulation.

“Towels are deformable, constantly changing shape, bending unpredictably… There’s no fixed geometry to memorize, and no single ‘correct’ grasp point.” — Knowable Magazine

Objects like these—constantly changing shape, wrinkling unpredictably, with no fixed geometry—are called deformable objects. They are extremely difficult for rule-based robots.

But now, numerous success stories are being reported.

Physical Intelligence’s π0: Successfully folding laundry with 50Hz continuous actions (Physical Intelligence Blog)
Figure’s Helix: “First autonomous laundry folding based on end-to-end neural network” (Figure AI)

Dyna Robotics laundry folding robot demo. Filmed at CoRL 2025, September 2025 — Jong Hyun Park

Deformable Object Manipulation

Plastic-wrapped logistics packages, flexible cables, food ingredients — tasks previously impossible with rule-based approaches are becoming possible through VLA. Figure AI released a YouTube video demonstrating a logistics robot continuously processing plastic-wrapped packages for 1 hour to prove this capability.

Figure AI’s 1-hour continuous logistics demo — handling various deformable objects including plastic-wrapped packages

3. Inheriting LLM’s World Knowledge

What makes VLA special is that it inherits the “common sense about the world” from LLM/VLM.

Previous rule-based robots lacked this common sense. To execute “pick up the cup,” everything had to be manually programmed: “what a cup is,” “what picking up means.” So if a new cup shape appeared in a store, the robot might fail to operate. But a VLA trained on internet-scale data knows what a cup is. Even when seeing a cup for the first time, if it looks like a cup, it can recognize it as such. So even when a new cup appears in the store, it can pick it up like a human would.

Evidence Supporting the Definition of Physical AI

Major Companies’ Approaches

All major companies are approaching this in similar ways. They pursue general-purpose robot foundation models with VLA-based end-to-end learning models. While there are differences in details—presence of tactile sensing, incorporating world models instead of VLA, whether the hardware is humanoid-shaped—the overall direction is the same.

Physical Intelligence
- π0, π0.5 (VLA + Flow Matching, 50Hz continuous actions)
Google DeepMind
- RT-2 → RT-X → Gemini Robotics
Figure AI
- Helix (proprietary VLA)
1X Technologies
- World Model + Redwood AI
Tesla (Optimus)
- End-to-end neural network, same approach as FSD for autonomous driving

Classical Robotics Is Also Changing

Interestingly, even the leading players in Classical Robotics are transitioning to the Physical AI era.

Boston Dynamics’ Transition

Boston Dynamics has long been known for its modular approach (perception → planning → control) and Model Predictive Control (MPC). Atlas’s backflips and Spot’s stable locomotion were products of this approach.

However, they recently began Large Behavior Model (LBM) research in collaboration with Toyota Research Institute (TRI).

Source: Boston Dynamics Blog - Large Behavior Models and Atlas Find New Footing

“The specific architecture used for Atlas is a 450-million-parameter diffusion transformer. This model outputs a continuous stream of actions at 30Hz to control all 50 of Atlas’s degrees of freedom.” — IEEE Spectrum

Boston Dynamics is also moving toward end-to-end models. Even BD, the leader of traditional robotics, is transitioning to the Physical AI paradigm.

Physical AI vs Classical Robotics

Aspect	Classical Robotics	Physical AI (VLA)
Architecture	Modular (perception → planning → control)	End-to-end integration
Learning	Rule-based + partial ML	Full data-driven learning
Generalization	Dependent on training environment	Zero-shot generalization possible
Knowledge	Domain-specific, manual input	World Knowledge inheritance (LLM/VLM)
Examples	Boston Dynamics Spot (2015~), industrial robot arms	π0, OpenVLA, GR00T, Gemini Robotics (2023~)
Limitations	Vulnerable to new environments/objects	Data collection cost, safety verification

Table: Key differences between Classical Robotics and Physical AI (VLA)

Summary: The Boundaries of Physical AI

Included (Physical AI):

VLA models (π0, OpenVLA, GR00T, Gemini Robotics, etc.)
End-to-end learning-based robot systems
Cross-embodiment datasets (Open X-Embodiment)
Simulation/hardware ecosystem for VLA

Excluded (Classical Robotics):

Rule-based industrial robots
Traditional autonomous driving stacks with modular separation
Single-task Specialist robots
Systems that only execute pre-programmed motions

On the Boundary:

Hybrid approaches (high-level: learning, low-level: MPC)

While all AI-based technologies that interact with the physical world could be defined as Physical AI, this would be akin to a marketing term. Deep learning-based object recognition and robot control using it were already possible domains, and are unrelated to the current technological trends driving the rise of the “Physical AI” keyword. Therefore, in this document, we limit the definition of Physical AI to the “general-purpose” technology newly developing in the era of LLMs, and focus our discussion accordingly.

Next Document

Let’s learn why the “Generalist robots” that Physical AI pursues are now possible, and how they differ from past “Specialist robots.”

Next: From Specialist to Generalist