GR00T (Project GR00T / GR00T N1 / N1.5 / N1.6)

NVIDIA's Foundation Model for Humanoid Robots

GR00T (Generalist Robot 00 Technology)

Home > Models > GROOT


Key Significance

  • First Open Humanoid Foundation Model: First open VLA for humanoid robots with GR00T N1
  • Dual-System Architecture: Inspired by human cognition - separates System 2 (VLM, planning/reasoning) and System 1 (Diffusion, real-time motion)
  • Demonstrates Power of Synthetic Data: Generates 780K trajectories (6,500 hours) in just 11 hours via Omniverse, 40% performance improvement over real data only
  • NVIDIA Ecosystem Integration: Hardware-software vertical integration from Isaac Sim/Lab, Omniverse, to Jetson Thor
  • Common Platform for Entire Industry: Collaboration with major humanoid companies including Figure AI, Boston Dynamics, Unitree
  • Jetson Thor Dedicated Hardware: 800 TFLOPS Blackwell-based robotics computing platform announced
  • Representative Case of Simulation-Based Scaling: Presents possibility of large-scale training without real data collection
  • Continuous Improvement (N1.5/N1.6): Language instruction compliance 46.6%→93.3% (N1.5), 2x DiT and Cosmos VLM introduction (N1.6)

GR00T N1 Architecture

GR00T N1 Architecture: System 2 (VLM) + System 1 (Diffusion Transformer) Dual-System Structure

GR00T Demo

GR00T N1.6: Various Manipulation Task Demonstration


Overview

GR00T (Generalist Robot 00 Technology) is a general-purpose humanoid robot foundation model announced by NVIDIA at GTC 2024 and open-sourced with the N1 series in 2025.

ItemDetails
First AnnouncedMarch 18, 2024 (GTC 2024)
N1 ReleasedMarch 2025 (GTC 2025)
CompanyNVIDIA
PaperarXiv:2503.14734
GitHubNVIDIA/Isaac-GR00T
LicenseApache 2.0 (Open Source)

Versions

Project GR00T (2024.03)

First announced at GTC 2024. Presented vision for foundation model for humanoid robots.

  • Announced goals: natural language understanding, human motion imitation, simulation-based learning
  • Jetson Thor computing platform announced simultaneously

GR00T N1 (2025.03)

World’s first open humanoid robot foundation model.

ItemDetails
TypeVision-Language-Action (VLA)
ArchitectureDual-system architecture
Action GenerationDiffusion Transformer
VLMEagle2-1B
DiT16 layers
Training1K H100 GPUs, 250K steps

GR00T N1.5 (2025.05)

Significantly improved language instruction compliance.

ItemDetails
Key ImprovementFrozen VLM (VLM fixed during training)
Loss FunctionFLARE Loss (Flow-based Action Reconstruction)
Language Compliance46.6% → 93.3% (+100%)
VLMEagle2-1B (Frozen)
DiT16 layers

Major Improvements:

  • Preserves language understanding by freezing VLM
  • Optimizes balance between action prediction and language compliance with FLARE loss
  • More than 2x improvement in language instruction compliance

GR00T N1.6 (2025.06)

Model scale expansion and real-world performance improvement.

ItemDetails
Key Improvement2x larger DiT, Cosmos VLM
VLMCosmos-2B (2x compared to Eagle2-1B)
DiT32 layers (2x compared to 16)
Action SpaceRelative Action Space
Real WorldImproved sim-to-real transfer performance

Major Improvements:

  • Diffusion Transformer expanded to 32 layers (2x)
  • Enhanced visual understanding with Cosmos-2B VLM
  • Improved generalization with Relative Action Space
  • Better sim-to-real transfer performance

Version Comparison

ItemN1N1.5N1.6
VLMEagle2-1BEagle2-1B (Frozen)Cosmos-2B
DiT Layers161632
Language Compliance46.6%93.3%93%+
Action SpaceAbsoluteAbsoluteRelative
Key ContributionFirst open modelLanguage compliance improvementScale expansion

Architecture

GR00T N1 is a Dual-System architecture inspired by human cognitive principles.

System 2 (Slow Thinking)

  • Based on Vision-Language Model
  • Develops action plans by understanding environment and instructions
  • Careful and methodical decision-making

System 1 (Fast Thinking)

  • Based on Diffusion Transformer
  • Converts plans into precise continuous motions
  • Corresponds to human reflexes/intuition

Training

Data Sources

SourceDescription
Human Demonstration DataActual human actions (teleoperation)
Synthetic DataGenerated by NVIDIA Omniverse/Isaac Sim
Human VideoFor motion pretraining

Training Data Distribution

GR00T Training Data Distribution: Combination of real and synthetic data

Power of Synthetic Data

Efficiency of synthetic data generation through Omniverse:

ItemValue
Generated Synthetic Trajectories780,000
Real-time Equivalent6,500 hours (9 months continuous)
Generation Time11 hours
Performance Improvement+40% over real data only

Training Infrastructure

ItemDetails
GPU1,000x H100
Training Steps250K steps
FrameworkIsaac Lab + Omniverse

Capabilities

Performable Tasks

  • Object grasping
  • Moving objects with single/bimanual
  • Object transfer between arms
  • Multi-step tasks requiring long context
  • Combination of general skills

Key Features

FeatureDescription
Natural Language UnderstandingUnderstands and executes language instructions
Motion ImitationLearns by observing human behavior
GeneralizationEasily generalizes to common tasks

Hardware: Jetson Thor

New computing platform designed for GR00T execution.

ItemSpec
GPU ArchitectureNVIDIA Blackwell
AI Performance800 TFLOPS (8-bit FP)
Transformer EngineBuilt-in
DesignModular, optimized for performance/power/size

Ecosystem: Isaac Platform

GR00T is part of NVIDIA’s Isaac robotics platform.

ComponentRole
Isaac SimSimulation environment
Isaac LabReinforcement learning framework
OmniverseSynthetic data generation
GR00TFoundation model

Industry Partners

NVIDIA is collaborating with major humanoid robot companies:

  • 1X Technologies
  • Agility Robotics
  • Apptronik
  • Boston Dynamics
  • Figure AI
  • Fourier Intelligence
  • Sanctuary AI
  • Unitree Robotics
  • XPENG Robotics

Cross-Embodiment

GR00T is designed to support various robot forms:

Supported RobotType
Fourier GR-1Humanoid
Unitree G1/H1Humanoid
Agility DigitHumanoid
ALOHABimanual manipulator
FrankaSingle arm

Benchmarks

LIBERO Benchmark (Simulation)

TaskN1 Success Rate
LIBERO-Object96.7%
LIBERO-Spatial92.5%
LIBERO-Goal85.0%
LIBERO-Long78.3%

Language Instruction Compliance (Language Following)

VersionCompliance Rate
N146.6%
N1.593.3%
N1.693%+

Impact

Significance of GR00T series:

N1 (2025.03)

  • First open humanoid foundation model
  • Proved utilization of large-scale synthetic data through simulation
  • Separated planning and execution with Dual-system architecture

N1.5 (2025.05)

  • Solved language understanding preservation problem by freezing VLM
  • Achieved balance between action learning and language compliance with FLARE loss

N1.6 (2025.06)

  • Improved expressiveness through model scale expansion
  • Enhanced generalization ability with Relative action space
  • Provides common platform for entire humanoid industry

References

GR00T N1

GR00T N1.5

GR00T N1.6

Project GR00T


See Also

  • Jim Fan - NVIDIA GEAR Lab, GR00T Research Lead