robotics · VLA · VLM

Frontier of General Purpose Robotics

May 3, 2026

original: https://www.adampatni.com/posts/robotics_deep_dive/

Architecture & Systems

Hierarchical generalist policies:
- split into a thinker for high level reasoning for long horizon tasks
- fast low-level controller for smooth reactive movement

Hierarchical Generalist Policies

High level thinker
low level controller

Based on Daniel Kahnemans research on human cognition

Recent advances:

Physical Intelligence’s HiRobot advanced the approach with hierarchical policies in π0.5; meanwhile, DeepMind’s Gemini Robotics adapted Gemini 2.0, and humanoid systems began translating this template into tangible capability through parallel efforts from NVIDIA’s Isaac GR00T and GR00T N1.5, Figure’s Helix (in logistics), and 1X’s Redwood (in domestic trials).

Action Tokenization, Decoders, Controllers

action tokenizers unlock robotic transformers

To enable robotic transformers to operate on action data, action data must be tokenized.
sequences from demonstrations get split into acktion "tokens", compact "codes" representing motion primitives
tokens are transmitted, decoded and turned into smooth actuator commands by the robot controller.
Tokenization imposes limits for latency, control frequency and smoothness (how "natural" the movements look and feel)

Latency problem

Current SOTA is not fast enough for fine reactive manipulation
GEN 1 is the current fastest SOTA https://generalistai.com/blog/apr-02-2026-GEN-1 coming out of Generalist - group of former GDM Researchers and Engineers
ACT (Action Chunking Transformer): which showed that predicting several actions at once lead to smoother movements which accelerate control loops