Frontier of General Purpose Robotics
original: https://www.adampatni.com/posts/robotics_deep_dive/
Architecture & Systems
- Hierarchical generalist policies:
- split into a thinker for high level reasoning for long horizon tasks
- fast low-level controller for smooth reactive movement
Hierarchical Generalist Policies
- High level thinker
- low level controller
- Based on Daniel Kahnemans research on human cognition
Recent advances:
Physical Intelligence’s HiRobot advanced the approach with hierarchical policies in π0.5; meanwhile, DeepMind’s Gemini Robotics adapted Gemini 2.0, and humanoid systems began translating this template into tangible capability through parallel efforts from NVIDIA’s Isaac GR00T and GR00T N1.5, Figure’s Helix (in logistics), and 1X’s Redwood (in domestic trials).
Action Tokenization, Decoders, Controllers
action tokenizers unlock robotic transformers
- To enable robotic transformers to operate on action data, action data must be tokenized.
- sequences from demonstrations get split into acktion "tokens", compact "codes" representing motion primitives
- tokens are transmitted, decoded and turned into smooth actuator commands by the robot controller.
- Tokenization imposes limits for latency, control frequency and smoothness (how "natural" the movements look and feel)
Latency problem
- Current SOTA is not fast enough for fine reactive manipulation
- GEN 1 is the current fastest SOTA https://generalistai.com/blog/apr-02-2026-GEN-1 coming out of Generalist - group of former GDM Researchers and Engineers
- ACT (Action Chunking Transformer): which showed that predicting several actions at once lead to smoother movements which accelerate control loops