notes
robotics · VLA · VLM

Frontier of General Purpose Robotics

original: https://www.adampatni.com/posts/robotics_deep_dive/


Architecture & Systems

  • Hierarchical generalist policies:
    • split into a thinker for high level reasoning for long horizon tasks
    • fast low-level controller for smooth reactive movement

Hierarchical Generalist Policies

  1. High level thinker
  2. low level controller
  • Based on Daniel Kahnemans research on human cognition

Recent advances:

Physical Intelligence’s HiRobot advanced the approach with hierarchical policies in π0.5; meanwhile, DeepMind’s Gemini Robotics adapted Gemini 2.0, and humanoid systems began translating this template into tangible capability through parallel efforts from NVIDIA’s Isaac GR00T and GR00T N1.5, Figure’s Helix (in logistics), and 1X’s Redwood (in domestic trials).

Action Tokenization, Decoders, Controllers

action tokenizers unlock robotic transformers

  • To enable robotic transformers to operate on action data, action data must be tokenized.
  • sequences from demonstrations get split into acktion "tokens", compact "codes" representing motion primitives
  • tokens are transmitted, decoded and turned into smooth actuator commands by the robot controller.
  • Tokenization imposes limits for latency, control frequency and smoothness (how "natural" the movements look and feel)

Latency problem

Improving tokenizers for fine grained control