Plate I · Prime the World
world
A quiet world is painted before it is computed.
Wash the boundary first. Agents learn the shape of the page before they choose a path.
timestep 000 → wet boundary
Plate II · Release Agents
agent
Three droplets separate along a lesson-sized Z.
Each dot is an agent. Each darkening edge is memory becoming a usable state.
observe
perturb
reward
state wash
When uncertainty blooms, the agent does not need more neon. It needs a softer world model.
Plate III · Bend the Policy
perturb
Pull a marble chip. Watch the policy stretch and recoil.
Constraints are tactile here: a dragged parameter bends the trajectory before it dries.
Plate IV · Read the Marble
reward
The dry trace becomes a formula carved in stone.
Hover the inscription to thicken the corresponding path, then let it settle back into marble.
carved update
πt+1 = dry(πt + λ · reward)
The model is not a dashboard. It is an explanation that can be handled.