General Enquiries
office@sim-ai.org
+1 617 555 0144
Working Papers · Volume XII · 2026
Sim-AI.org is an independent research organization advancing the theoretical foundations and applied practice of simulation-based artificial intelligence. We publish peer-reviewed working papers, technical notes, and reproducible artifacts from our laboratories and visiting fellows.
Showing 12 of 184 publications
We introduce a variational framework that couples differentiable simulators with amortized posterior inference under non-stationary observation models. Our approach derives a tractable lower bound that decomposes the simulator likelihood into a forward-invariant component and a residual correction term learned by a neural normalising flow. We prove identifiability under mild regularity conditions, give explicit rates of contraction, and present a battery of experiments on synthetic mechanical systems, epidemiological compartments, and large-scale climate emulators. The proposed estimator dominates existing simulation-based inference baselines by 14–37% in posterior coverage while remaining computationally competitive with maximum-likelihood approaches at every problem scale we studied.
Reward specification remains the central impediment to deploying capable agents inside open-world simulators. We survey ten years of work spanning preference modelling, demonstration-based proxies, constitutional rule sets, and process supervision. We propose a taxonomy distinguishing intrinsic, extrinsic, and adjudicated reward signals, and we evaluate sixty-three published systems against a unified protocol. Our analysis isolates three persistent failure modes — proxy gaming, distributional drift in preference data, and credit assignment leakage across temporally extended subtasks — and offers concrete recommendations for the next generation of reward specification methods.
Heterogeneous multi-agent simulators support populations whose reward structures, observation models, and time scales differ across subgroups. We study the equilibrium selection problem for such systems and prove a refinement theorem that extends Markov-perfect equilibrium to settings with asynchronous information arrival. The constructed selection rule is computable in time polynomial in the number of agent classes, and we demonstrate empirically that it stabilises learning dynamics in a 4096-agent supply-chain simulator without sacrificing throughput.
Differentiable simulators expose pathwise derivatives of state trajectories with respect to interventions. We define counterfactual sensitivity as the operator norm of these derivatives over a structured intervention class, and we develop a Monte-Carlo estimator with provable variance reduction. Application to a rigid-body manipulation suite reveals that several published learning algorithms exhibit pathological sensitivity in regimes ostensibly outside their training distribution.
Procedurally generated worlds offer near-unbounded variation but reward sparsity grows with horizon length. We propose a population-based curriculum that schedules world generators according to an estimate of student-frontier learnability. Our scheduler outperforms prioritised level replay and adversarial curricula on the SIM-Procgen-XL benchmark while requiring 38% fewer environment interactions.
SIM-Atlas is a curated trajectory corpus comprising 1.97 petabytes of state, observation, action, and reward tuples sampled from 142 deterministic and stochastic simulators across robotics, climate, biology, and economics. We document collection protocols, dataset statistics, and license terms, and we provide reference dataloaders that achieve 92% of theoretical IO peak on commodity NVMe storage.
Inverse simulation seeks parameters of a forward simulator that reproduce observed trajectories. We obtain matching upper and lower bounds on its sample complexity in terms of the simulator's effective Lipschitz dimension and the spectral gap of its transition kernel. The bounds tighten previous results by a factor of the trajectory horizon and resolve an open question on the sufficiency of stationary observations.
We argue that process supervision applied at the level of simulator-embedded deliberation is strictly more sample-efficient than outcome supervision in long-horizon planning tasks. We instantiate this claim in a tutoring simulator and a software-engineering bench, and we provide an analytic decomposition that explains the observed gap.
We present a runtime that delivers bit-deterministic replay of stochastic simulators executing across heterogeneous CPU/GPU/TPU clusters. The runtime reconstructs all sources of non-determinism via a hierarchical RNG manifest and recovers reproducibility at less than 4% performance overhead.
When pairs of simulated and observed trajectories agree on a sufficient set of summary statistics, the latent causal structure of the underlying generative process becomes identifiable up to a small equivalence class. We characterise this class, give a constructive recovery procedure, and validate it on synthetic twin populations of climate and electricity-market simulators.
When agents in a simulated continuous-double-auction market are forced to share a finite communication channel, they spontaneously develop a discrete protocol whose information-theoretic structure mirrors that of natural pidgins. We document the emergence under varying bandwidths and population sizes and discuss implications for the study of artificial language origins.
A programmatic essay on the unification of simulation, learning, and inference. We articulate three theses — universality of differentiable simulators, equivalence of model-based and model-free agents in the asymptotic regime, and the centrality of compositional priors — and chart a research agenda spanning theory, systems, and application domains.
Active research programmes
Foundational results on identifiability, sample complexity, equilibrium selection, and information geometry of simulator-augmented inference. We treat the simulator as a first-class mathematical object and develop the corresponding analytic vocabulary.
Heterogeneous populations, asynchronous information arrival, emergent communication, and equilibrium refinement in open-world environments at scales up to one million interacting agents.
Curricula, exploration, and credit assignment in long-horizon tasks instantiated inside procedurally generated worlds. Our experimental programme couples theoretical bounds with empirical benchmarking.
Identifiability of latent causal structure from synthetic-twin trajectories, counterfactual sensitivity in differentiable simulators, and the use of simulators as instruments for causal estimation.
Reward specification, process supervision, and adjudicated signals for capable simulator-embedded agents. We treat alignment as inseparable from the design of the underlying simulation.
People
Mariana Ferreira
Director, Theory Group
Foundations of simulation intelligence
Director since 2018
Adaeze Okoye
Senior Investigator
Inverse simulation, identifiability
Joined 2019
Henrik Lindqvist
Senior Investigator
Differentiable simulators, counterfactuals
Joined 2020
Renji Tanaka
Investigator
Multi-agent equilibrium selection
Joined 2021
Dimitri Petrov
Investigator
Reinforcement learning, systems
Joined 2022
Sophie Bauer
Investigator
Alignment, infrastructure
Joined 2023
Contact
office@sim-ai.org
+1 617 555 0144
press@sim-ai.org
For interview requests and embargoed releases.
fellowship@sim-ai.org
Three-, six-, and twelve-month residencies. Applications reviewed quarterly.
Sim-AI Research Organization
140 Brattle Street, Suite 4N
Cambridge, MA 02138, USA