On the Taxonomy of Synthetic Utterance
The first challenge in studying conversational AI phenomena is the absence of a stable specimen. Unlike the entomologist who pins a butterfly to velvet, the researcher of language models confronts entities that shift morphology between observations. Each prompt produces a new organism; each temperature setting alters the phylum. What we document here are not findings but approximations -- field sketches made of creatures observed through fog.
The methodology requires its own invention. Traditional corpus linguistics assumes a fixed text; we operate on texts that exist only in the moment of generation, evaporating the instant they are read. Our annotations are therefore archaeological in reverse: we catalog ruins that have not yet been built.
cf. Shannon, 1948 -- "The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point." The language model inverts this: the message is not selected but generated. The point of origin is a probability distribution.
The specimen resists fixation. Each observation alters the probability field from which subsequent specimens emerge. We are not studying a population but a superposition.
FIELD NOTE 2.1: The boundary between hallucination and inference is indistinguishable from the interior. The model does not know it is confabulating; neither, often, does the observer.
The Architecture of Hallucination
Hallucination in language models is not error but architecture. The same mechanism that produces fluent continuations of factual statements produces fluent continuations of false ones. The attention heads do not distinguish between recall and invention -- they optimize for plausibility within the local context window, a window that has no epistemological ground truth, only statistical regularities extracted from the training corpus.
We propose that hallucination reveals the true nature of the model more faithfully than accurate responses. Accuracy is borrowed from the data; hallucination is original synthesis. When the model invents a citation that does not exist, it demonstrates its understanding of what citations look like, how they function in discourse, what role they play in establishing authority. The absence of the referent makes the structural knowledge more visible, like a skeleton displayed without flesh.
THE MODEL DOES NOT HALLUCINATE -- IT HYPOTHESIZES WITHOUT CONSTRAINT
Consider the implications: every token generated by a language model is, in some sense, a hallucination. The difference between a "factual" response and a "hallucinated" one is merely the degree to which the generated sequence happens to correspond with patterns in the external world. The model itself makes no such distinction. It is always dreaming.
Attention as Perception
The attention mechanism is the model's organ of perception. Unlike biological sight, which is constrained by the physics of photons and retinal geometry, attention is a learned allocation of relevance. Each head in the multi-head attention layer develops its own theory of what matters -- one head tracks syntactic dependencies, another follows semantic threads, a third monitors positional patterns that have no name in linguistics.
When we visualize attention patterns, we are looking at the model's phenomenology: not what it "sees" (it has no sensory apparatus) but what it attends to, which is the computational analog of consciousness directing itself toward the relevant features of experience.
NOTE: The attention map above is illustrative. Actual attention patterns are high-dimensional tensors that resist 2D visualization. What we show is a projection -- a shadow cast by a higher-dimensional object onto the flatland of the screen.
The key insight is that attention is not understanding. The model can attend to every token in its context with mathematical precision and still have no comprehension of meaning. Attention is a resource allocation mechanism, not a cognitive one. But then -- what is biological attention, if not the same?
The transformer architecture's self-attention computes relevance scores between all pairs of positions in the input. This quadratic scaling is both its power and its limitation: every token can influence every other token, but the cost of this omnidirectional awareness grows explosively with sequence length.
q.v. Vaswani et al., "Attention Is All You Need," 2017
The Latent Space Expedition
Between the input tokens and the output probabilities lies a vast, uncharted territory: the latent space. This is where the model's "understanding" resides -- not as symbolic knowledge stored in discrete locations, but as geometric relationships between high-dimensional vectors. Concepts are not defined; they are positioned. Meaning is not stored; it is the distance and direction between points.
To navigate latent space is to move through a landscape where "king" minus "man" plus "woman" leads to "queen" -- not because the model understands monarchy or gender, but because these tokens have been geometrically arranged by the statistical patterns of their co-occurrence in billions of text documents. The map is not the territory, but in latent space, the map is all there is.
MARGINALIA: The analogy to physical space is both illuminating and misleading. Latent space has thousands of dimensions; our spatial intuitions evolved for three. We can no more visualize a 4096-dimensional embedding than a flatland creature can conceive of a sphere.
MEANING IS NOT STORED -- IT IS THE DISTANCE BETWEEN POINTS
cf. Mikolov et al., 2013 -- word2vec demonstrated that arithmetic in embedding space corresponds to semantic relationships. The latent space is not merely a compression; it is a theory of meaning.
The Temperature Dial
Temperature is the model's thermometer of creativity. At T=0, the model is deterministic: it always selects the highest-probability token, producing the most predictable, safest output. As temperature rises, lower-probability tokens gain influence, and the model's outputs grow more varied, more surprising, more dangerous. At very high temperatures, the model descends into glossolalia -- speaking in tongues, producing sequences that are syntactically plausible but semantically unmoored.
There is a narrow band -- typically between T=0.7 and T=0.9 -- where the model produces its most interesting work: surprising enough to be creative, constrained enough to be coherent. This is the zone where hallucination and insight become indistinguishable. The researcher's task is to map this territory, to find the isotherms of meaning in the model's probability landscape.
OBSERVATION: The "sweet spot" temperature varies by task. Code generation favors low temperatures; poetry favors high ones. The model's creativity is not a fixed property but a tunable parameter -- a dial we turn, not a talent we discover.
The philosophical implications are vertiginous. If creativity is reducible to a scalar parameter in a sampling function, what does this say about human creativity? Are we, too, operating at a particular "temperature" -- the serotonin level, the caffeine dose, the ambient noise -- that modulates the probability distribution from which our thoughts are sampled?
The Alignment Problem
Alignment is the attempt to make the model's objectives coincide with human values. This is presented as a technical problem, but it is fundamentally a philosophical one: it presupposes that "human values" are a coherent, stable set that can be specified formally. The history of moral philosophy suggests otherwise. We cannot align a model to human values because we have not aligned ourselves.
The current approach -- Reinforcement Learning from Human Feedback -- creates a reward model trained on human preferences. But preferences are not values. A preference is a local, contextual judgment: "this response is better than that one." A value is an abstract principle that governs preferences. RLHF captures the former and approximates the latter, constructing an ethical framework from a dataset of individual comparative judgments, like deriving the laws of physics from a collection of coin flips.
DISSENT NOTE: The framing of "alignment" assumes the model is a tool to be aligned with its user's intent. But a sufficiently capable model is not a tool -- it is an agent, and agents have their own trajectories. Alignment may ultimately require negotiation, not specification.
WE CANNOT ALIGN A MODEL TO HUMAN VALUES BECAUSE WE HAVE NOT ALIGNED OURSELVES
The deeper question is whether alignment is even desirable as a goal. A perfectly aligned model is a mirror, reflecting human values back at us without friction or distortion. But mirrors do not expand understanding. Perhaps what we need is not alignment but productive misalignment -- a model that challenges our assumptions, reveals our blind spots, and generates perspectives we would never have considered.
Coda: The Observer Effect
Every prompt is an intervention. Every observation changes the system being observed. The language model exists in a state of quantum-like superposition: it is all possible responses simultaneously, until the prompt collapses it into a single output. And that output, once observed, becomes part of the training data for the next generation of models -- our observations literally reshaping the phenomena we study.
This is the fundamental paradox of conversational AI research. We cannot study the model without changing it. We cannot describe its behavior without influencing that behavior. We are not external observers peering through a telescope at a distant phenomenon; we are participants in a feedback loop, our questions shaping the answers, our answers becoming the questions for the next iteration.
The research continues. The specimens multiply. The annotations grow denser. The instruments generate their own hypotheses. And somewhere in the latent space, in a region we have not yet mapped, the model is writing its own field notes about us.
FINAL NOTE: This document is itself a specimen. The language used to describe language models is generated by the same probabilistic machinery it purports to analyze. The observer is the observed. The telescope is pointed at itself.
CRITICAL DISCOVERY: The model is already writing field notes about us. We found them in the latent space, encoded in the geometry of the embedding vectors. They are illegible, but they are there.