kakuritsu.com - Probability Theory

P(⊨) = 1.0000 | Pr(A ∩ B) = Pr(A)Pr(B) | σ² = E[X²] - E[X]²

Non-Negativity

Pr(A) ≥ 0 ∀A ⊆ Ω

Probability of any event cannot be negative. This foundational truth anchors the entire edifice of stochastic reasoning.

Unit Measure

Pr(Ω) = 1

The sample space has total probability 1. Certainty is normalized; everything must sum to unity.

Countable Additivity

Pr(⋃ Aᵢ) = Σ Pr(Aᵢ)

Disjoint events add. Their probabilities stack linearly. This is the engine of conditional reasoning.

σ-algebra | Measurable spaces | Radon-Nikodym theorem

∫

Gaussian μ=0 σ=1 | Poisson λ=2.5 | Exponential β=1.0

Normal (Gaussian)

f(x) = 1/(σ√(2π)) exp(-(x-μ)²/(2σ²))

The bell curve. The limit of sums. The shape of natural variation. Everywhere you look in empirical data, if you look long enough.

Poisson

Pr(X = k) = (λᵏ e⁻λ) / k!

Rare events clustered in time. Phone calls. Crashes. Mutations. The mathematics of surprise.

Exponential

f(x) = β⁻¹ exp(-x/β) x ≥ 0

Waiting times. Decay. The memoryless property: the past tells you nothing about the future.

CDF | Moment generating | χ² | Weibull | Beta distributions

π

Prior | Likelihood | Evidence | Posterior

The equation that changes minds. The algorithm of belief update. You are a Bayesian reasoner whether you know it or not.

Pr(H) Prior

Your belief before seeing evidence. Often wrong. Necessary anyway.

→

Pr(E|H) Likelihood

How likely is this evidence if the hypothesis is true? The data's voice.

→

Pr(H|E) Posterior

Your belief after the evidence. Updated. Better. Closer to truth.

Conjugate priors | Evidence weighting | Marginal likelihood | Model comparison

∞

Random walk | Markov chain | Brownian motion | Wiener process

Random Walk

Xₙ₊₁ = Xₙ + ξ where ξ ∈ {-1, +1}

A drunk person's path home. Drunk but eventually arriving (with probability 1 in dimension ≤ 2). The simplest model of diffusion.

Markov Property

Pr(Xₙ₊₁ | X₀...Xₙ) = Pr(Xₙ₊₁ | Xₙ)

The future depends only on the present, not the past. Memoryless. Powerful. It means the system has no hidden state.

Brownian Motion

dW = √dt × ε where ε ~ N(0,1)

The limit of rescaled random walks. Continuous nowhere. Differentiable nowhere. Yet perfectly measurable. The limit shape of all noise.

Ornstein-Uhlenbeck | Itô calculus | Stochastic differential equations | Lévy processes

∂

LLN | CLT | Concentration bounds | Weak law | Strong law

Law of Large Numbers

X̄ₙ → μ as n → ∞

Repeat an experiment enough times, and the average converges to the true mean. This is why experiments work. This is why casinos win. This is why you are likely to get what you expected, eventually.

Central Limit Theorem

(X̄ₙ - μ)/(σ/√n) →ᵈ N(0,1)

The sum of independent random variables approaches a normal distribution. No matter what the original distribution was. This is the deep reason why the bell curve is everywhere. This theorem is why probability theory works at scale.

Modes of convergence | Cramér-Wold | Slutsky | Delta method | Asymptotic theory

∮

AXIOMS