parallel.quest
A structured reference for parallel computation patterns, concurrency models, and distributed systems design.
Getting Started
Everything you need to begin exploring parallel computation patterns and models.
Prerequisites
Before diving into parallel computation patterns, ensure you have a foundational understanding of basic programming constructs, operating system concepts such as processes and threads, and familiarity with at least one systems-level or high-level language (e.g., C, Go, Rust, Python, or Java).
- Understanding of basic data structures and algorithms
- Familiarity with process and thread models
- Experience with at least one concurrent runtime
Quick Start Guide
The fastest way to get started is to choose a concurrency model that matches your problem domain. For CPU-bound tasks, look at thread-based parallelism. For I/O-bound workloads, consider async/await or event-loop patterns. For distributed workloads, explore message-passing architectures.
// Example: Basic parallel map
parallel_map(data, |item| {
transform(item)
});
Project Structure
This reference is organized into conceptual sections. Each section covers a distinct domain within parallel computing: from foundational concurrency primitives to high-level distributed system patterns. Navigate using the sidebar or expand sections inline to discover relevant content.
Core Concepts
Fundamental primitives and mental models for understanding parallel systems.
Threads and Processes
Threads are the smallest unit of execution scheduled by the operating system. A process is an independent execution environment containing one or more threads. Understanding the distinction and the implications for shared memory, context switching, and scheduling is essential for reasoning about parallel behavior.
- Thread -- lightweight, shares memory space with other threads in the same process
- Process -- isolated memory, higher overhead for creation and communication
- Green threads -- user-space threads managed by the runtime, not the OS kernel
Synchronization Primitives
Synchronization ensures correctness when multiple threads access shared resources. The primary primitives include mutexes, semaphores, condition variables, and atomic operations. Each has distinct performance characteristics and use cases.
- Mutex -- mutual exclusion lock, ensures only one thread accesses a critical section
- Semaphore -- generalized mutex allowing N concurrent accesses
- Condition variable -- allows threads to wait for a specific condition
- Atomic operations -- lock-free primitives for simple shared state
Concurrency vs. Parallelism
Concurrency is the composition of independently executing processes. Parallelism is the simultaneous execution of computations. A concurrent program may run on a single core, interleaving tasks. A parallel program requires multiple cores executing tasks at the same time. Both concepts are complementary and often confused.
// Concurrent: tasks interleave on one core
schedule(task_a, task_b); // time-sliced
// Parallel: tasks run simultaneously
execute_on(core_1, task_a);
execute_on(core_2, task_b);
Deadlocks and Race Conditions
A deadlock occurs when two or more threads are waiting for each other to release resources, creating a cycle of dependency. A race condition occurs when the outcome depends on the non-deterministic ordering of operations. Both are fundamental hazards in concurrent programming and must be addressed through careful design, lock ordering, and testing.
Guides
Practical walkthroughs for implementing common parallel patterns.
Building a Thread Pool
A thread pool maintains a set of reusable worker threads that process tasks from a shared queue. This pattern avoids the overhead of creating and destroying threads for each task. The pool size is typically matched to the number of available CPU cores for compute-bound work.
pool = ThreadPool(workers=cpu_count())
results = pool.map(process_item, dataset)
pool.shutdown()
Key considerations include queue sizing, backpressure handling, and graceful shutdown. For I/O-bound tasks, consider a larger pool size or an async alternative.
Message Passing with Channels
Channels provide a typed conduit for sending values between concurrent processes. Unlike shared memory, message passing eliminates data races by transferring ownership of data. This model is central to languages like Go and Rust.
let (sender, receiver) = channel();
// Producer
spawn(move || {
sender.send(compute_result());
});
// Consumer
let result = receiver.recv();
Async/Await Patterns
The async/await model allows writing asynchronous code that reads like synchronous code. An async function returns a future (or promise) that represents a value not yet computed. The runtime schedules these futures cooperatively, switching between tasks at await points.
async function fetchAll(urls) {
const results = await Promise.all(
urls.map(url => fetch(url))
);
return results.map(r => r.json());
}
Best suited for I/O-bound workloads where tasks spend most of their time waiting on external resources.
Map-Reduce Implementation
Map-Reduce is a programming model for processing large data sets in parallel. The map phase applies a function to each element independently. The reduce phase aggregates the results. This pattern scales naturally across multiple machines and is the foundation of frameworks like Hadoop and Spark.
// Map phase: transform each item
mapped = parallel_map(data, transform_fn)
// Reduce phase: aggregate results
result = reduce(mapped, accumulate_fn, initial)
API Reference
Complete reference for parallel computation primitives and utilities.
ThreadPool
Creates a pool of worker threads for executing tasks concurrently.
ThreadPool(workers: int, queue_size: int = 0)
Methods:
.submit(fn, *args) -> Future
.map(fn, iterable) -> List[Result]
.shutdown(wait: bool = True)
.active_count() -> int
Parameters:
workers-- number of worker threads to spawnqueue_size-- maximum pending tasks (0 = unbounded)
Channel
A typed, thread-safe communication channel between producers and consumers.
channel(buffer: int = 0) -> (Sender, Receiver)
Sender:
.send(value: T) -> Result
.try_send(value: T) -> Result
Receiver:
.recv() -> T
.try_recv() -> Option[T]
.iter() -> Iterator[T]
Parameters:
buffer-- channel buffer size (0 = synchronous/rendezvous channel)
Mutex and RwLock
Synchronization primitives for protecting shared state.
Mutex(value: T)
.lock() -> Guard[T]
.try_lock() -> Option[Guard[T]]
RwLock(value: T)
.read() -> ReadGuard[T]
.write() -> WriteGuard[T]
Usage note: Always prefer RwLock when reads significantly outnumber writes. Use Mutex when write frequency is comparable to read frequency or when the critical section is very short.
Atomic Types
Lock-free atomic operations for simple shared counters and flags.
AtomicInt(initial: int = 0)
.load(ordering) -> int
.store(value, ordering)
.fetch_add(value, ordering) -> int
.compare_exchange(current, new, ordering) -> Result
Ordering options: Relaxed, Acquire, Release, AcqRel, SeqCst. Use the weakest ordering that guarantees correctness.
Configuration
Runtime settings, tuning parameters, and environment configuration.
Runtime Settings
Configure the parallel runtime behavior through environment variables or a configuration file.
# Environment variables
PARALLEL_WORKERS=8 # Worker thread count
PARALLEL_QUEUE_SIZE=1024 # Task queue capacity
PARALLEL_STACK_SIZE=2M # Per-thread stack size
PARALLEL_LOG_LEVEL=info # Logging verbosity
Tuning for Performance
Performance tuning depends on your workload profile. CPU-bound tasks benefit from matching worker count to physical cores. I/O-bound tasks can use 2-4x the core count. Memory-constrained environments may require reducing stack size or queue capacity.
- CPU-bound: workers = physical cores (not hyperthreads)
- I/O-bound: workers = 2-4x core count
- Mixed: separate pools for CPU and I/O tasks
Scheduler Options
The runtime supports multiple scheduling strategies, selectable at initialization.
Scheduler options:
work-stealing -- default, best for unbalanced loads
round-robin -- simple, predictable distribution
priority -- tasks with higher priority run first
affinity -- pin tasks to specific cores
FAQ
Frequently asked questions about parallel patterns and this reference.
When should I use threads vs. async?
Use threads for CPU-bound computation where you need true parallelism across cores. Use async/await for I/O-bound workloads where tasks spend most time waiting on external resources (network, disk, database). For mixed workloads, consider running async I/O within a thread pool.
How do I avoid deadlocks?
Follow a consistent lock ordering across all threads. Use timeouts on lock acquisition. Prefer message passing over shared memory when possible. Use lock-free data structures for simple shared state. Design systems to minimize the number of locks held simultaneously.
What is the GIL and does it affect me?
The Global Interpreter Lock (GIL) is a mutex in CPython that prevents multiple native threads from executing Python bytecodes at once. It effectively limits CPU-bound parallelism in Python to one core. Workarounds include using multiprocessing, C extensions that release the GIL, or alternative Python implementations (PyPy, Jython).
How do I choose a channel buffer size?
A buffer size of 0 (synchronous channel) couples sender and receiver timing, which is useful for coordination. Small buffers (1-16) smooth out brief timing differences. Larger buffers (100+) help when producers are bursty. Unbounded channels risk memory exhaustion. Start with a small buffer and profile under real load.
Is this reference language-specific?
No. The concepts, patterns, and mental models presented here are language-agnostic. Code examples use pseudocode or simplified syntax from various languages (Rust, Go, Python, JavaScript) to illustrate concepts. The patterns apply across languages and runtimes.