parallel.quest

A structured reference for parallel computation patterns, concurrency models, and distributed systems design.


Getting Started

Everything you need to begin exploring parallel computation patterns and models.

Prerequisites

Before diving into parallel computation patterns, ensure you have a foundational understanding of basic programming constructs, operating system concepts such as processes and threads, and familiarity with at least one systems-level or high-level language (e.g., C, Go, Rust, Python, or Java).

  • Understanding of basic data structures and algorithms
  • Familiarity with process and thread models
  • Experience with at least one concurrent runtime
Quick Start Guide

The fastest way to get started is to choose a concurrency model that matches your problem domain. For CPU-bound tasks, look at thread-based parallelism. For I/O-bound workloads, consider async/await or event-loop patterns. For distributed workloads, explore message-passing architectures.

// Example: Basic parallel map parallel_map(data, |item| { transform(item) });
Project Structure

This reference is organized into conceptual sections. Each section covers a distinct domain within parallel computing: from foundational concurrency primitives to high-level distributed system patterns. Navigate using the sidebar or expand sections inline to discover relevant content.


Core Concepts

Fundamental primitives and mental models for understanding parallel systems.

Threads and Processes

Threads are the smallest unit of execution scheduled by the operating system. A process is an independent execution environment containing one or more threads. Understanding the distinction and the implications for shared memory, context switching, and scheduling is essential for reasoning about parallel behavior.

  • Thread -- lightweight, shares memory space with other threads in the same process
  • Process -- isolated memory, higher overhead for creation and communication
  • Green threads -- user-space threads managed by the runtime, not the OS kernel
Synchronization Primitives

Synchronization ensures correctness when multiple threads access shared resources. The primary primitives include mutexes, semaphores, condition variables, and atomic operations. Each has distinct performance characteristics and use cases.

  • Mutex -- mutual exclusion lock, ensures only one thread accesses a critical section
  • Semaphore -- generalized mutex allowing N concurrent accesses
  • Condition variable -- allows threads to wait for a specific condition
  • Atomic operations -- lock-free primitives for simple shared state
Concurrency vs. Parallelism

Concurrency is the composition of independently executing processes. Parallelism is the simultaneous execution of computations. A concurrent program may run on a single core, interleaving tasks. A parallel program requires multiple cores executing tasks at the same time. Both concepts are complementary and often confused.

// Concurrent: tasks interleave on one core schedule(task_a, task_b); // time-sliced // Parallel: tasks run simultaneously execute_on(core_1, task_a); execute_on(core_2, task_b);
Deadlocks and Race Conditions

A deadlock occurs when two or more threads are waiting for each other to release resources, creating a cycle of dependency. A race condition occurs when the outcome depends on the non-deterministic ordering of operations. Both are fundamental hazards in concurrent programming and must be addressed through careful design, lock ordering, and testing.


Guides

Practical walkthroughs for implementing common parallel patterns.

Building a Thread Pool

A thread pool maintains a set of reusable worker threads that process tasks from a shared queue. This pattern avoids the overhead of creating and destroying threads for each task. The pool size is typically matched to the number of available CPU cores for compute-bound work.

pool = ThreadPool(workers=cpu_count()) results = pool.map(process_item, dataset) pool.shutdown()

Key considerations include queue sizing, backpressure handling, and graceful shutdown. For I/O-bound tasks, consider a larger pool size or an async alternative.

Message Passing with Channels

Channels provide a typed conduit for sending values between concurrent processes. Unlike shared memory, message passing eliminates data races by transferring ownership of data. This model is central to languages like Go and Rust.

let (sender, receiver) = channel(); // Producer spawn(move || { sender.send(compute_result()); }); // Consumer let result = receiver.recv();
Async/Await Patterns

The async/await model allows writing asynchronous code that reads like synchronous code. An async function returns a future (or promise) that represents a value not yet computed. The runtime schedules these futures cooperatively, switching between tasks at await points.

async function fetchAll(urls) { const results = await Promise.all( urls.map(url => fetch(url)) ); return results.map(r => r.json()); }

Best suited for I/O-bound workloads where tasks spend most of their time waiting on external resources.

Map-Reduce Implementation

Map-Reduce is a programming model for processing large data sets in parallel. The map phase applies a function to each element independently. The reduce phase aggregates the results. This pattern scales naturally across multiple machines and is the foundation of frameworks like Hadoop and Spark.

// Map phase: transform each item mapped = parallel_map(data, transform_fn) // Reduce phase: aggregate results result = reduce(mapped, accumulate_fn, initial)

API Reference

Complete reference for parallel computation primitives and utilities.

ThreadPool

Creates a pool of worker threads for executing tasks concurrently.

ThreadPool(workers: int, queue_size: int = 0) Methods: .submit(fn, *args) -> Future .map(fn, iterable) -> List[Result] .shutdown(wait: bool = True) .active_count() -> int

Parameters:

  • workers -- number of worker threads to spawn
  • queue_size -- maximum pending tasks (0 = unbounded)
Channel

A typed, thread-safe communication channel between producers and consumers.

channel(buffer: int = 0) -> (Sender, Receiver) Sender: .send(value: T) -> Result .try_send(value: T) -> Result Receiver: .recv() -> T .try_recv() -> Option[T] .iter() -> Iterator[T]

Parameters:

  • buffer -- channel buffer size (0 = synchronous/rendezvous channel)
Mutex and RwLock

Synchronization primitives for protecting shared state.

Mutex(value: T) .lock() -> Guard[T] .try_lock() -> Option[Guard[T]] RwLock(value: T) .read() -> ReadGuard[T] .write() -> WriteGuard[T]

Usage note: Always prefer RwLock when reads significantly outnumber writes. Use Mutex when write frequency is comparable to read frequency or when the critical section is very short.

Atomic Types

Lock-free atomic operations for simple shared counters and flags.

AtomicInt(initial: int = 0) .load(ordering) -> int .store(value, ordering) .fetch_add(value, ordering) -> int .compare_exchange(current, new, ordering) -> Result

Ordering options: Relaxed, Acquire, Release, AcqRel, SeqCst. Use the weakest ordering that guarantees correctness.


Configuration

Runtime settings, tuning parameters, and environment configuration.

Runtime Settings

Configure the parallel runtime behavior through environment variables or a configuration file.

# Environment variables PARALLEL_WORKERS=8 # Worker thread count PARALLEL_QUEUE_SIZE=1024 # Task queue capacity PARALLEL_STACK_SIZE=2M # Per-thread stack size PARALLEL_LOG_LEVEL=info # Logging verbosity
Tuning for Performance

Performance tuning depends on your workload profile. CPU-bound tasks benefit from matching worker count to physical cores. I/O-bound tasks can use 2-4x the core count. Memory-constrained environments may require reducing stack size or queue capacity.

  • CPU-bound: workers = physical cores (not hyperthreads)
  • I/O-bound: workers = 2-4x core count
  • Mixed: separate pools for CPU and I/O tasks
Scheduler Options

The runtime supports multiple scheduling strategies, selectable at initialization.

Scheduler options: work-stealing -- default, best for unbalanced loads round-robin -- simple, predictable distribution priority -- tasks with higher priority run first affinity -- pin tasks to specific cores

FAQ

Frequently asked questions about parallel patterns and this reference.

When should I use threads vs. async?

Use threads for CPU-bound computation where you need true parallelism across cores. Use async/await for I/O-bound workloads where tasks spend most time waiting on external resources (network, disk, database). For mixed workloads, consider running async I/O within a thread pool.

How do I avoid deadlocks?

Follow a consistent lock ordering across all threads. Use timeouts on lock acquisition. Prefer message passing over shared memory when possible. Use lock-free data structures for simple shared state. Design systems to minimize the number of locks held simultaneously.

What is the GIL and does it affect me?

The Global Interpreter Lock (GIL) is a mutex in CPython that prevents multiple native threads from executing Python bytecodes at once. It effectively limits CPU-bound parallelism in Python to one core. Workarounds include using multiprocessing, C extensions that release the GIL, or alternative Python implementations (PyPy, Jython).

How do I choose a channel buffer size?

A buffer size of 0 (synchronous channel) couples sender and receiver timing, which is useful for coordination. Small buffers (1-16) smooth out brief timing differences. Larger buffers (100+) help when producers are bursty. Unbounded channels risk memory exhaustion. Start with a small buffer and profile under real load.

Is this reference language-specific?

No. The concepts, patterns, and mental models presented here are language-agnostic. Code examples use pseudocode or simplified syntax from various languages (Rust, Go, Python, JavaScript) to illustrate concepts. The patterns apply across languages and runtimes.