Fixed-size aligned pool allocator for high-performance simulations. More...

#include <cstddef>
#include <cstdlib>
#include <new>
#include <mutex>
#include <cassert>

Classes
struct	PoolAllocator
	Fast aligned bump allocator for multithreaded simulations. More...

Detailed Description

Fixed-size aligned pool allocator for high-performance simulations.

This header defines PoolAllocator, a fast linear allocator (aka bump allocator) for use in performance-critical systems such as simulation engines, numerical benchmarks, and cache-sensitive applications.

Overview

Rather than relying on general-purpose allocators (new, malloc, std::allocator), which incur heap metadata overhead, internal fragmentation, and non-deterministic latency, this allocator uses a fixed-size, prealigned buffer and allocates memory by incrementing a single offset.

There is no deallocation — memory is reclaimed in bulk using reset().

Why Use a Bump Allocator?

Speed: Allocation is a single pointer addition — faster than malloc, tcmalloc, or jemalloc.
Determinism: No locks, no heap state → consistent latency even under heavy use.
Cache-Friendliness: Allocations are contiguous, aligned, and predictable.
SIMD-Compatible: Default alignment of 64 bytes supports AVX2, AVX-512, and L1 cache line size.
Thread-Safe by Design: When used with thread_local, there is no need for synchronization.
Zero Fragmentation: Linear growth ensures optimal packing and no reuse holes.

How It Works — Internals

Each call to allocate<T>() performs:

uintptr_t raw = reinterpret_cast<uintptr_t>(base + offset);
uintptr_t aligned = (raw + (align - 1)) & ~(align - 1);
offset = aligned + sizeof(T) - base;
return reinterpret_cast<T*>(aligned);

The backing buffer is allocated once via std::aligned_alloc(64, size) at construction. All alignment is handled manually at runtime — there is no dependency on STL allocators.

Alignment Model

Default: 64 bytes (aligned with most modern cache lines and SIMD registers)
You may pass allocate<T>(align) to override alignment (e.g., for 32-byte loads)
Aligned memory guarantees safe usage in:
- _mm256_load_pd (AVX2)
- _mm512_load_pd (AVX-512)
- vld1q_f64 (NEON)

Usage Example

PoolAllocator pool(64 * 1024);          // Preallocate 64KB
double* x = pool.allocate<double>();    // 64-byte aligned
MyStruct* p = pool.allocate<MyStruct>(32);  // Custom alignment
pool.reset();                           // Reuse entire buffer in next frame

Real-World Applications

Simulation & Benchmarking

Run per-thread simulations (e.g., Monte Carlo) without heap overhead
Avoid heap contention in multithreaded performance tests

High-Frequency Trading (HFT)

Allocate temporary order objects or pricing state
Reset after each tick/frame with no GC stalls
Guarantees low-latency, zero-fragmentation behavior in hot loops

Game Engines (ECS, Physics, AI)

Per-frame entity updates and physics scratch space
Used widely in Unity’s C# Allocator.TempJob and custom C++ engines

Embedded Systems / DSP / Real-Time

Fixed memory budgets + deterministic behavior required
Avoids malloc/free variability on constrained targets

Comparisons

Allocator	Alloc Speed	Dealloc Speed	Fragmentation	Thread Safety	Notes
`malloc/free`	Medium	Medium	High	✖ (unless locked)	General-purpose heap
`PoolAllocator`	O(1)	reset()	None	✅ (via thread_local)	Requires manual control
STL allocator	Varies	Safe	Medium	Thread-safe (some)	Safer, but slower

Limitations

No individual deallocation: Must call reset() to reuse
Fixed capacity: Will return nullptr on overflow; user must size correctly
Not suitable for long-lived or variably-sized lifetimes
No bounds checking — this is a low-level tool for trusted code paths

Requirements

C++17 or higher
Platform support for std::aligned_alloc()
Users must manage buffer reuse manually

Designed and tuned by Leon for use in SIMD-intensive Monte Carlo benchmarking, this allocator emphasizes low-latency allocation, predictable memory behavior, and tight coupling to modern cache and instruction pipelines.

Definition in file pool.hpp.

Classes