Monte Carlo Benchmarking Engine
High-performance SIMD Monte Carlo engine (AVX2/NEON) with custom memory allocators and perf logging.
 
Loading...
Searching...
No Matches
pool.hpp File Reference

Fixed-size aligned pool allocator for high-performance simulations. More...

#include <cstddef>
#include <cstdlib>
#include <new>
#include <mutex>
#include <cassert>

Go to the source code of this file.

Classes

struct  PoolAllocator
 Fast aligned bump allocator for multithreaded simulations. More...
 

Detailed Description

Fixed-size aligned pool allocator for high-performance simulations.

This header defines PoolAllocator, a fast linear allocator (aka bump allocator) for use in performance-critical systems such as simulation engines, numerical benchmarks, and cache-sensitive applications.


Overview

Rather than relying on general-purpose allocators (new, malloc, std::allocator), which incur heap metadata overhead, internal fragmentation, and non-deterministic latency, this allocator uses a fixed-size, prealigned buffer and allocates memory by incrementing a single offset.

There is no deallocation — memory is reclaimed in bulk using reset().


Why Use a Bump Allocator?

  • Speed: Allocation is a single pointer addition — faster than malloc, tcmalloc, or jemalloc.
  • Determinism: No locks, no heap state → consistent latency even under heavy use.
  • Cache-Friendliness: Allocations are contiguous, aligned, and predictable.
  • SIMD-Compatible: Default alignment of 64 bytes supports AVX2, AVX-512, and L1 cache line size.
  • Thread-Safe by Design: When used with thread_local, there is no need for synchronization.
  • Zero Fragmentation: Linear growth ensures optimal packing and no reuse holes.

How It Works — Internals

Each call to allocate<T>() performs:

uintptr_t raw = reinterpret_cast<uintptr_t>(base + offset);
uintptr_t aligned = (raw + (align - 1)) & ~(align - 1);
offset = aligned + sizeof(T) - base;
return reinterpret_cast<T*>(aligned);

The backing buffer is allocated once via std::aligned_alloc(64, size) at construction. All alignment is handled manually at runtime — there is no dependency on STL allocators.


Alignment Model

  • Default: 64 bytes (aligned with most modern cache lines and SIMD registers)
  • You may pass allocate<T>(align) to override alignment (e.g., for 32-byte loads)
  • Aligned memory guarantees safe usage in:
    • _mm256_load_pd (AVX2)
    • _mm512_load_pd (AVX-512)
    • vld1q_f64 (NEON)

Usage Example

PoolAllocator pool(64 * 1024); // Preallocate 64KB
double* x = pool.allocate<double>(); // 64-byte aligned
MyStruct* p = pool.allocate<MyStruct>(32); // Custom alignment
pool.reset(); // Reuse entire buffer in next frame
Fast aligned bump allocator for multithreaded simulations.
Definition pool.hpp:137

Real-World Applications

Simulation & Benchmarking

  • Run per-thread simulations (e.g., Monte Carlo) without heap overhead
  • Avoid heap contention in multithreaded performance tests

High-Frequency Trading (HFT)

  • Allocate temporary order objects or pricing state
  • Reset after each tick/frame with no GC stalls
  • Guarantees low-latency, zero-fragmentation behavior in hot loops

Game Engines (ECS, Physics, AI)

  • Per-frame entity updates and physics scratch space
  • Used widely in Unity’s C# Allocator.TempJob and custom C++ engines

Embedded Systems / DSP / Real-Time

  • Fixed memory budgets + deterministic behavior required
  • Avoids malloc/free variability on constrained targets

Comparisons

Allocator Alloc Speed Dealloc Speed Fragmentation Thread Safety Notes
malloc/free Medium Medium High ✖ (unless locked) General-purpose heap
PoolAllocator O(1) reset() None ✅ (via thread_local) Requires manual control
STL allocator Varies Safe Medium Thread-safe (some) Safer, but slower

Limitations

  • No individual deallocation: Must call reset() to reuse
  • Fixed capacity: Will return nullptr on overflow; user must size correctly
  • Not suitable for long-lived or variably-sized lifetimes
  • No bounds checking — this is a low-level tool for trusted code paths

Requirements

  • C++17 or higher
  • Platform support for std::aligned_alloc()
  • Users must manage buffer reuse manually

Designed and tuned by Leon for use in SIMD-intensive Monte Carlo benchmarking, this allocator emphasizes low-latency allocation, predictable memory behavior, and tight coupling to modern cache and instruction pipelines.

Definition in file pool.hpp.