Monte Carlo Benchmarking Engine
High-performance SIMD Monte Carlo engine (AVX2/NEON) with custom memory allocators and perf logging.
 
Loading...
Searching...
No Matches
combine_batch_parquets.py File Reference

Combines multiple per-method parquet log files into a single file. More...

Go to the source code of this file.

Namespaces

namespace  pipeline
 
namespace  pipeline.combine_batch_parquets
 

Variables

 pipeline.combine_batch_parquets.batch_dir = Path(sys.argv[1])
 
 pipeline.combine_batch_parquets.output_path = Path(sys.argv[2])
 
 pipeline.combine_batch_parquets.global_db_path = Path(DB_PATH)
 
list pipeline.combine_batch_parquets.files = [f for f in batch_dir.glob("perf_results_*.parquet") if f.name != output_path.name]
 
 pipeline.combine_batch_parquets.merged = pl.concat([pl.read_parquet(f) for f in files], how="vertical_relaxed").sort("Timestamp")
 
 pipeline.combine_batch_parquets.compression
 
 pipeline.combine_batch_parquets.db = pl.read_parquet(global_db_path)
 

Detailed Description

Combines multiple per-method parquet log files into a single file.

Description:
Scans a given batch directory for all perf_results_*.parquet files, excluding the output file itself. Merges them using vertical concat, sorts by "Timestamp", and saves the result as a single compressed parquet.

After merging, the result is also appended to a global historical file specified by DB_PATH, which is configured via .env and loaded in scripts/config.py.

Compression:
  • All output parquet files are compressed using Zstandard (zstd).
Usage:
$ python3 combine_batch_parquets.py <batch_dir> <output_file>
Arguments:
<batch_dir> Folder containing individual .parquet logs <output_file> Path to final combined .parquet file
Output:
  • A merged parquet file containing all batch results
  • Updated global Parquet DB with new rows appended
Notes:
  • Global Parquet path is loaded from .env via scripts/config.py
  • This script uses polars for fast DataFrame operations and I/O
  • Intended to be run after run_perf.sh completes all method benchmarks

Definition in file combine_batch_parquets.py.