Combines multiple per-method parquet log files into a single file. More...
Go to the source code of this file.
Namespaces | |
namespace | pipeline |
namespace | pipeline.combine_batch_parquets |
Variables | |
pipeline.combine_batch_parquets.batch_dir = Path(sys.argv[1]) | |
pipeline.combine_batch_parquets.output_path = Path(sys.argv[2]) | |
pipeline.combine_batch_parquets.global_db_path = Path(DB_PATH) | |
list | pipeline.combine_batch_parquets.files = [f for f in batch_dir.glob("perf_results_*.parquet") if f.name != output_path.name] |
pipeline.combine_batch_parquets.merged = pl.concat([pl.read_parquet(f) for f in files], how="vertical_relaxed").sort("Timestamp") | |
pipeline.combine_batch_parquets.compression | |
pipeline.combine_batch_parquets.db = pl.read_parquet(global_db_path) | |
Combines multiple per-method parquet log files into a single file.
perf_results_*.parquet
files, excluding the output file itself. Merges them using vertical concat, sorts by "Timestamp", and saves the result as a single compressed parquet.After merging, the result is also appended to a global historical file specified by DB_PATH
, which is configured via .env
and loaded in scripts/config.py
.
.parquet
logs <output_file> Path to final combined .parquet
file.env
via scripts/config.py
polars
for fast DataFrame operations and I/Orun_perf.sh
completes all method benchmarks Definition in file combine_batch_parquets.py.