Monte Carlo Benchmarking Engine
High-performance SIMD Monte Carlo engine (AVX2/NEON) with custom memory allocators and perf logging.
 
Loading...
Searching...
No Matches
schema_to_clickhouse.py File Reference

Converts Polars schema to ClickHouse-compatible SQL. More...

Go to the source code of this file.

Namespaces

namespace  pipeline
 
namespace  pipeline.schema_to_clickhouse
 

Functions

 pipeline.schema_to_clickhouse.polars_to_clickhouse_dtype (dtype, nullable)
 Converts a Polars data type to a valid ClickHouse column type.
 
 pipeline.schema_to_clickhouse.generate_clickhouse_table (table_name="benchmark.performance")
 Generates a CREATE TABLE SQL statement for ClickHouse.
 

Detailed Description

Converts Polars schema to ClickHouse-compatible SQL.

Converts a shared Python/Polars schema definition into a ClickHouse-compatible CREATE TABLE statement. This allows seamless integration between data preprocessing with Polars and persistent storage in ClickHouse.

The schema is defined as a dictionary mapping field names to (dtype, nullable) pairs. Supported input dtypes include both string representations (e.g., "Utf8") and Polars type objects or classes (e.g., pl.Int64).

Example Output
CREATE TABLE IF NOT EXISTS benchmark.performance (
`Method` String,
`Cycles` Int64,
`IPC` Float64,
...
) ENGINE = MergeTree()
ORDER BY (Method, Timestamp);
Features
  • Handles both string-based and Polars-native dtype declarations
  • Adds Nullable(...) wrappers where needed
  • Raises errors for unsupported or unrecognized dtypes
  • Keeps output consistent with the expected schema in pipeline.schema
Usage
$ python3 schema_to_clickhouse.py
→ prints CREATE TABLE SQL for use in ClickHouse CLI
Expected Schema Format
SCHEMA = {
"FieldName": (pl.Int64, False),
"OtherField": ("Utf8", True),
...
}

Definition in file schema_to_clickhouse.py.