Functions | |
pl.DataFrame | safe_vector_cast (pl.DataFrame df, dict schema) |
Cast a Polars DataFrame to match a declared schema, handling 'NA' strings as nulls. | |
safe_div (numerator, denominator) | |
Safely performs division, handling 'NA' values and invalid input. | |
safe_div_percent (numerator, denominator) | |
Computes percentage-based division safely, with 'NA' fallback. | |
pipeline.utils.safe_div | ( | numerator, | |
denominator ) |
Safely performs division, handling 'NA' values and invalid input.
Returns a rounded division result unless input is invalid or contains the string "NA", in which case "NA" is returned instead.
numerator | Numerator of the division (can be int, float, or "NA"). |
denominator | Denominator of the division (can be int, float, or "NA"). |
Definition at line 80 of file utils.py.
pipeline.utils.safe_div_percent | ( | numerator, | |
denominator ) |
Computes percentage-based division safely, with 'NA' fallback.
Similar to safe_div, but multiplies the result by 100 to express it as a percent. Invalid input or "NA" strings will return "NA" as a string.
numerator | Numerator of the division (can be int, float, or "NA"). |
denominator | Denominator of the division (can be int, float, or "NA"). |
Definition at line 102 of file utils.py.
pl.DataFrame pipeline.utils.safe_vector_cast | ( | pl.DataFrame | df, |
dict | schema ) |
Cast a Polars DataFrame to match a declared schema, handling 'NA' strings as nulls.
This function enforces schema alignment between a raw input DataFrame (typically from CSV) and a declared schema. If allow_na
is True in the schema, string values like "NA" will be replaced with nulls prior to casting.
df | The input Polars DataFrame to cast. |
schema | Dictionary in the format { column_name: (dtype, allow_na) }. |
ValueError | If any schema field is missing in the DataFrame. |
Definition at line 33 of file utils.py.