pyspark_ds_toolbox.ml.data_prep package
Submodules
pyspark_ds_toolbox.ml.data_prep.class_weights module
Module dedicated to functionalities related to class weighting tools.
- pyspark_ds_toolbox.ml.data_prep.class_weights.binary_classifier_weights(dfs: pyspark.sql.dataframe.DataFrame, col_target: str) pyspark.sql.dataframe.DataFrame
- Adds a class weight columns to a binary classification response column. - Parameters
- dfs (pyspark.sql.dataframe.DataFrame) – Training dataset with the col_target column. 
- col_target (str) – Column name of the column that contains the response variable for the model. It should contain only values of 0 and 1. 
 
- Raises
- ValueError – If unique values from col_target column are not 0 and 1. 
- Returns
- The dfs object with a weight_{col_target} column. 
- Return type
- pyspark.sql.dataframe.DataFrame 
 
pyspark_ds_toolbox.ml.data_prep.features_vector module
Module dedicated to features spark vector tools.
- pyspark_ds_toolbox.ml.data_prep.features_vector.get_features_vector(num_features: Optional[List[str]] = None, cat_features: Optional[List[str]] = None, output_col='features') List
- Assembles a features vector to be used with ML algorithms. - Parameters
- num_features (List[str]) – List of columns names of numeric features; 
- cat_features (List[str]) – List of column names of categorical features (StringIndexer); 
- output_col (str) – name of the output column; 
 
- Raises
- TypeError – If num_features AND cat_features are os type None. 
- Returns
- pyspark indexers, encoders and assemblers like a list; 
- Return type
- [List] 
 
Module contents
Sub-package Dedicated to Data Preparation for Machine Learning tools.