Change Log

0.4.2

Fixed

  • pyspark_ds_toolbox.ml.feature_selection.information_value.feature_selection_with_iv(): bucket_fraction argument behavior.

Changed

  • pyspark_ds_toolbox.ml.feature_selection.information_value.feature_selection_with_iv(): Return dict[dfs_iv] from a spark dataframe to dict[df_iv] to a pandas df.

0.4.1

Fixed

  • pyspark_ds_toolbox.ml.feature_selection.information_value.feature_selection_with_iv(): behavior with num_features and cat_features arguments.

0.4.0

Added

  • Added the pyspark_ds_toolbox.ml.feature_selection.information_value module and all its functionalities

    • feature_selection_with_iv()

    • compute_woe_iv()

    • WeightOfEvidenceComputer()

0.3.4

Breaking Changes

  • pyspark_ds_toolbox.ml.data_prep.features_vector.get_features_vector: Now returns a list with pyspark indexers, encoders and assemblers, to used with pipelines.

  • pyspark_ds_toolbox.ml.classification.baseline_classifiers.py: Models now are returned as pipelines.

0.3.3

Changed

  • pyspark_ds_toolbox.ml.classification.baseline_binary_classfiers has a mlflow_experiment_name argument.

Fixed

  • pyspark_ds_toolbox.ml.feature_importance.native_spark.

0.3.2

Changed

  • Fuctionalities from module pyspark_ds_toolbox.wrangling was refactored into pyspark_ds_toolbox.wrangling.reshape.py and pyspark_ds_toolbox.wrangling.data_quality.py;

  • Fuctionalities from module pyspark_ds_toolbox.ml.data_prep was refactored into pyspark_ds_toolbox.ml.data_prep.class_weights.py and pyspark_ds_toolbox.ml.data_prep.features_vector.py.

0.3.1

Changed

  • Module pyspark_ds_toolbox.ml.classification.baseline_binary_classfiers now algo return features scores.

0.3.0

Added

  • Module pyspark_ds_toolbox.ml.feature_importance with the functions:

    • extract_features_score()

Changed

  • Module pyspark_ds_toolbox.ml.shap_values became pyspark_ds_toolbox.ml.feature_importance.shap_values

0.2.0

Added

  • Module pyspark_ds_toolbox.ml.classification

changed

  • Module pyspark_ds_toolbox.ml.eval became pyspark_ds_toolbox.ml.classification.eval

0.1.4

Changed

  • [fix] Class pyspark_ds_toolbox.stats.association.Association now can properly receive only numerical or only categorical features.

0.1.3

Added

  • CHANGELOG.md file

Changed

  • pyspark dependency is now >=3.2

  • Class pyspark_ds_toolbox.stats.association.Association now uses pyspark.pandas.frame.DataFrame instead of databricks.koalas.frame.DataFrame.