ds_toolbox.econometrics package

Submodules

ds_toolbox.econometrics.causal_regression module

Causal Regression

This module provides the fit on train data and evaluate on test data the elasticity of a treatment on a response variable. The objective is to separate the units from the dataset (customer, stores, etc.) according to the sensitivity os their response. The steps taken are based on the chapters 19-21 from the book Causal Inference for The Brave and True that can be found at https://github.com/matheusfacure/python-causality-handbook/tree/master.

class ds_toolbox.econometrics.causal_regression.CausalRegression(df_train: pandas.core.frame.DataFrame, df_test: pandas.core.frame.DataFrame, y: str, t: str, numeric_regressors: Union[None, List], categorical_regressors: Union[None, List], h: float = 0.01)

Bases: object

Class that provides the fit on train data and evaluate on test data the elasticity of a treatment on a response variable. The objective is to separate the units from the dataset (customer, stores, etc.) according to the sensitivity os their response. The steps taken are based on the chapters 19-21 from the book Causal Inference for The Brave and True that can be found at https://github.com/matheusfacure/python-causality-handbook/tree/master.

Args:

df_train (pd.DataFrame): Train DataFrame with the columns y, y, numeric_regressors and categorical_regressors. df_test (pd.DataFrame): Test DataFrame with the columns y, y, numeric_regressors and categorical_regressors. y (str): Column name of the response variable. t (str): Column name of the treatment variable. numeric_regressors (Union[None, List]): Column names of the numeric regressors. categorical_regressors (Union[None, List]): Column names of the categorical regressors. h (float, optional): Value to be added to each treatment in order to estimate the elasticity. Defaults to 0.01.

Attributes:

formula_multiplicative_treatment_term (str): The formula of the multiplicative model, e.g., ‘y ~ t*categorical_variables + t*numeric_variables + e’. m_elasticity (sm.regression.linear_model.RegressionResultsWrapper): Fitted model of the formula_multiplicative_treatment_term in the df_train. formula_y_x (str): The formula of the y variable dependent on the categorical and numeric features, e.g., ‘y ~ categorical_variables + numeric_variables + e’. my (sm.regression.linear_model.RegressionResultsWrapper): Fitted model of the formula_y_x in the df_train. formula_t_x (str): The formula of the t variable dependent on the categorical and numeric features, e.g., ‘t ~ categorical_variables + numeric_variables + e’. mt (sm.regression.linear_model.RegressionResultsWrapper): Fitted model of the formula_t_x in the df_train. test_unbiased (pd.DataFrame): DataFrame with the original coluns, unbiased columns and predicted elasticity of the df_test. df_elasticity_ci (pd.DataFrame): DataFrame with cumulative elasticity (see the method plot_cumulative_elasticity_curve).

Methods:

fit_causal_regression: Fits the causal regression This method is called when the class is initiated. plot_cumulative_elasticity_curve: plots the cumulative elasticity curve. See chapter 20 os the book referenced in the class description.

fit_causal_regression()

This function computes the causal regression. The steps are [to-do]

plot_cumulative_elasticity_curve(title: str = 'Cumulative Elasticity', min_units: int = 30, steps: int = 100, z: float = 1.96)

Plots the cumulative elastocity curve. See chapter 20 of the book indicated in the class init.

Args:

title (str, optional): Plot Title. Defaults to ‘Cumulative Elasticity’. min_units (int, optional): Number of units in the first bin. Defaults to 30. steps (int, optional): Number of total buckets. Defaults to 100. z (float, optional): z-value for the normal distribution. Default value sets a 95% confidence interval. Defaults to 1.96.

Returns:

seabornplot

ds_toolbox.econometrics.causal_regression.compute_cum_elasticity(df: pandas.core.frame.DataFrame, predicted_elasticity: str, y: str, t: str, min_units: int = 30, steps: int = 100, z: float = 1.96)pandas.core.frame.DataFrame

Computes the cumulative elasticity from a data frame with the predicted elasticity. The result of this function is used to evaluate if a casual model is detecting the heterogeneity in treatment response.

Args:

df (pd.DataFrame): DataFrame resulted with a elasticity prediction. predicted_elasticity (str): Column name of the predicted elasticity variable. y (str): Column name of the response variable (y). t (str): Column name of the treatment variable (t). min_units (int, optional): Number of units to add in each step. Defaults to 30. steps (int, optional): [description]. Defaults to 100. z (float, optional): [description]. Defaults to 1.96.

Raises:

ValueError: If ‘min_units’ value is greater then the number of units in the dataset (df.shape[0]) this error is raised.

Returns:

pd.DataFrame: [description]

ds_toolbox.econometrics.causal_regression.create_sm_formula(y: str, numeric_regressors: Union[None, List] = None, categorical_regressors: Union[None, List] = None, treatment_col: Union[None, str] = None)

Creates a formula to be passed to a import statsmodels.formula.api.

Args:

y (str): Name of the y variable. numeric_regressors (Union[None, List], optional): List with name of the numeric regressors. Defaults to None. categorical_regressors (Union[None, List], optional): List os strings with the names of categorical regressors. Defaults to None. treatment_col (Union[None, str], optional): Name with the name of the treatment variable. Defaults to None.

Raises:

ValueError: At least one of numeric_regressors or categorical_regressors must be not None. If both are None ValueError will be raised

Returns:

str: str with the formula to be passed to a statsmodels.formula.api function.

ds_toolbox.econometrics.causal_regression.elasticity_ci(df: pandas.core.frame.DataFrame, y: str, t: str, z: float = 1.96)dict

Computes the confidence interval of a linear coefficient of a regression. Used to compute the elasticity confidence interval.

Args:

df (pd.DataFrame): Data Frame with y and t columns. y (str): Column name of the y variable. t (str): Column name of the treatment variable. z (float, optional): z value for the normal distribution. Defaults to 1.96.

Returns:

dict: dict in the form {‘elasticity’: float, ‘lower_ci’: float, ‘upper_ci’: float}.

ds_toolbox.econometrics.causal_regression.linear_coefficient(df: pandas.core.frame.DataFrame, y: str, x: str)numpy.float64

Computes the linear regression coefficient (OLS).

Args:

data (pd.DataFrame): pd.DataFrame. y (str): column name of the y variable. x (str): columns name of the regressor variable.

Returns:

np.float64: The linear coefficient.

ds_toolbox.econometrics.causal_regression.predict_elast(model: statsmodels.regression.linear_model.RegressionResultsWrapper, df_test: pandas.core.frame.DataFrame, t: str, h: float = 0.01)pandas.core.series.Series

Please refer to the CausalRegression class for a documentation. This function is used internally.

Args:

model (sm.regression.linear_model.RegressionResultsWrapper): A trained elasticity statsmodel model. df_test (pd.DataFrame): test DataFrame. t (str): Treatment column name. h (float, optional): Value to add to treatment column to compute the elasticity. Defaults to 0.01.

Returns:

float: Elasticity of the treatment column in the response variable from model.

Module contents