API Reference¶
Wiggum
- class wiggum.trend_components.BinClassStats[source]¶
class of trend for computing classification statistics from confusion matrix compoents based on teh comparison of values from two columns of the data
Methods
get_distance
(row[, col_a, col_b])distance for confusion matrix stats is
get_trends
(data_df, trend_col_name)Compute a trend between two variables that are prediction and ground truth, requires a precompute step to augment the data with row-wise labels for speed
is_computable
([labeled_df])check if this trend can be computed based on data and metadata available
- get_distance(row, col_a='subgroup_trend', col_b='agg_trend')[source]¶
distance for confusion matrix stats is
- get_trends(data_df, trend_col_name)[source]¶
Compute a trend between two variables that are prediction and ground truth, requires a precompute step to augment the data with row-wise labels for speed
- Parameters
- data_dfDataFrame or DataFrameGroupBy
data to compute trends on, may be a whole, unmodified DataFrame or
- a grouped DataFrame as passed by LabeledDataFrame get trend functions. for each
- groundtruth and prediction pair there must be an accuracy column named like
- groundtruthvar_predictionvar_acc.
- trend_col_name{‘subgroup_trend’,’agg_trend’}
which type of trend is to be computed
- Returns
- reg_dfDataFrame
returns result df with rows for accuracy (acc), true positive rate (tpr), positive predictive value (ppr), and true negative rate (tnr)
- is_computable(labeled_df=None)[source]¶
check if this trend can be computed based on data and metadata available
- Parameters
- selfTrend
a trend object with a set_vars Parameters
- labeled_dfLabeledDataFrame {None} (optional)
data to use if trend is not already configured
- Returns
- computablebool
True if requirements of get_trends are filled
- See also:
- get_trends() for description of how this trend computes and
- class wiggum.trend_components.BinaryWeightedRank[source]¶
statRank compatible varTypeMixin, for computing means of only binary valued variables sets stat to wg.trend_components.w_avg
Methods
get_trend_vars
(labeled_df)set target, trendgroup, and var_weight_list for computing rank trends
alias of
str
- class wiggum.trend_components.ContinuousOrdinalRegression[source]¶
regression compatible varTypeMixin, sets list formatted regression_vars and uses continuous dependent vars and ordinal independent
Methods
get_trend_vars
(labeled_df)set regression_vars for regression of pairs of ordinal and continuous trend variables, by assigning regression_vars as an instance property
trend_value_type
alias of
float
set_weights_regression
- get_trend_vars(labeled_df)[source]¶
set regression_vars for regression of pairs of ordinal and continuous trend variables, by assigning regression_vars as an instance property
- Parameters
- labeled_dfLabeledDataFrame
object to parse for variable types
- Returns
- regression_varslist of strings
variables list of all trend variables with type set to ordinal or continuous
- var_weight_listlist of strings
list of variables to be used as weights for each regression_vars
- class wiggum.trend_components.ContinuousRegression[source]¶
regression compatible varTypeMixin, for working with continuous variables sets list formatted regression_vars and symmetric_vars = True
Methods
get_trend_vars
(labeled_df)set regression_vars for regression of pairs of continuous trend variables, by assigning regression_vars as an instance property
trend_value_type
alias of
float
set_weights_regression
- get_trend_vars(labeled_df)[source]¶
set regression_vars for regression of pairs of continuous trend variables, by assigning regression_vars as an instance property
- Parameters
- labeled_dfLabeledDataFrame
object to parse for variable types
- Returns
- regression_varslist of strings
variables list of all trend variables with type set to ordinal or continuous
- var_weight_listlist of strings
list of variables to be used as weights for each regression_vars
- class wiggum.trend_components.CorrelationSignTrend[source]¶
trends that are based on a correlation of type that is specified as a property and computes a binary comparison of the signs as a distance
Methods
compute_correlation_table
(data_df, ...)common code for computing correlations for any correlation based trend
get_distance
(row[, col_a, col_b])distance between the subgroup and aggregate trends for a row of a result_df binary 0 for same sign, 1 for opposite sign
get_trends
(data_df, trend_col_name)Compute a trend, its quality and return a partial result_df
is_computable
([labeled_df])check if this trend can be computed based on data and metadata available
wrap_reg_df
(reg_df, groupby_name)add the groupby varaible or drop the subgroup coloumn
- get_distance(row, col_a='subgroup_trend', col_b='agg_trend')[source]¶
distance between the subgroup and aggregate trends for a row of a result_df binary 0 for same sign, 1 for opposite sign
- Parameters
- rowpd.Series
row of a result_df DataFrame
- Returns
- <>_distfloat
distance between the subgroup_trend and agg_trend, compatible with assignment to a cell of a result_df
- get_trends(data_df, trend_col_name)[source]¶
Compute a trend, its quality and return a partial result_df
- Parameters
- data_dfDataFrame or DataFrameGroupBy
data to compute trends on, may be a whole, unmodified DataFrame or
- a grouped DataFrame as passed by LabeledDataFrame get trend functions
- trend_col_name{‘subgroup_trend’,’agg_trend’}
which type of trend is to be computed
- Returns
- reg_dfDataFrame
partial result_df, multiple can be merged together to form a complete result_df
- class wiggum.trend_components.LinearRegression[source]¶
Methods
get_distance
(row[, col_a, col_b])compute angle between the overall and subgroup slopes for a row of a dataframe.
get_distance_unnormalized
(row[, col_a, col_b])compute angle between the overall and subgroup slopes for a row of a dataframe.
get_trends
(data_df, trend_col_name)Compute a linear regressions and return a partial result_df
is_computable
([labeled_df])check if this trend can be computed based on data and metadata available this requires that the regression vars be a list of tuple or list of length at least 2.
- get_distance(row, col_a='subgroup_trend', col_b='agg_trend')[source]¶
compute angle between the overall and subgroup slopes for a row of a dataframe. This is the angle closest to the positive x axis and is always positive valued, to be used as a distance.
- Parameters
- rowpd.Series
row of a result_df DataFrame
- Returns
- anglefloat
angle in degrees between the subgroup_trend and agg_trend, compatible with assignment to a cell of a result_df
- get_distance_unnormalized(row, col_a='subgroup_trend', col_b='agg_trend')[source]¶
compute angle between the overall and subgroup slopes for a row of a dataframe. This is the angle closest to the positive x axis and is always positive valued, to be used as a distance.
- Parameters
- rowpd.Series
row of a result_df DataFrame
- Returns
- anglefloat
angle in degrees between the subgroup_trend and agg_trend, compatible with assignment to a cell of a result_df
- get_trends(data_df, trend_col_name)[source]¶
Compute a linear regressions and return a partial result_df
- Parameters
- data_dfDataFrame or DataFrameGroupBy
data to compute trends on, may be a whole, unmodified DataFrame or
- a grouped DataFrame as passed by LabeledDataFrame get trend functions
- trend_col_name{‘subgroup_trend’,’agg_trend’}
which type of trend is to be computed
- Returns
- reg_dfDataFrame
partial result_df, multiple can be merged together to form a complete result_df
- is_computable(labeled_df=None)[source]¶
check if this trend can be computed based on data and metadata available this requires that the regression vars be a list of tuple or list of length at least 2.
- Parameters
- selfTrend
a trend object with a set_vars Parameters
- labeled_dfLabeledDataFrame {None} (optional)
data to use if trend is not already configured
- Returns
- computablebool
True if requirements of get_trends are filled
- See also:
- get_trends() for description of how this trend computes and
- class wiggum.trend_components.OrdinalRegression[source]¶
regression compatible varTypeMixin, sets list formatted regression_vars and symmetric_vars = True
Methods
get_trend_vars
(labeled_df)set regression_vars for regression of pairs of ordinal variables, by assigning regression_vars as an instance property
trend_value_type
alias of
float
set_weights_regression
- get_trend_vars(labeled_df)[source]¶
set regression_vars for regression of pairs of ordinal variables, by assigning regression_vars as an instance property
- Parameters
- labeled_dfLabeledDataFrame
object to parse for variable types
- Returns
- regression_varslist of strings
variables list of all ordinal trend variables
- var_weight_listlist of strings
list of variables to be used as weights for each regression_vars
- class wiggum.trend_components.PredictionClass[source]¶
for binary classification performance stats
Methods
alias of
float
get_trend_vars
- class wiggum.trend_components.StatBinRankTrend[source]¶
Compute a trend that determines between alphabetically ordered values of a two-valued categorical variable are > or < when ordered by a statistic of another variable quality based on the ratio and the distance is 0/1 loss
Methods
get_distance
(row[, col_a, col_b])0/1 loss on ><
get_trends
(data_df, trend_col_name)Compute a trend between a binary ranking variable
is_computable
([labeled_df])check if this trend can be computed based on data and metadata available
is_SP
- get_distance(row, col_a='subgroup_trend', col_b='agg_trend')[source]¶
0/1 loss on ><
- Parameters
- rowpd.Series
row of a result_df DataFrame. the agg_trend and subgroup_trend columns must contain lists
- Returns
- 0_1_lossfloat
0/1 loss distance between the subgroup_trend and agg_trend compatible with assignment to a cell of a result_df
- get_trends(data_df, trend_col_name)[source]¶
Compute a trend between a binary ranking variable
- Parameters
- data_dfDataFrame or DataFrameGroupBy
data to compute trends on, may be a whole, unmodified DataFrame or
- a grouped DataFrame as passed by LabeledDataFrame get trend functions
- trend_col_name{‘subgroup_trend’,’agg_trend’}
which type of trend is to be computed TODO: could infer this by type of above?
- Returns
- reg_dfDataFrame
partial result_df, multiple can be merged together to form a complete result_df
- is_computable(labeled_df=None)[source]¶
check if this trend can be computed based on data and metadata available
- Parameters
- selfTrend
a trend object with a set_vars Parameters
- labeled_dfLabeledDataFrame {None} (optional)
data to use if trend is not already configured
- Returns
- computablebool
True if requirements of get_trends are filled
- See also:
- get_trends() for description of how this trend computes and
- class wiggum.trend_components.StatRankTrend[source]¶
Compute a trend that is the ascending ranking of categorical variables, quality based on the trend vs actual kendall tau distance and the distance in subgroup vs aggregtae is 1-tau
the distances are a continuous value
Methods
get_distance
(row[, col_a, col_b])kendalltau distance as a permuation distance
get_trends
(data_df, trend_col_name)Compute a trend that is the ascending ranking of categorical variables
is_computable
([labeled_df])check if this trend can be computed based on data and metadata available
- get_distance(row, col_a='subgroup_trend', col_b='agg_trend')[source]¶
kendalltau distance as a permuation distance
- Parameters
- rowpd.Series
row of a result_df DataFrame. the agg_trend and subgroup_trend columns must contain lists
- Returns
- tau_distfloat
perumation distance between the subgroup_trend and agg_trend compatible with assignment to a cell of a result_df
- get_trends(data_df, trend_col_name)[source]¶
Compute a trend that is the ascending ranking of categorical variables
- Parameters
- data_dfDataFrame or DataFrameGroupBy
data to compute trends on, may be a whole, unmodified DataFrame or
- a grouped DataFrame as passed by LabeledDataFrame get trend functions
- trend_col_name{‘subgroup_trend’,’agg_trend’}
which type of trend is to be computed TODO: could infer this by type of above?
- Returns
- reg_dfDataFrame
partial result_df, multiple can be merged together to form a complete result_df
- is_computable(labeled_df=None)[source]¶
check if this trend can be computed based on data and metadata available
- Parameters
- selfTrend
a trend object with a set_vars Parameters
- labeled_dfLabeledDataFrame {None} (optional)
data to use if trend is not already configured
- Returns
- computablebool
True if requirements of get_trends are filled
- See also:
- get_trends() for description of how this trend computes and
- class wiggum.trend_components.Trend(labeled_df=None)[source]¶
baseclass for abstraction and building trend objects. All trend objects must inherit this class in order to have a constructor (__init__). This may be overloaded to define a different constructor.
- Parameters
- self
- labeled_dfLabeledDataFrame or None
if passed, get_trend_vars is called on initialization using labeled_df
- as the target dataset to compute trends on
Methods
return the type that the trend values for this trend type should be
is_SP
(row, thresh)default is if it's above a threshold, operates rowwise and can be applied to a DataFrame with the apply method
load
(content_dict)load a trend from a dictionary of the content
- is_SP(row, thresh)[source]¶
default is if it’s above a threshold, operates rowwise and can be applied to a DataFrame with the apply method
- Parameters
- rowpd.series
row of a result df to apply the threshold to
- threshfloat scalar
threshold to compare the distance to
- Returns
- boolean value if the distance is over the threshold
- class wiggum.trend_components.WeightedRank[source]¶
common parts for all continuous variable trends
Methods
alias of
str
get_trend_vars
- wiggum.trend_components.w_avg(df, avcol, wcol)[source]¶
- commpute a weighted average and use the std to define confidence interval
compatible with DataFrame.apply() and get_trends functions in wiggum.trend_components.categorical
- Parameters
- dfDataFrame or DataFrameGroupBy
passed as the source of apply, the data to extract columns from for computing a weighted average
- avcolstring
name of column in df to take the average of
- wcolstring
name of column in df to use for weighting
- Returns
- stat_datapandas Series
with ‘stat’, ‘max’, ‘min’ values defining the statistic and a confidence interval and ‘count’ defining the power of the computation
- statfloat
mean of df[avcol] weighted row wise by df[wcol]
- maxfloat
mean + std to be used for upper limit of confidence interval
- minfloat
mean - std
- countint
sum wcol
Wiggum App: interactive visualization
The app is flask powered and includes both python and javascript to power the compuation and visualization.