API Reference

Wiggum

class wiggum.trend_components.BinClassStats[source]

class of trend for computing classification statistics from confusion matrix compoents based on teh comparison of values from two columns of the data

Methods

get_distance(row[, col_a, col_b])

distance for confusion matrix stats is

get_trends(data_df, trend_col_name)

Compute a trend between two variables that are prediction and ground truth, requires a precompute step to augment the data with row-wise labels for speed

is_computable([labeled_df])

check if this trend can be computed based on data and metadata available

get_distance(row, col_a='subgroup_trend', col_b='agg_trend')[source]

distance for confusion matrix stats is

Compute a trend between two variables that are prediction and ground truth, requires a precompute step to augment the data with row-wise labels for speed

Parameters
data_dfDataFrame or DataFrameGroupBy

data to compute trends on, may be a whole, unmodified DataFrame or

a grouped DataFrame as passed by LabeledDataFrame get trend functions. for each
groundtruth and prediction pair there must be an accuracy column named like
groundtruthvar_predictionvar_acc.
trend_col_name{‘subgroup_trend’,’agg_trend’}

which type of trend is to be computed

Returns
reg_dfDataFrame

returns result df with rows for accuracy (acc), true positive rate (tpr), positive predictive value (ppr), and true negative rate (tnr)

is_computable(labeled_df=None)[source]

check if this trend can be computed based on data and metadata available

Parameters
selfTrend

a trend object with a set_vars Parameters

labeled_dfLabeledDataFrame {None} (optional)

data to use if trend is not already configured

Returns
computablebool

True if requirements of get_trends are filled

See also:
get_trends() for description of how this trend computes and
class wiggum.trend_components.BinaryWeightedRank[source]

statRank compatible varTypeMixin, for computing means of only binary valued variables sets stat to wg.trend_components.w_avg

Methods

get_trend_vars(labeled_df)

set target, trendgroup, and var_weight_list for computing rank trends

trend_value_type

alias of str

get_trend_vars(labeled_df)[source]

set target, trendgroup, and var_weight_list for computing rank trends

Parameters
labeled_dfLabeledDataFrame

object to parse for variable types

Returns
regression_varslist of strings

variables list of all trend variables with type set to ordinal or continuous

trend_value_type

alias of str

class wiggum.trend_components.ContinuousOrdinalRegression[source]

regression compatible varTypeMixin, sets list formatted regression_vars and uses continuous dependent vars and ordinal independent

Methods

get_trend_vars(labeled_df)

set regression_vars for regression of pairs of ordinal and continuous trend variables, by assigning regression_vars as an instance property

trend_value_type

alias of float

set_weights_regression

get_trend_vars(labeled_df)[source]

set regression_vars for regression of pairs of ordinal and continuous trend variables, by assigning regression_vars as an instance property

Parameters
labeled_dfLabeledDataFrame

object to parse for variable types

Returns
regression_varslist of strings

variables list of all trend variables with type set to ordinal or continuous

var_weight_listlist of strings

list of variables to be used as weights for each regression_vars

class wiggum.trend_components.ContinuousRegression[source]

regression compatible varTypeMixin, for working with continuous variables sets list formatted regression_vars and symmetric_vars = True

Methods

get_trend_vars(labeled_df)

set regression_vars for regression of pairs of continuous trend variables, by assigning regression_vars as an instance property

trend_value_type

alias of float

set_weights_regression

get_trend_vars(labeled_df)[source]

set regression_vars for regression of pairs of continuous trend variables, by assigning regression_vars as an instance property

Parameters
labeled_dfLabeledDataFrame

object to parse for variable types

Returns
regression_varslist of strings

variables list of all trend variables with type set to ordinal or continuous

var_weight_listlist of strings

list of variables to be used as weights for each regression_vars

class wiggum.trend_components.CorrelationSignTrend[source]

trends that are based on a correlation of type that is specified as a property and computes a binary comparison of the signs as a distance

Methods

compute_correlation_table(data_df, ...)

common code for computing correlations for any correlation based trend

get_distance(row[, col_a, col_b])

distance between the subgroup and aggregate trends for a row of a result_df binary 0 for same sign, 1 for opposite sign

get_trends(data_df, trend_col_name)

Compute a trend, its quality and return a partial result_df

is_computable([labeled_df])

check if this trend can be computed based on data and metadata available

wrap_reg_df(reg_df, groupby_name)

add the groupby varaible or drop the subgroup coloumn

get_distance(row, col_a='subgroup_trend', col_b='agg_trend')[source]

distance between the subgroup and aggregate trends for a row of a result_df binary 0 for same sign, 1 for opposite sign

Parameters
rowpd.Series

row of a result_df DataFrame

Returns
<>_distfloat

distance between the subgroup_trend and agg_trend, compatible with assignment to a cell of a result_df

Compute a trend, its quality and return a partial result_df

Parameters
data_dfDataFrame or DataFrameGroupBy

data to compute trends on, may be a whole, unmodified DataFrame or

a grouped DataFrame as passed by LabeledDataFrame get trend functions
trend_col_name{‘subgroup_trend’,’agg_trend’}

which type of trend is to be computed

Returns
reg_dfDataFrame

partial result_df, multiple can be merged together to form a complete result_df

class wiggum.trend_components.LinearRegression[source]

Methods

get_distance(row[, col_a, col_b])

compute angle between the overall and subgroup slopes for a row of a dataframe.

get_distance_unnormalized(row[, col_a, col_b])

compute angle between the overall and subgroup slopes for a row of a dataframe.

get_trends(data_df, trend_col_name)

Compute a linear regressions and return a partial result_df

is_computable([labeled_df])

check if this trend can be computed based on data and metadata available this requires that the regression vars be a list of tuple or list of length at least 2.

get_distance(row, col_a='subgroup_trend', col_b='agg_trend')[source]

compute angle between the overall and subgroup slopes for a row of a dataframe. This is the angle closest to the positive x axis and is always positive valued, to be used as a distance.

Parameters
rowpd.Series

row of a result_df DataFrame

Returns
anglefloat

angle in degrees between the subgroup_trend and agg_trend, compatible with assignment to a cell of a result_df

get_distance_unnormalized(row, col_a='subgroup_trend', col_b='agg_trend')[source]

compute angle between the overall and subgroup slopes for a row of a dataframe. This is the angle closest to the positive x axis and is always positive valued, to be used as a distance.

Parameters
rowpd.Series

row of a result_df DataFrame

Returns
anglefloat

angle in degrees between the subgroup_trend and agg_trend, compatible with assignment to a cell of a result_df

Compute a linear regressions and return a partial result_df

Parameters
data_dfDataFrame or DataFrameGroupBy

data to compute trends on, may be a whole, unmodified DataFrame or

a grouped DataFrame as passed by LabeledDataFrame get trend functions
trend_col_name{‘subgroup_trend’,’agg_trend’}

which type of trend is to be computed

Returns
reg_dfDataFrame

partial result_df, multiple can be merged together to form a complete result_df

is_computable(labeled_df=None)[source]

check if this trend can be computed based on data and metadata available this requires that the regression vars be a list of tuple or list of length at least 2.

Parameters
selfTrend

a trend object with a set_vars Parameters

labeled_dfLabeledDataFrame {None} (optional)

data to use if trend is not already configured

Returns
computablebool

True if requirements of get_trends are filled

See also:
get_trends() for description of how this trend computes and
class wiggum.trend_components.OrdinalRegression[source]

regression compatible varTypeMixin, sets list formatted regression_vars and symmetric_vars = True

Methods

get_trend_vars(labeled_df)

set regression_vars for regression of pairs of ordinal variables, by assigning regression_vars as an instance property

trend_value_type

alias of float

set_weights_regression

get_trend_vars(labeled_df)[source]

set regression_vars for regression of pairs of ordinal variables, by assigning regression_vars as an instance property

Parameters
labeled_dfLabeledDataFrame

object to parse for variable types

Returns
regression_varslist of strings

variables list of all ordinal trend variables

var_weight_listlist of strings

list of variables to be used as weights for each regression_vars

class wiggum.trend_components.PredictionClass[source]

for binary classification performance stats

Methods

trend_value_type

alias of float

get_trend_vars

trend_value_type

alias of float

class wiggum.trend_components.StatBinRankTrend[source]

Compute a trend that determines between alphabetically ordered values of a two-valued categorical variable are > or < when ordered by a statistic of another variable quality based on the ratio and the distance is 0/1 loss

Methods

get_distance(row[, col_a, col_b])

0/1 loss on ><

get_trends(data_df, trend_col_name)

Compute a trend between a binary ranking variable

is_computable([labeled_df])

check if this trend can be computed based on data and metadata available

is_SP

get_distance(row, col_a='subgroup_trend', col_b='agg_trend')[source]

0/1 loss on ><

Parameters
rowpd.Series

row of a result_df DataFrame. the agg_trend and subgroup_trend columns must contain lists

Returns
0_1_lossfloat

0/1 loss distance between the subgroup_trend and agg_trend compatible with assignment to a cell of a result_df

Compute a trend between a binary ranking variable

Parameters
data_dfDataFrame or DataFrameGroupBy

data to compute trends on, may be a whole, unmodified DataFrame or

a grouped DataFrame as passed by LabeledDataFrame get trend functions
trend_col_name{‘subgroup_trend’,’agg_trend’}

which type of trend is to be computed TODO: could infer this by type of above?

Returns
reg_dfDataFrame

partial result_df, multiple can be merged together to form a complete result_df

is_computable(labeled_df=None)[source]

check if this trend can be computed based on data and metadata available

Parameters
selfTrend

a trend object with a set_vars Parameters

labeled_dfLabeledDataFrame {None} (optional)

data to use if trend is not already configured

Returns
computablebool

True if requirements of get_trends are filled

See also:
get_trends() for description of how this trend computes and
class wiggum.trend_components.StatRankTrend[source]

Compute a trend that is the ascending ranking of categorical variables, quality based on the trend vs actual kendall tau distance and the distance in subgroup vs aggregtae is 1-tau

the distances are a continuous value

Methods

get_distance(row[, col_a, col_b])

kendalltau distance as a permuation distance

get_trends(data_df, trend_col_name)

Compute a trend that is the ascending ranking of categorical variables

is_computable([labeled_df])

check if this trend can be computed based on data and metadata available

get_distance(row, col_a='subgroup_trend', col_b='agg_trend')[source]

kendalltau distance as a permuation distance

Parameters
rowpd.Series

row of a result_df DataFrame. the agg_trend and subgroup_trend columns must contain lists

Returns
tau_distfloat

perumation distance between the subgroup_trend and agg_trend compatible with assignment to a cell of a result_df

Compute a trend that is the ascending ranking of categorical variables

Parameters
data_dfDataFrame or DataFrameGroupBy

data to compute trends on, may be a whole, unmodified DataFrame or

a grouped DataFrame as passed by LabeledDataFrame get trend functions
trend_col_name{‘subgroup_trend’,’agg_trend’}

which type of trend is to be computed TODO: could infer this by type of above?

Returns
reg_dfDataFrame

partial result_df, multiple can be merged together to form a complete result_df

is_computable(labeled_df=None)[source]

check if this trend can be computed based on data and metadata available

Parameters
selfTrend

a trend object with a set_vars Parameters

labeled_dfLabeledDataFrame {None} (optional)

data to use if trend is not already configured

Returns
computablebool

True if requirements of get_trends are filled

See also:
get_trends() for description of how this trend computes and
class wiggum.trend_components.Trend(labeled_df=None)[source]

baseclass for abstraction and building trend objects. All trend objects must inherit this class in order to have a constructor (__init__). This may be overloaded to define a different constructor.

Parameters
self
labeled_dfLabeledDataFrame or None

if passed, get_trend_vars is called on initialization using labeled_df

as the target dataset to compute trends on

Methods

get_trend_value_type()

return the type that the trend values for this trend type should be

is_SP(row, thresh)

default is if it's above a threshold, operates rowwise and can be applied to a DataFrame with the apply method

load(content_dict)

load a trend from a dictionary of the content

get_trend_value_type()[source]

return the type that the trend values for this trend type should be

is_SP(row, thresh)[source]

default is if it’s above a threshold, operates rowwise and can be applied to a DataFrame with the apply method

Parameters
rowpd.series

row of a result df to apply the threshold to

threshfloat scalar

threshold to compare the distance to

Returns
boolean value if the distance is over the threshold
load(content_dict)[source]

load a trend from a dictionary of the content

Parameters
content_dictDictionary

the dictionary that results from saving a trend object via the

trend.__dict__ output
Returns
selfa trend object

with all of the parameters set according to the dictionary

class wiggum.trend_components.WeightedRank[source]

common parts for all continuous variable trends

Methods

trend_value_type

alias of str

get_trend_vars

get_trend_vars(labeled_df)[source]
trend_value_type

alias of str

wiggum.trend_components.w_avg(df, avcol, wcol)[source]
commpute a weighted average and use the std to define confidence interval

compatible with DataFrame.apply() and get_trends functions in wiggum.trend_components.categorical

Parameters
dfDataFrame or DataFrameGroupBy

passed as the source of apply, the data to extract columns from for computing a weighted average

avcolstring

name of column in df to take the average of

wcolstring

name of column in df to use for weighting

Returns
stat_datapandas Series

with ‘stat’, ‘max’, ‘min’ values defining the statistic and a confidence interval and ‘count’ defining the power of the computation

statfloat

mean of df[avcol] weighted row wise by df[wcol]

maxfloat

mean + std to be used for upper limit of confidence interval

minfloat

mean - std

countint

sum wcol

Wiggum App: interactive visualization

The app is flask powered and includes both python and javascript to power the compuation and visualization.