API Reference¶

Wiggum

class wiggum.trend_components.BinClassStats[source]¶

class of trend for computing classification statistics from confusion matrix compoents based on teh comparison of values from two columns of the data

Methods

`get_distance`(row[, col_a, col_b])	distance for confusion matrix stats is
`get_trends`(data_df, trend_col_name)	Compute a trend between two variables that are prediction and ground truth, requires a precompute step to augment the data with row-wise labels for speed
`is_computable`([labeled_df])	check if this trend can be computed based on data and metadata available

get_distance(row, col_a='subgroup_trend', col_b='agg_trend')[source]¶: distance for confusion matrix stats is

get_trends(data_df, trend_col_name)[source]¶

Compute a trend between two variables that are prediction and ground truth, requires a precompute step to augment the data with row-wise labels for speed

Parameters

data_dfDataFrame or DataFrameGroupBy: data to compute trends on, may be a whole, unmodified DataFrame or
a grouped DataFrame as passed by LabeledDataFrame get trend functions. for each
groundtruth and prediction pair there must be an accuracy column named like
groundtruthvar_predictionvar_acc.
trend_col_name{‘subgroup_trend’,’agg_trend’}: which type of trend is to be computed

Returns

reg_dfDataFrame: returns result df with rows for accuracy (acc), true positive rate (tpr), positive predictive value (ppr), and true negative rate (tnr)

is_computable(labeled_df=None)[source]¶

check if this trend can be computed based on data and metadata available

Parameters

selfTrend: a trend object with a set_vars Parameters
labeled_dfLabeledDataFrame {None} (optional): data to use if trend is not already configured

Returns

computablebool: True if requirements of get_trends are filled
See also:
get_trends() for description of how this trend computes and

class wiggum.trend_components.BinaryWeightedRank[source]¶

statRank compatible varTypeMixin, for computing means of only binary valued variables sets stat to wg.trend_components.w_avg

Methods

`get_trend_vars`(labeled_df)	set target, trendgroup, and var_weight_list for computing rank trends
`trend_value_type`	alias of `str`

get_trend_vars(labeled_df)[source]¶

set target, trendgroup, and var_weight_list for computing rank trends

Parameters

labeled_dfLabeledDataFrame: object to parse for variable types

Returns

regression_varslist of strings: variables list of all trend variables with type set to ordinal or continuous

trend_value_type¶: alias of str

class wiggum.trend_components.ContinuousOrdinalRegression[source]¶

regression compatible varTypeMixin, sets list formatted regression_vars and uses continuous dependent vars and ordinal independent

Methods

`get_trend_vars`(labeled_df)	set regression_vars for regression of pairs of ordinal and continuous trend variables, by assigning regression_vars as an instance property
`trend_value_type`	alias of `float`

set_weights_regression

get_trend_vars(labeled_df)[source]¶

set regression_vars for regression of pairs of ordinal and continuous trend variables, by assigning regression_vars as an instance property

Parameters

labeled_dfLabeledDataFrame: object to parse for variable types

Returns

regression_varslist of strings: variables list of all trend variables with type set to ordinal or continuous
var_weight_listlist of strings: list of variables to be used as weights for each regression_vars

class wiggum.trend_components.ContinuousRegression[source]¶

regression compatible varTypeMixin, for working with continuous variables sets list formatted regression_vars and symmetric_vars = True

Methods

`get_trend_vars`(labeled_df)	set regression_vars for regression of pairs of continuous trend variables, by assigning regression_vars as an instance property
`trend_value_type`	alias of `float`

set_weights_regression

get_trend_vars(labeled_df)[source]¶

set regression_vars for regression of pairs of continuous trend variables, by assigning regression_vars as an instance property

Parameters

labeled_dfLabeledDataFrame: object to parse for variable types

Returns

regression_varslist of strings: variables list of all trend variables with type set to ordinal or continuous
var_weight_listlist of strings: list of variables to be used as weights for each regression_vars

class wiggum.trend_components.CorrelationSignTrend[source]¶

trends that are based on a correlation of type that is specified as a property and computes a binary comparison of the signs as a distance

Methods

`compute_correlation_table`(data_df, ...)	common code for computing correlations for any correlation based trend
`get_distance`(row[, col_a, col_b])	distance between the subgroup and aggregate trends for a row of a result_df binary 0 for same sign, 1 for opposite sign
`get_trends`(data_df, trend_col_name)	Compute a trend, its quality and return a partial result_df
`is_computable`([labeled_df])	check if this trend can be computed based on data and metadata available
`wrap_reg_df`(reg_df, groupby_name)	add the groupby varaible or drop the subgroup coloumn

get_distance(row, col_a='subgroup_trend', col_b='agg_trend')[source]¶

distance between the subgroup and aggregate trends for a row of a result_df binary 0 for same sign, 1 for opposite sign

Parameters

rowpd.Series: row of a result_df DataFrame

Returns

<>_distfloat: distance between the subgroup_trend and agg_trend, compatible with assignment to a cell of a result_df

get_trends(data_df, trend_col_name)[source]¶

Compute a trend, its quality and return a partial result_df

Parameters

data_dfDataFrame or DataFrameGroupBy: data to compute trends on, may be a whole, unmodified DataFrame or
a grouped DataFrame as passed by LabeledDataFrame get trend functions
trend_col_name{‘subgroup_trend’,’agg_trend’}: which type of trend is to be computed

Returns

reg_dfDataFrame: partial result_df, multiple can be merged together to form a complete result_df

class wiggum.trend_components.LinearRegression[source]¶

Methods

`get_distance`(row[, col_a, col_b])	compute angle between the overall and subgroup slopes for a row of a dataframe.
`get_distance_unnormalized`(row[, col_a, col_b])	compute angle between the overall and subgroup slopes for a row of a dataframe.
`get_trends`(data_df, trend_col_name)	Compute a linear regressions and return a partial result_df
`is_computable`([labeled_df])	check if this trend can be computed based on data and metadata available this requires that the regression vars be a list of tuple or list of length at least 2.

get_distance(row, col_a='subgroup_trend', col_b='agg_trend')[source]¶

compute angle between the overall and subgroup slopes for a row of a dataframe. This is the angle closest to the positive x axis and is always positive valued, to be used as a distance.

Parameters

rowpd.Series: row of a result_df DataFrame

Returns

anglefloat: angle in degrees between the subgroup_trend and agg_trend, compatible with assignment to a cell of a result_df

get_distance_unnormalized(row, col_a='subgroup_trend', col_b='agg_trend')[source]¶

compute angle between the overall and subgroup slopes for a row of a dataframe. This is the angle closest to the positive x axis and is always positive valued, to be used as a distance.

Parameters

rowpd.Series: row of a result_df DataFrame

Returns

anglefloat: angle in degrees between the subgroup_trend and agg_trend, compatible with assignment to a cell of a result_df

get_trends(data_df, trend_col_name)[source]¶

Compute a linear regressions and return a partial result_df

Parameters

data_dfDataFrame or DataFrameGroupBy: data to compute trends on, may be a whole, unmodified DataFrame or
a grouped DataFrame as passed by LabeledDataFrame get trend functions
trend_col_name{‘subgroup_trend’,’agg_trend’}: which type of trend is to be computed

Returns

reg_dfDataFrame: partial result_df, multiple can be merged together to form a complete result_df

is_computable(labeled_df=None)[source]¶

check if this trend can be computed based on data and metadata available this requires that the regression vars be a list of tuple or list of length at least 2.

Parameters

selfTrend: a trend object with a set_vars Parameters
labeled_dfLabeledDataFrame {None} (optional): data to use if trend is not already configured

Returns

computablebool: True if requirements of get_trends are filled
See also:
get_trends() for description of how this trend computes and

class wiggum.trend_components.OrdinalRegression[source]¶

regression compatible varTypeMixin, sets list formatted regression_vars and symmetric_vars = True

Methods

`get_trend_vars`(labeled_df)	set regression_vars for regression of pairs of ordinal variables, by assigning regression_vars as an instance property
`trend_value_type`	alias of `float`

set_weights_regression

get_trend_vars(labeled_df)[source]¶

set regression_vars for regression of pairs of ordinal variables, by assigning regression_vars as an instance property

Parameters

labeled_dfLabeledDataFrame: object to parse for variable types

Returns

regression_varslist of strings: variables list of all ordinal trend variables
var_weight_listlist of strings: list of variables to be used as weights for each regression_vars

class wiggum.trend_components.PredictionClass[source]¶

for binary classification performance stats

Methods

trend_value_type

alias of float

get_trend_vars

trend_value_type¶: alias of float

class wiggum.trend_components.StatBinRankTrend[source]¶

Compute a trend that determines between alphabetically ordered values of a two-valued categorical variable are > or < when ordered by a statistic of another variable quality based on the ratio and the distance is 0/1 loss

Methods

`get_distance`(row[, col_a, col_b])	0/1 loss on ><
`get_trends`(data_df, trend_col_name)	Compute a trend between a binary ranking variable
`is_computable`([labeled_df])	check if this trend can be computed based on data and metadata available

is_SP

get_distance(row, col_a='subgroup_trend', col_b='agg_trend')[source]¶

0/1 loss on ><

Parameters

rowpd.Series: row of a result_df DataFrame. the agg_trend and subgroup_trend columns must contain lists

Returns

0_1_lossfloat: 0/1 loss distance between the subgroup_trend and agg_trend compatible with assignment to a cell of a result_df

get_trends(data_df, trend_col_name)[source]¶

Compute a trend between a binary ranking variable

Parameters

data_dfDataFrame or DataFrameGroupBy: data to compute trends on, may be a whole, unmodified DataFrame or
a grouped DataFrame as passed by LabeledDataFrame get trend functions
trend_col_name{‘subgroup_trend’,’agg_trend’}: which type of trend is to be computed TODO: could infer this by type of above?

Returns

reg_dfDataFrame: partial result_df, multiple can be merged together to form a complete result_df

is_computable(labeled_df=None)[source]¶

check if this trend can be computed based on data and metadata available

Parameters

selfTrend: a trend object with a set_vars Parameters
labeled_dfLabeledDataFrame {None} (optional): data to use if trend is not already configured

Returns

computablebool: True if requirements of get_trends are filled
See also:
get_trends() for description of how this trend computes and

class wiggum.trend_components.StatRankTrend[source]¶

Compute a trend that is the ascending ranking of categorical variables, quality based on the trend vs actual kendall tau distance and the distance in subgroup vs aggregtae is 1-tau

the distances are a continuous value

Methods

`get_distance`(row[, col_a, col_b])	kendalltau distance as a permuation distance
`get_trends`(data_df, trend_col_name)	Compute a trend that is the ascending ranking of categorical variables
`is_computable`([labeled_df])	check if this trend can be computed based on data and metadata available

get_distance(row, col_a='subgroup_trend', col_b='agg_trend')[source]¶

kendalltau distance as a permuation distance

Parameters

rowpd.Series: row of a result_df DataFrame. the agg_trend and subgroup_trend columns must contain lists

Returns

tau_distfloat: perumation distance between the subgroup_trend and agg_trend compatible with assignment to a cell of a result_df

get_trends(data_df, trend_col_name)[source]¶

Compute a trend that is the ascending ranking of categorical variables

Parameters

data_dfDataFrame or DataFrameGroupBy: data to compute trends on, may be a whole, unmodified DataFrame or
a grouped DataFrame as passed by LabeledDataFrame get trend functions
trend_col_name{‘subgroup_trend’,’agg_trend’}: which type of trend is to be computed TODO: could infer this by type of above?

Returns

reg_dfDataFrame: partial result_df, multiple can be merged together to form a complete result_df

is_computable(labeled_df=None)[source]¶

check if this trend can be computed based on data and metadata available

Parameters

selfTrend: a trend object with a set_vars Parameters
labeled_dfLabeledDataFrame {None} (optional): data to use if trend is not already configured

Returns

computablebool: True if requirements of get_trends are filled
See also:
get_trends() for description of how this trend computes and

class wiggum.trend_components.Trend(labeled_df=None)[source]¶

baseclass for abstraction and building trend objects. All trend objects must inherit this class in order to have a constructor (__init__). This may be overloaded to define a different constructor.

Parameters

self
labeled_dfLabeledDataFrame or None: if passed, get_trend_vars is called on initialization using labeled_df
as the target dataset to compute trends on

Methods

`get_trend_value_type`()	return the type that the trend values for this trend type should be
`is_SP`(row, thresh)	default is if it's above a threshold, operates rowwise and can be applied to a DataFrame with the apply method
`load`(content_dict)	load a trend from a dictionary of the content

get_trend_value_type()[source]¶: return the type that the trend values for this trend type should be

is_SP(row, thresh)[source]¶

default is if it’s above a threshold, operates rowwise and can be applied to a DataFrame with the apply method

Parameters

rowpd.series: row of a result df to apply the threshold to
threshfloat scalar: threshold to compare the distance to

Returns

boolean value if the distance is over the threshold

load(content_dict)[source]¶

load a trend from a dictionary of the content

Parameters

content_dictDictionary: the dictionary that results from saving a trend object via the
trend.__dict__ output

Returns

selfa trend object: with all of the parameters set according to the dictionary

class wiggum.trend_components.WeightedRank[source]¶

common parts for all continuous variable trends

Methods

trend_value_type

alias of str

get_trend_vars

get_trend_vars(labeled_df)[source]¶

trend_value_type¶: alias of str

wiggum.trend_components.w_avg(df, avcol, wcol)[source]¶

commpute a weighted average and use the std to define confidence interval: compatible with DataFrame.apply() and get_trends functions in wiggum.trend_components.categorical

Parameters

dfDataFrame or DataFrameGroupBy: passed as the source of apply, the data to extract columns from for computing a weighted average
avcolstring: name of column in df to take the average of
wcolstring: name of column in df to use for weighting

Returns

stat_datapandas Series: with ‘stat’, ‘max’, ‘min’ values defining the statistic and a confidence interval and ‘count’ defining the power of the computation
statfloat: mean of df[avcol] weighted row wise by df[wcol]
maxfloat: mean + std to be used for upper limit of confidence interval
minfloat: mean - std
countint: sum wcol

Wiggum App: interactive visualization

The app is flask powered and includes both python and javascript to power the compuation and visualization.

Table of Contents

Related Topics

This Page

API Reference¶