User Manual for Wiggum App¶
Once wiggum is installed via pip, the Flask app is a command line tool. To launch the Flask app from the command line, type :
This will start the server on http://127.0.0.1:5000/ from the current working directory. Currently, launching from any location can create issues with loading files, so the app should be launched from the code directory.
The data preparation page of Wiggum app is the place to load the data, set metadata, and to augment the data.
The first step to using Wiggum is to load a dataset. You may load data as a .csv file or load from a folder that was created by Wiggum that contains the data, metadata, and results from a prior analysis.
Data was saved before
Uploading a .csv¶
Click “Choose File” in the upper left corner of the page, then select a data file from your local disk.
Select “Upload”. The data will now appear in a metadata table below.
Load from a prior analysis¶
Click “Choose Files” in the upper left corner of page, then select a folder created by Wiggum from a prior analysis.
Select “Upload”. The data will appear in a metadata table below with your previous metadata selections.
Wiggum requires that you set labels for each column or variable in your data. This will be done once data has been loaded. The user must set the variable type, the variable role, and if applicable, whether to use the variable as a weighting variable.
Setting Variable Type¶
There are four different possible variable types. Wiggum will provide its initial suggestion for variable types based on certain characteristics, but the user is free to change them.
Binary variables for Wiggum are for values that are only true/false, so use them for boolean variables. For other variables that may only have two values, but are not true/false, label them as categorical.
Ordinal variables are those that require an order, like years.
Categorical variables are those that have qualitative values based on some characteristic, such as eye color, gender, or type.
Continuous variables are values that are doubles, floats, or integers. These are usually numeric values.
Setting Variable Roles¶
There are five different possible roles for a variable.
Splitby variables are generally categories that you want to split by and analyze, such as categorical or ordinal variables.
Common examples are gender, race, or age group.
Independent variables are those that do not depend on another variable, or one that you wish to visualize on the y-axis.
Dependent variables are the variables that you wish to measure based on the independent variables.
Common examples are time or other continuous variables.
Ignore variables are those that the user does not want to consider for calculations or visualization. These can be variables that are irrelevant in consideration of the mix effect, or variables that the user wants to use as a weighting variable.
Other Metadata Settings¶
isCount can be used for ignore variables. It is most useful whenever you have a population variable, or a ‘count’ variable, that you want
to use as a weight for certain categories or trend variables. In order to use a variable as isCount, mark the variable role as ‘ignore’ then select ‘Y’ for isCount.
In order to utilize an ignore, isCount variable properly, it must be set as the weighting_var, or weighting variable, for at least one other variable in the metadata table.
The weighting_var can be set to any variable in the table, but is best utilized if it is an ignore, isCount variable.
Data augmentation is useful for you if you want to combine certain categorical variables to analyze. In the metadata entry screen, there is a checkbox for quantiles and intersection for each variable. Intersection is used for categorical variables.
Checking the intersection checkbox for more than one categorical variable creates a new column in your data representing the intersection of the two variables. For example, if you wanted to discover trends related to a categorical combination like ‘White Female’, check the intersection box for the variables race and gender.
Check the quantile box for continuous variables that you want to discretize.
There are multiple trends that you can choose from to analyze your data.
Pearson Correlation is measure of linear correlation between two variables. It includes a numerical co-efficient ranging form -1.0 to +1.0. The co-efficient is used to determine how positively or negatively the association between the two variables are. Rank Trend ******* Rank trend is used to measure ordinal association. It has a coefficient that measures he degree of similarity between two rankings, and it can be used to assess the significance of the relationship between them. Linear Regression ************** Linear Regression trend is used to show or predict the relationship between two variables, by fitting a linear equation to the observed data. Each observation consists of two variables, a dependent variable and an independent variable.
Using heatmaps to explore details¶
You can click on specific squares in the heatmaps to visualize trends. A detail view will appear in the window that highlights the trend of the square you clicked. Use these detailed views to explore your data.
Press rank button