Detecting Simpson's Paradox
published in FLAIRS 2018
Toward scalable, interactive detection of Simpson’s Paradox.
Simpson’s paradox is the phenomenon that a trend of an association in the whole population reverses within the subpopulations defined by a categorical variable. Detecting Simpson’s paradox indicates surprising and interesting patterns of the data set for the user. We show that our approach detects cases in real data sets as well as synthetic data sets, and demonstrate that our approach can uncover the hidden surprising pattern by detecting occurrences of Simpson’s paradox.