Last week Meta open-sourced a new Python package - balance, offering a simple workflow and methods for dealing with biased data samples when looking to infer from them to some population of interest.
Bias in data is a common problem in classification or survey analysis. A similar issue arises in observational studies when comparing the treated vs untreated groups, and in any data that suffers from selection bias.
Main workflow involves three steps:
(1) understanding the initial bias in the data relative to a target we would like to infer,
(2) adjusting the data to correct for the bias by producing weights for each unit in the sample based on propensity scores, and
(3) evaluating the final bias and the variance inflation after applying the fitted weights.
The package provides a set of tools for adjusting and visualizing biased datasets. That includes:
✅ Resample and change the data based on the feedback
✅ Data visualization tools
✅ Inverse propensity weighting in the form of a logistic regression model
The core workflow in balance deals with fitting and evaluating weights to a sample. For each unit in the sample (such as a respondent to a survey), balance fits a weight that can be (loosely) interpreted as the number of people from the target population that this respondent represents. This aims to help mitigate the coverage and non-response biases.
We curate and publish daily updates from the field of AI.
Consider becoming a paying subscriber to get the latest!