Bias scan tool – What is it?
This bias scan tool identifies potentially unfairly treated groups of similar users by a binary algorithmic classifier. The bias scan identifies clusters of users that face a higher misclassification rate compared to the rest of the data set. Clustering is an unsupervised ML method, so no data is needed is required on protected attributes of users. The metric by which bias is defined can be manually chosen in advance: False Negative Rate (FNR), False Positive Rate (FPR), or Accuracy (Acc).
The tool returns a report which presents the cluster with the highest bias and describes this cluster by the features that characterizes it. This is quantitatively expressed by the (statistically significant) differences in feature means between the identified cluster and the rest of the data. The report also visualizes the outcomes.
Try the tool below ⬇️
Generate a report
Select a bias metric
The implemented bias scan tool is based on k-means Hierarchical Bias-Aware Clustering (HBAC), as described by Misztal-Radecka and Indurkya in Information Processing and Management (2021) [link]. An implementation of the HBAC algorithm can be found on Github.
Download an example data set to use the bias scan tool.
Under the name Joint Fairness Assessment Method (JFAM) our bias scan tool has been selected as a finalist in Stanford's AI Audit Competition 2023.
What input does the bias scan tool need?
A .csv file of max. 1GB with feature columns, predicted labels by the classifier and ground truth labels. Only the name of 'pred_label' and 'truth_label' are of importance, not the naming or order of the feature columns. All column values should be numeric and unscaled.
- Features: unscaled numeric values, e.g., feat_1, feat_2, ..., feat_n;
- Predicted label: 0 or 1;
- Truth label: 0 or 1;
- Bias metric: False Positive Rate (FPR), False Negative Rate (FNR) or Accuracy.
An example report for the BERT-based disinformation detection (FPR) case study
An example report for the BERT-based disinformation detection (FNR) case study
Why this bias scan?
– No data needed on protected attributes of users (unsupervised bias detection);
– Model-agnostic (AI binary classifiers only);
– Connecting quantitative tools with qualitative methods to assess fair AI;
– Developed open-source and not-for-profit.