Bias scan tool – What is it?
This bias scan tool identifies potentially unfairly treated groups of similar users by a binary algorithmic classifier. The bias scan identifies clusters of users that face a higher misclassification rate compared to the rest of the data set. Clustering is an unsupervised ML method, so no data is needed is required on protected attributes of users. The metric by which bias is defined can be manually chosen in advance: False Negative Rate (FNR), False Positive Rate (FPR), or Accuracy (Acc).
The tool returns a report which presents the cluster with the highest bias and describes this cluster by the features that characterizes it. This is quantitatively expressed by the (statistically significant) differences in feature means between the identified cluster and the rest of the data. The report also visualizes the outcomes.
Try the tool below ⬇️
Generate a report
Select a bias metric
Upload data
Drag & drop or browse
File loaded
Selected bias metric:
No bias scan metric selected
Scanning!
Report is downloaded :-)
The report will be downloaded when ready
(in 5-15 seconds)
If your report is not downloaded, please check the names of your 'pred_label' and 'truth_label' columns, and whether all column values are numeric values. Or contact us.
Please read below FAQ section to learn more about the inner workings of this bias scan tool.
The implemented bias scan tool is based on k-means Hierarchical Bias-Aware Clustering (HBAC), as described by Misztal-Radecka and Indurkya in Information Processing and Management (2021) [link]. An implementation of the HBAC algorithm can be found on Github.
Download an example data set to use the bias scan tool.
Under the name Joint Fairness Assessment Method (JFAM) our bias scan tool has been selected as a finalist in Stanford's AI Audit Competition 2023.
Input data
What input does the bias scan tool need?
A .csv file of max. 1GB with feature columns, predicted labels by the classifier and ground truth labels. Only the name of 'pred_label' and 'truth_label' are of importance, not the naming or order of the feature columns. All column values should be numeric and unscaled.
- Features: unscaled numeric values, e.g., feat_1, feat_2, ..., feat_n;
- Predicted label: 0 or 1;
- Truth label: 0 or 1;
- Bias metric: False Positive Rate (FPR), False Negative Rate (FNR) or Accuracy.
Example reports
FAQ
Why this bias scan?
– No data needed on protected attributes of users (unsupervised bias detection);
– Model-agnostic (AI binary classifiers only);
– Connecting quantitative tools with qualitative methods to assess fair AI;
– Developed open-source and not-for-profit.