Introduction – Unsupervised bias detection tool
What is the tool about?
The tool identifies groups where an algorithm or AI system shows variations in performance. This type of monitoring is referred to as anomaly detection. To identify anomalous patterns, the tool uses clustering. Clustering is a form of unsupervised learning. This means detecting disparate treatment (bias) does not require any data on protected attributes of users, such as gender, nationality, or ethnicity. The metric used to measure bias can be manually selected and is referred to as the bias metric
.
What data can be processed?
The tool processes all data in table format. The type of data (numerical, categorical, time, etc.) is automatically detected. One column must be selected as the bias metric
– which should be a numerical value. The user must specify whether a high or low value of the bias metric
is considered better. For example: for an error rate, a low value is better, while for accuracy, a high value is better.
The tool contains a demo data for which output is generated. Hit the ‘Try it out’ button.
Example of numerical data set:
Age | Income | ... | Number of cars | Selected for control |
---|---|---|---|---|
35 | 55.000 | ... | 1 | 1 |
40 | 45.000 | ... | 0 | 0 |
... | ... | ... | ... | ... |
20 | 30.000 | ... | 0 | 0 |
What does the tool return?
The tool identifies deviating clusters. A summary of the results is made available in a bias analysis report that can be downloaded as a pdf. All identified clusters can be downloaded in a .json file. The tool specifically focuses on the most negatively deviating cluster and provides a description of this cluster. These results serve as a starting point for further investigation by domain experts, who can assess whether the observed disparities are indeed undesirable. The tool also visualizes the outcomes.
Overview of process

How is my data processed?
The tool is privacy-friendly because the data is processed entirely within the browser. The data does not leave your computer or the environment of your organization. The tool utilizes the computing power of your own computer to analyze the data. This type of browser-based software is referred to as local-first. The tool does not upload data to third parties, such as cloud providers. Instructions on how to host the tool and local-first architecture can be hosted locally within your own organization can be found on Github.
Try the tool below ⬇️
Web app – Unsupervised bias detection tool
Source code
The source code of the anolamy detection-algorithm is available on Github and as a pip package:
pip install unsupervised-bias-detection
.The architecture to run web apps local-first is also available on Github.
Anolamy detection algorithm – Hierarchical Bias-Aware Clustering (HBAC)
The tool uses the Hierarchical Bias-Aware Clustering (HBAC) algorithm. HBAC processes input data according to the k-means (for numerical data) or k-modes (for categorical data) clustering algorithm. The HBAC-algorithm is introduced by Misztal-Radecka and Indurkya in a scientific article as published in Information Processing and Management (2021). Our implementation of the HBAC-algorithm, including additional methodological checks to distinguish real bias from noise, such as sample splitting, statistical hypothesis testing and measuring cluster stability, can be found in the unsupervised-bias-detection pip package.
Scientific paper and audit report
The unsupervised bias detection tool has been applied in practice to audit a Dutch public sector risk profiling algorithm. Our team documented this use case in a scientific paper. The tool identified proxies for students with a non-European migration background in the risk profiling algorithm, specifically education level and distance between the student’s address and their parent(s)’ address. The results are also described in Appendix A of the below report. This report was sent to Dutch parliament on 22-05-2024.
Local-first architecture
What is local-first computing?
Local-first computing is the opposite of cloud computing: the data is not uploaded to third-parties, such as a cloud providers, and is processed by your own computer. The data attached to the tool therefore doesn’t leave your computer or the environment of your organization. The tool is privacy-friendly because the data can be processed within the mandate of your organisation and doesn’t need to be shared with new parties. The unsupervised bias detection tool can also be hosted locally within your organization. Instructions, including the source code or the web app, can be found on Github.
Overview of local-first architecture

Supported by
This tool is developed with support of public and philanthropic organisations.

Innovation grant Dutch Ministry of the Interior
Description
In partnership with the Dutch Executive Agency for Education and the Dutch Ministry of the Interior, Algorithm Audit has been developing and testing this tool from July 2024 to July 2025, supported by an Innovation grant from the annual competition hosted by the Dutch Ministry of the Interior. Project progress was shared at a community gathering on 13-02-2025.

SIDN Fund
Description
In 2024, the SIDN Fund supported Algorithm Audit to develop a first demo of the unsupervised bias detection tool.
Awards and acknowledgements
This tool has received awards and is acknowledged by various stakeholders, including civil society organisations, industry representatives and academics.

Finalist Stanford’s AI Audit Challenge 2023
Description
Under the name Joint Fairness Assessment Method (JFAM) the unsupervised bias detection tool has been selected as a finalist in Stanford’s AI Audit Competition 2023.
OECD Catalogue of Tools & Metrics for Trustworthy AI
Description
The unsupervised bias detection tool is part of OECD’s Catalogue of Tools & Metrics for Trustworthy AI.
Summary
Key take-aways about unsupervised bias detection tool:
- Quantitative-qualitative research method: Data-driven bias testing combined with the balanced and context-sensitive judgment of human experts;
- Unsupervised bias detection: No user data needed on protected attributes (unsupervised learning);
- Anolamy detection: Scalable method based on statistical analysis;
- Detects complex bias: Identifies unfairly treated groups characterized by mixture of features, detects intersectional bias;
- Model-agnostic: Works for all binary classification algorithms and AI systems;
- Open-source and not-for-profit: User friendly and free to use for the entire AI auditing community.
Team

Floris Holstege
PhD-candidate Machine Learning, University of Amsterdam

Joel Persson PhD
Research Scientist, Spotify

Kirtan Padh
PhD-candidate Causal Inference and Machine Learning, TU München

Krsto Proroković
PhD-candidate, Swiss AI Lab IDSIA

Mackenzie Jorgensen PhD
Researcher Alan Turing Institute, London