Oops? Interdisciplinary Stories of Sociotechnical Error| Fake It Till You Make It: Synthetic Data and Algorithmic Bias
Abstract
This article interrogates the claim that synthetic data is a risk-free and ethical solution to algorithmic bias. Synthetic data refers to artificial intelligence (AI)-generated datasets that substitute real-life data to train machine learning (ML) models. We examine how bias is supposedly corrected by synthetic data through the three key methods: (1) rebalancing the dataset, (2) the de- and reconstruction of data, and (3) classification. We argue that proponents of synthetic data presume that algorithmic bias resides in the lack of diversity in training data—rather than in sociopolitical inequalities—and mobilize prescriptions for fairness, bias correction, and equitable representations of gender and race. Practices in synthetic data assume that an unbiased, neutral state exists and can be achieved artificially. Analyzing this technical “solution” shows how mainstream ML discourse understands bias as a sociotechnical error. Building on existing literature on algorithmic bias, this article shows how synthetic data, despite promises to the contrary, actually exacerbates and creates new forms of inequality.