Big Data, Big Questions| A Dozen Ways to Get Lost in Translation: Inherent Challenges in Large Scale Data Sets
As noted by the late Susan Leigh Star, technoscientific research always involves simplification and standardization. In recent years, the collection and analysis of large-scale data sets (LSDS) have become the norm. These are often convenience samples analyzed by data mining techniques. Moreover, these data are often used as the basis for public and private policy and action. At the same time, the term “large-scale” suggests completeness, while ease of collection and analysis suggest that little else need be done. Both tend to crowd out other interpretations; hence understanding their limits should be of the utmost concern. This article discusses a number of the issues of concern that arise out of the necessary but potentially problematic simplifications/standardizations found in LSDS.