Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries

Alexandra Olteanu; Carlos Castillo; Fernando Diaz; Emre Kıcıman

doi:10.3389/fdata.2019.00013

Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries

Front Big Data. 2019 Jul 11:2:13. doi: 10.3389/fdata.2019.00013. eCollection 2019.

Authors

Alexandra Olteanu^{1

2}, Carlos Castillo³, Fernando Diaz², Emre Kıcıman⁴

Affiliations

¹ Microsoft Research, New York, NY, United States.
² Microsoft Research, Montreal, QC, Canada.
³ Department of Information and Communication Technologies, Universitat Pompeu Fabra, Barcelona, Spain.
⁴ Microsoft Research, Redmond, WA, United States.

Abstract

Social data in digital form-including user-generated content, expressed or implicit relations between people, and behavioral traces-are at the core of popular applications and platforms, driving the research agenda of many researchers. The promises of social data are many, including understanding "what the world thinks" about a social issue, brand, celebrity, or other entity, as well as enabling better decision-making in a variety of fields including public policy, healthcare, and economics. Many academics and practitioners have warned against the naïve usage of social data. There are biases and inaccuracies occurring at the source of the data, but also introduced during processing. There are methodological limitations and pitfalls, as well as ethical boundaries and unexpected consequences that are often overlooked. This paper recognizes the rigor with which these issues are addressed by different researchers varies across a wide range. We identify a variety of menaces in the practices around social data use, and organize them in a framework that helps to identify them. "For your own sanity, you have to remember that not all problems can be solved. Not all problems can be solved, but all problems can be illuminated." -Ursula Franklin.

Keywords: biases; ethics; evaluation; social media; user data.

Publication types

Review