Seek and you may (not) find: A multi-institutional analysis of where research data are shared

PLoS One. 2024 Apr 25;19(4):e0302426. doi: 10.1371/journal.pone.0302426. eCollection 2024.

Abstract

Research data sharing has become an expected component of scientific research and scholarly publishing practice over the last few decades, due in part to requirements for federally funded research. As part of a larger effort to better understand the workflows and costs of public access to research data, this project conducted a high-level analysis of where academic research data is most frequently shared. To do this, we leveraged the DataCite and Crossref application programming interfaces (APIs) in search of Publisher field elements demonstrating which data repositories were utilized by researchers from six academic research institutions between 2012-2022. In addition, we also ran a preliminary analysis of the quality of the metadata associated with these published datasets, comparing the extent to which information was missing from metadata fields deemed important for public access to research data. Results show that the top 10 publishers accounted for 89.0% to 99.8% of the datasets connected with the institutions in our study. Known data repositories, including institutional data repositories hosted by those institutions, were initially lacking from our sample due to varying metadata standards and practices. We conclude that the metadata quality landscape for published research datasets is uneven; key information, such as author affiliation, is often incomplete or missing from source data repositories and aggregators. To enhance the findability, interoperability, accessibility, and reusability (FAIRness) of research data, we provide a set of concrete recommendations that repositories and data authors can take to improve scholarly metadata associated with shared datasets.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biomedical Research
  • Humans
  • Information Dissemination* / methods
  • Metadata*

Grants and funding

Funding for this research and the Realities of Academic Data Sharing (RADS) Initiative was provided by the National Science Foundation (NSF), award #2135874, EAGER grant: Completing the Lifecycle: Developing Evidence Based Models of Research Data Sharing. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.