Semantic Academic Profiler (SAP): a framework for researcher assessment based on semantic topic modeling

Scientometrics. 2022;127(8):5005-5026. doi: 10.1007/s11192-022-04449-9. Epub 2022 Jul 11.

Abstract

Recent efforts have focused on identifying multidisciplinary teams and detecting co-Authorship Networks based on exploring topic modeling to identify researchers' expertise. Though promising, none of these efforts perform a real-life evaluation of the quality of the built topics. This paper proposes a Semantic Academic Profiler (SAP) framework that allows summarizing articles written by researchers to automatically build research profiles and perform online evaluations regarding these built profiles. SAP exploits and extends state-of-the-art Topic Modeling strategies based on Cluwords considering n-grams and introduces a new visual interface able to highlight the main topics related to articles, researchers and institutions. To evaluate SAP's capability of summarizing the profile of such entities as well as its usefulness for supporting online assessments of the topics' quality, we perform and contrast two types of evaluation, considering an extensive repository of Brazilian curricula vitae: (1) an offline evaluation, in which we exploit a traditional metric (NPMI) to measure the quality of several data representations strategies including (i) TFIDF, (ii) TFIDF with Bi-grams, (iii) Cluwords, and (iv) CluWords with Bi-grams; and (2) an online evaluation through an A/B test where researchers evaluate their own built profiles. We also perform an online assessment of SAP user interface through a usability test following the SUS methodology. Our experiments indicate that the CluWords with Bi-grams is the best solution and the SAP interface is very useful. We also observed essential differences in the online and offline assessments, indicating that using both together is very important for a comprehensive quality evaluation. Such type of study is scarce in the literature and our findings open space for new lines of investigation in the Topic Modeling area.

Keywords: Semantic Academic Profiler; Topic modeling; Word embeddings.