30 de mayo de 2023 a 2 de junio de 2023 Ciencias Naturales, Exactas y Ténicas
Facultad de Matemática y Computación
America/Havana zona horaria

Early detection of potentially influential research articles using text-mining techniques

No programado
20m
Facultad de Matemática y Computación

Facultad de Matemática y Computación

Ponentes

Lucie Dvořáčková Michal Cerny (Prague University of Economics and Business)

Descripción

A newly released scientific article is "too young" to prove its high influence on the scientific community in terms of a high number of citations. Gathering citations requires time. However, text-mining techniques can help in detection of potentially significant scientific work in the early stage when the number of citations is still low. Based on the CORD-19 corpus of biomedical articles, we first use the Word2Vec embedding model on the abstracts of almost 1,000,000 articles. The main contribution is in the second step of our approach: we give interpretability to the black-box NN model by defining scores of words (or n-tuples of words) which measure their potential to contribute to future high citability of a young article. The words (or word connections) with the highest scores then identify concepts which, when present in an abstract, contribute to the high citability most significantly. The model is able to identify particular drugs or covid mutations as significant entities. Not only that: the model also identifies some surprising connections of words that, when appearing in a similar context, can identify unexpected connections between seemingly unrelated research areas. (Many thanks also to T. Klieger, M. P. Joachimiak and V. Sklenak.)

Autor primario

Coautor

Michal Cerny (Prague University of Economics and Business)

Materiales de la presentación

Todavía no hay materiales.