Ponentes
Descripción
A newly released scientific article is "too young" to prove its high influence on the scientific community in terms of a high number of citations. Gathering citations requires time. However, text-mining techniques can help in detection of potentially significant scientific work in the early stage when the number of citations is still low. Based on the CORD-19 corpus of biomedical articles, we first use the Word2Vec embedding model on the abstracts of almost 1,000,000 articles. The main contribution is in the second step of our approach: we give interpretability to the black-box NN model by defining scores of words (or n-tuples of words) which measure their potential to contribute to future high citability of a young article. The words (or word connections) with the highest scores then identify concepts which, when present in an abstract, contribute to the high citability most significantly. The model is able to identify particular drugs or covid mutations as significant entities. Not only that: the model also identifies some surprising connections of words that, when appearing in a similar context, can identify unexpected connections between seemingly unrelated research areas. (Many thanks also to T. Klieger, M. P. Joachimiak and V. Sklenak.)