30 de mayo de 2023 a 2 de junio de 2023 Ciencias Naturales, Exactas y Ténicas
Facultad de Matemática y Computación
America/Havana zona horaria

Alternative Topic-Modeling Methods: A Comparison

No programado
20m
Facultad de Matemática y Computación

Facultad de Matemática y Computación

Ponente

Jean-Charles Lamirel (Univerity of Strasbourg - University of Paris 1)

Descripción

In the more and more influential context of the big data, unsupervised mining of textual data content, like topic modeling, is becoming a strategic task. Topic-models, among which LDA, become thus extensively used for various tasks that might make further rich use of the extracted topics for text. However, some former works that performed subjective evaluation of LDA results have shown that LDA produced poor output whenever it is exploited for the analysis of complex data, like for example, for the analysis of research by the use scientific papers. Increasing the accuracy of topic modeling methods is thus still a main concern for performing suitable topic extraction and enhance the quality of further studies. The objective of this paper is two-folds. First, to propose improvements to an alternative topic-modeling approach based on neural clustering and feature maximization. Second, to propose a quantitative comparison between LDA and this new alternative method, we called CFMf. By using a large-scale reference corpus of full-text philosophy of science articles (N=16,917), we thus compare the resulting topics of this latter method to those obtained with LDA. The results show a highly significant improvement (+50%) along key quantitative performance measures such as coherence, independently of the number of topics. Moreover, exploiting the principles of feature maximization on LDA results, we additionally show that it can correct LDA topic description and furthermore significantly increase LDA performance. We discuss these promising results and highlight rich upcoming research work.

Autor primario

Jean-Charles Lamirel (Univerity of Strasbourg - University of Paris 1)

Materiales de la presentación

Todavía no hay materiales.