Mining Web Log Data for News Topic Modeling Using Latent Dirichlet Allocation

Published in IEEE 2018 5th International Conference on Information Science and Control Engineering (ICISCE), 2018

The growth of e-news platforms, the most popular and accessible media for sharing information, has resulted in the increase of digital news articles volume. Users’ navigation across news articles in e-news platform, which is captured in form of web log data, is able to show which articles are read by users. News articles read by users can illustrate topics of interest and public unrest towards a particular event, field, or aspect. The knowledge and understanding of topics of interest and public unrest are important, especially for subsequent newsletter journalists and government in policy-making. This study was conducted in response to the importance of extracting topics from news articles read by users or public. Latent dirichlet allocation was used as topic modeling algorithm from list of news article title and category obtained from user web log data across 5 e-news publisher domains in Indonesia. The topic modeling process results in 12 topics of news articles. The results of this study provide insight to e-news platform regarding the reading material focus of users.

Download paper here