Expert Systems with Applications, 2018, 103: 106-117.
Xu Y, Yin J, Huang J, et al.
Abstract
Traditional topic modeling has been widely studied and popularly employed in expert systems and information systems. However, traditional topic models cannot discover structural relations among topics, thus losing the chance to explore the data more deeply. Hierarchical topic modeling has the capability of learning topics, as well as discovering the hierarchical topic structure from text data. But purely unsupervised models tend to generate weak topic hierarchies. To solve this problem, we propose a novel knowledge-based hierarchical topic model (KHTM), which can incorporate prior knowledge into topic hierarchy building. A key novelty of this model is that it can mine prior knowledge automatically from the topic hierarchies of multiple domains corpora. In this paper, the knowledge is represented as the word pairs which satisfy the requirement of frequent co-occurrence, and knowledge is organized in form of hierarchical structure. We also propose an iterative learning algorithm. For evaluation, we crawled two new multi-domain datasets and conducted comprehensive experiments. The experimental results show that our algorithm and model can generate more coherent topics, and more reasonable hierarchical structure.