Skip to main navigation Skip to search Skip to main content

Unsupervised topic model based text network construction for learning word embeddings

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Distributed word embeddings have proven remarkably effective at capturing word level semantic and syntactic regularities in language for many natural language processing tasks. One recently proposed semi-supervised representation learning method called Predictive Text Embedding (PTE) utilizes both semantically labeled and unlabeled data in information networks to learn the embedding of text that produces state of-the-art performance when compared to other embedding methods. However, PTE uses supervised label information to construct one of the networks and many other possible ways of constructing such information networks are left untested. We present two unsupervised methods that can be used in constructing a large scale semantic information network from documents by combining topic models that have emerged as a powerful technique of finding useful structure in an unstructured text collection as it learns distributions over words. The first method uses Latent Dirichlet Allocation (LDA) to build a topic model over text, and constructs a word topic network with edge weights proportional to the word-topic probability distributions. The second method trains an unsupervised neural network to learn the word-document distribution, with a single hidden layer representing a topic distribution. The two weight matrices of the neural net are directly reinterpreted as the edge weights of heterogeneous text networks that can be used to train word embeddings to build an effective low dimensional representation that preserves the semantic closeness of words and documents for NLP tasks. We conduct extensive experiments to evaluate the effectiveness of our methods.
Original languageEnglish
Title of host publicationProceedings - 18th IEEE International Conference on Machine Learning and Applications, ICMLA 2019
EditorsM. Arif Wani, Taghi M. Khoshgoftaar, Dingding Wang, Huanjing Wang, Naeem Seliya
Place of Publicationusa
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages155-161
Number of pages7
ISBN (Electronic)9781728145495
DOIs
StatePublished - Dec 1 2019
Event18th IEEE International Conference on Machine Learning and Applications, ICMLA 2019 - Boca Raton, United States
Duration: Dec 16 2019Dec 19 2019

Conference

Conference18th IEEE International Conference on Machine Learning and Applications, ICMLA 2019
Country/TerritoryUnited States
CityBoca Raton
Period12/16/1912/19/19

Keywords

  • Document Clustering
  • Document Topic Information Network
  • LDA
  • Learning Word Embeddings
  • Topic Model
  • Unsupervised Text Analysis

Cite this