20 August 2016

Words and Vectors

Clustering has become an active research area driven through deep learning techniques in deriving vectors of understanding in Natural Language Processing. Word2Vec is a fairly actively used technique for clustering. Its input is a text corpus and its output is a set of feature vectors for words. There are many libraries available that provide implementations for word embeddings including Gensim, DL4J, Spark, and others. The following are some variational areas within the same Word2Vec approach.

Doc2Vec (aka Paragraph2Vec, Sentence2Vec, Text2Vec)
Phrase2Vec
Sequence2Vec