N-grams are combinations of several words used together, N-grams, where N = 1 are called unigrams. Likewise, bigrams (N = 2), trigrams (N = 3) and further can be continued in a similar way.

Contents

GloVe: machine learning library for NLP Business Management Books
GloVe: machine learning library for NLP

GloVe

GloVe aims to solve this problem by capturing the value of a single word embedding with the structure of the entire visible corpus. To do this, the model looks for global word count matches and uses enough statistics, minimizes the standard deviation, yields a word vector space with a reasonable substructure. Such a scheme sufficiently allows one to identify the similarity of a word with a vector distance.

In addition to these two models, many recently developed technologies have found application: FastText, Poincare Embeddings, sense2vec, Skip-Thought, Adaptive Skip-Gram.

Machine translate

Machine translation is the transformation of text in one natural language into text equivalent in content in another language. This is done by a program or machine without human intervention. Machine translation uses neighboring word usage statistics. Machine translation systems are widely used commercially, as translations from languages ​​of the world are an industry with a volume of $ 40 billion per year (source: doctranslator).

Traditional machine translation systems have to use a parallel corpus - a set of texts, each of which is translated into one or more other languages. For example, having a source language f (French) and a target language e (English), it is required to build a statistical model that includes a probabilistic formulation for Bayes' rule, a translation model p (f | e) trained on a parallel corpus, and a language model p (e), trained only on the corpus with English.