栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 软件开发 > 后端开发 > Python

【NLP】Representation Learning for Natural Language Processing

Python 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

【NLP】Representation Learning for Natural Language Processing

  • To build an effective machine learning system, we first transform useful information on raw data into internal representations such as feature vectors.
  • Conventional machine learning systems adopt careful feature engineering as preprocessing to build feature representations from raw data.
  • The distributional hypothesis that linguistic objects with similar distributions have similar meanings is the basis for distributed word representation learning.

One-hot representation
  • assigns a unique index to each word → a high-dimensional sparse representation
  • cannot capture the semantic relatedness among words (The difference between cat and dog is as much as the difference between cat and bed in one-hot word representation)
  • inflexible to deal with new words in a real-world scenario

Representation learning
  • Representation learning aims to learn informative representations of objects from raw data automatically. /// Distributed representation has been proved to be more effificient because it usually has low dimensions that can prevent the sparsity issue.
  • Deep learning is a typical approach for representation learning.
Development of representation learning in NLP

Representation LearningMajor characteristics
N-gram ModelPredicts the next item in a sequence based on its previous n-1 items ∈ probabilistic language model
Bag-of-wordsdisregarding the orders of these words in the document: ①each word that has appeared in the document corresponds to a unique and nonzero dimension. ②a score can be computed for each word (e.g., the numbers of occurrences) to indicate the weights
TF-IDFBoW → Moreover, researchers usually take the importance of different words into consideration, rather than treat all the words equally
Neural Probabilistic Language Model (NPLM)NPLM first assigns a distributed vector for each word, then uses a neural network to predict the next word. 例如,前馈神经网络语言模型、循环神经网络语言模型、长短期记忆的循环神经网络语言模型。

Word embeddings: 

Word2Vec, GloVe, fastText

Inspired by NPLM, there came many methods that embed words into distributed representations. ///  Word embeddings in the NLP pipeline map discrete words into informative low-dimensional vectors.

Pre-trained Language Models (PLM):

ELMo, BERT

take complicated context in text into consideration /// calculate dynamic representations for the words based on their context, which is especially useful for the words with multiple meanings /// pretrained fine-tuning pipeline

The Pre-trained language model family


Applications

Neural Relation Extraction
  • Sentence-Level NRE: A basic form of sentence-level NRE consists of three components: (a) an input encoder to give a representation for each input word (Word Embeddings, Position Embeddings, Part-of-speech (POS) Tag Embeddings, WordNet Hypernym Embeddings). (b) a sentence encoder which computes either a single vector or a sequence of vectors to represent the original sentence. (c) a relation classifier which calculates the conditional probability distribution of all relations.

  • Bag-Level NRE: utilizing information from multiple sentences (bag-level) rather than a single sentence (sentence-level) to decide if a relation holds between two entities. A basic form of bag-level NRE consists of four components: (a) an input encoder similar to sentence-level NRE, (b) a sentence encoder similar to sentence-level NRE, (c) a bag encoder which computes a vector representing all related sentences in a bag, and (d) a relation classifier similar to sentence-level NRE which takes bag vectors as input instead of sentence vectors.

Topic Model
  • Topic modeling algorithms do not require any prior annotations or labeling of the documents. 

主题模型∈生成模型,一篇文章中每个词都是通过 “以一定概率选择某个主题,

并从这个主题中以一定概率选择某个词语” 这样一个过程得到的。

for each document in the collection, we generate the words in a two-stage process:

1. Randomly choose a distribution over topics.

2. For each word in the document,

    • Randomly choose a topic from the distribution over topics in step #1.

    • Randomly choose a word from the corresponding distribution over the vocabulary.

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/829044.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号