栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 软件开发 > 后端开发 > Python

NLP 入门知识点

Python 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

NLP 入门知识点

最近从 B 站上找了个教程 学习NLP 的知识,就以此篇博客作为载体记录课上学的知识点吧。

Long Short Term Memory (LSTM) 模型
  • LSTM uses a “conveyor belt” to get longer memory than SimpleRNN.

  • Each of the following blocks has a parameter matrix:

    • Forget gate
    • Input gate
    • New values
    • Output values
  • Number of parameters: 4 × times × shape(h) × times × [shape(h) + shape(x)]

Making RNNs More Effective
  • SimpleRNN and LSTM are two kinds of RNNs; always use LSTM instead of SimpleRNN.
  • Use Bi-RNN instead of RNN whenever possible.
  • Stacked RNN may be better than a single RNN layer (if n is big).
  • Pretrain the embedding layer (if n is small).
Stacked RNN

Bidirectional RNN


Text Generation (自动文本生成) Train a Neural Network
  1. Partition text to (segment, next_char) pairs.
  2. One-hot encode the characters.
    • Character → rightarrow → v v v × times × 1 vector.
    • Segment → rightarrow → l l l × times × v v v matrix
  3. Build and train a neural network
    • l l l × times × v v v matrix ⇒ Rightarrow ⇒ LSTM ⇒ Rightarrow ⇒ Dense ⇒ Rightarrow ⇒ v v v × times × 1 vector
Text Generation
  1. Propose a seed segment
  2. Repeat the followings:
    a) Feed the segment (with one-hot) to the neural network.
    b) The neural network outputs probabilities.
    c) next_char ← leftarrow ← Sample from the probabilities.
    d) Append next_char to the segment.
机器翻译与 Seq2Seq 模型


如何提升 Seq2Seq
  1. Bi-LSTM instead of LSTM (Encoder only!!!)
    • Encoder’s final states ( h t h_t ht​ and c t c_t ct​) have all the information of the English sentense.
    • If the sentence is long, the final states have forgotten early inputs.
    • Bi-LSTM (left-to-right and right-to-left) has longer memory.
    • Use Bi-LSTM in the encoder; use unidirectional LSTM in the decoder.
  2. Word-Level Tokenization
    • Word-Level tokenization instead of char-level.
      • The average length of English words is 4.5 letters.
      • The sequences will be 4.5x shorter.
      • Shorter sequence -> less likely to forget.
    • But you will need a large dataset,
      • of (frequently used) chars is ~ 1 0 2 10^2 102 → rightarrow → one-hot suffices.
      • of (frequently used) words is ~ 1 0 4 10^4 104 → rightarrow → must be embedding.
      • Embedding layer has many parameters → rightarrow → overfitting!
    1. Multi-Task Learning (这样一来 encoder 只有一个而训练数据多了一倍,所以可以训练的更好)
Attention (注意力机制)

Seq2Seq 模型有个缺点是无法记住一个很长的句子的完整信息,所以有可能句子中有个别词被忘记而 decoder 无从得知完整的句子进入无法得到正确的翻译。

因此,一些研究者引入了 Attention 机制,Attention 的特点如下:

  • Attention tremendously improves Seq2Seq model.
  • With attention, Seq2Seq model does not forget source input.
  • With attention, the decoder knows where to focus.
  • Downside: much more computation
Transformer and BERT
  1. Transformer is Seq2Seq modell; it has an encoder and decoder.
  2. Transformer model is not RNN.
  3. Transformer is based on attention and self-attention.
  4. BERT is for pre-training Transformer’s encoder.
转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/286441.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号