[Week 6 - Day 2] RNN, LSTM, GRU

Basics of Recurrent Neural Networks (RNNs)

Basic structure

입력 : 각 timestep의 입력 벡터 xt와 전 타임스텝 RNN 모델에서 계산한 hidden state vector h(t-1)
출력 : 현재 time step의 ht
서로 다른 time step에서 들어오는 입력데이터를 처리할 때 동일한 파라미터를 가진 반복적인 모듈 재귀적 호출
hidden state vector가 다음 timestep의 입력인 동시에 출력값 y
fW : RNN 모듈에 필요한 linear transfomation matrix W 를 파라미터로 가지는 함수
- 비선형 변환 tanh 통과
output layer y : 예측값 필요한 경우 linear transfomation matrix Why * y
- binary classification : scalar 값에 sigmoid 함수 적용
- multi class classification : class 개수만큼의 dimesion을 가지는 vertor에 softmax 함수 적용하여 class 개수만큼의 확률분포

Types of RNNs

one-to-one

입력과 출력의 timestep이 하나 (sequence 아님)

one to many

입력은 하나의 timestep
- 추가적으로 넣어줄 입력이 없는 경우 모두 0으로 채워진 같은 사이즈의 입력을 줌
출력은 여러 timestep
Ex) image caption : 하나의 이미지를 입력 설명(각 단어)을 예측 또는 생성

many to one

입력 sequence
출력 마지막 timestep
ex) Sentiment classification : 입력문장이 주어지면 각 단어를 timestep에서 받아서 처리 후 마지막 timestep에서 긍정/부정 출력

many to many

입출력 모두 sequence
ex) machine translation
먼저 입력을 모두 읽고 출력 단어 생성
입력이 주어질 때마다 예측 수행 (delay X) (POS tagging, video classification)

Character-level Language Model

RNN의 가장 간단한 task

Language Model 언어 모델

주어진 문자열이나 단어들의 순서를 바탕으로 다음 단어의 순서가 무엇인지

Character Level Language Model

ex) "hello" many to many
1. 사전 구축 (vocabulary)- unique character
2. one hot vector로 표현
3. h,e,l,l,순서로 입력 -> e,l,l,o 예측
4. hidden state 계산 - output vector Why (차원 동일) - softmax layer 통과
5. ground truth 의 벡터와 가까워지도록 loss 적용
6. inference 수행 - 첫번째 'h'만 입력 -> 예측 값을 다음 입력값으로 무한한 길이의 sequence 생성 가능
ex) 주식 가격 데이터 예측
ex) 여러 단어, 여러 문장 문단 학습
- 공백, 마침표, 쉼표, 줄바꿈 사전에 기록 - character sequence 로 볼 수 있음
ex) latex 언어 논문 작성, C 언어 코드 작성

Backpropagation through time(BPTT)

character level model의 학습과정
각 timestep마다 주어진 character- hidden state - output layer - ground truth 와 비교하여 loss 값
Wxh, Whh, Why 행렬들이 backpropagation을 통해 학습
길이가 길어지면 한번에 처리할 수 있는 양이 한정되어 truncation 제한된 길이의 sequnece로 학습 수행

Searching for Interpretable Cells
- 정보들이 hidden state 각각의 차원에 담겨있음
- ex) "" 열려있는지 체크하는 quote detection cell
- ex) If statement cell

Vanishing/Exploding Gradient Problem in RNN

이제까지의 예시들은 LSTM이나 GRU에서 수행한 결과이고, Orinigal RNN에서는 문제 발생
동일한 matrix를 매 timestep마다 곱하면 0보다 작은 경우 Vanishing, 0보다 큰 경우 exploding
ex) Whh 정사각행렬에서 발생되는 gradient
- backprop 수행했을 때 RNN 빠르게 감소 , 뒤쪽의 timestep까지 유의미하게 전달 불가
- LSTM 은 long term depedency 개선

Long Short-Term Memory(LSTM) & Gated Recurrent Unit (GRU)

Vanilla RNN보다 진보한 LSTM
LSTM을 경량화 한 GRU

LSTM

vanishing/exploding gradient 문제 개선
long term dependency 개선

{ct,ht} = LSTM(xt, c(t-1), h(t-1))
cell state vector : 보다 완전한 정보
hidden state vector : cell state vector를 한 번 더 가공
xt, ht 선형변환 벡터를 4개로 나누고 sigmoid/ tanh layer 통과

Input gate

Forget gate
- cell state vector와 forget gate의 값(입력값 선형변환, sigmoid 거쳐서 나온 벡터)을 계산하여 잊어버릴 정보
  Output gate
  
  Gate gate - tanh
- 현재 timestep에서 계산되는 유의미한 정보
- C~t tanh를 거쳐서 나온 벡터 (-1~1) 와 input gate와 계산
  - 한번의 선형변환만으로 더해줄 정보를 만들기가 어려운 경우 c~t로 더 크게 만들어주고 input gate를 통해 비율만큼 덜어준다
- 이전 timestep의 cell state 정보와 forget gate 를 계산한 정보를 더함
hidden state 계산
- ct에 tanh를 적용해서 -1~1 사이의 범위를 만들어준 후 output gate의 값을 곱해주어서 cell state가 가지던 정보에서 적절한 비율로 작게 만듦
- 예측값에 직접적으로 필요한 정보만을 담음
- ex ) "hello"
  - 따옴표 열려있다는 정보 -> Cell state vector
  - 다음 글자 "o" -> hidden state vector

GRU

LSTM 모델 구조를 경량화 해서 적은 메모리, 빠른 시간
Cell state vector + hidden state vector = hidden state vector

ht = (1-zt)ht-1 + zt h~t

zt가 LSTM에서의 input gate 역할
forget gate 자리에는 1-zt
input z가 커질 수록 forget 1-z가 작아져 hidden state vector는 ht-1과 현재 만들어진 ht 간의 가중 평균
경량화된 모델임에도 비슷한 성능

LSTM, GRU에서의 Backpropagation

전 타임스텝의 cell state에서 필요로 하는 정보를 +를 통해 만들어줌으로써 gradient vanishing 문제 해결

저작자표시

'Boostcamp AI Tech > [week 6-14] LEVEL2' 카테고리의 다른 글

[Week 10] PStage 과정 정리 (0)	2021.10.06
[Week 10] SentencePieceTokenizer (0)	2021.10.05
[Week 9] F1 Score , Stratified K Fold (0)	2021.10.01
[week 7 - day 1,2] Transformer (0)	2021.09.14
[Week 6 - Day 3 ] seq2seq (0)	2021.09.09

newnu blog

[Week 6 - Day 2] RNN, LSTM, GRU

Basics of Recurrent Neural Networks (RNNs)

Basic structure

Types of RNNs

one-to-one

one to many

many to one

many to many

Character-level Language Model

Language Model 언어 모델

Character Level Language Model

Backpropagation through time(BPTT)

Searching for Interpretable Cells

Vanishing/Exploding Gradient Problem in RNN

Long Short-Term Memory(LSTM) & Gated Recurrent Unit (GRU)

LSTM

Input gate

Forget gate

Output gate

Gate gate - tanh

hidden state 계산

GRU

LSTM, GRU에서의 Backpropagation

'Boostcamp AI Tech > [week 6-14] LEVEL2' 카테고리의 다른 글

티스토리툴바

[Week 6 - Day 2] RNN, LSTM, GRU

Basics of Recurrent Neural Networks (RNNs)

Basic structure

Types of RNNs

one-to-one

one to many

many to one

many to many

Character-level Language Model

Language Model 언어 모델

Character Level Language Model

Backpropagation through time(BPTT)

Searching for Interpretable Cells

Vanishing/Exploding Gradient Problem in RNN

Long Short-Term Memory(LSTM) & Gated Recurrent Unit (GRU)

LSTM

Input gate

Forget gate

Output gate

Gate gate - tanh

hidden state 계산

GRU

LSTM, GRU에서의 Backpropagation

'Boostcamp AI Tech > [week 6-14] LEVEL2' 카테고리의 다른 글

관련글

티스토리툴바