[DL Wizard] Forwardpropagation, Backpropagation and Gradient Descent with PyTorch 번역 및 정리
2020. 2. 4. 20:24ㆍnlp
반응형
FNN 구조
- Linear function : hidden size = 32
- Non-linear function : sigmoid
- Linear function : output size = 1
- Non-linear function : sigmoid
output size = 1 (꽃의 종류에 따라 0 또는 1 출력)
input size =2 (꽃의 특징 두가지 입력)
train data size = 100
1) 데이터 불러오기
import torch
import torch.nn as nn
torch.manual_seed(2)
from sklearn import datasets
from sklearn import preprocessing
iris = datasets.load_iris()
# iris.data[:, :2] : 모든 feature 말고 첫 2개 feature만 불러오기
X = torch.tensor(preprocessing.normalize(iris.data[:, :2]), dtype=torch.float)
y = torch.tensor(iris.target.reshape(-1,1), dtype=torch.float)
X.size() # [150,2]
y.size() # [150,1]
# 원래 데이터의 꽃 종류는 0,1,2로 3가지. 우리는 0,1로 2가지만 사용
# y[y<2].size() : y label이 0,1인 경우의 사이즈 [100,2]
# y[y<2].size()[0] = 100
X = X[ :y[y<2].size()[0]]
y = y[ :y[y<2].size()[0]]
X.size() # [100,2]
y.size() # [100,1]
2) FNN model class 생성
class FNN(nn.Module):
def __init__(self, ):
super().__init__()
# Dimensions for input, hidden and output
self.input_dim = 2
self.hidden_dim = 32
self.output_dim = 1
# Learning rate definition
self.learning_rate = 0.001
# Our parameters (weights)
# w1: 2 x 32
self.w1 = torch.randn(self.input_dim, self.hidden_dim)
# w2: 32 x 1
self.w2 = torch.randn(self.hidden_dim, self.output_dim)
def sigmoid(self, s):
return 1 / (1 + torch.exp(-s))
def sigmoid_first_order_derivative(self, s):
return s * (1 - s)
# Forward propagation
def forward(self, X):
# First linear layer
self.y1 = torch.matmul(X, self.w1) # 3 X 3 ".dot" does not broadcast in PyTorch
# First non-linearity
self.y2 = self.sigmoid(self.y1)
# Second linear layer
self.y3 = torch.matmul(self.y2, self.w2)
# Second non-linearity
y4 = self.sigmoid(self.y3)
return y4
# Backward propagation
def backward(self, X, l, y4):
# Derivative of binary cross entropy cost w.r.t. final output y4
self.dC_dy4 = y4 - l
'''
Gradients for w2: partial derivative of cost w.r.t. w2
dC/dw2
'''
self.dy4_dy3 = self.sigmoid_first_order_derivative(y4)
self.dy3_dw2 = self.y2
# Y4 delta: dC_dy4 dy4_dy3
self.y4_delta = self.dC_dy4 * self.dy4_dy3
# This is our gradients for w1: dC_dy4 dy4_dy3 dy3_dw2
self.dC_dw2 = torch.matmul(torch.t(self.dy3_dw2), self.y4_delta)
'''
Gradients for w1: partial derivative of cost w.r.t w1
dC/dw1
'''
self.dy3_dy2 = self.w2
self.dy2_dy1 = self.sigmoid_first_order_derivative(self.y2)
# Y2 delta: (dC_dy4 dy4_dy3) dy3_dy2 dy2_dy1
self.y2_delta = torch.matmul(self.y4_delta, torch.t(self.dy3_dy2)) * self.dy2_dy1
# Gradients for w1: (dC_dy4 dy4_dy3) dy3_dy2 dy2_dy1 dy1_dw1
self.dC_dw1 = torch.matmul(torch.t(X), self.y2_delta)
# Gradient descent on the weights from our 2 linear layers
self.w1 -= self.learning_rate * self.dC_dw1
self.w2 -= self.learning_rate * self.dC_dw2
def train(self, X, l):
# Forward propagation
y4 = self.forward(X)
# Backward propagation and gradient descent
self.backward(X, l, y4)
3) FNN model 학습시키기
# Instantiate our model class and assign it to our model object
model = FNN()
# Loss list for plotting of loss behaviour
loss_lst = []
# Number of times we want our FNN to look at all 100 samples we have, 100 implies looking through 100x
num_epochs = 101
# Let's train our model with 100 epochs
for epoch in range(num_epochs):
# Get our predictions
y_hat = model(X)
# Cross entropy loss, remember this can never be negative by nature of the equation
# But it does not mean the loss can't be negative for other loss functions
cross_entropy_loss = -(y * torch.log(y_hat) + (1 - y) * torch.log(1 - y_hat))
# We have to take cross entropy loss over all our samples, 100 in this 2-class iris dataset
mean_cross_entropy_loss = torch.mean(cross_entropy_loss).detach().item()
# Print our mean cross entropy loss
if epoch % 20 == 0:
print('Epoch {} | Loss: {}'.format(epoch, mean_cross_entropy_loss))
loss_lst.append(mean_cross_entropy_loss)
# (1) Forward propagation: to get our predictions to pass to our cross entropy loss function
# (2) Back propagation: get our partial derivatives w.r.t. parameters (gradients)
# (3) Gradient Descent: update our weights with our gradients
model.train(X, y)
반응형
'nlp' 카테고리의 다른 글
[DL Wizard] Derivative, Gradient and Jacobian 번역 및 정리 (0) | 2020.02.05 |
---|---|
[Hyper-parameter Tuning] 하이퍼 파라미터 튜닝 (0) | 2020.02.05 |
[DL Wizard] Feedforward Neural Network with PyTorch 번역 및 정리 (0) | 2020.02.04 |
딥러닝 모델 평가하기 (0) | 2020.02.04 |
1차원 배열 크기 주의 (0) | 2020.02.04 |