
[DL Wizard] Learning Rate Scheduling 번역 및 정리

Learning Rate Scheduling - Deep Learning Wizard

Learning Rate Scheduling Optimization Algorithm: Mini-batch Stochastic Gradient Descent (SGD) We will be using mini-batch gradient descent in all our examples here when scheduling our learning rate Combination of batch gradient descent & stochastic gradien


2가지 Learning Rate Schedules

1) Step-wise Decay

2) Reduce on Loss Plateau Decay



Step-wise Decay


- 이전 학습률 * gamma = 다음 학습률


- 2개의 하이퍼 파라미터 : step_size, gamma


from torch.optim.lr_scheduler import StepLR
# Instantiate optimizer class
learning_rate = 0.1
oprimizer = torch.opim.SGD(model.parameters(), lr=learning_rate, momentum=0.9, nesterov=True)

# Instantiate step learning scheduler class

# step_size: at how many multiples of epoch you decay
# step_size = 1, after every 1 epoch, new_lr = lr*gamma 
# step_size = 2, after every 2 epoch, new_lr = lr*gamma 
# gamma = decaying factor

scheduler = StepLR(optimizer, step_size=1, gamma=0.1)

# Train the model
for epoch in range(num_epochs):
    # Decay learning rate
    # Print learning rate 
    print('Epoch:', epoch,'LR:', scheduler.get_lr())
    for i, (images, labels) in enumerate(train_loader):
        images = images.view(-1, 28*28).requires_grad_()
        # Clear gradients 
        # Forward pass to get output
        outputs = model(images)
        # Calculate Loss : softmax --> cross entropy loss
        loss = criterion(outputs, labels)
        # Get gradients
        # Update parameters



Reduce on Loss Plateau Decay

- 오차가 더 이상 줄어들지 않고 같은 값에 머물면 학습률 줄이기 

* plateau (verb)  to reach a state or level of little or no growth or decline, especially to stop increasing or progressing

- Patience : 오차가 줄어들지 않기 시작한 후 몇 번의 epoch부터 학습률을 줄일지 


- epoch이 크면 patience도 더 크게 설정하기

- 2개의 하이퍼 파라미터 : patience, decay factor 


from torch.optim.lr_scheduler import ReduceLROnPlateau 

# Instantiate Optimizer Class
learning_rate = 0.1
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, momentum=0.9, nesterov=True)

# Instantiate Step Learning Scheduler Class

# lr = lr * factor 
# mode='max': look for the maximum validation accuracy to track
# patience: number of epochs - 1 where loss plateaus before decreasing LR
        # patience = 0, after 1 bad epoch, reduce LR
# factor = decaying factor

scheduler = ReduceLROnPlateau(optimizer, mode='max', factor=0.1, patience=0, verbose=True)


