[Hyper-parameter Tuning] 하이퍼 파라미터 튜닝

[Hyper-parameter Tuning] 하이퍼 파라미터 튜닝

2020. 2. 5. 13:49ㆍnlp

Hyper-parameter Tuning Techniques in Deep Learning

The process of setting the hyper-parameters requires expertise and extensive trial and error. There are no simple and easy ways to set…

towardsdatascience.com

Hyper-parameters

1) learning rate

2) momentum

3) batch size

4) weight decay

Momentum이란?

- 기존 Gradient Descent 식

- Momentum에서 가중치 갱신 : dW를 직접 쓰는 대신, V_dW 사용

- 여기서 β가 momentum이라는 하이퍼 파라미터

- β 의 범위 : [0,1]

- 새로 갱신된 가중치 값을 구하기 위해 기존의 값과 새로 구한 값을 함께 섞어 사용

Learning rate와 Momentum의 관계

- learning rate는 처음에 작게 시작해서 증가시키고 (cyclical learning rate)

- momentum은 처음에 크게 시작해서 감소시키는 것이 좋음 (cyclical momentum)

cf. weight decay는 변화시키지 않는 것이 좋음

Batch size

- batch size 작으면 오버피팅 막기 위해 정규화regularization 필요

- batch size 크면 learning rate도 좀 더 큰 값 이용 가능

Weight decay

- 가중치 감소 : 큰 가중치에 패널티 부여해서 오버피팅 억제

- 정규화의 필요성이 클수록 더 큰 weight decay 값을 사용

[DL Wizard] Learning Rate Scheduling 번역 및 정리 (0)	2020.02.05
[DL Wizard] Derivative, Gradient and Jacobian 번역 및 정리 (0)	2020.02.05
[DL Wizard] Forwardpropagation, Backpropagation and Gradient Descent with PyTorch 번역 및 정리 (0)	2020.02.04
[DL Wizard] Feedforward Neural Network with PyTorch 번역 및 정리 (0)	2020.02.04
딥러닝 모델 평가하기 (0)	2020.02.04

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

codlingual