Naive Bayes and Text Classification

Naive Bayes and Text Classification

2019. 12. 4. 14:58ㆍnlp

Text Classification

1. 명시적 코딩

2. 감독학습

2.1. Generative / Joint Model (ex) Naive Bayes, Language Model

2.2. Discriminative / Conditional Mdoel (ex) Logistic Regression, Maximum Entropy Model

Naive Bayes(=NB)의 2가지 가정

1) Bag of Words(=BoW) : 단어의 순서는 중요하지 않다

2) 각 feature는 모두 독립적이다

[ NB 수식 ]

* d = document(글), c = class(종류) , f = feature(특징적 요소)

[ NB 계산하기 ]

1) 단순 count

2) Add-one Smoothing

특정 class에 나타난 특정 feature의 횟수를 한 번 더 본 셈 치기

분모의 |V|는 모든 class에 나타난 feature의 개수(중복없이, set(feature))

※ 출처

codlingual