AI/Machine Learning

[ML] Nearest Neighbor Method - KNN(3)

์ดํƒœํ™ 2022. 11. 11. 21:09

๐Ÿค” KNN(K - Nearest Neighors Classifier)

KNN์ด๋ž€ ๋ง ๊ทธ๋Œ€๋กœ K๊ฐœ์˜ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์ด์›ƒ(๋ฐ์ดํ„ฐ)๋“ค์„ ์ด์šฉํ•˜์—ฌ ๋ถ„๋ฅ˜๋ฅผ ์ง„ํ–‰ํ•˜๋Š” ๊ฒƒ์„ ๋งํ•ฉ๋‹ˆ๋‹ค.

 

๋งค์šฐ ๋‹จ์ˆœํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด์ง€๋งŒ ์ƒ๊ฐ๋ณด๋‹ค ์ข‹์€ ์„ฑ๋Šฅ์„ ๋‚ด๊ธฐ ๋•Œ๋ฌธ์— ๋“œ๋ž˜๊ณค๋ณผ์˜ ์ „ํˆฌ๋ ฅ ์ธก์ •๊ธฐ์™€ ๊ฐ™์€ ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.

 

์ฆ‰, KNN์•Œ๊ณ ๋ฆฌ์ฆ˜๋ณด๋‹ค ์ข‹์ง€ ๋ชปํ•œ ์„ฑ๋Šฅ์„ ๋‚ด๋Š” ๋ชจ๋ธ๋“ค์€ ๋ฏฟ๊ณ  ๊ฑธ๋Ÿฌ์ฃผ์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค.

 

์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ๋Š” KNN์— ๋Œ€ํ•ด ๋ฐฐ์›Œ๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

๐Ÿ”Ž KNN(k-Nearest Neighbors Classifier)

๊ฒฐ๊ตญ ์šฐ๋ฆฌ๊ฐ€ ๊ณต๋ถ€ํ•˜๋ ค๊ณ  ํ–ˆ๋˜ ๋ชฉํ‘œ์ธ KNN๊นŒ์ง€ ๋„์ฐฉํ–ˆ์Šต๋‹ˆ๋‹ค.

 

์—ฌ๋Ÿฌ๊ฐ€์ง€ ๊ฐœ๋…๋“ค์„ ๋ฐฐ์šด๋‹ค๊ณ  ๊ณ ์ƒํ•˜์…จ์œผ๋‹ˆ ์ผ๋‹จ ์ˆจ ํ•œ๋ฒˆ ์‰ฌ๊ณ  ๊ฐ‘์‹œ๋‹ค.

 

KNN์€ ๋งค์šฐ ์ง๊ด€์ ์ด๊ณ  ์‰ฝ์Šต๋‹ˆ๋‹ค.

 

ํŠน์ง•์ด ์•„๋‹ˆ๋ผ ๋ฐฐ์šฐ๊ธฐ์—๋„ ๋งˆ์ฐฌ๊ฐ€์ง€์ž…๋‹ˆ๋‹ค.

 

KNN์€ ๋ง ๊ทธ๋Œ€๋กœ K๊ฐœ์˜ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ๋ฐ์ดํ„ฐ๋“ค์„ ์ด์šฉํ•˜์—ฌ ํ•ด๋‹น ๋ฐ์ดํ„ฐ์˜ ๊ฐ’์„ ๋ถ„๋ฅ˜ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

 

์ด๋ฅผ ๊ทธ๋ฆผ์„ ๋‚˜ํƒ€๋‚ด๋ฉด  ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

 

 

 

 

 

 

 

 

 

โœ๏ธ Algorithm

KNN์„ ์ด์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„๋ฅ˜ํ•˜๋Š” ๊ณผ์ •์„ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

 

 

1. Prepare training data with their class labels (a.k.aReference vector)

๊ฐ๊ฐ ํด๋ž˜์Šค๊ฐ€ ๋ถ„๋ฅ˜๋˜์–ด ์žˆ๋Š” Train data๋ฅผ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค.

 

ํ•ด๋‹น Train data๋Š” ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ๊ฐ€ ๋“ค์–ด์™”์„๋•Œ ํ•ด๋‹น ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„๋ฅ˜ํ•˜๋Š” ๋ฐ์— ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

 


2. New test data has come (without its class label)

์ƒˆ๋กœ์šด ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฝ์ž…ํ•ฉ๋‹ˆ๋‹ค.

 

์ด๋•Œ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ๋Š” ์–ด๋– ํ•œ ํด๋ž˜์Šค๋กœ ๋ถ„๋ฅ˜๋˜์–ด ์žˆ์ง€ ์•Š์•„์•ผ ํ•ฉ๋‹ˆ๋‹ค.

 


3. Calculate distance from the test data and all training data

Test data๋กœ๋ถ€ํ„ฐ Train data์˜ ๊ฑฐ๋ฆฌ๋ฅผ ์ธก์ •ํ•ฉ๋‹ˆ๋‹ค.

 

 

 

4. Select the nearest k neighbors

๋ช‡ ๊ฐœ์˜ ๋ฐ์ดํ„ฐ์™€ ๊ฐ€๊นŒ์šธ ๋•Œ ํ•ด๋‹น Test data๋ฅผ ํด๋ž˜์Šค ๋ถ„๋ฅ˜ํ•  ๊ฒƒ์ธ์ง€ ์•Œ๊ธฐ์œ„ํ•ด k๋ฅผ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค.

 

 

 

5. Voting from the k neighbors

Test data์™€ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด Train data๋“ค์„ ์„ ํƒํ•˜๊ณ  ์„ ํƒ๋œ ์ง€ํ‘œ๋“ค์ด ๋” ๋งŽ์ด ํฌํ•จ๋œ ํด๋ž˜์Šค๊ฐ€ ์–ด๋–ค ํด๋ž˜์Šค์ธ์ง€ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.

 

์ด๋•Œ ๋” ๋งŽ์ด ํฌํ•จ๋œ ํด๋ž˜์Šค๋ฅผ Test data์˜ ํด๋ž˜์Šค๋กœ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

๐Ÿ”Ž K์— ๋”ฐ๋ฅธ Test Data์˜ ๋ถ„ํฌ

 

K๊ฐ€ ํด ๋•Œ๋Š” ๋ฏผ๊ฐํ•˜์ง€๋Š” ์•Š์ง€๋งŒ ๊ฒฝํ–ฅ์„ ๋งค์šฐ ์ž˜ ๋งž์ถœ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

๋ฐ˜๋Œ€๋กœ K๊ฐ€ ์ž‘์„ ๋•Œ๋Š” ๋งค์šฐ ๋ณต์žกํ•œ ๋ชจ๋ธ์„ ์ƒ์„ฑํ•˜์ง€๋งŒ ๋„ˆ๋ฌด ๋ฏผ๊ฐํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ƒˆ๋กœ์šด Test ๋ฐ์ดํ„ฐ๊ฐ€ ๋“ค์–ด์™”์„ ๋•Œ ์˜ค๋ถ„๋ฅ˜ํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์Šต๋‹ˆ๋‹ค.

 

 

 

 

 

 

์ ์ ˆํ•œ ํ•ด๋‹ต์„ ์ฐพ๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋ชจ๋“  ๊ฒฝ์šฐ๋ฅผ ์‹œ๋„ํ•ด๋ณด๊ณ  Error๊ฐ€ ๊ฐ€์žฅ ์ ์€ k๋ฅผ ์„ ํƒํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

 

 

 

 

 

 

๋˜ํ•œ ๊ฐ’์˜ ๋ถ„ํฌ์—๋งŒ ์ง‘์ค‘ํ•˜์ง€ ์•Š๊ณ  ๊ฐ๊ฐ์˜ ๋ฐ์ดํ„ฐ์— ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌํ•˜์—ฌ ๊ฐ’์„ ๊ฒฐ์ • ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.