์ดํƒœํ™
ํ™'story
์ดํƒœํ™
์ „์ฒด ๋ฐฉ๋ฌธ์ž
์˜ค๋Š˜
์–ด์ œ
  • ๋ถ„๋ฅ˜ ์ „์ฒด๋ณด๊ธฐ (171)
    • TW (39)
    • AI (47)
      • ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ (10)
      • Kaggle (2)
      • Machine Learning (26)
      • Computer Vision (0)
      • Deep Learning (0)
      • ROS2 (7)
    • Computer Science (29)
      • Data Structure (0)
      • Algorithm (18)
      • Computer Architecture (5)
      • SOLID (0)
      • System Programing (6)
    • LOLPAGO (10)
      • ํ”„๋ก ํŠธ์—”๋“œ (10)
      • ๋ฐฑ์—”๋“œ (0)
    • BAEKJOON (2)
    • React (5)
    • ์–ธ์–ด (8)
      • C++ (8)
    • GIT (0)
    • MOGAKCO (19)
    • ๋ฏธ๊ตญ ์—ฌํ–‰๊ธฐ (3)
    • etc. (7)
      • Blog (2)
      • ์ฝœ๋ผํ†ค (2)

๋ธ”๋กœ๊ทธ ๋ฉ”๋‰ด

  • ํ™ˆ
  • ํƒœ๊ทธ
  • ๋ฐฉ๋ช…๋ก

๊ณต์ง€์‚ฌํ•ญ

์ธ๊ธฐ ๊ธ€

ํƒœ๊ทธ

  • computerscience
  • ๋จธ์‹ ๋Ÿฌ๋‹
  • Ai
  • algorithm
  • ์•Œ๊ณ ๋ฆฌ์ฆ˜
  • computer architecture
  • C++
  • ๋ฐฑ์ค€
  • tw
  • ๊ธฐ๊ณ„ํ•™์Šต
  • NLP
  • LOLPAGO
  • react
  • ML
  • ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•
  • ๋”ฅ๋Ÿฌ๋‹
  • baekjoon
  • pytorch
  • ROS2
  • kaggle

์ตœ๊ทผ ๋Œ“๊ธ€

์ตœ๊ทผ ๊ธ€

ํ‹ฐ์Šคํ† ๋ฆฌ

hELLO ยท Designed By ์ •์ƒ์šฐ.
์ดํƒœํ™

ํ™'story

[ML] Ensemble Method(2) - Bagging & Random Forest
AI/Machine Learning

[ML] Ensemble Method(2) - Bagging & Random Forest

2022. 11. 26. 18:20

๐Ÿค” Ensemble Method

ํ†ต๊ณ„ํ•™๊ณผ ๊ธฐ๊ณ„ ํ•™์Šต์—์„œ ์•™์ƒ๋ธ” ํ•™์Šต๋ฒ•์€ ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค์„ ๋”ฐ๋กœ ์“ฐ๋Š” ๊ฒฝ์šฐ์— ๋น„ํ•ด ๋” ์ข‹์€ ์˜ˆ์ธก ์„ฑ๋Šฅ์„ ์–ป๊ธฐ ์œ„ํ•ด ๋‹ค์ˆ˜์˜ ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ• ์ž…๋‹ˆ๋‹ค.

 

 ํ†ต๊ณ„ ์—ญํ•™์—์„œ์˜ ํ†ต๊ณ„์  ์•™์ƒ๋ธ”๊ณผ ๋‹ฌ๋ฆฌ ๊ธฐ๊ณ„ ํ•™์Šต์—์„œ์˜ ์•™์ƒ๋ธ”์€ ๋Œ€์ฒด ๋ชจ๋ธ๋“ค์˜ ๋‹จ๋‹จํ•œ ์œ ํ•œ ์ง‘ํ•ฉ์„ ๊ฐ€๋ฆฌํ‚ค์ง€๋งŒ, ์ผ๋ฐ˜์ ์œผ๋กœ ๊ทธ๋Ÿฌํ•œ ๋Œ€์ฒด ๋ชจ๋ธ ์‚ฌ์ด์— ํ›จ์”ฌ ๋” ์œ ์—ฐํ•œ ๊ตฌ์กฐ๋ฅผ ํ—ˆ์šฉํ•ฉ๋‹ˆ๋‹ค.

 

์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ๋Š” ์•™์ƒ๋ธ” ๊ธฐ๋ฒ• ์ค‘ Bagging์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

๐Ÿ”Ž About Ensemble

์•™์ƒ๋ธ”์€ ๋‘˜ ์ด์ƒ์˜ Base Learner์™€ ๊ทธ๊ฒƒ๋“ค์˜ ๊ฒฐ๊ณผ๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ ์˜ˆ์ธก๊ฐ’์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

 

์ด๋•Œ ๊ฐ๊ฐ์˜ Learner๋“ค์€ ๋‹ฌ๋ผ์•ผ๋งŒ ํ•ฉ๋‹ˆ๋‹ค.

 

๊ฐ๊ฐ์˜ Learner๊ฐ€ ์„œ๋กœ ๋‹ค๋ฅธ view point๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์–ด์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

 

 

 

์ด์— ๋Œ€ํ•œ ์˜ˆ์‹œ๋Š” ์•ž์˜ ํฌ์ŠคํŠธ์—์„œ ์–ธ๊ธ‰ํ–ˆ์œผ๋‹ˆ ๊ถ๊ธˆํ•˜์‹  ๋ถ„์€ ์ฐธ๊ณ ํ•˜์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.

 

https://2t-hong.tistory.com/119

 

[ML] Ensemble Method(1) - ํŽธํ–ฅ-๋ถ„์‚ฐ ๋”œ๋ ˆ๋งˆ(Bias-Variance Dilemma)

๐Ÿค”Ensemble Method(์•™์ƒ๋ธ” ํ•™์Šต๋ฒ•) ํ†ต๊ณ„ํ•™๊ณผ ๊ธฐ๊ณ„ ํ•™์Šต์—์„œ ์•™์ƒ๋ธ” ํ•™์Šต๋ฒ•์€ ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค์„ ๋”ฐ๋กœ ์“ฐ๋Š” ๊ฒฝ์šฐ์— ๋น„ํ•ด ๋” ์ข‹์€ ์˜ˆ์ธก ์„ฑ๋Šฅ์„ ์–ป๊ธฐ ์œ„ํ•ด ๋‹ค์ˆ˜์˜ ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ• ์ž…๋‹ˆ

2t-hong.tistory.com

 

 

์ •๋ฆฌํ•˜์ž๋ฉด ๊ฐ๊ฐ์˜ ๊ฐœ๋ณ„ ๋ชจ๋ธ์ด ์„œ๋กœ ๋‹ค๋ฅธ ๋ทฐํฌ์ธํŠธ๋ฅผ ๊ฐ€์ง€๊ฒŒ ํ•จ์œผ๋กœ์จ ์ด๋ฅผ ์กฐํ•ฉํ–ˆ์„ ๋•Œ ํ•˜๋‚˜์˜ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ–ˆ์„ ๋•Œ๋ณด๋‹ค ํ›จ์”ฌ ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

๐Ÿ”Ž Type Of Ensemble Method

์•™์ƒ๋ธ” ๋ชจ๋ธ์€ ๋Œ€ํ‘œ์ ์œผ๋กœ Bagging๊ณผ Boosting์ด ์กด์žฌํ•ฉ๋‹ˆ๋‹ค.

 

Bagging์€ Low bias / High Variance ์˜ Base Learner๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค.

 

์ฆ‰, Overfitํ•œ Base Learner๋“ค์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.,

 

 

 

๋ฐ˜๋Œ€๋กœ Boosting์˜ ๊ฒฝ์šฐHigh bias / Low variance์˜ Base Learner๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค. 

 

์ฆ‰, Underfitํ•œ Base Learner๋“ค์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

 

์ด๋ฒˆ ํฌ์ŠคํŠธ์—์„œ๋Š” Bagging ๋Œ€ํ•ด ๋” ์ž์„ธํžˆ ์•Œ์•„๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

 

 

 

 

 

 

โœ Bagging

Bagging์ด๋ž€ Bootstrap Aggregating์˜ ์ค„์ž„๋ง๋กœ Low bias / High variance ๋ชจ๋ธ์„ base learner๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. 

 

 

 

 

 

 

์œ„์˜ ๊ทธ๋ฆผ๊ณผ ๊ฐ™์ด ๊ณ ์ •๋œ Training data set์ด ์กด์žฌํ•œ๋‹ค๊ณ  ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

 

์ด๋•Œ Bootstrapping์ด๋ผ๊ณ  ๋ถˆ๋ฆฌ๋Š” ์ƒ˜ํ”Œ๋ง ๊ธฐ๋ฒ•์„ ํ†ตํ•ด Bootstrap ์„ n๊ฐœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. 

 

Bootstrap์€ sample set์ด๋ผ๊ณ  ์ƒ๊ฐํ•˜์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค.

 

์ด๋•Œ Bootstrapping์€ ๋ณต์›์ถ”์ถœ์„ ์ด์šฉํ•˜๋Š” sampling ๊ธฐ๋ฒ•์ž…๋‹ˆ๋‹ค.

 

์ด๋•Œ ๊ฐ๊ฐ์˜ Bootstrap์˜ ํฌ๊ธฐ๋ฅผ ์„ค์ •ํ•œ ๋’ค ๋ณต์›์ถ”์ถœ์„ ํ†ตํ•ด sampling์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

 

() ์—ฌ๋‹ด์œผ๋กœ ๋ณต์›์ถ”์ถœ์„ ์ง„ํ–‰ํ•œ๋‹ค๋ฉด ํ•ด๋‹น ๋ฐ์ดํ„ฐ๋Š” ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ๋“ค๊ณผ ์•ฝ 66%๊ฐ™๊ณ  33% ๋‹ค๋ฅด๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. )  

 

 

 

Bagging์˜ ๊ฒฝ์šฐ ๋ชจ๋“  ํ”„๋กœ์„ธ์Šค๊ฐ€ ๋ณ‘๋ ฌ์ ์œผ๋กœ ์ง„ํ–‰๋ฉ๋‹ˆ๋‹ค.

 

Low bias ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•ด์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๊ฐ๊ฐ์˜ base learner๋ฅผ overfit ์‹œํ‚ต๋‹ˆ๋‹ค.

 

์ด๋Š” ์ถฉ๋ถ„ํžˆ ๋‹ค๋ฅธ view point๋ฅผ ์„ธ์›Œ์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

 

 

 

๋Œ€๋ถ€๋ถ„์˜ Bagging ๊ฐ™์€ ๊ฒฝ์šฐ decision tree๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ƒ์„ฑ๋œ ๋ชจ๋ธ์€ ๋ชจ๋‘ full tree์˜ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

 

 

 

์ด๋ ‡๊ฒŒ ์ƒ๊ธด ๋ฐ์ดํ„ฐ๋“ค์„ ๋ฐ”ํƒ•์œผ๋กœ Voting์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

 

์ด๋ฅผ ๋ฏผ์ฃผ์ ์ธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๋ผ๊ณ  ํ•˜๋Š” ์‚ฌ๋žŒ๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

 

๊ฐ€์žฅ ์ค‘์š”ํ•œ ๊ฒƒ์€ ๊ฐ๊ฐ์˜ Base Learner๋“ค์ด ์ถฉ๋ถ„ํžˆ ๋‹ค๋ฅธ VIew Point๋ฅผ ๊ฐ€์ง€๊ฒŒ ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

 

 

 

 

 

 

Bagging์„ ์‚ฌ์šฉํ•˜๋ฉด ๊ต‰์žฅํžˆ ์ข‹์€ Performance๋ฅผ ๋ณด์ž…๋‹ˆ๋‹ค.

 

ํ•˜๋‚˜์˜ Decision Tree๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ๋ณด๋‹ค ํ›จ์”ฌ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋‚ธ๋‹ค๋Š” ๋œป์ž…๋‹ˆ๋‹ค.

 

 

 

๋˜ํ•œ ์šฐ๋ฆฌ๊ฐ€ ์„ค์ • ํ•ด์ค˜์•ผํ•  Hyper Parameter์ด ์ ์Šต๋‹ˆ๋‹ค.

 

๋‹จ์ˆœํžˆ base learner์˜ ๊ฐœ์ˆ˜๋งŒ ์ •ํ•ด์ฃผ๋ฉด ๋ฉ๋‹ˆ๋‹ค.

 

์ด๋•Œ, ๋„ˆ๋ฌด ๋งŽ์€ ๊ฐœ์ˆ˜์˜ base learner๋ฅผ ์ •ํ•ด์ฃผ๋ฉด ์ค‘๋ณต๋˜๋Š” view point๊ฐ€ ์ƒ๊ธธ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

์ฆ‰, ๋น„์Šทํ•œ ๊ฒฐ๊ณผ๋ฅผ ๋„์ถœํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

 

๊ทธ๋ž˜์„œ ์ถฉ๋ถ„ํžˆ ํฐ ์ˆ˜์˜ base learner์˜ ๊ฐœ์ˆ˜๋ฅผ ์ •ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.

 

 

 

์ •๋ฆฌํ•˜์ž๋ฉด Bagging์€ ์„ค์ •ํ•ด์•ผํ•  ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ๋งŽ์ง€ ์•Š์•„ ์‰ฝ์ง€๋งŒ ๋†’์€ Performace๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

 

 

 

Bagging์€ ํ†ต์ƒ์ ์œผ๋กœ 30๊ฐœ ์ •๋„์˜ .Base Learner๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

 

 

 

Bagging์˜ ๋ํŒ์™• ์ฆ‰, Base Learner๋ฅผ 100๊ฐœ ์ด์ƒ ์‚ฌ์šฉํ•˜๊ณ  ์‹ถ์€ ์‚ฌ๋žŒ๋“ค์— ์˜ํ•ด ๋‚˜์˜จ ๊ฒƒ์ด Random Forest์ž…๋‹ˆ๋‹ค.

 

100๊ฐœ๊ฐ€ ๋„˜๋Š” Base Learner๋ฅผ ์‚ฌ์šฉํ•˜๋”๋ผ๋„ ์–ด๋–ป๊ฒŒ ๋‹ค๋ฅธ View Point๋ฅผ ๊ฐ€์งˆ ์ˆ˜ ์žˆ์„ ์ง€์— ๋Œ€ํ•œ ๊ณ ์ฐฐ๋กœ๋ถ€ํ„ฐ ์‹œ์ž‘๋ฉ๋‹ˆ๋‹ค.

 

 

 

 

 

 

 

 

โœ Random Forest

Random Forest๋Š” ๊ตฌ์กฐ๊ฐ€ ๋‹ค๋ฅธ ์•Œ๊ณ ๋ฆฌ์ฆ˜๋ณด๋‹ค ๋งค์šฐ ๊ฐ„๋‹จํ•˜์ง€๋งŒ ๋งค์šฐ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ž…๋‹ˆ๋‹ค.

 

์ผ๋ฐ˜์ ์ธ Bagging๊ณผ ๋‹ค๋ฅด๊ฒŒ Random Forest๋Š” variables๊นŒ์ง€ ๋žœ๋ค์œผ๋กœ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.

 

์ด๋ฅผ ๊ทธ๋ฆผ์œผ๋กœ ๋‚˜ํƒ€๋‚ด๋ฉด ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

 

 

 

 

 

 

 

 

์œ„์™€ ๊ฐ™์€ ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด ๋” ๋‹ค์–‘ํ•œ View Point๋ฅผ ๊ฐ€์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

 

 

 

 

ํŠน์ง•

์›๋ž˜๋Š” Data Number๋งŒ์„ ์„ ํƒํ–ˆ๋‹ค๋ฉด Variable Number( Feature )๋„ ํ•จ๊ป˜ random ํ•˜๊ฒŒ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค..

 

์žฅ์  : ๋น ๋ฅธ ์‹œ๊ฐ„์— ๋™์ž‘ํ•œ๋‹ค, hyperparameter๊ฐœ์ˆ˜๊ฐ€ ์ ๋‹ค, interpretableํ•˜๋‹ค.

 

๋‹จ์  : ๊ฐ™์€ ์˜๊ฒฌ์„ ๋งŽ์ด ๋‚ด๋Š” base learner๊ฐ€ ์ƒ์„ฑ๋  ์ˆ˜ ์žˆ๋‹ค.( ์ ๋‹นํžˆ ํฐ ์‚ฌ์ด์ฆˆ์˜ base learner์‚ฌ์šฉํ•˜๋ฉด base learner๊ฐ€ ๋Š˜์–ด๋‚œ๋‹ค๊ณ  ํ•ด์„œ criticalํ•œ ์„ฑ๋Šฅ ๋ณ€ํ™”๊ฐ€ ์žˆ์ง€๋Š” ์•Š๋‹ค. )

 

Random Forest์˜ ๊ฒฝ์šฐ ๋‘ ๊ฐ€์ง€์˜ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ๋“ค์–ด๊ฐ„๋‹ค๊ณ  ์ƒ๊ฐํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.

 

1. ๋ช‡ ๊ฐœ์˜ ํŠธ๋ฆฌ๋ฅผ ๋งŒ๋“ค ๊ฒƒ์ธ๊ฐ€?

2. ๋ณ€์ˆ˜๋ฅผ ๋ช‡ ๊ฐœ์”ฉ ๋ณผ ๊ฒƒ์ธ๊ฐ€?

 

RF๋˜ํ•œ ์ƒ๊ฐํ•ด์ค˜์•ผํ•  ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ์˜ ๊ฐœ์ˆ˜๊ฐ€ ๋งค์šฐ ์ ์ง€๋งŒ ๋†’์€ ์„ฑ๋Šฅ์„ ๋ƒ…๋‹ˆ๋‹ค.

 

NN๊ณผ ๋‹ฌ๋ฆฌ architecture์™€ loss function ์„ ์ƒ๊ฐํ•˜์ง€ ์•Š์•„๋„ ๋ฉ๋‹ˆ๋‹ค.

 

 

 

 

ํ•™์Šต ์ˆœ์„œ

ํ•™์Šต ์ˆœ์„œ๋Š” ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

 

 

 

 

 

 

 

 

 

 

 

๋‹ค์Œ ํฌ์ŠคํŒ…์—์„œ๋Š” Boosting์—๋Œ€ํ•ด ํ•™์Šตํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

 

'AI > Machine Learning' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[ML] Ensemble Method(4) - Gradient Boost  (0) 2022.11.29
[ML] Ensemble Method(3) - AdaBoost  (0) 2022.11.29
[ML] Ensemble Method(1) - ํŽธํ–ฅ-๋ถ„์‚ฐ ๋”œ๋ ˆ๋งˆ(Bias-Variance Dilemma)  (0) 2022.11.26
[ML] Nearest Neighbor Method - KNN(3)  (0) 2022.11.11
[ML] Nearest Neighbor Method - ์ •๊ทœํ™”(Normalization)(2)  (0) 2022.11.11
    'AI/Machine Learning' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€
    • [ML] Ensemble Method(4) - Gradient Boost
    • [ML] Ensemble Method(3) - AdaBoost
    • [ML] Ensemble Method(1) - ํŽธํ–ฅ-๋ถ„์‚ฐ ๋”œ๋ ˆ๋งˆ(Bias-Variance Dilemma)
    • [ML] Nearest Neighbor Method - KNN(3)
    ์ดํƒœํ™
    ์ดํƒœํ™
    ๊ณต๋ถ€ํ•˜์ž ํƒœํ™์•„

    ํ‹ฐ์Šคํ† ๋ฆฌํˆด๋ฐ”