@
Xs0ul 跑了一下, feature importance 排名如下:
Feature ranking:
1. feature person_year_total_income (0.712042)
2. feature year_total_income (0.107312)
3. feature member_count (0.041349)
4. feature subsidy_total (0.026403)
5. feature reason (0.020928)
6. feature arable_land (0.017534)
7. feature living_space (0.016653)
8. feature wood_land (0.010882)
9. feature help_plan (0.009243)
10. feature washing_machine (0.006374)
11. feature fridge (0.005236)
12. feature is_danger_house (0.005180)
13. feature tv (0.005083)
14. feature is_debt (0.005042)
15. feature bank_number (0.003775)
16. feature xin_nong_he_total (0.002452)
17. feature call_number (0.002253)
18. feature debt_total (0.001464)
19. feature xin_yang_lao_total (0.000796)
20. feature bank_name (0.000000)
21. feature standard (0.000000)
22. feature is_back_poor (0.000000)
判断脱贫主要就是看 person_year_total_income (人均年收入), 大于 2800 元的就超过国家贫困线了, 因此我去掉这 2 个因子 person_year_total_income 、 year_total_income ,跑完后,预测成功率是: 81.34%, feature importance 排名如下:
Feature ranking:
1. feature subsidy_total (0.198893)
2. feature arable_land (0.176897)
3. feature living_space (0.146558)
4. feature reason (0.129572)
5. feature member_count (0.113734)
6. feature wood_land (0.082290)
7. feature help_plan (0.024511)
8. feature washing_machine (0.020852)
9. feature tv (0.020510)
10. feature is_danger_house (0.019875)
11. feature is_debt (0.014723)
12. feature fridge (0.014228)
13. feature bank_number (0.012896)
14. feature xin_nong_he_total (0.010757)
15. feature call_number (0.007313)
16. feature debt_total (0.003950)
17. feature xin_yang_lao_total (0.002437)
18. feature bank_name (0.000005)
19. feature standard (0.000000)
20. feature is_back_poor (0.000000)
而我统计了下测试数据分布: 贫困 41289 已脱贫 7089 , 如果我全猜已脱贫的概率是 85.3%。
这是不是说明, 随机森林模型 不如 瞎猜?