V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
推荐学习书目
Learn Python the Hard Way
Python Sites
PyPI - Python Package Index
http://diveintopython.org/toc/index.html
Pocoo
值得关注的项目
PyPy
Celery
Jinja2
Read the Docs
gevent
pyenv
virtualenv
Stackless Python
Beautiful Soup
结巴中文分词
Green Unicorn
Sentry
Shovel
Pyflakes
pytest
Python 编程
pep8 Checker
Styles
PEP 8
Google Python Style Guide
Code Style from The Hitchhiker's Guide
acone2003
V2EX  ›  Python

新手求教:怎样把两个列向量合并成一个 n*2 的矩阵?

  •  
  •   acone2003 · 2018-06-08 06:12:33 +08:00 · 5124 次点击
    这是一个创建于 2364 天前的主题,其中的信息可能已经有所发展或是发生改变。
    最近学习随即森林分类算法,碰到一个问题,试了各种互联网上的方法,都不能得到正确结果,只好在这里求助大家了.
    是这样:test_lables 是测试样本二分类的真实标签,有 692 个样本,test_hat 是预测值,现在我想把这两个合并在一块,组成一个 692*2 的矩阵,每个预测值对应一个真实值。源代码如下:

    import numpy as np
    import pandas as pd
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.dummy import DummyClassifier
    from sklearn.ensemble import AdaBoostClassifier
    from sklearn.svm import SVC
    #from sklearn import datasets

    dataframe = pd.read_csv( "D:/Research/TuPo_sel0.Train.csv", header = None )
    train_features = dataframe.iloc[ :, 0:24]
    train_lables = dataframe.iloc[:, 24]

    test_data = pd.read_csv( "D:/Research/TuPo_sel0.Valid.csv", header = None )
    test_features = test_data.iloc[ :, 0:24 ]
    test_lables = test_data.iloc[ :, 24 ]

    dummy = DummyClassifier( strategy = 'uniform', random_state = 1 )
    dummy.fit( train_features, train_lables )
    print( "dummy_score =", dummy.score( test_features, test_lables ) )

    style = 1

    if style == 1:
    max_features = 19
    n_estimators = 400
    randomforest = RandomForestClassifier( max_features = max_features, n_estimators = n_estimators, random_state=1, n_jobs=-1 )
    model = randomforest.fit( train_features, train_lables )
    test_hat = model.predict( test_features )
    test_hat1 = np.hstack( ( test_hat, test_lables ) )
    test_hat1.reshape( -1, 2 )
    print( test_hat1.shape )
    print( test_hat1 )
    print( "max_features =", max_features, "; n_estimators =", n_estimators,
    "; randomforest_score =", randomforest.score( test_features, test_lables ) )

    运算结果如下:
    runfile('D:/Python Programs/TryLoadData.py', wdir='D:/Python Programs')
    dummy_score = 0.5447976878612717
    (1384,)
    [0 0 1 ... 0 0 0]
    max_features = 19 ; n_estimators = 400 ; randomforest_score = 0.6416184971098265

    求教各位怎么修改才能得到正确结果?
    acone2003
        1
    acone2003  
    OP
       2018-06-08 06:35:55 +08:00
    另外再顺便问一下:怎样计算测试集中的预测精度,即所有预测为 1 的样本的预测正确率。
    enenaaa
        2
    enenaaa  
       2018-06-08 09:02:17 +08:00
    test_hat1 = np.hstack((test_hat.reshape(-1, 1), test_lables.reshape(-1, 1)))

    查看训练结果可以看简报,metrics.classification_report
    acone2003
        3
    acone2003  
    OP
       2018-06-08 09:18:39 +08:00
    谢谢 enenaaa,搞定!
    necomancer
        4
    necomancer  
       2018-06-08 09:51:42 +08:00
    np.vstack([a, b]).T
    关于   ·   帮助文档   ·   博客   ·   API   ·   FAQ   ·   实用小工具   ·   3135 人在线   最高记录 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 26ms · UTC 13:24 · PVG 21:24 · LAX 05:24 · JFK 08:24
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.