使用xgboost以及lbgm

xgboost:
这里列举的代码，是一个对10分类结果进行预测，max_depth参数控制树的深度，objective控制的是训练的目标，multi:softprob即多分类预测概率，num_class为需要分类的类数。
这里输入的x的维度为n40000,输入的y的维度为n1,即y的每一行都是一个数字，0-9，代表所属的10分类
eval_metric这个参数指定的是训练的时候loss的类别。获得结果后，可以用predict方法获得分类的结果（1维标签，0-9），用predict_proba可以获得预测分类的概率（10维，分别代表10分类各自的概率）

from xgboost import XGBClassifier
import pickle
def xgboost(x, y):
    model = XGBClassifier(max_depth=25, objective='multi:softprob', num_class=10)
    x_train, x_test_valid, y_train, y_test_valid = train_test_split(x, y, test_size=0.002, random_state=1)
    model.fit(x_train, y_train, early_stopping_rounds=10, eval_set=[(x_train, y_train), (x_test_valid, y_test_valid)],
              eval_metric="mlogloss", verbose=True)

    # make prediction
    preds = model.predict(x_test_valid)
    print(preds[0:10])
    prob_pre = model.predict_proba(x_test_valid)
    print(prob_pre[0:10])
    test_accuracy = accuracy_score(y_test_valid, preds)
    print("Test Accuracy: %.2f%%" % (test_accuracy * 100.0))
    if test_accuracy > 0.999:
        #保存代码
        pickle.dump(model, open("xgboostmodel2.pickle.dat", "wb"))

lbgm:和xbgoost的使用比较类似，需要注意的是在使用lbgm进行多分类时，使用predict方法输出的就是分类的预测概率，以0-9的10分类为例，lbgm的predict方法输出的就是10分类的预测概率，因此lbgm没有predict_proba方法，需要自己写函数将预测的结果转化成标签。此外，同样是多分类预测概率，lbgm里的objective

import pickle
from lightgbm import LGBMRegressor

def turn_prob_to_label(prob_data):
    # 将检测概率转化成一维数字标签
    prob_data = list(prob_data)
    result = []
    for each_data in prob_data:
        max_loc = list(each_data).index(max(list(each_data)))
        result.append(str(max_loc))
    return result


def lightgbm(x, y):
    model = LGBMRegressor(max_depth=30, objective='multiclass', num_class=10)
    x_train, x_test_valid, y_train, y_test_valid = train_test_split(x, y, test_size=0.2, random_state=1)
    model.fit(x_train, y_train, early_stopping_rounds=30, eval_set=[(x_train, y_train), (x_test_valid, y_test_valid)],
              eval_metric="multi_error", verbose=True)

    preds = model.predict(x_test_valid)
    test_result = np.array(turn_prob_to_label(preds))
    print(test_result[0:10])
    print(y_test_valid[0:10])
    test_accuracy = accuracy_score(y_test_valid, test_result)
    print("Test Accuracy: %.2f%%" % (test_accuracy * 100.0))
    if test_accuracy > 0.999:
        pickle.dump(model, open("lgboostmodel2.pickle.dat", "wb"))
        # loaded_model = pickle.load(open("pima.pickle.dat", "rb"))

在训练完模型后，可以用pickle来保存和读取模型

使用xgboost以及lbgm

Python相关栏目本月热门文章