在随机森林分类器中打印特定样本的决策路径

我在scikit-learn文档中找到了此代码，并对其进行了修改以适合您的问题。

由于

RandomForestClassifier

是的集合，

DecisionTreeClassifier

我们可以遍历不同的树并检索每个树中的样本的决策路径。希望能帮助到你：

import numpy as npfrom sklearn.model_selection import train_test_splitfrom sklearn.datasets import make_classificationfrom sklearn.ensemble import RandomForestClassifierX, y = make_classification(n_samples=1000,     n_features=6,     n_informative=3,     n_classes=2,     random_state=0,     shuffle=False)X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)estimator = RandomForestClassifier(n_estimators=10,         random_state=0)estimator.fit(X_train, y_train)# The decision estimator has an attribute called tree_  which stores the entire# tree structure and allows access to low level attributes. The binary tree# tree_ is represented as a number of parallel arrays. The i-th element of each# array holds information about the node `i`. Node 0 is the tree's root. NOTE:# Some of the arrays only apply to either leaves or split nodes, resp. In this# case the values of nodes of the other type are arbitrary!## Among those arrays, we have:#   - left_child, id of the left child of the node#   - right_child, id of the right child of the node#   - feature, feature used for splitting the node#   - threshold, threshold value at the node## Using those arrays, we can parse the tree structure:#n_nodes = estimator.tree_.node_countn_nodes_ = [t.tree_.node_count for t in estimator.estimators_]children_left_ = [t.tree_.children_left for t in estimator.estimators_]children_right_ = [t.tree_.children_right for t in estimator.estimators_]feature_ = [t.tree_.feature for t in estimator.estimators_]threshold_ = [t.tree_.threshold for t in estimator.estimators_]def explore_tree(estimator, n_nodes, children_left,children_right, feature,threshold,     suffix='', print_tree= False, sample_id=0, feature_names=None):    if not feature_names:        feature_names = feature    assert len(feature_names) == X.shape[1], "The feature names do not match the number of features."    # The tree structure can be traversed to compute various properties such    # as the depth of each node and whether or not it is a leaf.    node_depth = np.zeros(shape=n_nodes, dtype=np.int64)    is_leaves = np.zeros(shape=n_nodes, dtype=bool)    stack = [(0, -1)]  # seed is the root node id and its parent depth    while len(stack) > 0:        node_id, parent_depth = stack.pop()        node_depth[node_id] = parent_depth + 1        # If we have a test node        if (children_left[node_id] != children_right[node_id]): stack.append((children_left[node_id], parent_depth + 1)) stack.append((children_right[node_id], parent_depth + 1))        else: is_leaves[node_id] = True    print("The binary tree structure has %s nodes"          % n_nodes)    if print_tree:        print("Tree structure: n")        for i in range(n_nodes): if is_leaves[i]:     print("%snode=%s leaf node." % (node_depth[i] * "t", i)) else:     print("%snode=%s test node: go to node %s if X[:, %s] <= %s else to ""node %s."% (node_depth[i] * "t",   i,   children_left[i],   feature[i],   threshold[i],   children_right[i],   )) print("n")        print()    # First let's retrieve the decision path of each sample. The decision_path    # method allows to retrieve the node indicator functions. A non zero element of    # indicator matrix at the position (i, j) indicates that the sample i goes    # through the node j.    node_indicator = estimator.decision_path(X_test)    # Similarly, we can also have the leaves ids reached by each sample.    leave_id = estimator.apply(X_test)    # Now, it's possible to get the tests that were used to predict a sample or    # a group of samples. First, let's make it for the sample.    #sample_id = 0    node_index = node_indicator.indices[node_indicator.indptr[sample_id]:       node_indicator.indptr[sample_id + 1]]    print(X_test[sample_id,:])    print('Rules used to predict sample %s: ' % sample_id)    for node_id in node_index:        # tabulation = " "*node_depth[node_id] #-> makes tabulation of each level of the tree        tabulation = ""        if leave_id[sample_id] == node_id: print("%s==> Predicted leaf index n"%(tabulation)) #continue        if (X_test[sample_id, feature[node_id]] <= threshold[node_id]): threshold_sign = "<="        else: threshold_sign = ">"        print("%sdecision id node %s : (X_test[%s, '%s'] (= %s) %s %s)"   % (tabulation,      node_id,      sample_id,      feature_names[feature[node_id]],      X_test[sample_id, feature[node_id]],      threshold_sign,      threshold[node_id]))    print("%sPrediction for sample %d: %s"%(tabulation,         sample_id,         estimator.predict(X_test)[sample_id]))    # For a group of samples, we have the following common node.    sample_ids = [sample_id, 1]    common_nodes = (node_indicator.toarray()[sample_ids].sum(axis=0) ==         len(sample_ids))    common_node_id = np.arange(n_nodes)[common_nodes]    print("nThe following samples %s share the node %s in the tree"          % (sample_ids, common_node_id))    print("It is %s %% of all nodes." % (100 * len(common_node_id) / n_nodes,))    for sample_id_ in sample_ids:        print("Prediction for sample %d: %s"%(sample_id_,         estimator.predict(X_test)[sample_id_]))

为了在随机森林中打印不同的树，您可以通过以下方式遍历估计器：

for i,e in enumerate(estimator.estimators_):    print("Tree %dn"%i)    explore_tree(estimator.estimators_[i],n_nodes_[i],children_left_[i],      children_right_[i], feature_[i],threshold_[i],     suffix=i, sample_id=1, feature_names=["Feature_%d"%i for i in range(X.shape[1])])    print('n'*2)

这是在第一树输出

RandomForestClassifier

为

sample_id = 0

：

Tree 1The binary tree structure has 115 nodes[ 2.36609963  1.32658511 -0.08002818  0.88295736  2.24224824 -0.71469736]Rules used to predict sample 1: decision id node 0 : (X_test[1, 'Feature_3'] (= 0.8829573603562209) > 0.7038955688476562)decision id node 86 : (X_test[1, 'Feature_2'] (= -0.08002817952064323) > -1.4465678930282593)decision id node 92 : (X_test[1, 'Feature_0'] (= 2.366099632530947) > 0.7020512223243713)decision id node 102 : (X_test[1, 'Feature_5'] (= -0.7146973587899221) > -1.2842652797698975)decision id node 106 : (X_test[1, 'Feature_2'] (= -0.08002817952064323) > -0.4031955599784851)decision id node 110 : (X_test[1, 'Feature_0'] (= 2.366099632530947) > 0.717217206954956)decision id node 112 : (X_test[1, 'Feature_4'] (= 2.2422482391211678) <= 3.0181679725646973)==> Predicted leaf indexdecision id node 113 : (X_test[1, 'Feature_4'] (= 2.2422482391211678) > -2.0)Prediction for sample 1: 1.0The following samples [1, 1] share the node [  0  86  92 102 106 110 112 113] in the treeIt is 6.956521739130435 % of all nodes.Prediction for sample 1: 1.0Prediction for sample 1: 1.0Tree 2The binary tree structure has 135 nodes[ 2.36609963  1.32658511 -0.08002818  0.88295736  2.24224824 -0.71469736]Rules used to predict sample 1: decision id node 0 : (X_test[1, 'Feature_3'] (= 0.8829573603562209) > 0.5484486818313599)decision id node 88 : (X_test[1, 'Feature_2'] (= -0.08002817952064323) > -0.7239605188369751)decision id node 102 : (X_test[1, 'Feature_5'] (= -0.7146973587899221) > -1.6143207550048828)decision id node 110 : (X_test[1, 'Feature_0'] (= 2.366099632530947) > 2.3399271965026855)decision id node 130 : (X_test[1, 'Feature_5'] (= -0.7146973587899221) <= -0.5680553913116455)decision id node 131 : (X_test[1, 'Feature_0'] (= 2.366099632530947) <= 2.4545814990997314)==> Predicted leaf indexdecision id node 132 : (X_test[1, 'Feature_4'] (= 2.2422482391211678) > -2.0)Prediction for sample 1: 0.0The following samples [1, 1] share the node [  0  88 102 110 130 131 132] in the treeIt is 5.185185185185185 % of all nodes.Prediction for sample 1: 0.0Prediction for sample 1: 0.0

在随机森林分类器中打印特定样本的决策路径

面试问答相关栏目本月热门文章