python 使用plt.tricontour绘制逻辑回归决策边界（不规则空间下的等高线图）

前言

最近在做有关逻辑回归的作业，需要绘制决策边界。绘制原理是：
对于逻辑回归，其决策边界为 θ T X = 0 theta^TX = 0 θTX=0处，其中 θ = [ θ 0 , θ 1 , θ 2 , ⋯ , θ n ] ; X = [ X 0 , X 1 , X 2 , ⋯ , X n ] theta = [theta_0,theta_1,theta_2,cdots,theta_n ]; X = [X_0,X_1,X_2,cdots,X_n ] θ=[θ0,θ1,θ2,⋯,θn];X=[X0,X1,X2,⋯,Xn]。我们将训练所得的 θ theta θ代入，再使用plt.contour(xx,yy,zz,0)即可。

在该题目中，所给数据的决策边界并非线性，因此需要进行一定的多项式变换。poly_feat返回两个特征的五阶组合多项式如 x 1 5 , x 1 x 2 4 , x 1 2 x 2 3 , ⋯ , x 2 5 x_1^5,x_1x_2^4, x_1^2x_2^3,cdots,x_2^5 x15,x1x24,x12x23,⋯,x25

from sklearn.preprocessing import PolynomialFeatures#%%poly feature transformation
poly_feat = PolynomialFeatures(degree=5, include_bias=True)
X_poly = poly_feat.fit_transform(X[:,1:])

使用五阶多项式变化，便可以将一个只有三个特征（其中第一个特征为1）的X变成一个有21个特征的样本。而且，可以绘制非线性的决策边界。

from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
epochs = 1000000
lr = 0.01
lamb = 0
degree = 5
poly_feat = PolynomialFeatures(degree, include_bias=True)
theta = np.zeros((X_poly.shape[1],1))
final_theta = batch_gradient_descent(X_poly, y, theta, epoch=epochs, lr=lr, lamb=lambs)
test1 = np.array(data['Test 1'])#feature1, disorder
test2 = np.array(data['Test 2'])#feature2 disorder
Test1, Test2 = np.meshgrid(test1, test2)
score_mesh = np.zeros((test1.size, test2.size))
#construce score mesh by iteraing every element of features
for idx1, t1 in enumerate(test1):
    for idx2, t2 in enumerate(test2):
        poly = poly_feat.fit_transform(np.array([t1, t2]).reshape(1,-1))#consture polynomial features
        score_mesh[idx1, idx2] = poly@final_theta
cs = plt.contour(test1, test2, score_mesh,0)
cs.collections[0].set_label('lamb = '+str(lambs))# add label for contour
#plot data scatter    
positive = data[data['Accepted'].isin([1])]
negative = data[data['Accepted'].isin([0])]

plt.scatter(positive['Test 1'], positive['Test 2'], s=20, c='c', marker='o', label='Accepted')
plt.scatter(negative['Test 1'], negative['Test 2'], s=30, c='m', marker='x', label='Not Accepted')
plt.legend()
plt.xlabel('Test 1 Score')
plt.ylabel('Test 2 Score')

其中test1, test2如下图所示，均为不规则序列

最终所得决策边界如下图所示

可以看出，该决策边界十分混乱，出现了多条高程值为0的线。通过分析可知，这是由于等高线图的X与Y并不是规则空间。由于X与Y不是递增或者递减，所以会出现多条等高线。解决方法有以下几种：

解决方法一

构建规则格网，利用递增或者递减的X或Y构建高程格网

from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
epochs = 1000000
lr = 0.01
lamb = 0
degree = 5
color = ['r','g','b']
for idx,lambs in enumerate([0]):
    poly_feat = PolynomialFeatures(degree, include_bias=True)
    X_poly = poly_feat.fit_transform(X[:,1:])
    theta = np.zeros((X_poly.shape[1],1))
    final_theta = batch_gradient_descent(X_poly, y, theta, epoch=epochs, lr=lr, lamb=lambs)
    xk = np.linspace(-1, 1, test1.size)#constuct orderly sequency by np.linspace
    yk = xk
    xx, yy = np.meshgrid(xk,yk) 
    score_mesh = np.zeros((test1.size, test2.size))
    for idx1, t1 in enumerate(xk):
        for idx2, t2 in enumerate(yk):
            poly = poly_feat.fit_transform(np.array([t1, t2]).reshape(1,-1))
            score_mesh[idx2, idx1] = poly@final_theta
    cs = plt.contour(xx, yy, score_mesh,0)
    cs.collections[0].set_label('lamb = '+str(lambs))
    
positive = data[data['Accepted'].isin([1])]
negative = data[data['Accepted'].isin([0])]

plt.scatter(positive['Test 1'], positive['Test 2'], s=20, c='c', marker='o', label='Accepted')
plt.scatter(negative['Test 1'], negative['Test 2'], s=30, c='m', marker='x', label='Not Accepted')
plt.legend()
plt.xlabel('Test 1 Score')
plt.ylabel('Test 2 Score')

所得等高线如下图所示：

解决方法二

依然使用不规则数据test1,test2构建高程格网，但是使用plt.tricontour函数对不规则三角网进行插值，得到等高线：

from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
epochs = 1000000
lr = 0.01
lamb = 0
degree = 5
color = ['r','g','b']
for idx,lambs in enumerate([0]):
    poly_feat = PolynomialFeatures(degree, include_bias=True)
    theta = np.zeros((X_poly.shape[1],1))
    final_theta = batch_gradient_descent(X_poly, y, theta, epoch=epochs, lr=lr, lamb=lambs)
    score_mesh_flat = poly_feat.fit_transform(np.stack([test1, test2],axis = 1))@final_theta
    test1 = np.array(data['Test 1'])#feature1, disorder
    test2 = np.array(data['Test 2'])#feature2 disorder
    cs = plt.tricontour(test1, test2, score_mesh_flat.flatten(),levels = 0)
    cs.collections[0].set_label('lamb = '+str(lambs))
    
positive = data[data['Accepted'].isin([1])]
negative = data[data['Accepted'].isin([0])]

plt.scatter(positive['Test 1'], positive['Test 2'], s=20, c='c', marker='o', label='Accepted')
plt.scatter(negative['Test 1'], negative['Test 2'], s=30, c='m', marker='x', label='Not Accepted')
plt.legend()
plt.xlabel('Test 1 Score')
plt.ylabel('Test 2 Score')

解决方法三

依然使用原始数据与plt.contour函数，但是此时对数据进行排序（test1.sort(); test2.sort）

from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
epochs = 1000000
lr = 0.01
lamb = 0
degree = 5
color = ['r','g','b']
for idx,lambs in enumerate([0]):
    poly_feat = PolynomialFeatures(degree, include_bias=True)
    X_poly = poly_feat.fit_transform(X[:,1:])
    theta = np.zeros((X_poly.shape[1],1))
    final_theta = batch_gradient_descent(X_poly, y, theta, epoch=epochs, lr=lr, lamb=lambs)
    test1 = np.array(data['Test 1'])
    test2 = np.array(data['Test 2'])
    test1.sort();test2.sort();# sort the disorder sequency
    Test1, Test2 = np.meshgrid(test1, test2)
    score_mesh = np.zeros((test1.size, test2.size))
    for idx1, t1 in enumerate(test1):
        for idx2, t2 in enumerate(test2):
            poly = poly_feat.fit_transform(np.array([t1, t2]).reshape(1,-1))
            score_mesh[idx2, idx1] = poly@final_theta
    cs = plt.contour(Test1, Test2, score_mesh,levels = 0)
    cs.collections[0].set_label('lamb = '+str(lambs))
    
positive = data[data['Accepted'].isin([1])]
negative = data[data['Accepted'].isin([0])]

plt.scatter(positive['Test 1'], positive['Test 2'], s=20, c='c', marker='o', label='Accepted')
plt.scatter(negative['Test 1'], negative['Test 2'], s=30, c='m', marker='x', label='Not Accepted')
plt.legend()
plt.xlabel('Test 1 Score')
plt.ylabel('Test 2 Score')

原理与思考

同样是不规则空间的等高线绘制，plt.contour函数与plt.tricontour，之所以会出现这么大的不同
是由于plt.contour的等高线算法是针对规则格网的等高线算法，要求X与Y是单调递增或递减的，而plt.contour针对不规则三角网的等高线算法。

python 使用plt.tricontour绘制逻辑回归决策边界（不规则空间下的等高线图）

Python相关栏目本月热门文章