线性回归——多变量向量化梯度下降算法实现（Python版）

【向量化】
单一变量的线性回归函数，我们将其假设为： $h_{θ} (χ) = θ_{0} + θ_{1} χ h_theta(chi)= theta_0+theta_1chi$ hθ(χ)=θ0+θ1χ但是如果我们的变量个数不止一个的话，那么我们的假设函数就应该是如下的形式：其中n-1为数据集中特征属性的个数
$h_{θ} (χ) = θ_{0} + \sum_{i = 1}^{n - 1} θ_{i} * χ_{i} h_theta(chi)=theta_0+sum_{i=1}^{n-1}theta_ichi_i$ hθ(χ)=θ0+i=1∑n−1θi∗χi
为了结构的统一，我们引入 $x_{0} = 1 x_0=1$ x0=1,则上式转化为如下的形式：
$h_{θ} (χ) = \sum_{i = 1}^{n} θ_{i} χ_{i} h_theta(chi)=sum_{i=1}^{n}theta_ichi_i$ hθ(χ)=i=1∑nθi∗χi
进而对其进行向量化，上式可以转换为：
$h_{θ} (χ) = \sum_{i = 1}^{n} θ_{i} χ_{i} = t h e t a^{T} χ h_theta(chi)=sum_{i=1}^{n}theta_i*chi_i=
theta^Tchi$ hθ(χ)=i=1∑nθi∗χi=thetaTχ
其中 $χ = [χ_{0}, χ_{1}, \dots, χ_{n}]^{T} chi=[chi_0,chi_1,cdots,chi_n]^T$ χ=[χ0,χ1,⋯,χn]T的列向量， $θ = [θ_{0}, θ_{1}, \dots, θ_{n}]^{T} theta=[ theta_0,theta_1,cdots,theta_n]^T$ θ=[θ0,θ1,⋯,θn]T,再次强调 $χ chi$ χ中的 $χ_{0} chi_0$ χ0是一个始终为1的属性。
对于正则化的梯度下降公式的推导如下：
$J (θ) = \frac{1}{2 m} \sum_{i = 1}^{m} (h_{θ} (x^{(i)}) - y^{(i)})^{2} J(theta)=frac{1}{2m} sum_{i=1}^{m}(h_theta(x^{(i)})-y^{(i)})^{2}$ J(θ)=2m1i=1∑m(hθ(x(i))−y(i))2
对 $J (θ) J(theta)$ J(θ)进行求偏导：
$\frac{\partial j (θ)}{\partial θ_{j}} = \frac{1}{m} \sum_{i = 1}^{m} (h_{θ} (x^{(i)}) - y^{(i)}) x_{j}^{(i)} frac{partial j(theta)}{partial theta_j}=frac{1}{m}sum_{i=1}^{m}(h_theta(x^{(i)})-y^{(i)})x_j^{(i)}$ ∂θj∂j(θ)=m1i=1∑m(hθ(x(i))−y(i))xj(i)
其中对于矩阵求导，请自行查阅矩阵求导变换公式
进一步，那么梯度下降迭代公式如下所示：
$θ_{j} : = θ_{j} - α \frac{1}{m} \sum_{i = 1}^{m} (h_{θ} (x^{(i)}) - y^{(i)}) x_{j}^{(i)} theta_j :=theta_j-alpha frac{1}{m}sum_{i=1}^{m}(h_ theta(x^{(i)})-y^{(i)})x_j^{(i)}$ θj:=θj−αm1i=1∑m(hθ(x(i))−y(i))xj(i)

【向量化的优点】
向量化相对于for循环而言，能一次性计算整个数据集，效率有明显的提升，并且Python内部对矩阵运算也进行了优化，能够充分利用计算机并行运算的能力。当然同时也有缺点，就是相对于for循环而言，理解起来更复杂。

【相关知识点——特征缩放】

特征缩放，能够有效的提高梯度下降的速率，减少迭代次数，使梯度下降算法更快的收敛。如果一些特征的取值范围较大，另外一些特征取值相对较小，那么绘制出的等高线图，便会便显出长扁的外形特征，如下图（来源吴恩达讲义）：
那么梯度下降迭代就会表现出弯弯曲曲迭代的特性（图中红色轨迹），而对于特征范围接近的数据集，其等高线图如下所示：（来源吴恩达讲义）

很明显等高线越圆，迭代速度越快。
通常对于特征范围较大的变量，我们的解决办法是：尝试将所有特征的尺度都尽量缩放到-1 到 1 之间，通常我们采用以下方法对特征进行缩放：
$c h i_{n} = d f r a c c h i_{n} - m u_{n} s_{n} chi_n=dfrac{chi_n-mu_n}{s_n}$ chin=dfracchin−munsn
其中 $m u_{n} mu_n$ mun是平均值， $s_{n} s_n$ sn是标准差。

具体代码实现如下：

#多变量梯度下降算法的实现，数据集采用吴恩达机器学习教程“ex1data2.txt”
#对于多变量线性回归梯度下降算法的实现，这里采用向量化的方式去进行

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd


def readData(path,name=[]):
    data = pd.read_csv(path,names=name) 
    data = (data - data.mean()) / data.std()
    data.insert(0,'First',1)
    return data


def costFunction(X,Y,theta):
    inner = np.power(((X * theta.T) - Y.T), 2)
    return np.sum(inner) / (2 * len(X))

def gradientDescent(data,theta,alpha,iterations):
    eachIterationValue = np.zeros((iterations,1))
    temp =np.matrix(np.zeros(theta.shape))
    X = np.matrix(data.iloc[:,0:-1].values)
    print(X)
    Y =np.matrix(data.iloc[:,-1].values)
    m = X.shape[0]
    colNum=X.shape[1]
    for i in range(iterations):
 error = (X * theta.T)-Y.T
 for j in range(colNum):
     term =np.multiply(error,X[:,j])
     temp[0,j] =theta[0,j]-((alpha/m) * np.sum(term))
 theta =temp
 eachIterationValue[i,0]=costFunction(X,Y,theta)
    return theta,eachIterationValue   

if __name__ == "__main__":
    data = readData('ex1data2.txt',['Size', 'Bedrooms', 'Price'])
    #data = (data - data.mean()) / data.std()
    theta =np.matrix(np.array([0,0,0]))
    
    iterations=1500
    alpha =0.01
    
    theta,eachIterationValue=gradientDescent(data,theta,alpha,iterations)
    
    print(theta)
    
    plt.plot(np.arange(iterations),eachIterationValue)
    plt.title('CostFunction')
    plt.show()

运行结果如下图：

线性回归——多变量向量化梯度下降算法实现（Python版）

Python相关栏目本月热门文章