1 Simple Octave/MATLAB function

2 Linear regression with one variable（单变量线性回归）

2.1 Plotting the Data

2.2 Gradient Descent（梯度下降）

2.2.1 Update Equations

2.2.2 Implementation

2.2.3 Computing the cost J(θ)

2.2.4 Gradient descent

2.3 Debugging

2.4 Visualizing J(θ)

3 Linear regression with multiple variables

3.1 Feature Normalization（特征归一化）

3.2 Gradient Descent

3.2.1 Optional (ungraded) exercise: Selecting learning rates

3.3 Normal Equations

1 Simple Octave/MATLAB function

return a 5x5 identity matrix

>>> import numpy as np
>>> A=np.eye(5)
>>> A
array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

2 Linear regression with one variable（单变量线性回归）
implement linear regression with one variable to predict profits for a food truck

ex1data1.txt：first column：population of a city ；second column：profit of a food truck in that city（A negative value for profit indicates a loss）.

2.1 Plotting the Data

use a scatter plot（散点图） to visualize the data

# 2.1 Plotting the Data
import pandas as pd
import matplotlib.pyplot as plt

path='ex1data1.txt'
# names 指定列名，文件中不包含header的行，应该显性表示header=None(data是Dataframe)
data=pd.read_csv(path,header=None,names=['Population', 'Profit'])
# 观察一下数据读取是否准确(head( )函数只能读取前五行数据)
print(data.head())

# 1
# x y label与names一致
#data.plot(kind='scatter',x='Population of City in 10,000s',y='Profit in $10,000s')
fig,ax=plt.subplots()
ax.scatter(data['Population'],data['Profit'])
'''
x=data.values[:,0]
y=data.values[:,1]
# 2
plt.scatter(x,y)
# 3
# 对线条指定为o(plt.plot默认生成折线图)
# plt.plot(x,y,"bo")
'''
plt.xlabel('Population of City in 10,000s')
plt.ylabel('Profit in $10,000s')
plt.show()

2.2 Gradient Descent（梯度下降）
fit the linear regression parameters θ to our dataset using gradient descent

2.2.1 Update Equations

batch gradient descent（批量梯度下降）：更新参数时使用所有的样本来进行更新

2.2.2 Implementation

1、add another dimension to our data to accommodate the θ0 intercept term

（add an additional first column to X and set it to all ones）

2、initialize the initial parameters to 0 and the learning rate alpha to 0.01

# 2.2 Gradient Descent
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

path='ex1data1.txt'
data=pd.read_csv(path,header=None,names=['Population', 'Profit'])

# 2.2.2
# 在第0列插入1,列名为’Ones’
data.insert(0, 'Ones', 1)
# 初始化X和y
cols = data.shape[1]#读取矩阵第二维度的长度
X = data.iloc[:,:-1]#X是data里的除最后一列
y = data.iloc[:,cols-1:cols]#y是data最后一列
print(X.head())#head()观察前5行
# 转换X和Y为numpy矩阵
X = np.matrix(X.values)
y = np.matrix(y.values)
# 初始化theta
theta = np.matrix(np.array([0,0]))
# 初始化学习率0.01
alpha = 0.01
iterations = 1500

2.2.3 Computing the cost J(θ)

using θ initialized to zeros，cost = 32.07

# 2.2.3
# 计算J(Ѳ)
def computeCost(X,y,theta):
    inner=np.power(((X*theta.T)-y),2)
    return np.sum(inner)/(2*len(X))

print(computeCost(X,y,theta))

2.2.4 Gradient descent

J(θ) is parameterized by the vector θ, not X and y

verify that gradient descent is working correctly：the value of J(θ) is decreasing with each step

J(θ) should converge to a steady value by the end of the algorithm

1、use final parameters to plot the linear fit

2、make predictions on profits in areas of 35,000 and 70,000 people

# 2.2.4
# 计算Ѳ
def gradientDescent(X, y, theta, alpha, iterations):
    temp = np.matrix(np.zeros(theta.shape))
    parameters = int(theta.ravel().shape[1]) # ravel()方法将数组维度拉成一维数组
    cost = np.zeros(iterations)
    m=len(X)
    for i in range(iterations):# 更新iterations轮
        error = (X * theta.T) - y

        for j in range(parameters):# 所有参数同时更新
            term = np.multiply(error, X[:, j])
            temp[0, j] = theta[0, j] - ((alpha / m) * np.sum(term))

        theta = temp
        cost[i] = computeCost(X, y, theta)
        print(i,"cost",cost[i])
    return theta, cost

g, cost = gradientDescent(X, y, theta, alpha, iterations)
print("Ѳ=",g)

# 画图
x = np.linspace(data.Population.min(), data.Population.max())
f = g[0, 0] + (g[0, 1] * x)
fig, ax = plt.subplots()
ax.plot(x, f, 'r', label='Linear regression')
ax.scatter(data.Population, data.Profit, label='Training data')
ax.legend(loc=2)
ax.set_xlabel('Population of City in 10,000s')#AxesSubplot必须用set_xlabel
ax.set_ylabel('Profit in $10,000s')
ax.set_title('Training data with linear regression fit')
plt.show()

# 预测35,000，70,000
predict1 = [1,3.5]*g.T
print("predict1:",predict1)
predict2 = [1,7]*g.T
print("predict2:",predict2)

   ones  Population
0     1      6.1101
1     1      5.5277
2     1      8.5186
3     1      7.0032
4     1      5.8598
(97, 2) (1, 2) (97, 1)
32.072733877455676
0 cost 6.737190464870007
1 cost 5.9315935686049555
2 cost 5.901154707081388
3 cost 5.895228586444221
4 cost 5.8900949431173295
5 cost 5.885004158443647
6 cost 5.879932480491418
7 cost 5.874879094762575
8 cost 5.869843911806385
9 cost 5.8648268653129305
10 cost 5.859827889932181
……
1498 cost 4.483411453374869
1499 cost 4.483388256587726
Ѳ= [[-3.63029144  1.16636235]]
predict1: [[0.45197679]]
predict2: [[4.53424501]]

2.3 Debugging

Printing the dimensions of variables with the size command will help you debug.

# 2.3 Debugging
#检查维度
print(X.shape, theta.shape, y.shape)

2.4 Visualizing J(θ)

plot the cost over a 2-dimensional grid of θ0 and θ1 values

还在思考怎么用python实现（需要使用meshgrid创建的2D数组定义Z，但是在每次迭代以后才知道下次迭代用的theta才能计算z）

我傻了我傻了，直接生成xy计算xy下的z就行了。。。

# 2.4 Visualizing J(θ)
theta0 = np.linspace(-10, 10, 100)#start:返回样本数据开始点 stop:返回样本数据结束点 num:生成的样本数据量，默认为50
theta1 = np.linspace(-1, 4, 100)
xn=np.size(theta0, 0)#传入的第二个参数是0，则返回矩阵的行数
print(xn)
yn=np.size(theta1, 0)
J = np.zeros((xn,yn))
for i in range(xn):
    for j in range(yn):
        t = np.matrix([theta0[i], theta1[j]])
        J[i, j] = computeCost(X, y, t)


# 绘制三维图像
theta0, theta1 = np.meshgrid(theta0, theta1)
fig = plt.figure()
ax = fig.add_subplot(121, projection='3d')
ax.plot_surface(theta0, theta1, J.T,rstride=5, cstride=1,cmap=cm.coolwarm, )
ax.set_xlabel('Ѳ0')
ax.set_ylabel('Ѳ1')
ax.set_title('(a)Surface')

# 绘制等高线图
ax2 = fig.add_subplot(122)
ax2.contour(theta0, theta1, J.T, np.logspace(-2, 3, 20))# logspace等比数列
ax2.plot(g[0,0], g[0,1], 'rx')
ax2.set_xlabel('Ѳ0')
ax2.set_ylabel('Ѳ1')
ax2.set_title('(b)Contour,showing minimum')
plt.show()

3 Linear regression with multiple variables

implement linear regression with multiple variables to predict the prices of houses

ex1data2.txt：The first column is the size of the house (in square feet), the second column is the number of bedrooms, and the third column is the price of the house.

# 3 Linear regression with multiple variables
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

path = 'ex1data2.txt'
data = pd.read_csv(path, header=None, names=['Size', 'Bedrooms', 'Price'])

# 初始化X和y
cols = data.shape[1]#读取矩阵第二维度的长度
X = data.iloc[:,:-1]#X是data里的除最后一列

3.1 Feature Normalization（特征归一化）

feature scaling（特征缩放） can make gradient descent converge much more quickly

•Subtract the mean value of each feature from the dataset.
•Divide the feature values by their respective “standard deviations.”

# 3.1 Feature Normalization
m=X.mean()# df.mean()=df.mean(0)得到每列数据的平均值;df.mean(1)得到每行数据的平均值
s=X.std()
X = (X - m) / s
# 在第0列插入1,列名为’Ones’
X.insert(0, 'Ones', 1)
y = data.iloc[:,cols-1:cols]#y是data最后一列
print(X.head())#head()观察前5行
# 转换X和Y为numpy矩阵
X = np.matrix(X.values)
y = np.matrix(y.values)
# 初始化theta
theta = np.matrix(np.array([0,0,0]))
# 初始化学习率0.01
# alpha = 0.01
# alpha=0.03
# alpha=0.1
alpha=0.3
#iterations = 500
iterations = 100

3.2 Gradient Descent

3.2.1 Optional (ungraded) exercise: Selecting learning rates

Using the best learning rate that you found to run gradient descent until convergence to find the final values of θ.

predict the price of a house with 1650 square feet and 3 bedrooms

# 3.2 Gradient Descent基本同2.2
# 计算J(Ѳ)
def computeCost(X,y,theta):
    inner=np.power(((X*theta.T)-y),2)
    return np.sum(inner)/(2*len(X))

# 计算Ѳ
def gradientDescent(X, y, theta, alpha, iterations):
    temp = np.matrix(np.zeros(theta.shape))
    parameters = int(theta.ravel().shape[1]) # ravel()方法将数组维度拉成一维数组

    m=len(X)
    for i in range(iterations):# 更新iterations轮
        error = (X * theta.T) - y

        for j in range(parameters):# 所有参数同时更新
            term = np.multiply(error, X[:, j])
            temp[0, j] = theta[0, j] - ((alpha / m) * np.sum(term))

        theta = temp
        cost[i] = computeCost(X, y, theta)
        print(i,"cost",cost[i])
    return theta, cost

cost = np.zeros(iterations)

g, cost = gradientDescent(X, y, theta, alpha, iterations)
print("Ѳ=",g)

# 预测1650 3
test=([1650,3]- m) / s
print(test)
predict= [1,test['Size'],test['Bedrooms']]*g.T
print("predict:",predict)

# 3.2.1 Optional (ungraded) exercise: Selecting learning rates
iters = np.arange(0,iterations,1)
plt.plot(iters,cost)
plt.xlabel('Number of iterations')
plt.ylabel('Cost J')
# plt.title('alpha = 0.01')
# plt.title('alpha = 0.03')
# plt.title('alpha = 0.1')
plt.title('alpha = 0.3')
plt.show()

Ѳ= [[340412.65957447 110630.99273722  -6649.41672919]]
predict: [[293081.49780324]]

3.3 Normal Equations

the closed-form solution to linear regression is

梯度下降与正规方程的比较：

梯度下降：需要选择学习率α，需要多次迭代，需要特征缩放，当特征数量n大时也能较好适用，适用于各种类型的模型.

正规方程：不需要选择学习率α，一次计算得出，不需要特征缩放，如果特征数量n较大则运算代价大，因为矩阵逆的计算时间复杂度为O()，通常来说当小于10000 时还是可以接受的，只适用于线性模型，不适合逻辑回归模型等其他模型.

# 3.3 Normal Equations
import pandas as pd
import numpy as np

path = 'ex1data2.txt'
data = pd.read_csv(path, header=None, names=['Size', 'Bedrooms', 'Price'])

# 在第0列插入1,列名为’Ones’
data.insert(0, 'Ones', 1)
# 初始化X和y
cols = data.shape[1]# 读取矩阵第二维度的长度
X = data.iloc[:,:-1]# X是data里的除最后一列
y = data.iloc[:,cols-1:cols]# y是data最后一列

# 正规方程
def normalEqn(X, y):
    theta = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)# np.linalg.inv()：矩阵求逆
    return theta

theta=normalEqn(X, y)#这里用的是data1的数据
print(theta)

predict= np.sum([1,1650,3]*theta.T)# 这里加个sum才行，不然输出三个数
print("predict:",predict)

[[89597.9095428 ]
 [  139.21067402]
 [-8738.01911233]]
predict: 293081.46433489426

好难啊，我好菜啊，哭了。。。

Programming Exercise 1: Linear Regression

1 Simple Octave/MATLAB function
return a 5x5 identity matrix

>>> import numpy as np >>> A=np.eye(5) >>> A array([[1., 0., 0., 0., 0.], [0., 1., 0., 0., 0.], [0., 0., 1., 0., 0.], [0., 0., 0., 1., 0.], [0., 0., 0., 0., 1.]])

2.2 Gradient Descent（梯度下降）
fit the linear regression parameters θ to our dataset using gradient descent

2.2.1 Update Equations

batch gradient descent（批量梯度下降）：更新参数时使用所有的样本来进行更新

2.2.3 Computing the cost J(θ)
using θ initialized to zeros，cost = 32.07

# 2.2.3 # 计算J(Ѳ) def computeCost(X,y,theta): inner=np.power(((Xtheta.T)-y),2) return np.sum(inner)/(2len(X)) print(computeCost(X,y,theta))

2.3 Debugging
Printing the dimensions of variables with the size command will help you debug.

# 2.3 Debugging #检查维度 print(X.shape, theta.shape, y.shape)

3.2 Gradient Descent

Python相关栏目本月热门文章

Programming Exercise 1: Linear Regression

1 Simple Octave/MATLAB function return a 5x5 identity matrix >>> import numpy as np >>> A=np.eye(5) >>> A array([[1., 0., 0., 0., 0.], [0., 1., 0., 0., 0.], [0., 0., 1., 0., 0.], [0., 0., 0., 1., 0.], [0., 0., 0., 0., 1.]])

2.2 Gradient Descent（梯度下降） fit the linear regression parameters θ to our dataset using gradient descent

2.2.1 Update Equations ​ batch gradient descent（批量梯度下降）：更新参数时使用所有的样本来进行更新 ​

2.2.3 Computing the cost J(θ) using θ initialized to zeros，cost = 32.07 # 2.2.3 # 计算J(Ѳ) def computeCost(X,y,theta): inner=np.power(((X*theta.T)-y),2) return np.sum(inner)/(2*len(X)) print(computeCost(X,y,theta))

2.3 Debugging Printing the dimensions of variables with the size command will help you debug. # 2.3 Debugging #检查维度 print(X.shape, theta.shape, y.shape)

3.2 Gradient Descent

Python相关栏目本月热门文章

1 Simple Octave/MATLAB function
return a 5x5 identity matrix

>>> import numpy as np >>> A=np.eye(5) >>> A array([[1., 0., 0., 0., 0.], [0., 1., 0., 0., 0.], [0., 0., 1., 0., 0.], [0., 0., 0., 1., 0.], [0., 0., 0., 0., 1.]])

2.2 Gradient Descent（梯度下降）
fit the linear regression parameters θ to our dataset using gradient descent

2.2.1 Update Equations

batch gradient descent（批量梯度下降）：更新参数时使用所有的样本来进行更新

2.2.3 Computing the cost J(θ)
using θ initialized to zeros，cost = 32.07

# 2.2.3 # 计算J(Ѳ) def computeCost(X,y,theta): inner=np.power(((Xtheta.T)-y),2) return np.sum(inner)/(2len(X)) print(computeCost(X,y,theta))

2.3 Debugging
Printing the dimensions of variables with the size command will help you debug.

# 2.3 Debugging #检查维度 print(X.shape, theta.shape, y.shape)