2021-10-17_Python

2021-10-17

线性回归练习记录

一、通过Excel进行数据分析
二、不借助第三方库进行线性回归分析
三.通过sklearn进行线性回归分析
总结

一、通过Excel进行数据分析

通过excel打开准备好的数据集，选择前20组数据，点击插入，选择散点图，点图右上角的+号，选择趋势线中的线性预测。然后在更多选项中选择显示公式和R^2，在选择不同数量组数据时，得到以下数据：
20组：

2000组数据

二、不借助第三方库进行线性回归分析

代码如下：

import xlrd
import numpy as np
def read_xls(count):
    workbook=xlrd.open_workbook(r'D:Downloadweights_heights.xls')
    sheet=workbook.sheet_by_index(1)
    tables= []
    if type(count)!=int:
        return tables
    for i in range(0,count):
        temp ={'l1':'','l2':''}
        temp['l1']=sheet.cell_value(i+1,1)
        temp['l2']=sheet.cell_value(i+1,2)
        tables.append(temp)
    return tables
def fit(x,y):
    x_=np.mean(x)
    y_=np.mean(y)
    r=0.0
    r1=0.0
    t=0.0
    t1=0.0
    for i in range(0,np.size(x)):
        t+=(x[i]-x_)*(y[i]-y_)
        t1+=np.square(x[i]-x_)
    r=t/t1
    r1=y_-r*x_
    return r,r1
tables=read_xls(2000)
x=[]
y=[]
for i in range(0,np.size(tables)):
    x.append(tables[i]['l1'])
for i in range(0,np.size(tables)):
    y.append(tables[i]['l2'])
t=fit(x,y) #求解方程
print(t)
R2=0.0 #求R^2
t1=0.0
t2=0.0
for i in range(0,np.size(x)):
    t1+=np.square((t[0]*x[i]+t[1])-y[i])
    t2+=np.square(np.mean(y)-y[i])
R2=1-(t1/t2)
print(R2)

20组数据时，函数为y^ =4.1280x-152.2337,R^2=0.3254
200组数据时，函数为y^ =3.4316x-105.3530,R^2=0.3099
2000组数据时，函数为y^ =2.9555x-73.6607,R^2=0.2483

三.通过sklearn进行线性回归分析

代码如下：

from sklearn import preprocessing
import sklearn.model_selection
from sklearn import linear_model
import numpy as np
import pandas as pd
data=pd.read_excel("D:Downloadweights_heights.xls","weights_heights")
ss=data.iloc[:200,[1,2]]
x_tr,x_te,y_tr,y_te=train_test_split(ss.iloc[:,[0]],ss.iloc[:,[1]],test_size=0.2,train_size=0.8)
model=linear_model.LinearRegression()
model.fit(x_tr,y_tr)
print(model.coef_,model.intercept_,model.score(x_te,y_te))

20组数据时，函数为y^ =4.1488x-152.7636,R^2=0.4053
200组数据时，函数为y^ =3.4595x-107.3113,R^2=0.2323
2000组数据时，函数为y^ =2.9434x-72.9775,R^2=0.2476

总结

借助第三方库可以大大加快开发速度，提高效率。而且效果和通过其他不借助库解决回归问题差别并不是很大。

2021-10-17

Python相关栏目本月热门文章