栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 软件开发 > 后端开发 > Python

爬虫pandas,数据清洗-------周东海

Python 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

爬虫pandas,数据清洗-------周东海





import pandas as pd

print(pd.__version__)
#定义字典
mydataset = {
    'sites' : ["Google","Runoob","WiKi"],
    'number' : [1,2,3]
}

#将字典转换为dataframe,才能处理
mydf=pd.DataFrame(mydataset)
print(mydf)

a = [1,2,3]
mysr = pd.Series(a,name="aha")
print(mysr)
print(mysr[1])

b = ["Google","Runoob","WiKi"]
myvar = pd.Series(b,index=["x","y","z"])
print(myvar['y'])

sites = {1:"Google",2:"Runoob",3:"WiKi"}
myvar2 = pd.Series(sites)
print(myvar2[3])

sites = {1:"Google",2:"Runoob",3:"WiKi"}
myvar3 = pd.Series(sites,index=[1,2])
print(myvar3)
import pandas as pd

#定义列表
data = [['Google',10],['Runoob',12],['Wiki',13]]
#将列表转换为DataFrame
mydf=pd.DataFrame(data,columns=['name','age'])
print(mydf)

#字典
data = {'size':['Google','Runoob','Wiki'],'age' : [10,12,13]}
mydf2=pd.DataFrame(data)
print(mydf2)

data = [{'a':1,'b':2,'c':3},{'a':5,'b':10,'c':20}]
mydf3=pd.DataFrame(data)
print(mydf3)



data = {
    "calories":[420,280,390],
    "duration":[50,40,45]
}
mydf4=pd.DataFrame(data)
print(mydf4.loc[0])
print(mydf4.loc[[0,1]])

data = {
    "calories":[420,280,390],
    "duration":[50,40,45]
}
mydf5=pd.DataFrame(data,index=['row1','row2','row3'])
print(mydf5)

2.数据清洗

import pandas as pd

df = pd.read_csv('./sss.csv')

# print(df)

# 打印某一列,判断某一列是否有空值
# print(df['NUM_BEDROMMS'])
# print(df['NUM_BEDROMMS'].isnull())

# inplace修改原数据
# df2=df.dropna(inplace=True)
# print(df2)

# 只查看某一列
# df3=df.dropna(subes=['ST_NUM'])
# print(df3)

# 替换脏数据
# df4=df.fillna('666')
# print(df4)

# 替换某一列空值
df['PID'].fillna(123456, inplace=True)
print(df)

# 平均数填充空值
avg=df['ST_NUM'].mean()
# avg=df['ST_NUM'].madian()
# avg=df['ST_NUM'].mode()
df.fillna(avg,inplace=True)
print(df)

import pandas as pd

data={
    "Date":['2020/12/01','2020/12/02','20201226'],
    "duration":[50,40,45]
}
df = pd.DataFrame(data,index=['day1','day2','day3'])

df['Date']=pd.to_datetime(df['Date'])
print(df)

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/849830.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号