栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 软件开发 > 后端开发 > Python

pandas 库的学习记录

Python 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

pandas 库的学习记录

目录

1、pandas 解决什么问题

以下面的例子认识dataframe

columns 的介绍

2、表格数据的读写 read and write tabular data

读数据

写数据

 3、数据表子集的操作

4、绘图 create plots in pandas

单数据,plt 绘制

 多数据,O-O style

 5、 create new columns  and 列的名称修改

列的重新命名

 6、calculate summary statistics 列表数据信息统计

Aggregating statistics

汇总按类别分组的统计信息 Aggregating statistics grouped by category

 Count number of records by category

7、排序

Sort table rows 按某列的元素对表格排序

8、combine data from multiple tables 合并

​ 多个表格按行列合并

连接两个表格  merge


1、pandas 解决什么问题

What kind of data does pandas handle?
When working with tabular data(表格数据), such as data stored in spreadsheets or databases, pandas is the right tool for you. pandas will help you to explore, clean, and process your data.

In pandas, a data table(数据表) is called a Dataframe.

以下面的例子认识dataframe
import numpy as np
import pandas as pd

df = pd.Dataframe(
       {
              "name": ["Braund, Mr. Owen Harris",
                     "Allen, Mr. William Henry",
                     "Bonnell, Miss. Elizabeth",],
              "age": [22, 35, 58],
              "sex": ["male", "female", "male"]
       }
)

print(df)

print(df.describe()) # 只针对数字类型的数据

 

 A Dataframe is a 2-dimensional data structure that can store data of different types (including characters, integers, floating point values, categorical data and more) in columns(列). 

columns 的介绍

Each column in a Dataframe is a Series

import numpy as np
import pandas as pd

df = pd.Dataframe(
       {
              "name": ["Braund, Mr. Owen Harris",
                     "Allen, Mr. William Henry",
                     "Bonnell, Miss. Elizabeth",],
              "age": [22, 35, 58],
              "sex": ["male", "female", "male"]
       }
)

print(df["age"])

 单纯的series

2、表格数据的读写 read and write tabular data

读数据
import numpy as np
import pandas as pd


ti_data = pd.read_excel("titanic.xlsx") # 读取 excel 数据

print(ti_data) # 打印各列数据的类型
print(ti_data.dtypes)
print(ti_data.head(3)) # 只看头三个数据
print(ti_data.tail(2)) # 末尾 两个

写数据
import numpy as np
import pandas as pd

df = pd.Dataframe(
       {
              "name": ["Braund, Mr. Owen Harris",
                     "Allen, Mr. William Henry",
                     "Bonnell, Miss. Elizabeth",],
              "age": [22, 35, 58],
              "sex": ["male", "female", "male"]
       }
)

'''
写数据
'''
df.to_excel("df.xlsx")

 3、数据表子集的操作

原列表

import numpy as np
import pandas as pd

t_data = pd.read_excel('df.xlsx')
print(t_data)

age = t_data[["age"]] # 选择特定的列
print(age)

age30 = t_data[t_data["age"] > 30] #选择某个数值进行筛选
print(age30)


'''
行列综合操作
'''
print('键')
sex_age = t_data.loc[t_data["age"] > 30, 'age']
print(sex_age)


print('坐标')
row_col = t_data.iloc[1:2,1:3]
print(row_col)

4、绘图 create plots in pandas

单数据,plt 绘制
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt



t_data = pd.read_excel('df.xlsx')

t_data["ID"] = [1,2,3] #增加了一列ID 数值1,2,3
print(t_data)

fig = t_data["age"].plot()
fig.set_title("age")
plt.show()

 多数据,O-O style
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt



t_data = pd.read_excel('df.xlsx')

t_data["ID"] = [1,2,3] #增加了一列ID 数值1,2,3
print(t_data)

fig, axs = plt.subplots(figsize=(12, 4))

t_data.plot(ax=axs)

axs.set_title("age and ID")

plt.show()

 

 5、 create new columns  and 列的名称修改
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt



t_data = pd.read_excel('df.xlsx')
print(t_data)
print('修改后的表格')

t_data["ID"] = [1,2,3] #增加了一列ID 数值1,2,3
t_data["age's cubic"] = t_data["age"] **3
print(t_data)

 

列的重新命名
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt



t_data = pd.read_excel('df.xlsx')
print(t_data)
print('修改后的表格')

#修改列的名字
t_data = t_data.rename(
    columns ={
        "age":"年龄"
    }
)

print(t_data)

 

 6、calculate summary statistics 列表数据信息统计

Aggregating statistics
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt



t_data = pd.read_excel('df.xlsx')
print(t_data)


print("mean of age:  ",t_data["age"].mean())
print(t_data.describe())

 

汇总按类别分组的统计信息 Aggregating statistics grouped by category
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt



t_data = pd.read_excel('df.xlsx')
t_data["ID"] = [1,2,3] #增加了一列ID 数值1,2,3
t_data["age's cubic"] = t_data["age"] **3
print(t_data)

#按 name 进行统计
#group = t_data.groupby("name").mean()
group = t_data[["age","ID","name"]].groupby("name").mean()
print(group)

 

 Count number of records by category
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt



t_data = pd.read_excel('df.xlsx')
t_data["ID"] = [1,2,3] #增加了一列ID 数值1,2,3
t_data["age's cubic"] = t_data["age"] **3
print(t_data)

print('first way')
print(t_data["age"].value_counts())

print('second way')
print(t_data.groupby("age")["age"].count()) # 之前学的按group 进行统计

 

7、排序

Sort table rows 按某列的元素对表格排序
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


t_data = pd.read_excel('df.xlsx')
print(t_data)

#按年龄进行排序 顺序
print(t_data.sort_values(by="age"))
#按年龄进行排序 逆序
print(t_data.sort_values(by="age",ascending=False))

8、combine data from multiple tables 合并

 多个表格按行列合并

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


data1 = pd.Dataframe(
    {
        "name": ["Braund, Mr. Owen Harris",
                 "Allen, Mr. William Henry",
                 "Bonnell, Miss. Elizabeth", ]
    }

)
print(data1)

data2 = pd.Dataframe(
    {
        "age": [22, 35, 58],
        "sex": ["male", "female", "male"]

    }

)
print(data2)


#合并
data = pd.concat([data1,data2],axis=1) #axis = 0, 列
print(data)

 

 

连接两个表格  merge

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


data1 = pd.Dataframe(
    {
        "name": ["Braund, Mr. Owen Harris",
                 "Allen, Mr. William Henry",
                 "Bonnell, Miss. Elizabeth", ],
        "age": [22, 35, 58]

    }

)
print(data1)

data2 = pd.Dataframe(
    {
        "age": [22, 35, 58],
        "sex": ["male", "female", "male"]

    }

)
print(data2)


#按其中的 age 列 进行 merge
data = pd.merge(data1,data2,how='left',on='age')
print(data)

 

 还有用多个列的参数进行合并的操作

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/740878.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号