Pandas-了解数据

内容：

1.查询数据的前几行、行数、列数、索引

2.根据某一列元素进行聚合并计数、并显示最多的那一类元素

3.对计数列进行加总

4.转换某一列的数据类型

5.统计某一列元素的种类

import pandas as pd
import numpy as np

#查询数据的前n行、后n行
df.head(10)查询数据的前10行
df.tail(10)查询数据的最后10行


#查询数据有多少行
df.shape[0]
df.info()

#查询数据有多少列
df.shape[1]
df.columns #是一个index型序列

#查询数据的索引
df.index

#根据某一列元素进行聚合并计数，并显示最多的那一类元素
new_df = df.groupby("item_name")
new_df = new_df.count()
new_df = new_df.sort_values(['随便一列'],ascending=False) #ascending=False表示降序
new_df.head(1)

#对计数列进行加总
total_item = df.quantity.sum()
total_item

#转换某一列的数据类型
#例子：某一列的元素是价格，格式是$+数字，比如$7.25,
#将其转换为float类型
dollarizer = lambda x:float(x[1:-1])
df.item_price = df.item_price.apply(dollarizer)
#在pandas中dtype('0')是字符串类型数据、dtype('float64')是浮点类型

#统计某一列的元素种类
number = df.columnname.value_counts().count()

2.users数据

#查询数据的前二十五行、后10行
users.head(25)
users.tail(10)

#查询数据的行数、列数
users.shape[0]
users.shape[1]

#查询数据的索引
users.index

#查询每一列的数据类型
users.dtypes

#打印某一列-occupation
users.occupation
users['occupation']

#查询某一列的数据有多少种类型
#查询数据集中有多少种occupations
users.occupation.nunique()
#或者
users.occupation.value_counts().count()

#查询某一列数据中出现次数最多的元素
#查询occupation列中最频繁的元素
users.occupation.value_counts().head(1).index[0]
users.occupation.value_counts().head()

#筛选某一列出现最少的次数
user.age.value_counts().tail()

#对数据进行描述性统计,只会返回numeric数据的统计结果
users.describe()

#对数据的某一列进行描述性统计
users.occupation.describe()

#计算某一列数值型数据的均值
users.age.mean()

3. food数据

#查询数据的尺寸
food.shape
food.shape[0]
food.shape[1]
food.info()

#打印数据所有的列索引
food.columns

#查询某一列数据的类型
food.dtypes['-glucose_100g']

#查询数据的索引
food.index

Pandas-了解数据

Python相关栏目本月热门文章