Pandas_Python

Pandas

Pandas介绍

Pandas数据结构介绍 Series类型

以为数组型对象，包含一个：值序列，一个数据标签（index：索引）

如果不指定，默认索引就是从0到N-1

通过values和index分别获得值与索引

import pandas as pd

obj = pd.Series([4, 7, -5, 3])
obj

0    4
1    7
2   -5
3    3
dtype: int64

print(obj.values)
print(obj.index)

通常需要特定指定索引的

obj2 = pd.Series([4, 7, -5, 3], index = ['d', 'b', 'a', 'c'])
obj2

d    4
b    7
a   -5
c    3
dtype: int64

obj2.index

Index(['d', 'b', 'a', 'c'], dtype='object')

Series的一些性质

print(obj2['b'])

# 当然也可以通过索引更改值
obj2['d'] = 6
# 也可以使用神奇索引
obj2[['c', 'a', 'd']]

c    3
a   -5
d    6
dtype: int64

# 这里直接写obj2>0也可以，效果是一样的
obj2[obj2.values > 0]

d    6
b    7
c    3
dtype: int64

# 用于numpy数学函数
import numpy as np

np.exp(obj2)

d     403.428793
b    1096.633158
a       0.006738
c      20.085537
dtype: float64

可以将Series类型看作是：长度固定且有序的字典

'b' in obj2

True

'r' in obj2

False

用字典生成一个Series

sdata = {'Ohio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000}
obj3 = pd.Series(sdata)
obj3

Ohio      35000
Texas     71000
Oregon    16000
Utah       5000
dtype: int64

没有index的那个值就用NaN(Not a Number)来替代，NA是pandas中替代缺失值的元素

states = ['California', 'Ohio', 'Oregon', 'Texas']
obj4 = pd.Series(sdata, index = states)
obj4

California        NaN
Ohio          35000.0
Oregon        16000.0
Texas         71000.0
dtype: float64

使用isnull/notnull来检测缺失值

pd.isnull(obj4)

California     True
Ohio          False
Oregon        False
Texas         False
dtype: bool

pd.notnull(obj4)

California    False
Ohio           True
Oregon         True
Texas          True
dtype: bool

自动对齐是一个非常好用的东西

obj3 + obj4

California         NaN
Ohio           70000.0
Oregon         32000.0
Texas         142000.0
Utah               NaN
dtype: float64

pd.Series()数据类型的对象本身和其中的index都有name属性

obj3.index.name = 'state'
obj3.name = 'population'
obj3

state
Ohio      35000
Texas     71000
Oregon    16000
Utah       5000
Name: population, dtype: int64

可以按照位置给索引赋值

obj

0    4
1    7
2   -5
3    3
dtype: int64

obj.index = ['Bob', 'Steve', 'Jeff', 'Ryan']
obj

Bob      4
Steve    7
Jeff    -5
Ryan     3
dtype: int64

Pandas