pandas dataframe合并(pandas两种数据结构)

Pandas 的数据结构主要是：Series（一维数组），Dataframe（二维数组）。Dataframe是由索引和内容组成，索引既有行索引index又有列索引columns，如内容,index=[],colunms=[] 这样的形式。

1 Pandas中创建Dataframe

1.1 pd.Dataframe(ndarray数据，index=[‘行索引1’，‘行索引2’]，colunms=[‘列索引1’，‘列索引2’])

import numpy as np
import pandas as pd
a=pd.Dataframe(np.arange(18).reshape(3,6),index=['a','b','c'],columns=['A','B','C','D','E','F'])
print(a)
#  A   B   C   D   E   F
#  a   0   1   2   3   4   5
#  b   6   7   8   9  10  11
#  c  12  13  14  15  16  17

1.2 pd.Dataframe(dict数据）

a=pd.Dataframe([{'a':0,'b':3,'c':6},{'a':1,'b':4,'c':7},{'a':2,'b':8,'c':5}]) #带字典的列表
b=pd.Dataframe({'a':[0,1,2],'b':[3,4,8],'c':[6,7,5]}) #字典
c=pd.Dataframe(dict(a=[0,1,2],b=[3,4,5],c=[6,7,8])) #字典
#out： a  b  c
#   0  0  3  6
#   1  1  4  7
#   2  2  8  5

2 Dataframe属性

和ndarray类似，Dataframe也具有shape、dtype等属性

2.1 df.shape ：查看Dataframe的形状

2.2 df.dtype：查看Dataframe的列数据的类型

2.3 df[df.index==某行索引值]：对某行内容进行索引

import pandas as pd
A=pd.Dataframe({'a':[1,2,3],'b':[2,3,4],'c':[2,4,5]})
print(A[A.index==0])
#Out：a  b  c
#  0  1  2  2
print(A[A.a==1])
#Out：a  b  c
#  0  1  2  2

2.4 df.columns：列索引

2.5 df.head()：仅显示前面几行数据（默认是前五行）

2.6 df.tail()：仅显示最后几行数据（默认是后五行）

3 Dataframe的索引

3.1 查看某列——df['列索引']

import pandas as pd
A=pd.Dataframe(dict(a=[1,2,3],b=[2,3,4],c=[23,5,2]))
print(A['a'])
#Out：0    1
#     1    2
#     2    3
#     Name: a, dtype: int64

3.2 查看某个数据——df['列索引']['行索引']

import pandas as pd
import numpy as np
B=pd.Dataframe(np.arange(12).reshape(3,4),index=['a','b','c'],columns=['AA','BB','CC','DD'])
print(B['AA'][0]) 
#Out：0

需要注意的是想要查看某行，不能用df['行索引']这样的形式。因为在Pandas中方括号写数组,表示取行索引对行进行操作，方括号写字符串，表示取列索引对列进行操作。

3.2 查看某行——df.loc[]函数和df.iloc[]函数

import pandas as pd
import numpy as np
a=pd.Dataframe(np.arange(15).reshape(3,5),index=['a','b','c'],columns=['aa','bb','cc','dd','ee'])
print(a)
#Out1：  aa  bb  cc  dd  ee
#    a   0   1   2   3   4
#    b   5   6   7   8   9
#    c  10  11  12  13  14
print(a.loc['a':'b','aa':'dd'])
#Out2： aa  bb  cc  dd
#   a   0   1   2   3
#   b   5   6   7   8
print(a.iloc[0:1,0:3])
#Out3： aa  bb  cc
#   a   0   1   2

3.2.1 df.loc[索引]函数：方框+'索引'，闭区间

3.2.2 df.iloc[下标]函数：方框+下标，左到右不到

在读取Series的数据时，就有通过方括号+索引/下标值的方式读取对应数据这两种方式。这是因为Series的索引是可以重新定义的，而下标始终是不变的（0开始）。

3.4 查看行数据时还可以以下方式

df[df.index==某一行索引值]：查看某行数据

df[df.某列索引名==该列的一个值]：对一行或多行内容进行索引

pandas dataframe合并(pandas两种数据结构)

大数据系统相关栏目本月热门文章