pandas中series对象

源数据为电影评分数据，原文件地址：
其实拿出数据中的一列，就是series类型，链接：https://pan.baidu.com/s/1o1DQqAYr9QHqBZ52z4LMZw?pwd=pyth
提取码：pyth
series对象略像字典这种结构，因为带键值对（此处称为索引、值）

import pandas as pd
fandango = pd.read_csv('fandango_score_comparison.csv')
series_film = fandango['FILM']
print(type(fandango))#数据矩阵的类型
print(type(series_film))#拿出其中一列，类型就是series
print(series_film[0:5])
series_rt = fandango['RottenTomatoes']
print (series_rt[0:5])

series的特点：建立series对象，可以用值作为索引，同时，数字索引依然可用

# import the Series object from pandas
from pandas import Series
series_film = fandango['FILM']
film_names = series_film.values
print (type(film_names))#,pandas一些内容是封装在numpy基础之上的
rt_scores = series_rt.values#拿一列值，以备构建series对象使用
series_custom = Series(rt_scores , index=film_names)#使用film_names作为索引
series_custom[['Minions (2015)', 'Leviathan (2014)']]#行索引一般是数字，这里可以使用数值索引,当然也可以用数字，因为在构建series数据时使用film_names作为索引了
#结果为：
'''
Minions (2015)      54
Leviathan (2014)    99
dtype: int64
'''
#上面数值索引可用，下面的例子中数字索引依然是可用的

# int index is also aviable
series_custom = Series(rt_scores , index=film_names)
series_custom[['Minions (2015)', 'Leviathan (2014)']]
fiveten = series_custom[5:10]#正常使用
print(fiveten)
'''
The Water Diviner (2015)        63
Irrational Man (2015)           42
Top Five (2014)                 86
Shaun the Sheep Movie (2015)    99
Love & Mercy (2015)             89
dtype: int64
'''

索引排序后，索引对应的数据也会跟着索引一起变动，同样的，对值排序，索引也跟着变动

original_index = series_custom.index.tolist()
sorted_index = sorted(original_index)#字符串排序,original_index没变
sorted_by_index = series_custom.reindex(sorted_index)#对索引重新排，按照sorted_index的顺序排，注意索引对应的数据也跟着排序了
print(series_custom[0:3])#原series对象
'''
['Avengers: Age of Ultron (2015)', 'Cinderella (2015)', 'Ant-Man (2015)']
Avengers: Age of Ultron (2015)    74
Cinderella (2015)                 85
Ant-Man (2015)                    80
'''
print(sorted_by_index[0:3])#排序后的series对象
'''
'71 (2015)               97
5 Flights Up (2015)      52
A Little Chaos (2015)    40
'''
print(series_custom[["'71 (2015)","5 Flights Up (2015)","A Little Chaos (2015)"]])#挑出三数值索引以及数据对照
'''
'71 (2015)               97
5 Flights Up (2015)      52
A Little Chaos (2015)    40
dtype: int64
'''
#对值排序，索引也跟着排序
sc2 = series_custom.sort_index()
sc3 = series_custom.sort_values()
print(sc2[0:3])
'''
'71 (2015)               97
5 Flights Up (2015)      52
A Little Chaos (2015)    40
'''
print(sc3[0:3])
'''
'71 (2015)               97
5 Flights Up (2015)      52
A Little Chaos (2015)    40
'''

Series对象中的值被视为ndarray，即NumPy中的核心数据类型,NumPy中ndarray的操作自然是通用的

#The values in a Series object are treated as an ndarray, the core data type in NumPy
import numpy as np
# Add each value with each other
print np.add(series_custom, series_custom)#加法操作函数
# Apply sine function to each value
np.sin(series_custom)#sin函数
# Return the highest value (will return a single value not a Series)
np.max(series_custom)#求最大值函数

#像ndarray一样通过True，false序列作为索引筛选数据
#will actually return a Series object with a boolean value for each film
series_greater_than_50 = series_custom[series_custom > 50]#series_custom > 50 会生成True和False的序列
criteria_one = series_custom > 50
criteria_two = series_custom < 75
both_criteria = series_custom[criteria_one & criteria_two]#大于50小于75的留下
print (both_criteria)
'''
Avengers: Age of Ultron (2015)                True
Cinderella (2015)                             True
Ant-Man (2015)                                True
Do You Believe? (2015)                       False
Hot Tub Time Machine 2 (2015)                False
                                             ...  
Mr. Holmes (2015)                             True
'71 (2015)                                    True
Two Days, One Night (2014)                    True
Gett: The Trial of Viviane Amsalem (2015)     True
Kumiko, The Treasure Hunter (2015)            True
Length: 146, dtype: bool
'''

索引相同的两个series对象的相加

#data alignment same index
rt_critics = Series(fandango['RottenTomatoes'].values, index=fandango['FILM'])
rt_users = Series(fandango['RottenTomatoes_User'].values, index=fandango['FILM'])
rt_mean = (rt_critics + rt_users)/2#索引相同，所以对应数值加起来
print(rt_mean)

pandas中series对象

Python相关栏目本月热门文章