源数据为电影评分数据,原文件地址:
其实拿出数据中的一列,就是series类型,链接:https://pan.baidu.com/s/1o1DQqAYr9QHqBZ52z4LMZw?pwd=pyth
提取码:pyth
series对象略像字典这种结构,因为带键值对(此处称为索引、值)
import pandas as pd
fandango = pd.read_csv('fandango_score_comparison.csv')
series_film = fandango['FILM']
print(type(fandango))#数据矩阵的类型
print(type(series_film))#拿出其中一列,类型就是series
print(series_film[0:5])
series_rt = fandango['RottenTomatoes']
print (series_rt[0:5])
series的特点:建立series对象,可以用值作为索引,同时,数字索引依然可用
# import the Series object from pandas from pandas import Series series_film = fandango['FILM'] film_names = series_film.values print (type(film_names))#,pandas一些内容是封装在numpy基础之上的 rt_scores = series_rt.values#拿一列值,以备构建series对象使用 series_custom = Series(rt_scores , index=film_names)#使用film_names作为索引 series_custom[['Minions (2015)', 'Leviathan (2014)']]#行索引一般是数字,这里可以使用数值索引,当然也可以用数字,因为在构建series数据时使用film_names作为索引了 #结果为: ''' Minions (2015) 54 Leviathan (2014) 99 dtype: int64 ''' #上面数值索引可用,下面的例子中数字索引依然是可用的 # int index is also aviable series_custom = Series(rt_scores , index=film_names) series_custom[['Minions (2015)', 'Leviathan (2014)']] fiveten = series_custom[5:10]#正常使用 print(fiveten) ''' The Water Diviner (2015) 63 Irrational Man (2015) 42 Top Five (2014) 86 Shaun the Sheep Movie (2015) 99 Love & Mercy (2015) 89 dtype: int64 '''
索引排序后,索引对应的数据也会跟着索引一起变动,同样的,对值排序,索引也跟着变动
original_index = series_custom.index.tolist() sorted_index = sorted(original_index)#字符串排序,original_index没变 sorted_by_index = series_custom.reindex(sorted_index)#对索引重新排,按照sorted_index的顺序排,注意索引对应的数据也跟着排序了 print(series_custom[0:3])#原series对象 ''' ['Avengers: Age of Ultron (2015)', 'Cinderella (2015)', 'Ant-Man (2015)'] Avengers: Age of Ultron (2015) 74 Cinderella (2015) 85 Ant-Man (2015) 80 ''' print(sorted_by_index[0:3])#排序后的series对象 ''' '71 (2015) 97 5 Flights Up (2015) 52 A Little Chaos (2015) 40 ''' print(series_custom[["'71 (2015)","5 Flights Up (2015)","A Little Chaos (2015)"]])#挑出三数值索引以及数据对照 ''' '71 (2015) 97 5 Flights Up (2015) 52 A Little Chaos (2015) 40 dtype: int64 ''' #对值排序,索引也跟着排序 sc2 = series_custom.sort_index() sc3 = series_custom.sort_values() print(sc2[0:3]) ''' '71 (2015) 97 5 Flights Up (2015) 52 A Little Chaos (2015) 40 ''' print(sc3[0:3]) ''' '71 (2015) 97 5 Flights Up (2015) 52 A Little Chaos (2015) 40 '''
Series对象中的值被视为ndarray,即NumPy中的核心数据类型,NumPy中ndarray的操作自然是通用的
#The values in a Series object are treated as an ndarray, the core data type in NumPy
import numpy as np
# Add each value with each other
print np.add(series_custom, series_custom)#加法操作函数
# Apply sine function to each value
np.sin(series_custom)#sin函数
# Return the highest value (will return a single value not a Series)
np.max(series_custom)#求最大值函数
#像ndarray一样通过True,false序列作为索引筛选数据
#will actually return a Series object with a boolean value for each film
series_greater_than_50 = series_custom[series_custom > 50]#series_custom > 50 会生成True和False的序列
criteria_one = series_custom > 50
criteria_two = series_custom < 75
both_criteria = series_custom[criteria_one & criteria_two]#大于50小于75的留下
print (both_criteria)
'''
Avengers: Age of Ultron (2015) True
Cinderella (2015) True
Ant-Man (2015) True
Do You Believe? (2015) False
Hot Tub Time Machine 2 (2015) False
...
Mr. Holmes (2015) True
'71 (2015) True
Two Days, One Night (2014) True
Gett: The Trial of Viviane Amsalem (2015) True
Kumiko, The Treasure Hunter (2015) True
Length: 146, dtype: bool
'''
索引相同的两个series对象的相加
#data alignment same index rt_critics = Series(fandango['RottenTomatoes'].values, index=fandango['FILM']) rt_users = Series(fandango['RottenTomatoes_User'].values, index=fandango['FILM']) rt_mean = (rt_critics + rt_users)/2#索引相同,所以对应数值加起来 print(rt_mean)



