栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 软件开发 > 后端开发 > Python

pandas最快入门(二)-查询与新增

Python 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

pandas最快入门(二)-查询与新增

Pandas df.loc查询数据
import pandas as pd
读取数据
df=pd.read_excel('000001.SZ.xlsx',encoding='gbk')
df.head()
ts_codetrade_dateopenhighlowclosechangepct_chgvolamountpre_close
0000001.SZ2016010412.0012.0311.2311.33-0.66-5.50563497.87660376.153111.99
1000001.SZ2016010511.2711.5711.1511.400.070.62663269.95755531.353711.33
2000001.SZ2016010611.4211.5611.3911.530.131.14515706.44591698.520411.40
3000001.SZ2016010711.4111.4110.9110.94-0.59-5.12174761.10194869.493511.53
4000001.SZ2016010811.2111.2910.9011.120.181.65747527.58831334.546210.94
#设置索引为编号,方便筛选
df.set_index('trade_date',inplace=True)
#替换掉ts_code后面的后缀
df.loc[:,'ts_code']=df['ts_code'].str.replace('.SZ','').astype('int32')
df.head()
ts_codeopenhighlowclosechangepct_chgvolamountpre_close
trade_date
20160104112.0012.0311.2311.33-0.66-5.50563497.87660376.153111.99
20160105111.2711.5711.1511.400.070.62663269.95755531.353711.33
20160106111.4211.5611.3911.530.131.14515706.44591698.520411.40
20160107111.4111.4110.9110.94-0.59-5.12174761.10194869.493511.53
20160108111.2111.2910.9011.120.181.65747527.58831334.546210.94
使用label值查询数据
#得到单个值
df.loc[20160108,'open']
11.21
#得到一个series
df.loc[[20160107,20160108],'open']
trade_date
20160107    11.41
20160108    11.21
Name: open, dtype: float64
#得到一个Dataframe
df.loc[[20160107,20160108],['open','low']]
openlow
trade_date
2016010711.4110.91
2016010811.2110.90
使用数值区间进行范围查询
df.loc[20160106:20160108,'open']
trade_date
20160106    11.42
20160107    11.41
20160108    11.21
Name: open, dtype: float64
df.loc[20160107,'open':'low']
open    11.41
high    11.41
low     10.91
Name: 20160107, dtype: float64
df.loc[20160106:20160108,'open':'low']
openhighlow
trade_date
2016010611.4211.5611.39
2016010711.4111.4110.91
2016010811.2111.2910.90
使用条件表达式查询
#bool条件语句
df['open']<11.5
trade_date
20160104    False
20160105     True
20160106     True
20160107     True
20160108     True
            ...  
20201225    False
20201228    False
20201229    False
20201230    False
20201231    False
Name: open, Length: 1218, dtype: bool
#利用条件语句进行查询
#&表示与,每个条件表达式需要加括号
df.loc[(df['open']<11.3) & (df['high']>11.5),:]
ts_codeopenhighlowclosechangepct_chgvolamountpre_close
trade_date
20160105111.2711.5711.1511.400.070.62663269.957.555314e+0511.33
20170828111.1011.5411.1011.350.242.161603938.321.828380e+0611.11
20170904111.1811.7211.1711.720.514.551352325.571.551875e+0611.21
20170921111.2611.5111.2011.460.171.51692407.937.886053e+0511.29
20171027111.1911.5611.1811.560.383.401360086.421.548544e+0611.18
20171107111.2712.0911.2511.920.645.672477163.262.914269e+0611.28
自定义函数或lambda函数查询
#lambda表达式-传入df,返回冒号后面的内容
df.loc[lambda df: (df['open']<11.3) & (df['high']>11.5),:]
ts_codeopenhighlowclosechangepct_chgvolamountpre_close
trade_date
20160105111.2711.5711.1511.400.070.62663269.957.555314e+0511.33
20170828111.1011.5411.1011.350.242.161603938.321.828380e+0611.11
20170904111.1811.7211.1711.720.514.551352325.571.551875e+0611.21
20170921111.2611.5111.2011.460.171.51692407.937.886053e+0511.29
20171027111.1911.5611.1811.560.383.401360086.421.548544e+0611.18
20171107111.2712.0911.2511.920.645.672477163.262.914269e+0611.28
#自定义查询函数,返回条件语句
def query_my_data(df):
    return (df.index>20160105) & (df['close']==11.54)

df.loc[query_my_data,:]
ts_codeopenhighlowclosechangepct_chgvolamountpre_close
trade_date
20170912111.3811.5411.2711.540.161.4100846183.42962032.17211.38
20171031111.5511.5811.3911.54-0.02-0.1700627491.38721237.47411.56
20171102111.3611.5811.2611.540.141.2300604308.48689229.87311.40
20190222111.3511.5711.1511.540.181.58451184047.791343020.33611.36
Pnada新增数据列 直接赋值的方法
#计算当日股票价格变动
df.loc[:,"today_change"]=df['close']-df['open']
df.head()
ts_codeopenhighlowclosechangepct_chgvolamountpre_closetoday_change
trade_date
20160104112.0012.0311.2311.33-0.66-5.50563497.87660376.153111.99-0.67
20160105111.2711.5711.1511.400.070.62663269.95755531.353711.330.13
20160106111.4211.5611.3911.530.131.14515706.44591698.520411.400.11
20160107111.4111.4110.9110.94-0.59-5.12174761.10194869.493511.53-0.47
20160108111.2111.2910.9011.120.181.65747527.58831334.546210.94-0.09
df.apply方法

对series进行操作,需指定axis

def get_change(x):
    if x['today_change']>0:
        return "up"
    else:
        return "down"

#需要设置axis=1,表示series的index是columns
df.loc[:,"当日变动"]=df.apply(get_change,axis=1)
#统计
df['当日变动'].value_counts()
down    621
up      597
Name: 当日变动, dtype: int64
df.head()
ts_codeopenhighlowclosechangepct_chgvolamountpre_closetoday_change当日变动
trade_date
20160104112.0012.0311.2311.33-0.66-5.50563497.87660376.153111.99-0.67down
20160105111.2711.5711.1511.400.070.62663269.95755531.353711.330.13up
20160106111.4211.5611.3911.530.131.14515706.44591698.520411.400.11up
20160107111.4111.4110.9110.94-0.59-5.12174761.10194869.493511.53-0.47down
20160108111.2111.2910.9011.120.181.65747527.58831334.546210.94-0.09down
条件选择赋值
df['pct_observe']=''
df.loc[df['pct_chg']>0,'pct_observe']='+'
df.loc[df['pct_chg']<=0,'pct_observe']='-'
df.head()
ts_codeopenhighlowclosechangepct_chgvolamountpre_closetoday_change当日变动pct_observe
trade_date
20160104112.0012.0311.2311.33-0.66-5.50563497.87660376.153111.99-0.67down-
20160105111.2711.5711.1511.400.070.62663269.95755531.353711.330.13up+
20160106111.4211.5611.3911.530.131.14515706.44591698.520411.400.11up+
20160107111.4111.4110.9110.94-0.59-5.12174761.10194869.493511.53-0.47down-
20160108111.2111.2910.9011.120.181.65747527.58831334.546210.94-0.09down+

源代码和相应数据下载请点击 https://github.com/MiaoChenglin125/Data-Acquisition/tree/master/pandas%E6%93%8D%E4%BD%9C

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/283341.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号