栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 面试经验 > 面试问答

在pandas中提取日期时间类型列的月份的第一天

面试问答 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

在pandas中提取日期时间类型列的月份的第一天

最简单最快的是转换为

numpy array
by
values
然后转换为:

df['month'] = df['purchase_date'].values.astype('datetime64[M]')print (df)   user_id       purchase_date      month0        1 2015-01-23 14:05:21 2015-01-011        2 2015-02-05 05:07:30 2015-02-012        3 2015-02-18 17:08:51 2015-02-013        4 2015-03-21 17:07:30 2015-03-014        5 2015-03-11 18:32:56 2015-03-015        6 2015-03-03 11:02:30 2015-03-01

用另一种解决方案

floor
pd.offsets.MonthBegin(0)

df['month'] = df['purchase_date'].dt.floor('d') - pd.offsets.MonthBegin(1)print (df)   user_id       purchase_date      month0        1 2015-01-23 14:05:21 2015-01-011        2 2015-02-05 05:07:30 2015-02-012        3 2015-02-18 17:08:51 2015-02-013        4 2015-03-21 17:07:30 2015-03-014        5 2015-03-11 18:32:56 2015-03-015        6 2015-03-03 11:02:30 2015-03-01

df['month'] = (df['purchase_date'] - pd.offsets.MonthBegin(1)).dt.floor('d')print (df)   user_id       purchase_date      month0        1 2015-01-23 14:05:21 2015-01-011        2 2015-02-05 05:07:30 2015-02-012        3 2015-02-18 17:08:51 2015-02-013        4 2015-03-21 17:07:30 2015-03-014        5 2015-03-11 18:32:56 2015-03-015        6 2015-03-03 11:02:30 2015-03-01

最后的解决方案是

month period
由创建
to_period

df['month'] = df['purchase_date'].dt.to_period('M')print (df)   user_id       purchase_date   month0        1 2015-01-23 14:05:21 2015-011        2 2015-02-05 05:07:30 2015-022        3 2015-02-18 17:08:51 2015-023        4 2015-03-21 17:07:30 2015-034        5 2015-03-11 18:32:56 2015-035        6 2015-03-03 11:02:30 2015-03

…然后到

datetimes
by
to_timestamp
,但速度稍慢:

df['month'] = df['purchase_date'].dt.to_period('M').dt.to_timestamp()print (df)   user_id       purchase_date      month0        1 2015-01-23 14:05:21 2015-01-011        2 2015-02-05 05:07:30 2015-02-012        3 2015-02-18 17:08:51 2015-02-013        4 2015-03-21 17:07:30 2015-03-014        5 2015-03-11 18:32:56 2015-03-015        6 2015-03-03 11:02:30 2015-03-01

解决方案很多,因此:

时间

rng = pd.date_range('1980-04-03 15:41:12', periods=100000, freq='20H')df = pd.Dataframe({'purchase_date': rng})  print (df.head())In [300]: %timeit df['month1'] = df['purchase_date'].values.astype('datetime64[M]')100 loops, best of 3: 9.2 ms per loopIn [301]: %timeit df['month2'] = df['purchase_date'].dt.floor('d') - pd.offsets.MonthBegin(1)100 loops, best of 3: 15.9 ms per loopIn [302]: %timeit df['month3'] = (df['purchase_date'] - pd.offsets.MonthBegin(1)).dt.floor('d')100 loops, best of 3: 12.8 ms per loopIn [303]: %timeit df['month4'] = df['purchase_date'].dt.to_period('M').dt.to_timestamp()1 loop, best of 3: 399 ms per loop#MaxU solutionIn [304]: %timeit df['month5'] = df['purchase_date'].dt.normalize() - pd.offsets.MonthBegin(1)10 loops, best of 3: 24.9 ms per loop#MaxU solution 2In [305]: %timeit df['month'] = df['purchase_date'] - pd.offsets.MonthBegin(1, normalize=True)10 loops, best of 3: 28.9 ms per loop#Wen solutionIn [306]: %timeit df['month6']= pd.to_datetime(df.purchase_date.astype(str).str[0:7]+'-01')1 loop, best of 3: 214 ms per loop


转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/516854.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号