熊猫分组的月份和年份

您可以使用重采样或

Grouper

（在后台重采样）。

首先，请确保datetime列实际上是datetimes（用命中

pd.to_datetime

）。如果它是DatetimeIndex会更容易：

In [11]: df1Out[11]: abc  xyzDate2013-06-01  100  2002013-06-03  -20   502013-08-15   40   -52014-01-20   25   152014-02-21   60   80In [12]: g = df1.groupby(pd.Grouper(freq="M"))  # DataframeGroupBy (grouped by Month)In [13]: g.sum()Out[13]: abc  xyzDate2013-06-30   80  2502013-07-31  NaN  NaN2013-08-31   40   -52013-09-30  NaN  NaN2013-10-31  NaN  NaN2013-11-30  NaN  NaN2013-12-31  NaN  NaN2014-01-31   25   152014-02-28   60   80In [14]: df1.resample("M", how='sum')  # the sameOut[14]: abc  xyzDate2013-06-30   40  1252013-07-31  NaN  NaN2013-08-31   40   -52013-09-30  NaN  NaN2013-10-31  NaN  NaN2013-11-30  NaN  NaN2013-12-31  NaN  NaN2014-01-31   25   152014-02-28   60   80

注意：以前的

pd.Grouper(freq="M")

写为

pd.TimeGrouper("M")

。从0.21开始不推荐使用后者。

我曾以为以下方法会起作用，但不会（由于

as_index

未得到尊重？我不确定。）。出于兴趣考虑，我将其包括在内。

如果它是一列（必须是datetime64列！就像我说的那样，

to_datetime

用来打它），则可以使用PeriodIndex：

In [21]: dfOut[21]:        Date  abc  xyz0 2013-06-01  100  2001 2013-06-03  -20   502 2013-08-15   40   -53 2014-01-20   25   154 2014-02-21   60   80In [22]: pd.DatetimeIndex(df.Date).to_period("M")  # old wayOut[22]:<class 'pandas.tseries.period.PeriodIndex'>[2013-06, ..., 2014-02]Length: 5, Freq: MIn [23]: per = df.Date.dt.to_period("M")  # new way to get the sameIn [24]: g = df.groupby(per)In [25]: g.sum()  # dang not quite what we want (doesn't fill in the gaps)Out[25]:         abc  xyz2013-06   80  2502013-08   40   -52014-01   25   152014-02   60   80

为了获得理想的结果，我们必须重新索引…

熊猫分组的月份和年份

面试问答相关栏目本月热门文章