您可以在中使用该
TimeGrouper功能
groupy/apply。使用a,
TimeGrouper您无需创建期间列。我知道您不是要计算平均值,但我将以它为例:
>>> df.groupby(pd.TimeGrouper('5Min'))['val'].mean()time2014-04-03 16:00:00 14390.0000002014-04-03 16:05:00 14394.3333332014-04-03 16:10:00 14396.500000或一个带有显式的示例
apply:
>>> df.groupby(pd.TimeGrouper('5Min'))['val'].apply(lambda x: len(x) > 3)time2014-04-03 16:00:00 False2014-04-03 16:05:00 False2014-04-03 16:10:00 TrueDoctstring用于
TimeGrouper:
Docstring for resample:class TimeGrouper@21TimeGrouper(self, freq = 'Min', closed = None, label = None,how = 'mean', nperiods = None, axis = 0, fill_method = None,limit = None, loffset = None, kind = None, convention = None, base = 0,**kwargs)Custom groupby class for time-interval groupingParameters----------freq : pandas date offset or offset alias for identifying bin edgesclosed : closed end of interval; left or rightlabel : interval boundary to use for labeling; left or rightnperiods : optional, integerconvention : {'start', 'end', 'e', 's'} If axis is PeriodIndexNotes-----Use begin, end, nperiods to generate intervals that cannot be deriveddirectly from the associated object编辑
我不知道创建周期列的一种优雅方法,但是以下方法可以工作:
>>> new = df.groupby(pd.TimeGrouper('5Min'),as_index=False).apply(lambda x: x['val'])>>> df['period'] = new.index.get_level_values(0)>>> df id val periodtime2014-04-03 16:01:53 23 14389 02014-04-03 16:01:54 28 14391 0 2014-04-03 16:05:55 24 14393 12014-04-03 16:06:25 23 14395 12014-04-03 16:07:01 23 14395 12014-04-03 16:10:09 23 14395 22014-04-03 16:10:23 26 14397 22014-04-03 16:10:57 26 14397 22014-04-03 16:11:10 26 14397 2之所以起作用,是因为groupby这里的as_index =
False实际返回了您想要的期间列作为多索引的一部分,而我只是抓住了多索引的那一部分并分配给原始数据帧中的新列。您可以在apply中做任何事情,我只需要索引:
>>> new time0 2014-04-03 16:01:53 14389 2014-04-03 16:01:54 143911 2014-04-03 16:05:55 14393 2014-04-03 16:06:25 14395 2014-04-03 16:07:01 143952 2014-04-03 16:10:09 14395 2014-04-03 16:10:23 14397 2014-04-03 16:10:57 14397 2014-04-03 16:11:10 14397>>> new.index.get_level_values(0)Int64Index([0, 0, 1, 1, 1, 2, 2, 2, 2], dtype='int64')



