如何基于pandas时间序列中的5分钟间隔创建组ID？

您可以在中使用该

TimeGrouper

功能

groupy/apply

。使用a，

TimeGrouper

您无需创建期间列。我知道您不是要计算平均值，但我将以它为例：

>>> df.groupby(pd.TimeGrouper('5Min'))['val'].mean()time2014-04-03 16:00:00    14390.0000002014-04-03 16:05:00    14394.3333332014-04-03 16:10:00    14396.500000

或一个带有显式的示例

apply

：

>>> df.groupby(pd.TimeGrouper('5Min'))['val'].apply(lambda x: len(x) > 3)time2014-04-03 16:00:00    False2014-04-03 16:05:00    False2014-04-03 16:10:00     True

Doctstring用于

TimeGrouper

：

Docstring for resample:class TimeGrouper@21TimeGrouper(self, freq = 'Min', closed = None, label = None,how = 'mean', nperiods = None, axis = 0, fill_method = None,limit = None, loffset = None, kind = None, convention = None, base = 0,**kwargs)Custom groupby class for time-interval groupingParameters----------freq : pandas date offset or offset alias for identifying bin edgesclosed : closed end of interval; left or rightlabel : interval boundary to use for labeling; left or rightnperiods : optional, integerconvention : {'start', 'end', 'e', 's'}    If axis is PeriodIndexNotes-----Use begin, end, nperiods to generate intervals that cannot be deriveddirectly from the associated object

编辑

我不知道创建周期列的一种优雅方法，但是以下方法可以工作：

>>> new = df.groupby(pd.TimeGrouper('5Min'),as_index=False).apply(lambda x: x['val'])>>> df['period'] = new.index.get_level_values(0)>>> df          id    val  periodtime2014-04-03 16:01:53  23  14389       02014-04-03 16:01:54  28  14391       0 2014-04-03 16:05:55  24  14393       12014-04-03 16:06:25  23  14395       12014-04-03 16:07:01  23  14395       12014-04-03 16:10:09  23  14395       22014-04-03 16:10:23  26  14397       22014-04-03 16:10:57  26  14397       22014-04-03 16:11:10  26  14397       2

之所以起作用，是因为groupby这里的as_index =
False实际返回了您想要的期间列作为多索引的一部分，而我只是抓住了多索引的那一部分并分配给原始数据帧中的新列。您可以在apply中做任何事情，我只需要索引：

>>> new   time0  2014-04-03 16:01:53    14389   2014-04-03 16:01:54    143911  2014-04-03 16:05:55    14393   2014-04-03 16:06:25    14395   2014-04-03 16:07:01    143952  2014-04-03 16:10:09    14395   2014-04-03 16:10:23    14397   2014-04-03 16:10:57    14397   2014-04-03 16:11:10    14397>>>  new.index.get_level_values(0)Int64Index([0, 0, 1, 1, 1, 2, 2, 2, 2], dtype='int64')

如何基于pandas时间序列中的5分钟间隔创建组ID？

面试问答相关栏目本月热门文章