这是一种向量化方法-
np.diff(np.r_[0,np.flatnonzero(np.diff(a))+1,a.size])
样品运行-
In [208]: a = np.array([0,1,1,1,0,0,0,0,0,0,0,1,0,1,1,0,0,0,1,1,0,0])In [209]: np.diff(np.r_[0,np.flatnonzero(np.diff(a))+1,a.size])Out[209]: array([1, 3, 7, 1, 1, 2, 3, 2, 2])
boolean串联速度更快-
np.diff(np.flatnonzero(np.concatenate(([True], a[1:]!= a[:-1], [True] ))))
运行时测试
对于设置,让我们创建一个更大的数据集的岛屿
0s及
1s和公平基准为与给定的样本,让我们在岛上的长度之间变化
1和
7-
In [257]: n = 100000 # thus would create 100000 pair of islandsIn [258]: a = np.repeat(np.arange(n)%2, np.random.randint(1,7,(n)))# Approach #1 proposed in this postIn [259]: %timeit np.diff(np.r_[0,np.flatnonzero(np.diff(a))+1,a.size])100 loops, best of 3: 2.13 ms per loop# Approach #2 proposed in this postIn [260]: %timeit np.diff(np.flatnonzero(np.concatenate(([True], a[1:]!= a[:-1], [True] ))))1000 loops, best of 3: 1.21 ms per loop# @Vineet Jain's soln In [261]: %timeit [ sum(1 for i in g) for k,g in groupby(a)]10 loops, best of 3: 61.3 ms per loop



