方法1
这是一个使用
np.unique-
_, tags, count = np.unique(labels, return_counts=1, return_inverse=1)sizes = count[tags]
方法#2
使用正数
labels,更简单,更有效地使用
np.bincount-
sizes = np.bincount(labels)[labels]
运行时测试
设置具有
60,000唯一的正数和两组这样的长度
100,000并
1000,000进行计时。
设置#1:
In [192]: np.random.seed(0) ...: labels = np.random.randint(0,60000,(100000))In [193]: %%timeit ...: sizes = np.zeros(labels.shape) ...: for num in np.unique(labels): ...: mask = labels == num ...: sizes[mask] = np.count_nonzero(mask)1 loop, best of 3: 2.32 s per loopIn [194]: %timeit np.bincount(labels)[labels]1000 loops, best of 3: 376 µs per loopIn [195]: 2320/0.376 # Speedup figureOut[195]: 6170.212765957447
设置#2:
In [196]: np.random.seed(0) ...: labels = np.random.randint(0,60000,(1000000))In [197]: %%timeit ...: sizes = np.zeros(labels.shape) ...: for num in np.unique(labels): ...: mask = labels == num ...: sizes[mask] = np.count_nonzero(mask)1 loop, best of 3: 43.6 s per loopIn [198]: %timeit np.bincount(labels)[labels]100 loops, best of 3: 5.15 ms per loopIn [199]: 43600/5.15 # Speedup figureOut[199]: 8466.019417475727



