工作中遇到一个问题,即找出图片当中数量占比最高的像素值。
因时间紧迫,使用两层循环嵌套 naive 的完成了需求,效率非常慢,被人吐槽(可耻!)。
故开始寻找优化方法,发现利用 numpy 可以实现这个功能。
因此本文打算从最 naive 的方法开始,逐步比较不同方法,并试图探寻 numpy 执行如此迅速的原因。
以下是本篇章节,可以根据标题选择性观看。
- naive 循环嵌套基于numpy的实现numpy高效背后的原因
naive 循环嵌套
1 载入图片;
2 遍历所有的pixel,统计每个像素值的个数;
3 然后寻找其中像素值最多的个数
这里主要耗时存在于 步骤2 中,对于所有pixel的遍历,即两次for循环的嵌套
512x512的图片,时间大概在 1.5s 左右
import cv2
import time
# load img
img = cv2.imread('./lena.jpg')
start_t = time.time()
pixel_num = {}
(w,h,_) = img.shape
# search for each pixel and record the max times
max_value = None
max_times = 0
for i in range(w):
for j in range(h):
[b,g,r] = img[i,j]
pixel = (b,g,r)
if pixel in pixel_num.keys():
pixel_num[pixel] += 1
if pixel_num[pixel] > max_times:
max_times = pixel_num[pixel]
max_value = pixel
else:
pixel_num[pixel] = 1
end_t = time.time()
print(f'the most pixel is {max_value}, occupying {max_times/float(w*h):.6f}, time: {end_t-start_t:.6f}')
基于numpy的实现
本篇是基于np.ravel_multi_index() 和 np.bincount()
np.ravel_multi_index(),作用是把多维度数据映射到一维数据
np.bincount(),作用是统计一维数据中各个元素出现的次数
512x512的图片,时间大概在 0.25s 左右,比上面naive的方法(1.5s)快了83%
import cv2
import numpy as np
import time
# load img
img = cv2.imread('./lena.jpg')
start_t = time.time()
(w,h,_) = img.shape
img_2D = np.reshape(img,(-1,3))
# 统计b,g,r 最大像素的值
b_max = np.max(img_2D[:,0])+1
g_max = np.max(img_2D[:,1])+1
r_max = np.max(img_2D[:,2])+1
img_tuple = (img_2D[:,0],img_2D[:,1],img_2D[:,2])
nbin = np.array([b_max,g_max,r_max])
# 将三维数值 双射 到一个一维数据
xy = np.ravel_multi_index(img_tuple, nbin) # 0.007s
# 统计这个数组中每个元素出现的个数
H = np.bincount(xy, minlength=nbin.prod()) # 0.055s
H = H.reshape(nbin)
# 得到最多出现像素的次数
max_num = H.max() # 0.15s
# 得到最多出现像素在结果中的像素值
max_value = np.unravel_index(np.argmax(H),H.shape)
end_t = time.time()
print(f'the most pixel is {max_value}, occupying {max_num/float(w*h):.6f}, time: {end_t-start_t:.6f}')
numpy高效背后的原因
理论上如果要找到出现次数最多的元素,必定需要遍历所有的元素,即算法复杂度为 O(WH)
但是为什么上述两者的效率可以相差70%?
记录每个步骤的时间,发现其中 np.ravel_multi_index()(0.007s) 和 np.bincount()(0.055s) 很快,得到最大值(0.15s)上花费较多时间
为了能够更好的探究效率为何如此之高的原因,查看了numpy实现 bincount 的C语言源代码
本质上也还是一个遍历,应该就是C语言的遍历比python快把
static PyObject *
arr_bincount(PyObject *NPY_UNUSED(self), PyObject *args, PyObject *kwds)
{
PyArray_Descr *type;
PyObject *list = NULL, *weight=Py_None;
PyObject *lst=NULL, *ans=NULL, *wts=NULL;
intp *numbers, *ians, len , mxi, mni, ans_size;
int i;
double *weights , *dans;
static char *kwlist[] = {"list", "weights", NULL};
if (!PyArg_ParseTupleAndKeywords(args, kwds, "O|O",
kwlist, &list, &weight)) {
goto fail;
}
if (!(lst = PyArray_ContiguousFromAny(list, PyArray_INTP, 1, 1))) {
goto fail;
}
len = PyArray_SIZE(lst);
if (len < 1) {
PyErr_SetString(PyExc_ValueError,
"The first argument cannot be empty.");
goto fail;
}
numbers = (intp *) PyArray_DATA(lst);
mxi = mxx(numbers, len);
mni = mnx(numbers, len);
if (numbers[mni] < 0) {
PyErr_SetString(PyExc_ValueError,
"The first argument of bincount must be non-negative");
goto fail;
}
ans_size = numbers [mxi] + 1;
type = PyArray_DescrFromType(PyArray_INTP);
if (weight == Py_None) {
if (!(ans = PyArray_Zeros(1, &ans_size, type, 0))) {
goto fail;
}
ians = (intp *)(PyArray_DATA(ans));
for (i = 0; i < len; i++)
ians [numbers [i]] += 1;
Py_DECREF(lst);
}
else {
if (!(wts = PyArray_ContiguousFromAny(weight, PyArray_DOUBLE, 1, 1))) {
goto fail;
}
weights = (double *)PyArray_DATA (wts);
if (PyArray_SIZE(wts) != len) {
PyErr_SetString(PyExc_ValueError,
"The weights and list don't have the same length.");
goto fail;
}
type = PyArray_DescrFromType(PyArray_DOUBLE);
if (!(ans = PyArray_Zeros(1, &ans_size, type, 0))) {
goto fail;
}
dans = (double *)PyArray_DATA (ans);
// 关键:在这里对于所有的numbers进行遍历 然后累加
for (i = 0; i < len; i++) {
dans[numbers[i]] += weights[i];
}
Py_DECREF(lst);
Py_DECREF(wts);
}
return ans;
fail:
Py_XDECREF(lst);
Py_XDECREF(wts);
Py_XDECREF(ans);
return NULL;
}



