栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 面试经验 > 面试问答

为什么迭代一小串字符串比一小串列表慢?

面试问答 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

为什么迭代一小串字符串比一小串列表慢?

TL; DR

  • 对于Python 2,一旦消除了很多开销,实际的速度差异就会接近70%(或更高)。

  • 对象创建 没有 错。这两种方法都不会创建新对象,因为会缓存一个字符的字符串。

  • 区别并不明显,但可能是由于对类型和格式正确的字符串索引进行了大量检查而造成的。由于很有必要检查返回的物品,因此很有可能。

  • 列表索引非常快。



>>> python3 -m timeit '[x for x in "abc"]'1000000 loops, best of 3: 0.388 usec per loop>>> python3 -m timeit '[x for x in ["a", "b", "c"]]'1000000 loops, best of 3: 0.436 usec per loop

这与您所发现的不同。

然后,您必须使用Python 2。

>>> python2 -m timeit '[x for x in "abc"]'1000000 loops, best of 3: 0.309 usec per loop>>> python2 -m timeit '[x for x in ["a", "b", "c"]]'1000000 loops, best of 3: 0.212 usec per loop

让我们解释两个版本之间的区别。我将检查编译后的代码。

对于Python 3:

import disdef list_iterate():    [item for item in ["a", "b", "c"]]dis.dis(list_iterate)#>>>   40 LOAD_ConST    1 (<pre object <listcomp> at 0x7f4d06b118a0, file "", line 4>)#>>>    3 LOAD_ConST    2 ('list_iterate.<locals>.<listcomp>')#>>>    6 MAKE_FUNCTION 0#>>>    9 LOAD_ConST    3 ('a')#>>>   12 LOAD_ConST    4 ('b')#>>>   15 LOAD_ConST    5 ('c')#>>>   18 BUILD_LIST    3#>>>   21 GET_ITER#>>>   22 CALL_FUNCTION 1 (1 positional, 0 keyword pair)#>>>   25 POP_TOP#>>>   26 LOAD_ConST    0 (None)#>>>   29 RETURN_VALUEdef string_iterate():    [item for item in "abc"]dis.dis(string_iterate)#>>>  210 LOAD_ConST    1 (<pre object <listcomp> at 0x7f4d06b17150, file "", line 21>)#>>>    3 LOAD_ConST    2 ('string_iterate.<locals>.<listcomp>')#>>>    6 MAKE_FUNCTION 0#>>>    9 LOAD_ConST    3 ('abc')#>>>   12 GET_ITER#>>>   13 CALL_FUNCTION 1 (1 positional, 0 keyword pair)#>>>   16 POP_TOP#>>>   17 LOAD_ConST    0 (None)#>>>   20 RETURN_VALUE

您会在此处看到,由于每次都建立列表,列表变体可能会变慢。

这是

 9 LOAD_ConST   3 ('a')12 LOAD_ConST   4 ('b')15 LOAD_ConST   5 ('c')18 BUILD_LIST   3

部分。字符串变体仅具有

 9 LOAD_ConST   3 ('abc')

您可以检查一下是否确实有所不同:

def string_iterate():    [item for item in ("a", "b", "c")]dis.dis(string_iterate)#>>>  350 LOAD_ConST    1 (<pre object <listcomp> at 0x7f4d068be660, file "", line 35>)#>>>    3 LOAD_ConST    2 ('string_iterate.<locals>.<listcomp>')#>>>    6 MAKE_FUNCTION 0#>>>    9 LOAD_ConST    6 (('a', 'b', 'c'))#>>>   12 GET_ITER#>>>   13 CALL_FUNCTION 1 (1 positional, 0 keyword pair)#>>>   16 POP_TOP#>>>   17 LOAD_ConST    0 (None)#>>>   20 RETURN_VALUE

这产生了

 9 LOAD_ConST    6 (('a', 'b', 'c'))

因为元组是不可变的。测试:

>>> python3 -m timeit '[x for x in ("a", "b", "c")]'1000000 loops, best of 3: 0.369 usec per loop

太好了,赶快行动吧。

对于Python 2:

def list_iterate():    [item for item in ["a", "b", "c"]]dis.dis(list_iterate)#>>>   20 BUILD_LIST    0#>>>    3 LOAD_ConST    1 ('a')#>>>    6 LOAD_ConST    2 ('b')#>>>    9 LOAD_ConST    3 ('c')#>>>   12 BUILD_LIST    3#>>>   15 GET_ITER #>>>         >>   16 FOR_ITER     12 (to 31)#>>>   19 STORE_FAST    0 (item)#>>>   22 LOAD_FAST     0 (item)#>>>   25 LIST_APPEND   2#>>>   28 JUMP_ABSOLUTE16#>>>         >>   31 POP_TOP  #>>>   32 LOAD_ConST    0 (None)#>>>   35 RETURN_VALUEdef string_iterate():    [item for item in "abc"]dis.dis(string_iterate)#>>>   20 BUILD_LIST    0#>>>    3 LOAD_ConST    1 ('abc')#>>>    6 GET_ITER #>>>         >>    7 FOR_ITER     12 (to 22)#>>>   10 STORE_FAST    0 (item)#>>>   13 LOAD_FAST     0 (item)#>>>   16 LIST_APPEND   2#>>>   19 JUMP_ABSOLUTE 7#>>>         >>   22 POP_TOP  #>>>   23 LOAD_ConST    0 (None)#>>>   26 RETURN_VALUE

奇怪的是,我们具有 相同 的列表构建,但是这样做的速度仍然更快。Python 2的运行速度异常快。

让我们删除理解和重新计时。这

_ =
是为了防止它被优化。

>>> python3 -m timeit '_ = ["a", "b", "c"]'10000000 loops, best of 3: 0.0707 usec per loop>>> python3 -m timeit '_ = "abc"'100000000 loops, best of 3: 0.0171 usec per loop

我们可以看到初始化不足以说明版本之间的差异(这些数字很小)!因此,我们可以得出结论,Python 3的理解速度较慢。随着Python
3改变了理解范围以具有更安全的作用域,这是有道理的。

好吧,现在提高基准(我只是删除不是迭代的开销)。这通过预先分配来删除迭代器的构建:

>>> python3 -m timeit -s 'iterable = "abc"''[x for x in iterable]'1000000 loops, best of 3: 0.387 usec per loop>>> python3 -m timeit -s 'iterable = ["a", "b", "c"]' '[x for x in iterable]'1000000 loops, best of 3: 0.368 usec per loop>>> python2 -m timeit -s 'iterable = "abc"''[x for x in iterable]'1000000 loops, best of 3: 0.309 usec per loop>>> python2 -m timeit -s 'iterable = ["a", "b", "c"]' '[x for x in iterable]'10000000 loops, best of 3: 0.164 usec per loop

我们可以检查调用

iter
是否是开销:

>>> python3 -m timeit -s 'iterable = "abc"''iter(iterable)'10000000 loops, best of 3: 0.099 usec per loop>>> python3 -m timeit -s 'iterable = ["a", "b", "c"]' 'iter(iterable)'10000000 loops, best of 3: 0.1 usec per loop>>> python2 -m timeit -s 'iterable = "abc"''iter(iterable)'10000000 loops, best of 3: 0.0913 usec per loop>>> python2 -m timeit -s 'iterable = ["a", "b", "c"]' 'iter(iterable)'10000000 loops, best of 3: 0.0854 usec per loop

不,不是。差别太小,尤其是对于Python 3。

因此,让我们降低整个过程的速度,从而消除更多不必要的开销!目的只是为了有一个较长的迭代,以便时间隐藏开销。

>>> python3 -m timeit -s 'import random; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' '[x for x in iterable]'100 loops, best of 3: 3.12 msec per loop>>> python3 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' '[x for x in iterable]'100 loops, best of 3: 2.77 msec per loop>>> python2 -m timeit -s 'import random; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' '[x for x in iterable]'100 loops, best of 3: 2.32 msec per loop>>> python2 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' '[x for x in iterable]'100 loops, best of 3: 2.09 msec per loop

这实际上并没有 太大 变化,但有所帮助。

因此,消除理解。开销并不是问题的一部分:

>>> python3 -m timeit -s 'import random; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'for x in iterable: pass'1000 loops, best of 3: 1.71 msec per loop>>> python3 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'for x in iterable: pass'1000 loops, best of 3: 1.36 msec per loop>>> python2 -m timeit -s 'import random; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'for x in iterable: pass'1000 loops, best of 3: 1.27 msec per loop>>> python2 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'for x in iterable: pass'1000 loops, best of 3: 935 usec per loop

这还差不多!通过使用

deque
迭代,我们仍然可以稍微快一些。基本上相同,但是 速度更快

>>> python3 -m timeit -s 'import random; from collections import deque; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'deque(iterable, maxlen=0)'1000 loops, best of 3: 777 usec per loop>>> python3 -m timeit -s 'import random; from collections import deque; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'deque(iterable, maxlen=0)'1000 loops, best of 3: 405 usec per loop>>> python2 -m timeit -s 'import random; from collections import deque; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'deque(iterable, maxlen=0)'1000 loops, best of 3: 805 usec per loop>>> python2 -m timeit -s 'import random; from collections import deque; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'deque(iterable, maxlen=0)'1000 loops, best of 3: 438 usec per loop

令我印象深刻的是,Unipre在字节串方面具有竞争力。我们可以通过尝试在

bytes
unipre
两者中进行显式检查:

  • bytes

    >>> python3 -m timeit -s 'import random; from collections import deque; iterable = b"".join(chr(random.randint(0, 127)).enpre("ascii") for _ in range(100000))' 'deque(iterable, maxlen=0)'  :(

    1000 loops, best of 3: 571 usec per loop

    python3 -m timeit -s ‘import random; from collections import deque; iterable = [chr(random.randint(0, 127)).enpre(“ascii”) for _ in range(100000)]’ ‘deque(iterable, maxlen=0)’
    1000 loops, best of 3: 394 usec per loop

    >>> python2 -m timeit -s 'import random; from collections import deque; iterable = b"".join(chr(random.randint(0, 127))      for _ in range(100000))' 'deque(iterable, maxlen=0)'

    1000 loops, best of 3: 757 usec per loop

    python2 -m timeit -s ‘import random; from collections import deque; iterable = [chr(random.randint(0, 127)) for _ in range(100000)]’ ‘deque(iterable, maxlen=0)’
    1000 loops, best of 3: 438 usec per loop

在这里,您可以看到Python 3实际上比Python 2 更快

  • unipre

    >>> python3 -m timeit -s 'import random; from collections import deque; iterable = u"".join(   chr(random.randint(0, 127)) for _ in range(100000))' 'deque(iterable, maxlen=0)'

    1000 loops, best of 3: 800 usec per loop

    python3 -m timeit -s ‘import random; from collections import deque; iterable = [ chr(random.randint(0, 127)) for _ in range(100000)]’ ‘deque(iterable, maxlen=0)’
    1000 loops, best of 3: 394 usec per loop

    >>> python2 -m timeit -s 'import random; from collections import deque; iterable = u"".join(unichr(random.randint(0, 127)) for _ in range(100000))' 'deque(iterable, maxlen=0)'

    1000 loops, best of 3: 1.07 msec per loop

    python2 -m timeit -s ‘import random; from collections import deque; iterable = [unichr(random.randint(0, 127)) for _ in range(100000)]’ ‘deque(iterable, maxlen=0)’
    1000 loops, best of 3: 469 usec per loop

同样,Python 3更快,尽管这是可以预料的(

str
在Python 3中引起了很多关注)。

实际上,这

unipre
-
bytes
差异很小,令人印象深刻。

因此,让我们分析一下这种情况,因为它对我来说既快速又方便:

>>> python3 -m timeit -s 'import random; from collections import deque; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'deque(iterable, maxlen=0)'1000 loops, best of 3: 777 usec per loop>>> python3 -m timeit -s 'import random; from collections import deque; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'deque(iterable, maxlen=0)'1000 loops, best of 3: 405 usec per loop

实际上,我们可以排除蒂姆·彼得(Tim Peter)提出10次支持的答案!

>>> foo = iterable[123]>>> iterable[36] is fooTrue

这些不是新对象!

但这值得一提:索引 成本 。区别可能在于索引,因此删除迭代并仅索引:

>>> python3 -m timeit -s 'import random; iterable = "".join(chr(random.randint(0, 127)) for _ in range(100000))' 'iterable[123]'10000000 loops, best of 3: 0.0397 usec per loop>>> python3 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'iterable[123]'10000000 loops, best of 3: 0.0374 usec per loop

差异似乎很小,但是 至少 一半的成本是间接费用:

>>> python3 -m timeit -s 'import random; iterable =        [chr(random.randint(0, 127)) for _ in range(100000)]' 'iterable; 123'100000000 loops, best of 3: 0.0173 usec per loop

因此速度差异足以决定归咎于它。我认为。

那么为什么索引列表这么快呢?

好吧,我会回来给你这一点,但我的猜测是的是倒在支票 实习
字符串(或缓存的字符,如果它是一个独立的机构)。这将不如最佳速度快。但是我会去检查源代码(尽管我对C语言不太满意):)。


所以这是来源:

static PyObject *unipre_getitem(PyObject *self, Py_ssize_t index){    void *data;    enum PyUnipre_Kind kind;    Py_UCS4 ch;    PyObject *res;    if (!PyUnipre_Check(self) || PyUnipre_READY(self) == -1) {        PyErr_BadArgument();        return NULL;    }    if (index < 0 || index >= PyUnipre_GET_LENGTH(self)) {        PyErr_SetString(PyExc_IndexError, "string index out of range");        return NULL;    }    kind = PyUnipre_KIND(self);    data = PyUnipre_DATA(self);    ch = PyUnipre_READ(kind, data, index);    if (ch < 256)        return get_latin1_char(ch);    res = PyUnipre_New(1, ch);    if (res == NULL)        return NULL;    kind = PyUnipre_KIND(res);    data = PyUnipre_DATA(res);    PyUnipre_WRITE(kind, data, 0, ch);    assert(_PyUnipre_CheckConsistency(res, 1));    return res;}

从顶部走,我们将进行一些检查。这些无聊。然后一些分配,这也应该很无聊。第一个有趣的行是

ch = PyUnipre_READ(kind, data, index);

但是我们 希望
这很快,因为我们正在通过索引从连续的C数组读取数据。结果

ch
小于256,因此我们将在中返回缓存的字符
get_latin1_char(ch)

因此,我们将运行(删除第一个检查)

kind = PyUnipre_KIND(self);data = PyUnipre_DATA(self);ch = PyUnipre_READ(kind, data, index);return get_latin1_char(ch);

哪里

#define PyUnipre_KIND(op)     (assert(PyUnipre_Check(op)),      assert(PyUnipre_IS_READY(op)),      ((PyASCIIObject *)(op))->state.kind)

(这很无聊,因为断言在调试中会被忽略[因此我可以检查它们是否很快],

((PyASCIIObject*)(op))->state.kind)
并且(我认为)是间接调用和C级强制转换);

#define PyUnipre_DATA(op)     (assert(PyUnipre_Check(op)),      PyUnipre_IS_COMPACT(op) ? _PyUnipre_COMPACT_DATA(op) :        _PyUnipre_NONCOMPACT_DATA(op))

(由于类似的原因,这也很无聊,假设宏(

Something_CAPITALIZED
)都很快),

#define PyUnipre_READ(kind, data, index)     ((Py_UCS4)     ((kind) == PyUnipre_1BYTE_KIND ?         ((const Py_UCS1 *)(data))[(index)] :         ((kind) == PyUnipre_2BYTE_KIND ?  ((const Py_UCS2 *)(data))[(index)] :  ((const Py_UCS4 *)(data))[(index)]         )     ))

(涉及索引,但实际上一点也不慢),并且

static PyObject*get_latin1_char(unsigned char ch){    PyObject *unipre = unipre_latin1[ch];    if (!unipre) {        unipre = PyUnipre_New(1, ch);        if (!unipre) return NULL;        PyUnipre_1BYTE_DATA(unipre)[0] = ch;        assert(_PyUnipre_CheckConsistency(unipre, 1));        unipre_latin1[ch] = unipre;    }    Py_INCREF(unipre);    return unipre;}

这证实了我的怀疑:

  • 这被缓存:

    PyObject *unipre = unipre_latin1[ch];
  • 这应该很快。在

    if (!unipre)
    没有运行,所以它是在这种情况下相当于字面上

    PyObject *unipre = unipre_latin1[ch];

    Py_INCREF(unipre);
    return unipre;

坦白地说,在测试

assert
s之后(通过禁用它们[我 认为 它可以在C级断言上运行…]),只有看起来很慢的部分是:

PyUnipre_IS_COMPACT(op)_PyUnipre_COMPACT_DATA(op)_PyUnipre_NONCOMPACT_DATA(op)

哪个是:

#define PyUnipre_IS_COMPACT(op)     (((PyASCIIObject*)(op))->state.compact)

(和以前一样快),

#define _PyUnipre_COMPACT_DATA(op)              (PyUnipre_IS_ASCIi(op) ?             ((void*)((PyASCIIObject*)(op) + 1)) :        ((void*)((PyCompactUnipreObject*)(op) + 1)))

(如果宏

IS_ASCII
很快,则很快),以及

#define _PyUnipre_NONCOMPACT_DATA(op)           (assert(((PyUnipreObject*)(op))->data.any),             ((((PyUnipreObject *)(op))->data.any)))

(也很快速,因为它是断言,间接寻址和强制转换)。

因此,我们进入(兔子洞)以:

PyUnipre_IS_ASCII

这是

#define PyUnipre_IS_ASCIi(op)            (assert(PyUnipre_Check(op)),          assert(PyUnipre_IS_READY(op)),       ((PyASCIIObject*)op)->state.ascii)

嗯…似乎也很快…


好吧,但让我们将其与进行比较

PyList_GetItem
。(是的, 感谢 蒂姆·彼得斯(Tim Peters)为我提供了更多的工作要做:P。)

PyObject *PyList_GetItem(PyObject *op, Py_ssize_t i){    if (!PyList_Check(op)) {        PyErr_BadInternalCall();        return NULL;    }    if (i < 0 || i >= Py_SIZE(op)) {        if (indexerr == NULL) { indexerr = PyUnipre_FromString(     "list index out of range"); if (indexerr == NULL)     return NULL;        }        PyErr_SetObject(PyExc_IndexError, indexerr);        return NULL;    }    return ((PyListObject *)op) -> ob_item[i];}

我们可以看到,在非错误情况下,这只会运行:

PyList_Check(op)Py_SIZE(op)((PyListObject *)op) -> ob_item[i]

哪里

PyList_Check

#define PyList_Check(op)      PyType_FastSubclass(Py_TYPE(op), Py_TPFLAGS_LIST_SUBCLASS)

TABS!TABS
!!!

)(issue21587)
5分钟内
修复并合并。就像…是的。该死的。他们让Skeet感到羞耻。

#define Py_SIZE(ob)  (((PyVarObject*)(ob))->ob_size)#define PyType_FastSubclass(t,f)  PyType_HasFeature(t,f)#ifdef Py_LIMITED_API#define PyType_HasFeature(t,f)  ((PyType_GetFlags(t) & (f)) != 0)#else#define PyType_HasFeature(t,f)  (((t)->tp_flags & (f)) != 0)#endif

因此,除非

Py_LIMITED_API
启用,否则通常这确实是微不足道的(两个间接调用和几个布尔检查)… ???

然后是索引和强制转换(

((PyListObject *)op) -> ob_item[i]
),我们完成了。

因此,对列表 检查肯定会 更少 ,并且速度差异很小肯定意味着它可能是相关的。


我认为通常来说,

(->)
Unipre的类型检查和间接性更多。似乎我遗漏了一点,但是



转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/611952.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号