Python-压缩Ascii字符串

使用压缩并不总是会减少字符串的长度！

考虑下面的代码；

import zlibimport bz2def comptest(s):    print 'original length:', len(s)    print 'zlib compressed length:', len(zlib.compress(s))    print 'bz2 compressed length:', len(bz2.compress(s))

让我们在一个空字符串上尝试一下；

In [15]: comptest('')original length: 0zlib compressed length: 8bz2 compressed length: 14

这样就

zlib

产生了额外的8个字符和

bz2

14个字符。压缩方法通常在压缩数据前放置一个“标头”，以供解压缩程序使用。该头增加了输出的长度。

让我们测试一个单词；

In [16]: comptest('test')original length: 4zlib compressed length: 12bz2 compressed length: 40

即使减去标题的长度，压缩也不会使单词变短。这是因为在这种情况下几乎没有压缩。字符串中的大多数字符仅出现一次。现在简短一句话；

In [17]: comptest('This is a compression test of a short sentence.')original length: 47zlib compressed length: 52bz2 compressed length: 73

同样，压缩输出大于输入文本。由于文本的长度有限，因此重复很少，因此压缩效果不佳。

您需要相当长的文本块才能进行压缩，才能真正起作用。

In [22]: rings = '''   ....:     Three Rings for the Elven-kings under the sky,    ....:     Seven for the Dwarf-lords in their halls of stone,    ....:     Nine for Mortal Men doomed to die,    ....:     One for the Dark Lord on his dark throne    ....:     In the Land of Mordor where the Shadows lie.    ....:     One Ring to rule them all, One Ring to find them,    ....:     One Ring to bring them all and in the darkness bind them    ....:     In the Land of Mordor where the Shadows lie.'''In [23]: comptest(rings) original length: 410zlib compressed length: 205bz2 compressed length: 248

Python-压缩Ascii字符串

面试问答相关栏目本月热门文章