由于Python标准库的
struct模块是用C编写的,因此使用它非常容易而且非常快捷。
这是可以用来完成您想要的事情的方法。通过为字段中的字符数指定负值,还可以跳过字符列。
import structfieldwidths = (2, -10, 24) # negative widths represent ignored padding fieldsfmtstring = ' '.join('{}{}'.format(abs(fw), 'x' if fw < 0 else 's') for fw in fieldwidths)fieldstruct = struct.Struct(fmtstring)parse = fieldstruct.unpack_fromprint('fmtstring: {!r}, recsize: {} chars'.format(fmtstring, fieldstruct.size))line = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789n'fields = parse(line)print('fields: {}'.format(fields))输出:
fmtstring: '2s 10x 24s', recsize: 36 charsfields: ('AB', 'MNOPQRSTUVWXYZ0123456789')以下修改将使其适应于Python 2或3(并处理Unipre输入):
import structimport sysfieldstruct = struct.Struct(fmtstring)if sys.version_info[0] < 3: parse = fieldstruct.unpack_fromelse: # converts unipre input to byte string and results back to unipre string unpack = fieldstruct.unpack_from parse = lambda line: tuple(s.depre() for s in unpack(line.enpre()))
正如您所考虑的那样,这是一种处理字符串切片的方法,但担心它可能变得太丑陋。关于它的好处是,除了不那么丑陋之外,它还可以在Python
2和3中保持不变,并且能够处理Unipre字符串。在速度方面,它当然比基于
struct模块的版本慢,但是可以通过删除具有填充字段的功能来稍微加快速度。
try: from itertools import izip_longest # added in Py 2.6except importError: from itertools import zip_longest as izip_longest # name change in Py 3.xtry: from itertools import accumulate # added in Py 3.2except importError: def accumulate(iterable): 'Return running totals (simplified version).' total = next(iterable) yield total for value in iterable: total += value yield totaldef make_parser(fieldwidths): cuts = tuple(cut for cut in accumulate(abs(fw) for fw in fieldwidths)) pads = tuple(fw < 0 for fw in fieldwidths) # bool values for padding fields flds = tuple(izip_longest(pads, (0,)+cuts, cuts))[:-1] # ignore final one parse = lambda line: tuple(line[i:j] for pad, i, j in flds if not pad) # optional informational function attributes parse.size = sum(abs(fw) for fw in fieldwidths) parse.fmtstring = ' '.join('{}{}'.format(abs(fw), 'x' if fw < 0 else 's') for fw in fieldwidths) return parseline = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789n'fieldwidths = (2, -10, 24) # negative widths represent ignored padding fieldsparse = make_parser(fieldwidths)fields = parse(line)print('format: {!r}, rec size: {} chars'.format(parse.fmtstring, parse.size))print('fields: {}'.format(fields))输出:
format: '2s 10x 24s', rec size: 36 charsfields: ('AB', 'MNOPQRSTUVWXYZ0123456789')


