相信很多人有见到过下边这个问题,作为python初学者会觉得实现过程难以理解,在网上虽然有代码,但有些细节还是让我困惑,所以将我学习的过程分享出来,希望能帮到遇到同样问题的初学者,我就不在问题上多做解释,主要是代码
1.问题描述你要给纯文本文件添加格式。假设你要将一个文件用作网页,而给你文 件的人嫌麻烦,没有以HTML格式编写它。你不想手工添加需要的所有 标签,想编写一个程序来自动完成这项工作
2.大致思路要想在文本插入标签,首先要将文本分成文本块,生成一个文本块的集合,
3.准备工作一个纯文本文档(test_input.txt)
Welcome to World Wide Spam, Inc.
These are the corporate web pages of World Wide Spam, Inc. We hope you find your stay enjoyable, and that you will sample many of our products.
A short history of the company
World Wide Spam was started in the summer of 2000. The business concept was to ride the dot-com wave and to make money both through bulk email and by selling canned meat online.
After receiving several complaints from customers who weren’t satisfied by their bulk email, World Wide Spam altered their profile, and focused 100% on canned goods. Today, they rank as the world’s 13,892nd online supplier of SPAM.
Destinations
From this page you may visit several of our interesting web pages:
- What is SPAM? (http://wwspam.fu/whatisspam)
- How do they make it? (http://wwspam.fu/howtomakeit)
- Why should I eat it? (http://wwspam.fu/whyeatit)
How to get in touch with us
You can get in touch with us in many ways: By phone (555-1234), by email (wwspam@wwspam.fu) or by visiting our customer feedback page (http://wwspam.fu/feedback).
首先要做的事情之一是将文本分成段落,也就是文本块
一个文本块生成器(util.py)(注意理解,这句话是说这段代码是要将文本变成很多文本块)
def lines(file):
for line in file:
yield line
yield 'n' # 结尾加空行,以此判断文本结束了
def blocks(file):
block = [] # 一个空的列表,用来保存读取到的字符
for line in lines(file):
if line.strip(): # 去掉字符两端的空格和换行
block.append(line)
elif block:
yield ''.join(block).strip() # 将列表里的字符连接成字符串
block = []
作为新手(主要是笨~),这段代码确实让我看了很久(都不好意思说出来)
一步一步解释:
首先def lines(file):输出的都是单个字符,通过block.append(line)追加到列表,然后通过yield ''.join(block).strip()连接成一个字符串,这就是一个文本块,然后将列表清空,再次循环,这样每次调用blocks时都能得到一个两端没有换行和空格的‘干净的’字符串
其次,if line.strip():这里的判断条件,如果line是一个字符,将它append,如果是line是换行或空格,就yield ''.join(block).strip(),同时确定了这是这个文本块的结尾,这也解释了def lines(file):yield 'n',不得不说yield真是神奇
先写到这里



