您可以
extract()先删除不需要的标签,然后再获取文字。
但是它保留了所有内容
'n',
spaces因此您需要一些工作才能删除它们。
data = '''<span> I Like <span > to punch </span> your face <span>'''from bs4 import BeautifulSoup as BSsoup = BS(data, 'html.parser')external_span = soup.find('span')print("1 HTML:", external_span)print("1 TEXT:", external_span.text.strip())unwanted = external_span.find('span')unwanted.extract()print("2 HTML:", external_span)print("2 TEXT:", external_span.text.strip())结果
1 HTML: <span> I Like <span > to punch </span> your face <span></span></span>1 TEXT: I Like to punch your face2 HTML: <span> I Like your face <span></span></span>2 TEXT: I Like your face
您可以跳过
Tag外部范围内的每个对象,而仅保留
NavigableString对象(HTML中为纯文本)。
data = '''<span> I Like <span > to punch </span> your face <span>'''from bs4 import BeautifulSoup as BSimport bs4soup = BS(data, 'html.parser')external_span = soup.find('span')text = []for x in external_span: if isinstance(x, bs4.element.NavigableString): text.append(x.strip())print(" ".join(text))结果
I Like your face



