更新 为使用更通用的方法(请参阅编辑历史记录以获取原始答案):
您可以通过测试外部div的子元素是否是的实例来提取它们
NavigableString。
from bs4 import BeautifulSoup, NavigableStringhtml = '''<div id="1"> <div id="2"> this is the text i do NOT want </div> this is the text i want here</div>'''soup = BeautifulSoup(html) outer = soup.divinner_text = [element for element in outer if isinstance(element, NavigableString)]
这将导致外部div元素中包含一个字符串列表。
>>> inner_text[u'n', u'n this is the text i want heren']>>> ''.join(inner_text)u'nn this is the text i want heren'
对于第二个示例:
html = '''<div id="1"> this is the text i want here</div>'''soup2 = BeautifulSoup(html) outer = soup2.divinner_text = [element for element in outer if isinstance(element, NavigableString)]>>> inner_text[u'n this is the text i want heren']
这也适用于其他情况,例如,外部div的text元素在任何子标签之前,在子标签之间,多个文本元素之间或根本不存在。



