您可以使用CSS选择器来查找所需的数据。在您的情况下,
div > h3 ~div将找到
div直接在
div元素内部并由
h3元素开头的所有元素。
import bs4page= """<div ><div ><div > <h3>HEADING</h3> <div><i ></i> NAME</div> <div><i ></i> MOBILE</div> <div><i ></i> NUMBER</div> <div><i ></i> XYZ_ADDRESS</div></div></div></div>"""soup = bs4.BeautifulSoup(page, 'lxml')# find all div elements that are inside a div element# and are proceeded by an h3 elementselector = 'div > h3 ~ div'# find elements that contain the data we wantfound = soup.select(selector)# Extract data from the found elementsdata = [x.text.split(';')[-1].strip() for x in found]for x in data: print(x)编辑:刮标题中的文本。
heading = soup.find('h3') heading_data = heading.textprint(heading_data)编辑:或者,您可以通过使用如下选择器来一次获取标题和其他div元素:
div.col-lg-10 > *。这将查找
div属于
col-lg-10类的元素内的所有元素。
soup = bs4.BeautifulSoup(page, 'lxml')# find all elements inside a div element of class col-lg-10selector = 'div.col-lg-10 > *'# find elements that contain the data we wantfound = soup.select(selector)# Extract data from the found elementsdata = [x.text.split(';')[-1].strip() for x in found]for x in data: print(x)


