查找带有Beautifulsoup的特定链接

面试问答更新时间：2026-04-03 20:22:34 发布时间：1572天前 IT归档最新发布模块sitemap 名妆网法律咨询聚返吧英语巴士网伯小乐网商动力

首先设置一个测试文档，并使用BeautifulSoup打开解析器：

>>> from BeautifulSoup import BeautifulSoup>>> doc = '<html><body><div><a href="something">yep</a></div><div><a href="http://www.nhl.com/ice/boxscore.htm?id=3">somelink</a></div><a href="http://www.nhl.com/ice/boxscore.htm?id=7">another</a></body></html>'>>> soup = BeautifulSoup(doc)>>> print soup.prettify()<html> <body>  <div>   <a href="something">    yep   </a>  </div>  <div>   <a href="http://www.nhl.com/ice/boxscore.htm?id=3">    somelink   </a>  </div>  <a href="http://www.nhl.com/ice/boxscore.htm?id=7">   another  </a> </body></html>

接下来，我们可以搜索所有

<a>

以

href

属性开头的标签

http://www.nhl.com/ice/boxscore.htm?id=

。您可以为其使用正则表达式：

>>> import re>>> soup.findAll('a', href=re.compile('^http://www.nhl.com/ice/boxscore.htm?id='))[<a href="http://www.nhl.com/ice/boxscore.htm?id=3">somelink</a>, <a href="http://www.nhl.com/ice/boxscore.htm?id=7">another</a>]

转载请注明：文章转载自 www.mshxw.com

本文地址：https://www.mshxw.com/it/661060.html

上一篇在python 3中禁用异常链接

下一篇如何使用Python挂载文件系统？

面试问答相关栏目本月热门文章

关于我们文章归档网站地图联系我们