栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 面试经验 > 面试问答

如何使用php从HTML提取img src,标题和alt?

面试问答 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

如何使用php从HTML提取img src,标题和alt?

编辑:现在我知道了

使用正则表达式解决此类问题不是一个好主意,]并且很可能导致无法维护和不可靠的代码。最好使用HTML解析器。

正则表达式解决方案

在这种情况下,最好将流程分为两部分:

  • 获取所有的img标签
  • 提取他们的元数据

我将假设您的文档不是xHTML严格的,因此您不能使用XML解析器。带有此网页源代码的EG:

preg_match_all('/<img[^>]+>/i',$html, $result);print_r($result);Array(    [0] => Array        ( [0] => <img src="/Content/Img/stackoverflow-logo-250.png" width="250" height="70" alt="logo link to homepage" /> [1] => <img  src="/content/img/vote-arrow-up.png" alt="vote up" title="This was helpful (click again to undo)" /> [2] => <img  src="/content/img/vote-arrow-down.png" alt="vote down" title="This was not helpful (click again to undo)" /> [3] => <img src="http://www.gravatar.com/avatar/df299babc56f0a79678e567e87a09c31?s=32&d=identicon&r=PG" height=32 width=32 alt="gravatar image" /> [4] => <img  src="/content/img/vote-arrow-up.png" alt="vote up" title="This was helpful (click again to undo)" />[...]        ))

然后,我们使用循环获取所有img标签属性:

$img = array();foreach( $result as $img_tag){    preg_match_all('/(alt|title|src)=("[^"]*")/i',$img_tag, $img[$img_tag]);}print_r($img);Array(    [<img src="/Content/Img/stackoverflow-logo-250.png" width="250" height="70" alt="logo link to homepage" />] => Array        ( [0] => Array     (         [0] => src="/Content/Img/stackoverflow-logo-250.png"         [1] => alt="logo link to homepage"     ) [1] => Array     (         [0] => src         [1] => alt     ) [2] => Array     (         [0] => "/Content/Img/stackoverflow-logo-250.png"         [1] => "logo link to homepage"     )        )    [<img  src="/content/img/vote-arrow-up.png" alt="vote up" title="This was helpful (click again to undo)" />] => Array        ( [0] => Array     (         [0] => src="/content/img/vote-arrow-up.png"         [1] => alt="vote up"         [2] => title="This was helpful (click again to undo)"     ) [1] => Array     (         [0] => src         [1] => alt         [2] => title     ) [2] => Array     (         [0] => "/content/img/vote-arrow-up.png"         [1] => "vote up"         [2] => "This was helpful (click again to undo)"     )        )    [<img  src="/content/img/vote-arrow-down.png" alt="vote down" title="This was not helpful (click again to undo)" />] => Array        ( [0] => Array     (         [0] => src="/content/img/vote-arrow-down.png"         [1] => alt="vote down"         [2] => title="This was not helpful (click again to undo)"     ) [1] => Array     (         [0] => src         [1] => alt         [2] => title     ) [2] => Array     (         [0] => "/content/img/vote-arrow-down.png"         [1] => "vote down"         [2] => "This was not helpful (click again to undo)"     )        )    [<img src="http://www.gravatar.com/avatar/df299babc56f0a79678e567e87a09c31?s=32&d=identicon&r=PG" height=32 width=32 alt="gravatar image" />] => Array        ( [0] => Array     (         [0] => src="http://www.gravatar.com/avatar/df299babc56f0a79678e567e87a09c31?s=32&d=identicon&r=PG"         [1] => alt="gravatar image"     ) [1] => Array     (         [0] => src         [1] => alt     ) [2] => Array     (         [0] => "http://www.gravatar.com/avatar/df299babc56f0a79678e567e87a09c31?s=32&d=identicon&r=PG"         [1] => "gravatar image"     )        )   [..]        ))

正则表达式占用大量CPU,因此您可能需要缓存此页面。如果没有缓存系统,则可以使用ob_start进行调整,并从文本文件加载/保存。

这些东西如何工作?

首先,我们使用preg_ match_ all,该函数获取与模式匹配的每个字符串并将其输出到它的第三个参数中。

正则表达式:

<img[^>]+>

我们将其应用于所有html网页。可以将其读取为 每个以“

<img
” 开头,包含非“>”字符并以>结束的字符串

(alt|title|src)=("[^"]*")

我们先后将其应用于每个img标签。可以将其读为 以“ alt”,“ title”或“ src”开头的每个字符串,然后是“
=“,然后是“”,一堆不是“”并以“”结尾的东西隔离()之间的子字符串

最后,每次您想处理正则表达式时,都拥有快速测试它们的好工具。检查此在线正则表达式测试仪。

编辑:回答第一个评论。

的确,我没有想到使用单引号的人(希望很少)。

好吧,如果仅使用’,只需将所有的’替换为’。

如果您混合两者。然后尝试使用(“ |’)代替,或使用”和[^ø]代替[^“]。



转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/390611.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号