栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 面试经验 > 面试问答

clone element with beautifulsoup

面试问答 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

clone element with beautifulsoup

There is no native clone function in BeautifulSoup in versions before 4.4
(released July 2015); you’d have to create a deep copy yourself, which is
tricky as each element maintains links to the rest of the tree.

To clone an element and all its elements, you’d have to copy all attributes
and reset their parent-child relationships; this has to happen recursively.
This is best done by not copying the relationship attributes and re-seat each
recursively-cloned element:

from bs4 import Tag, NavigableStringdef clone(el):    if isinstance(el, NavigableString):        return type(el)(el)    copy = Tag(None, el.builder, el.name, el.namespace, el.nsprefix)    # work around bug where there is no builder set    # https://bugs.launchpad.net/beautifulsoup/+bug/1307471    copy.attrs = dict(el.attrs)    for attr in ('can_be_empty_element', 'hidden'):        setattr(copy, attr, getattr(el, attr))    for child in el.contents:        copy.append(clone(child))    return copy

This method is kind-of sensitive to the current BeautifulSoup version; I
tested this with 4.3, future versions may add attributes that need to be
copied too.

You could also monkeypatch this functionality into BeautifulSoup:

from bs4 import Tag, NavigableStringdef tag_clone(self):    copy = type(self)(None, self.builder, self.name, self.namespace, self.nsprefix)    # work around bug where there is no builder set    # https://bugs.launchpad.net/beautifulsoup/+bug/1307471    copy.attrs = dict(self.attrs)    for attr in ('can_be_empty_element', 'hidden'):        setattr(copy, attr, getattr(self, attr))    for child in self.contents:        copy.append(child.clone())    return copyTag.clone = tag_cloneNavigableString.clone = lambda self: type(self)(self)

letting you call

.clone()
on elements directly:

document2.body.append(document1.find('div', id_='someid').clone())

My feature request to
the BeautifulSoup project was accepted and
tweaked
to use the

copy.copy()

function; now that
BeautifulSoup 4.4 is released you can use that version (or newer) and do:

import copydocument2.body.append(copy.copy(document1.find('div', id_='someid')))


转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/441310.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号