栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 面试经验 > 面试问答

在集合中查找重复元素并将其分组的快速算法是什么?

面试问答 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

在集合中查找重复元素并将其分组的快速算法是什么?

您可以使用从代表性元素到其他元素的列表/向量/双端队列的映射。这要求插入容器中的比较相对较少,这意味着您可以遍历结果组,而不必执行任何比较。

此示例始终将第一个代表性元素插入映射的双端队列存储中,因为这从逻辑上简化了通过组的后续迭代,但是,如果此重复证明存在问题,则仅执行一次

push_back
仅会很容易
if(!ins_pair.second)

typedef std::map<Type, std::deque<Type> > Storage;void Insert(Storage& s, const Type& t){    std::pair<Storage::iterator, bool> ins_pair( s.insert(std::make_pair(t, std::deque<Type>())) );    ins_pair.first->second.push_back(t);}

这样,遍历各个组就变得相对简单且便宜:

void Iterate(const Storage& s){    for (Storage::const_iterator i = s.begin(); i != s.end(); ++i)    {        for (std::deque<Type>::const_iterator j = i->second.begin(); j != i->second.end(); ++j)        { // do something with *j        }    }}

I performed some experiments for comparison and object counts. In a test with
100000 objects in random order forming 50000 groups (i.e. and average of 2
objects per group) the above method cost the following number of comparisons
and copies:

1630674 comparisons, 443290 copies

(I tried bringing the number of copies down, but only really managed to at the
expense of comparisons which seem to be the higher cost operation in your
scenario.)

Using a multimap, and retaining the previous element in the final iteration to
detect the group transitions cost this:

1756208 comparisons, 100000 copies

Using a single list and popping the front element and performing a linear
search for other group members cost:

1885879088 comparisons, 100000 copies

Yes, that’s ~1.9b comparisons compared to ~1.6m for my best method. To get the
list method to perform anywhere near an optimal number of comparisons it would
have to be sorted and this is going to cost a similar number of comparisons as
building an inherently ordered container would in the first place.

Edit

I took your posted pre and ran the implied algorithm (I had to make some
assumptions about the pre as there as some assumed definitions) over the same
test data set as I used before and I counted:

1885879088 comparisons, 420939 copies

i.e. exactly the same number of comparisons as my dumb list algorithm, but
with more copies. I think that that means we using essentially similar
algorithms for this case. I can’t see any evidence of an alternative sort
order, but it looks like you want a list of the groups which contain more than
one equivalent elements. This can be simply achieved in my

Iterate
function
by adding in
if (i->size > 1)
clause.

I still can’t see any evidence that building a sorted container such as this
map of deques isn’t a good (even if not optimal) strategy.



转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/394792.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号