在集合中查找重复元素并将其分组的快速算法是什么？

您可以使用从代表性元素到其他元素的列表/向量/双端队列的映射。这要求插入容器中的比较相对较少，这意味着您可以遍历结果组，而不必执行任何比较。

此示例始终将第一个代表性元素插入映射的双端队列存储中，因为这从逻辑上简化了通过组的后续迭代，但是，如果此重复证明存在问题，则仅执行一次

push_back

仅会很容易

if(!ins_pair.second)

。

typedef std::map<Type, std::deque<Type> > Storage;void Insert(Storage& s, const Type& t){    std::pair<Storage::iterator, bool> ins_pair( s.insert(std::make_pair(t, std::deque<Type>())) );    ins_pair.first->second.push_back(t);}

这样，遍历各个组就变得相对简单且便宜：

void Iterate(const Storage& s){    for (Storage::const_iterator i = s.begin(); i != s.end(); ++i)    {        for (std::deque<Type>::const_iterator j = i->second.begin(); j != i->second.end(); ++j)        { // do something with *j        }    }}

I performed some experiments for comparison and object counts. In a test with
100000 objects in random order forming 50000 groups (i.e. and average of 2
objects per group) the above method cost the following number of comparisons
and copies:

1630674 comparisons, 443290 copies

(I tried bringing the number of copies down, but only really managed to at the
expense of comparisons which seem to be the higher cost operation in your
scenario.)

Using a multimap, and retaining the previous element in the final iteration to
detect the group transitions cost this:

1756208 comparisons, 100000 copies

Using a single list and popping the front element and performing a linear
search for other group members cost:

1885879088 comparisons, 100000 copies

Yes, that’s ~1.9b comparisons compared to ~1.6m for my best method. To get the
list method to perform anywhere near an optimal number of comparisons it would
have to be sorted and this is going to cost a similar number of comparisons as
building an inherently ordered container would in the first place.

Edit

I took your posted pre and ran the implied algorithm (I had to make some
assumptions about the pre as there as some assumed definitions) over the same
test data set as I used before and I counted:

1885879088 comparisons, 420939 copies

i.e. exactly the same number of comparisons as my dumb list algorithm, but
with more copies. I think that that means we using essentially similar
algorithms for this case. I can’t see any evidence of an alternative sort
order, but it looks like you want a list of the groups which contain more than
one equivalent elements. This can be simply achieved in my

Iterate

function
by adding in

if (i->size > 1)

clause.

I still can’t see any evidence that building a sorted container such as this
map of deques isn’t a good (even if not optimal) strategy.

在集合中查找重复元素并将其分组的快速算法是什么？

面试问答相关栏目本月热门文章