In a project I’m currently working on I had to compare two Java Collections to see if they contained the same elements, ignoring element order. The usual place for such general Collection-related library functions is java.util.Collections, however a fitting method was not found there.

## Apache Commons Algorithm

An answer on Stack Overflow pointed me to a method in an Apache Commons library which did exactly what I wanted. The method is called `isEqualCollection`

and the implementation I looked at can be found in Apache Commons Collections 4.0. The Apache library uses this algorithm to compare two collections A and B:

- Test that both A and B have equal size, if the sizes differ then A is not equal to B and we are done.
- Build a frequency table for collection A.
- Build a frequency table for collection B.
- Test that both frequency tables have the same number of rows, if not then A is not equal to B and we are done.
- For each element in the frequency table of A, check that the element has the same frequency as in B (by lookup in the frequency table of B).

## Improved Algorithm

**Disclaimer** *This article focuses on the Apache Commons algorithm. I have not looked at any other implementations. I am sure that others have already thought about the same improvements. You should see this only as an exercise I did for fun, ***not** an optimal implementation by any means. A complete analysis would have required much more work.

The frequency tables give the number of times an element x occurs in each collection. We can improve the memory usage of the algorithm by using a single frequency table rather than two. Let `fA(x)`

be the number of occurrences of x in A, and let `fB(x)`

be the number of occurrences of x in B. We only need to compute `d(x) = fA(x)-fB(x)`

. If there is any x for which `d(x)`

is not zero, then x occurs more times in A (`d(x) > 0`

) or B (`d(x) < 0`

). The modified algorithm stores `d(x)`

in a frequency table which is initialized by counting occurrences of elements in A, as in step 2 above, then for each element in B we subtract one from the frequency count of the element.

Using a single frequency table still requires iteration over both collections in order to populate the frequency table correctly, but we now have an easy way to add add a fail-fast condition while iterating over elements in B. If we notice that the frequency of any element drops below zero then we know that the element occurs more often in B. This new fail-fast condition may improve the average running time, but it does not affect the worst-case running time.

A slight improvement in worst-case running time is gained by the removal of the table lookup in the final loop. The final loop now only needs to iterate over frequency differences and see if any difference is non-zero.

Let’s review the improved algorithm:

- Test that both A and B have equal size, if the sizes differ then A is not equal to B and we are done.
- Build a frequency table for collection A.
- For each element x in B:
- subtract one from row x in the frequency table
- if the frequency of x is now negative, we are done (A is not equal to B)

- For each row x in the frequency difference table, check that the frequency difference is zero.

## Implementation

Below is my implementation of the improved algorithm. Note that I use a `HashMap`

to implement the frequency table, it may be replaced by an `IdentityHashMap`

if you want to use reference equality rather than object equality as the element equality condition.

```
boolean isEqualCollection(Collection<?> a, Collection<?> b) {
if (a.size() != b.size()) {
return false;
}
Map<Object,Integer> map = new HashMap<Object,Integer>();
for (Object o: a) {
Integer val = map.get(o);
int count;
if (val != null) {
count = val.intValue();
} else {
count = 0;
}
map.put(o, Integer.valueOf(count+1));
}
for (Object o: b) {
Integer val = map.get(o);
int count;
if (val != null) {
count = val.intValue();
if (count == 0) {
return false;
}
} else {
return false;
}
map.put(o, Integer.valueOf(count-1));
}
for (Integer count: map.values()) {
if (count.intValue() != 0) {
return false;
}
}
return true;
}
```

In conclusion, the above implementation is slightly more efficient than the Apache Commons implementation because it uses less memory (by using only one `HashMap`

, rather than two), avoids redundant map lookups (in the final loop), and has an additional fail-fast condition.