Hot questions for Operating MultiSet in Guava

Top 10 Java Open Source / Guava / Operating MultiSet in Guava

Simplest way to iterate through a Multiset in the order of element frequency?

Question: Consider this example which prints out some device type stats. ("DeviceType" is an enum with a dozenish values.)

Multiset<DeviceType> histogram = getDeviceStats();
for (DeviceType type : histogram.elementSet()) {
    System.out.println(type + ": " + histogram.count(type));
}

What's the simplest, most elegant way to print the distinct elements in the order of their frequency (most common type first)?

Answer: usage example of Multisets.copyHighestCountFirst() as per the question:

Multiset<DeviceType> histogram = getDeviceStats();
for (DeviceType type : Multisets.copyHighestCountFirst(histogram).elementSet()) {
    System.out.println(type + ": " + histogram.count(type));
}

Guava MultiSet vs Map?

Question: My understanding of Multiset is a set with frequency, but I can always use Map to represent the frequency, is there other reason to use Multiset?

Answer: Advantages of a Multiset<E> over a Map<E, Integer>:

  1. No special code required when adding an element that is not already in the collection.
  2. Methods for handling the count of elements directly: count(E), add(E, int), etc.
  3. The intention of the code is clearer. A Multiset<E> obviously maps the elements to their counts. A Map<E, Integer> could map the elements to arbitrary integers.

How to use Guava's Multisets.toMultiSet() when collecting a stream?

Question: I have list of strings, where each string consists of letters separated by the character ',' (comma). I want to go through the list of strings, split on comma, and calculate how many times each letter occurs, and store the result in a Multiset. Blank strings should be ignored, and the split parts should be trimmed. The multiset should be sorted on key.

The below code works, i.e., it produces the desired Multiset. However, I couldn't figure out how to use the proper collector method (Multisets.toMultiset()), so resorted to a two-step solution, using a temporary list variable, which I would like to eliminate.

I would appreciate if someone can show me how I should have constructed the call to Multisets.toMultiset() in the collect-step. I got stuck on defining the element function and the supplier function, I couldn't even make code that compiled...

@Test
public void testIt() {
    List<String> temp = Stream.of("b, c", "a", "  ", "a, c")
            .filter(StringUtils::isNotBlank)
            .map(val -> val.split(","))
            .flatMap(Arrays::stream)
            .map(String::trim)
            .collect(Collectors.toList());

    Multiset<String> multiset = ImmutableSortedMultiset.copyOf(temp);

    System.out.println("As list: " + temp);
    System.out.println("As multiset: " + multiset);
    // Output is:
    // As list: [b, c, a, a, c]
    // As multiset: [a x 2, b, c x 2]
}

I'm using Guava 28.1. Also used in the example above is the StringUtils class from commons-lang3, version 3.9

Answer: If you really want to ommit the second copy stage, there are several ways to achieve this

1. There is already an ImmatbleSortedMultiset Collector specified

.collect(ImmutableSortedMultiset.toImmutableSortedMultiset(Comparator.naturalOrder()));

2. Since you were asking how to do it with MultiSets::toMultiset

.collect(Multisets.toMultiset(Function.identity(), i -> 1, TreeMultiset::create));

3. Or you can perfectly add your own Collector implementation using the Builder

.collect(Collector.of(
    ImmutableSortedMultiset::<String>naturalOrder,
    ImmutableSortedMultiset.Builder::add,
    (b1, b2) -> {b1.addAll(b2.build()); return b1;},
    ImmutableSortedMultiset.Builder::build)
);