Best HashMap initial capacity while indexing a List

java hashmap initial capacity
hashmap with fixed size
java hashmap increase size
hashmap put
declare hashmap with size
hashmap methods
initialize hashmap with arraylist
hashmap getordefault

I have a list (List<T> list) and I want to index its objects by their ids using a map (HashMap<Integer, T> map). I always use list.size() as the initial capacity in the HashMap constructor,like in the code below. Is this the best initial capacity to be used in this case?

Note: I'll never add more items to the map.

List<T> list = myList;
Map<Integer, T> map = new HashMap<Integer, T>(list.size());
for(T item : list) {
    map.put(item.getId(), item);
}

If you wish to avoid rehashing the HashMap, and you know that no other elements will be placed into the HashMap, then you must take into account the load factor as well as the initial capacity. The load factor for a HashMap defaults to 0.75.

The calculation to determine whether rehashing is necessary occurs whenever an new entry is added, e.g. put places a new key/value. So if you specify an initial capacity of list.size(), and a load factor of 1, then it will rehash after the last put. So to prevent rehashing, use a load factor of 1 and a capacity of list.size() + 1.

EDIT

Looking at the HashMap source code, it will rehash if the old size meets or exceeds the threshold, so it won't rehash on the last put. So it looks like a capacity of list.size() should be fine.

HashMap<Integer, T> map = new HashMap<Integer, T>(list.size(), 1.0);

Here's the relevant piece of HashMap source code:

void addEntry(int hash, K key, V value, int bucketIndex) {
    Entry<K,V> e = table[bucketIndex];
    table[bucketIndex] = new Entry<>(hash, key, value, e);
    if (size++ >= threshold)
        resize(2 * table.length);
}

What is the optimal capacity and load factor for a fixed-size HashMap?, A hash map is essentially a wrapper for an array containing linked lists of the items in it. Each item is allocated an index in the array based on its hash code ( the of items in the map) so it's best to avoid having to do it if at all possible. can pass the initial size of the underlying array as a argument when� Performance of HashMap depends on 2 parameters: Initial Capacity; Load Factor; Initial Capacity – It is the capacity of HashMap at the time of its creation (It is the number of buckets a HashMap can hold when the HashMap is instantiated). In java, it is 2^4=16 initially, meaning it can hold 16 key-value pairs.

The 'capacity' keyword is incorrect by definition and isn't used in the way typically expected.

By default the 'load factor' of a HashMap is 0.75, this means that when the number of entries in a HashMap reaches 75% of the capacity supplied, it will resize the array and rehash.

For example if I do:

Map<Integer, Integer> map = new HashMap<>(100);

When I am adding the 75th entry, the map will resize the Entry table to 2 * map.size() (or 2 * table.length). So we can do a few things:

  1. Change the load factor - this could impact the performance of the map
  2. Set the initial capacity to list.size() / 0.75 + 1

The best option is the latter of the two, let me explain what's going on here:

list.size() / 0.75

This will return list.size() + 25% of list.size(), for example if my list had a size of 100 it would return 133. We then add 1 to it as the map is resized if the size of it is equal to 75% of the initial capacity, so if we had a list with a size of 100 we would be setting the initial capacity to 134, this would mean that adding all 100 entries from the list would not incur any resize of the map.

End result:

Map<Integer, Integer> map = new HashMap<>(list.size() / 0.75 + 1);

Efficient Map Initialization in Java, Overview � Package; Class; Use � Tree � Deprecated � Index � Help Thus, it's very important not to set the initial capacity too high (or the load When the number of entries in the hash table exceeds the product of the load As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. This post illustrated how HashMap (or HashTable) can be implemented with an array-based linked list. You can take a look more examples on Cracking the Coding Interview by Gayle Laakmann McDowell.

Guava's Maps.newHashMapWithExpectedSize uses this helper method to calculate initial capacity for the default load factor of 0.75, based on some expected number of values:

/**
 * Returns a capacity that is sufficient to keep the map from being resized as
 * long as it grows no larger than expectedSize and the load factor is >= its
 * default (0.75).
 */
static int capacity(int expectedSize) {
    if (expectedSize < 3) {
        checkArgument(expectedSize >= 0);
        return expectedSize + 1;
    }
    if (expectedSize < Ints.MAX_POWER_OF_TWO) {
        return expectedSize + expectedSize / 3;
    }
    return Integer.MAX_VALUE; // any large value
}

reference: source

From the newHashMapWithExpectedSize documentation:

Creates a HashMap instance, with a high enough "initial capacity" that it should hold expectedSize elements without growth. This behavior cannot be broadly guaranteed, but it is observed to be true for OpenJDK 1.6. It also can't be guaranteed that the method isn't inadvertently oversizing the returned map.

HashMap (Java Platform SE 7 ), Initial Capacity. HashMap uses an array as its primary storage. Every put operation inserts a Key-Value pair at an index in this array, and every� Threshold point is a measured unit for an Object’s normal ability, beyond that point the Object does not behave as regular.In the context of HashMap the ideal load factor is 0.75.In Java API it is mentioned that if the total entries in Hash map is more then its product of load factor and initial capacity then the hash map calls it rehash function to double its capacity.

What you're doing is fine. In this way you're sure that the hash map has at least enough capacity for the initial values. If you have more information regarding the usage patterns of the hash map (example: is it updated frequently? are many new elements added frequently?), you might want to set a bigger initial capacity (for instance, list.size() * 2), but never lower. Use a profiler to determine if the initial capacity is falling short too soon.

UPDATE

Thanks to @PaulBellora for suggesting that the initial capacity should be set to (int)Math.ceil(list.size() / loadFactor) (typically, the default load factor is 0.75) in order to avoid an initial resize.

Getting the Most Out of Your HashMaps, The default initial capacity - MUST be a power of two. */ grow, it is best practice to explicitly give the size of the HashMap when creating it. For example, let's assume there are three keys with the same hashcode stored in a linked list inside a bucket: Note: Null keys always map to hash 0, thus index 0. When there is no parameter defined while creating HashMap default Initial Capacity(16) and Default load factor(0.75) will be used. This HashMap can contain up to 16 elements and resize of HashMap will occur when 13th element will be inserted.

According to the reference documentation of java.util.HashMap:

The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.

This means, if you know in advance, how many entries the HashMap should store, you can prevent rehashing by choosing an appropriate initial capacity and load factor. However:

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put).

HashMap Internal Implementation Analysis in Java, Initial capacity of an HashMap in java, load factor of an HashMap in java, i.e the capacity is increased to 25=32, 26=64, 27=128….. when the threshold is According to HashMap doc, the default load factor of 0.75f always gives best how will be work internally arraylist,linked list,hashmap,hashset ect. 26 Best HashMap initial capacity while indexing a List Apr 5 '13 17 Failed to execute 'btoa' on 'Window': The string to be encoded contains characters outside of the Latin1 range. Oct 15 '15

What Are Initial Capacity And Load Factor Of HashMap In Java?, When there is no parameter defined while creating HashMap default Initial Capacity(16) and Default load factor(0.75) will be used. In the end, HashCode returns the exact location(Index) in backing array. The best candidate for Key in HashMap would be an Immutable Class with properly implement� possible duplicate of Best HashMap initial capacity while indexing a List – Danielson Jun 24 '15 at 10:22 @Danielson actually, this question have nothing to do with that one. – Dmitry Ginzburg Jun 24 '15 at 10:23

Java HashMap Tutorial for beginners, A shorter value helps in indexing and faster searches. So it's a linked list. Initial Capacity – It is the capacity of HashMap at the time of its creation (It is the number of buckets a HashMap can hold when the HashMap is instantiated). a good idea to keep a high number of buckets in HashMap initially. ArrayList is a part of collection framework and is present in java.util package. It provides us dynamic arrays in Java. Though, it may be slower than standard arrays but can be helpful in programs where lots of manipulation in the array is needed.

HashMap in Java with Examples, Data Structures for Beginners: Arrays, HashMaps, and Lists you In JavaScript, it would automatically increase the size of the array when needed. If you know the index for the element that you are looking for, then We have an initial capacity of 2 (two buckets). we have a much better hash function!

Comments
  • I would recommend: 1) Declare your variable as Map instead of HashMap, 2) Let this kind of problems to the JVM, if you notice with a profiler that is giving your performance hits, then start evaluating it.
  • @LuiggiMendoza generally yes, agreed, but this is such a common use-case that we might as-well get rid of re-sizes
  • @Eric just look at the source and search for "resize(" . grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/… It's probably done the same way. Line 676
  • is it just me or doesn't anybody know that a load of 1.0 is a really bad idea?!
  • @rgettman: if you know how hash map work internally, you would notice that you not only screw up your inserts but also your reads. All operation would become O(N) instead of O(1) because you'll have to jump from bucket to bucket because of collisions everywhere
  • You know that you set as the correct ansower the WRONG ANSWER ?
  • Using load factor of 1 is a bad idea considering the fact that hash collisions will increase if there are less free buckets. Its a tradeoff between space and time that java designers use load factor of 0.75 as default. If you are not sure about load factor internals, do not touch this default. Now, If you go with load factor of 0.75 instead of 1, the capacity of map that shouldn't cause it to rehash can be calculated with initialCapacity = (Expected no. of elements/0.75)+1. Period.
  • Looking at the JDK source code, the actual table size is rounded up to the nearest power of 2. Also, re. your statement "By default the 'load factor' of a HashMap is 0.75, this means that when the number of entries in a HashMap reaches 75% of the capacity supplied, it will resize the array and rehash." - to be a bit pedantic, the resize only happens when the entries exceed (not reach) 75% of the capacity. So, for example, with a specified initial capacity of 64 and load factor of 0.5, you can put 32 entries in without resizing.
  • Also 100 / 0.75 = 133, not that it changes anything
  • +1. This is the simplest and the easiest solution for those who don't want to understand the internals of the map, and just want something that works as expected.
  • "the hash map has at least enough capacity for the initial values" - I don't think this is true with the default load factor of 0.75.
  • @PaulBellora the initial capacity is the same size as the one specified in the initialCapacity parameter. The load factor is a measure of how full the hash table is allowed to get before its capacity (initial or not) is automatically increased
  • Right, so with a load factor of 0.75 and an initial capacity of n, putting n values would cause it to resize.
  • @PaulBellora so you're suggesting that the initial capacity should be size()/.75 to avoid an initial resize? makes sense, I'll update my answer
  • @italo in that case both rgettman's answer and my own would be equivalent. Also, if you want to enforce the invariant that the hashmap never changes, maybe you should make it immutable using Collections.unmodifiableMap()