## What is the time complexity of adding and retrieving strings from hashset

linkedhashset

java collections time complexity

hashset java 8

treeset

java set

hashset add time complexity

java set contains time complexity

**This question already has answers here**:

One of the best ways to answer something like this is to dig into the implementation :)

Notwithstanding some of that optimization magic described in the header of `setobject.c`

, adding an object into a set reuses hashes from strings where `hash()`

has already been once called (recall, strings are immutable), or calls the type's hash implementation.

For Unicode/bytes objects, we end up via here to `_Py_HashBytes`

, which seems to have an optimization for small strings, otherwise it uses the compile-time configured hash function, all of which naturally are somewhat O(n)-ish. But again, this seems to only happen once per string object.

For tuples, the hash implementation can be found here – apparently a simplified, non-cached xxHash.

However, once the hash has been computed, the time complexity for sets should be around O(1).

*EDIT:* A quick, not very scientific benchmark:

import time def make_string(c, n): return c * n def make_tuple(el, n): return (el,) * n def hashtest(gen, n): # First compute how long generation alone takes gen_time = time.perf_counter() for x in range(n): gen() gen_time = time.perf_counter() - gen_time # Then compute how long hashing and generation takes hash_and_gen_time = time.perf_counter() for x in range(n): hash(gen()) hash_and_gen_time = time.perf_counter() - hash_and_gen_time # Return the two return (hash_and_gen_time, gen_time) for gen in (make_string, make_tuple): for obj_length in (10000, 20000, 40000): t = f"{gen.__name__} x {obj_length}" # Using `b'hello'.decode()` here to avoid any cached hash shenanigans hash_and_gen_time, gen_time = hashtest( lambda: gen(b"hello".decode(), obj_length), 10000 ) hash_time = hash_and_gen_time - gen_time print(t, hash_time, obj_length / hash_time)

outputs

make_string x 10000 0.23490356100000004 42570.66158311665 make_string x 20000 0.47143921999999994 42423.284172241765 make_string x 40000 0.942087403 42458.905482254915 make_tuple x 10000 0.45578034300000025 21940.393335480014 make_tuple x 20000 0.9328520900000008 21439.62608263008 make_tuple x 40000 1.8562772150000004 21548.505620158674

which basically says hashing sequences, be they strings or tuples, is linear time, yet hashing strings is a lot faster than hashing tuples.

EDIT 2: this proves strings and bytestrings cache their hashes:

import time s = ('x' * 500_000_000) t0 = time.perf_counter() a = hash(s) t1 = time.perf_counter() print(t1 - t0) t0 = time.perf_counter() b = hash(s) t2 = time.perf_counter() assert a == b print(t2 - t0)

outputs

0.26157095399999997 1.201999999977943e-06

**HashSet in Java,** Objects that you insert in HashSet are not guaranteed to be inserted in same order. Objects HashSet<String> h = new HashSet<String>(); Time Complexity of HashSet Operations: The underlying data structure for HashSet is hashtable. public boolean remove (Object o) { return map.remove (o) == PRESENT; } Time Complexity of HashSet Operations: The underlying data structure for HashSet is hashtable. So amortize (average or usual case) time complexity for add, remove and look-up (contains method) operation of HashSet takes O (1) time.

Wiki is your friend

https://wiki.python.org/moin/TimeComplexity

for the operations above it seems that they are all O(1) for a `set`

**Time Complexity of Java Collections,** For HashSet, LinkedHashSet, and EnumSet the add(), remove() and contains() operations cost constant O(1) time. Thanks to the internal HashMap implementation. Likewise, the TreeSet has O(log(n)) time complexity for the operations listed for the previous group. For HashSet, LinkedHashSet, and EnumSet the add(), remove() and contains() operations cost constant O(1) time. Thanks to the internal HashMap implementation. Likewise, the TreeSet has O(log(n)) time complexity for the operations listed for the previous group.

Strictly speaking it depends on the implementation of the hash set and the way you're using it (there may be cleverness that will optimize some of the time away in specialized circumstances), but in general, yes, you should expect that it will take O(n) time to hash a key to do an insert or lookup where n is the size of the key. Usually hash sets are assumed to be O(1), but there's an implicit assumption there that the keys are of fixed size and that hashing them is a O(1) operation (in other words, there's an assumption that the key size is negligible compared to the number of items in the set).

Optimizing the storage and retrieval of *really big* chunks of data is why databases are a thing. :)

**HashSet vs. TreeSet vs. LinkedHashSet,** HashSet is Implemented using a hash table. Elements are not ordered. The add, remove, and contains methods has constant time complexity O(1). TreeSet is implemented using a tree structure(red-black tree in algorithm book). In HashSet insertion, the order is not preserved that means the insertion order of the elements is not needed to be the same as the retrieving order of the elements. This HashSet class is introduced in the earlier version of Java 1.2. We should go for HashSet if the insertion order of the elements is not important. Example:

Average case is O(1). However, the worst case is O(n), with n being the number of elements in the set. This case is caused by hashing collisions. you can read more about it in here https://www.geeksforgeeks.org/internal-working-of-set-in-python/

**Time Complexity,** This page documents the time-complexity (aka "Big O" or "Big Oh") of need to add/remove at both ends, consider using a collections.deque� The expected time complexity of adding an element to a set is O(1) which can drop to O(n) in the worst case scenario (only one bucket present) – therefore, it's essential to maintain the right HashSet's capacity.

**What is the Big-O for operations in a Hashmap?,** I.e. every time you add a new item, or get hold of an existing item, it does one calculation a searched for element needs to go (both when retrieving it as well as inserting it). integer O(1); Float O(1); String O(n) - where n is the length of the string. What is the difference between HashSet, HashMap and hash table? A Computer Science portal for geeks. It contains well written, well thought and well explained computer science and programming articles, quizzes and practice/competitive programming/company interview Questions.

**Search Algorithms in Java,** Since Binary Search divides the array into half each time its time complexity is O( log(N)). public static int[] compilePatternArray(String pattern) { int patternLength Retrieving the element from the Map using right keys to hash and a good Hashing HashSet<>(); integers.add(3); integers.add(22); integers.add(27);� Since HashSet uses HashMap to store the elements(stores element as Key and some dummy/null object as value) set.contains becomes an O(1) operation because it uses map.containsKey. To summarise-Time complexity to : check for a key in hashmap - O(1) check for a value in hashmap - O(N) check for an element in hashset - O(1)

**Hash table,** However, this introduces extra complexity into the implementation, and may cause even worse performance for smaller hash tables, where the time spent inserting� HashSet extends AbstractSet and implements the Set interface. It creates a collection that uses a hash table for storage. A hash table stores information by using a mechanism called hashing. In hashing, the informational content of a key is used to determine a unique value, called its hash code. The

##### Comments

- Short answer: No. Long Answer: It depends on how hashing strings works in python. After some research, it see that individual strings are immutable, and they store their hash value once it's been computed. That cuts down
**drastically**on lookup times... and the algorithm python uses is pretty cheap too... - Thanks for the detailed answer! So my understanding now is, computing hash for a string takes O(n) time, but this only happened once during the compile-time (somewhat similar to Java string pool). Thus it is safe to claim that we can ignore the length of the strings during set operations, thus O(1). The tuple is similar, and that's one of the reasons they need to be immutable.
- Almost. There are interned strings which are mostly those encountered during parsing code, etc., then there are regular immutable strings, for which
`hash`

is computed exactly once (but strings are not pooled). Tuples are immutable, but according to a comment near the hashing function, caching their hashes has been found not necessary. - Very clear! Thanks!
- Very intuitive answer!