Loading...
Abstract:
Some Map implementations allow null keys and values. This leads to funky behaviour when calling putIfAbsent() and the compute functions. In this newsletter we look a bit more closely at the issues at hand when allowing nulls in maps.
Welcome to the 303rd edition of The Java(tm) Specialists' Newsletter, sent from the wonderful Island of Crete. Last month we had a small version of JCrete, with some of the folks staying on for a few days. We socialized like we haven't for two years, and as a result, I missed the July edition of this newsletter. But we are back!
javaspecialists.teachable.com: Please visit our new self-study course catalog to see how you can upskill your Java knowledge.
A few days ago, my friend Dmitry Vyazelenko mentioned the strange behavior of nulls in maps. That tweet inspired this newsletter. Thanks Dmitry :-)
When I started coding in Java, the only collections we had available were Vector and Hashtable. Anything else, we had to write ourselves. Well, there was also Stack, a subclass of Vector, and Properties, a subclass of Hashtable. But we did not use them much. At least in those days job interviews were quick: "If you needed to store key/value pairs, which data structure would you use?" "Um, Vector?" "Sorry, next candidate please."
Soon a collection framework was created to fill the gap. It was based on the standard template library for C++ and was named JGL by a company called ObjectSpace. It had some cool features like one-to-many mappings in maps, similar to Eclipse Collection's Multimap. We used JGL for a while, but when Java 1.2 was released, switched over to the Java Collections framework.
Old Hashtable had a quirk that we could insert neither a null key nor a null value. If we desperately wanted to add null, we would create a dummy object representing null and add that instead, and then convert that back to null if we found it. It was messy, and we wished for the ability to add null. Be careful what you wish for, it may be granted!
The Map interface specified that an implementation might accept null keys and values, but it may also reject them. Thus we might see NullPointerExceptions on put() or get(). When the Map interface was introduced in Java 1.2, we had four Map implementations with five different behaviours. What fun:
In Java 1.4, LinkedHashMap and IdentityHashMap were added, and these behaved the same as HashMap.
In Java 5, several things happened. First off, generics were added to the language. These were sneaked in without breaking backwards compatibility. Also a ConcurrentHashMap was added, which had much faster read performance than the Hashtable. The ConcurrentHashMap used an internal lock, thus we could no longer participate in the locking from outside. Hashtable synchronized on "this". With ConcurrentHashMap, any compound actions would need to be supported by the map itself. However, in those days we did not have default methods in interfaces, and so a new interface was born, the ConcurrentMap. The most useful of these compound methods was putIfAbsent(). The only public implementations of the ConcurrentMap in the JDK are the ConcurrentHashMap from Java 5, and later the ConcurrentSkipListMap from Java 6. They both throw a NullPointerException if we attempt to add a null key or value. However, the Javadoc for ConcurrentMap does not prevent an implementation from accepting null, but when the map does allow null, the behavior becomes a bit funky. The meaning of "absent" is that the key either does not exist at all or that the value is null. Thus the following two snippets of code are not equivalent if we can have null values:
if (!map.containsKey(-1)) map.put(-1, 0L); if (map.get(-1) == null) map.put(-1, 0L);However, if we want to be pedantic, only the second one can be converted to putIfAbsent() without change in semantics. For example, if we have put (-1, null) into the map, then the first code snippet would not overwrite it, but the second would.
We see here that if the map contains a mapping to (-1, null), then it is not overwritten in the first if statement, but it is in the second:
public class PutIfAbsentFun { public static void main(String... args) { Map<Integer, Long> map = HashMap.newHashMap(1); // Java 19 map.put(-1, null); // start with a null value System.out.println("map = " + map); // {-1=null} if (!map.containsKey(-1)) map.put(-1, 0L); System.out.println("map = " + map); // {-1=null} if (map.get(-1) == null) map.put(-1, 0L); System.out.println("map = " + map); // {-1=0} } }At first glance, it seems that computeIfAbsent() works like putIfAbsent(). If the value is null, then the computation is done. However, there is a subtlety. If the computation function returns null, then the entry is not inserted into the map. And if we use computeIfPresent() or compute() and their function return null, then the entry is removed from the map. This behaviour makes sense if maps do not allow null keys and values. But it can get confusing when they do.
public class ComputeIfFun { public static void main(String... args) { Map<Integer, Long> map = HashMap.newHashMap(1); System.out.println("map = " + map); // {} map.computeIfAbsent(-1, key -> { System.out.println("computing the value"); return null; // this will not add an entry into the map }); System.out.println("map = " + map); // {} map.computeIfAbsent(-1, key -> { System.out.println("computing the value"); return 0L; // the map now contains -1=0L }); System.out.println("map = " + map); // {-1=0L} map.computeIfPresent(-1, (key, value) -> { System.out.println("computing the value again"); return null; // this removes the entry from the map }); System.out.println("map = " + map); // {} map.put(-1, null); System.out.println("map = " + map); // {-1=null} map.compute(-1, (key, value) -> null); // removes the entry System.out.println("map = " + map); } }The merge() function is subtly different, yet still mostly the same. The value parameter cannot be null, otherwise we get a NullPointerException. However, if the value in the map is null, then the merge() function is not called and instead, the value is simply added. Similarly to the compute() methods, if our function returns null, then the entry is removed. Here is an example:
public class MergeFun { public static void main(String... args) { Map<Integer, Long> map = HashMap.newHashMap(1); System.out.println("map = " + map); // {} map.put(-1, null); System.out.println("map = " + map); // {-1=null} map.merge(-1, 1L, (value1, value2) -> 2L); System.out.println("map = " + map); // {-1=1} map.merge(-1, 1L, (value1, value2) -> 2L); System.out.println("map = " + map); // {-1=2} map.merge(-1, 1L, (value1, value2) -> null); System.out.println("map = " + map); // {} } }The compound methods remove(key, value), replace(key, value), and replace(key, oldValue, newValue) treat a null value as being present:
public class ReplaceRemoveFun { public static void main(String... args) { Map<Integer, Long> map = HashMap.newHashMap(1); System.out.println(map.replace(-1, null)); // null map.put(-1, 0L); System.out.println(map.replace(-1, null)); // 0 System.out.println(map.replace(-1, null, null)); // true System.out.println(map.remove(-1, null)); // true System.out.println(map.remove(-1, null)); // false } }Allowing null keys in maps has some interesting implications. I sent out some Twitter polls to see how many programmers would know the correct answers:
What does new TreeMap().put(null, null) produce? with possible answers of "No output" (15%), NullPointerException (61%) and "Either, depending on ..." (24%). The correct answer is the last, because it depends on the Java version. Versions 1.2 through 1.6 they accepted the first key to be null, and would only throw a NullPointerException on subsequent entries. This was made more consistent in Java 1.7 to always throw a NullPointerException unless the TreeMap was created with a Comparator that could cope with null keys. See Bug 5045147.
In a WeakHashMap.put(null, "Hello world"), when will the entry be removed in #Java? with possible answers of "Won't be added" (35%), "At the next GC" (14%), "A while after the next GC" (14%), and "Never" (37%). The correct answer is the last one, since when the key is null, a private static final object is used instead as a key. Since this object is always reachable, the WeakReference will never be cleared, and thus the entry will remain in the map forever.
What is the computational time complexity of HashMap.get(null) in #Java, where n is the number of keys with a hash code of 0? Possible answers were "linear - O(n)" (29%), "quadratic - O(n²)" (3%), "constant - O(1)" (47%), and "logarithmic - O(log n)" (21%). The correct answer was the first - linear. Since Java 8, when a lot of keys have the same hash code, the HashMap builds up a red-black tree of entries. Thus ordinarily the complexity would be logarithmic. However, this is only true if the key is Comparable. Since the key is null, it does not implement Comparable, and thus the map needs to search for it linearly.
A little example for the last quiz, where we create a lot of Strings with hashCode=0 and then also add a null key. This time we are using a HashSet instead, which internally delegates to a HashMap:
public class ZeroHashFun { private static final String[] zeroHashCodes = { "ARbyguv", "ARbygvW", "ARbyhVv", "ARbyhWW", "ARbzHuv", "ARbzHvW", "ARbzIVv", "ARbzIWW", "ARcZguv", "ARcZgvW", "ARcZhVv", "ARcZhWW", "ASCyguv", "ASCygvW", "ASCyhVv", "ASCyhWW", "ASCzHuv", "ASCzHvW", "ASCzIVv", "ASCzIWW", "ASDZguv", "ASDZgvW", "ASDZhVv", "ASDZhWW",}; public static void main(String... args) { for (int i = 16; i <= 24; i++) { System.out.println("Testing with size " + (1 << i)); test(1 << i); } } private static void test(int size) { HashSet<String> set = HashSet.newHashSet(size); for (int i = 0; i < size - 1; i++) { StringBuilder sb = new StringBuilder(); for (int j = 1, index = 0; j <= i; j <<= 1, index++) { if ((i & j) != 0) sb.append(zeroHashCodes[index % zeroHashCodes.length]); } set.add(sb.toString()); } set.add(null); // also adding a null key System.out.println("set.size() = " + set.size()); System.out.println("Searching for key not in the set"); String notInMap = zeroHashCodes[1] + zeroHashCodes[0]; for (int repeats = 0; repeats < 10; repeats++) { testLookup(set, notInMap); } System.out.println("Searching for null key"); for (int repeats = 0; repeats < 10; repeats++) { testLookup(set, null); } } private static void testLookup(Set<String> set, String key) { long time = System.nanoTime(); try { set.contains(key); } finally { time = System.nanoTime() - time; System.out.printf("time = %dms%n", (time / 1000000)); } } }Summary of the output. For each of the times, we only show the median. It is clear though from the timing that the contains() method has linear compexity to the number of keys with hashCode==0, since as we double that size, we also take twice as long to call contains():
Testing with size 65536 set.size() = 65536 Searching for key not in the set time = 0ms Searching for null key time = 0ms Testing with size 131072 set.size() = 131072 Searching for key not in the set time = 0ms Searching for null key time = 3ms Testing with size 262144 set.size() = 262144 Searching for key not in the set time = 0ms Searching for null key time = 7ms Testing with size 524288 set.size() = 524288 Searching for key not in the set time = 0ms Searching for null key time = 13ms Testing with size 1048576 set.size() = 1048576 Searching for key not in the set time = 0ms Searching for null key time = 25ms Testing with size 2097152 set.size() = 2097152 Searching for key not in the set time = 0ms Searching for null key time = 55ms Testing with size 4194304 set.size() = 4194304 Searching for key not in the set time = 0ms Searching for null key time = 104ms Testing with size 8388608 set.size() = 8388608 Searching for key not in the set time = 0ms Searching for null key time = 230ms Testing with size 16777216 set.size() = 16777216 Searching for key not in the set time = 0ms Searching for null key time = 454ms(Even though this could be classified as an extremely mild threat, I don't think I'll bother submitting a bug report. It takes longer to create the set than to call contains().)
None of this weirdness with null keys and values would happen if we used maps that did not allow them in the first place. My concluding recommendation thus is to use ConcurrentHashMap instead of HashMap and ConcurrentSkipListMap instead of TreeMap. No more null weirdness, and race conditions are also taken care of.
I hope you enjoyed reading this as much as I enjoyed writing it :-)
Kind regards from Chorafakia
Heinz
Java Specialists Superpack 22 Our entire Java Specialists Training in One Huge BundleLoading...
Loading...