Chief Scientist Emeritus Fabian Yamaguchi and foundational Code Property Graph technology recognized with IEEE Test of Time Award

There are plenty of good and popular caching libraries on the JVM, including ehcache, guava, and many others. However, in some situations it’s worth exploring other options. Maybe you need better performance. Or you want to allow the cache to grow and fill up the entire heap, yet shrink automatically when your application needs more space elsewhere. In these situations, Java Soft References can be an excellent tool.

Typical caching libraries require you to define a static upper bound, which means you’ll need to be very conservative with sizing your cache and you are still at risk of running into an OutOfMemoryError.

As a refresh, an OutofMemoryError is a runtime error in Java which occurs when the Java Virtual Machine (JVM) is unable to allocate an object due to insufficient space in the Java heap.

Caching libraries also need to keep track of the size of their elements themselves, which is potentially expensive.

Java References Explained

There are many great articles about the different reference types: strong, soft, weak and phantom. In a nutshell, strongly referenced objects (the common case, e.g. `String s = “abc”`) are never collected by the garbage collector, and can therefore lead to an OutOfMemoryError if you allocate more than fit onto your heap. In contrast, softly referenced objects are collected as a last resort before an OutOfMemoryError is thrown.

You can create a Java Soft Reference using SoftReference softRef = new SoftReference<>(“abc”)

To access the underlying object, just call softRef.get(), which may return null. Note that if you (additionally) hold a strong reference to the same underlying object, it’s not (only) softly referenced any more and can’t be automatically freed.

‘The internet’ often discourages the use of Java Soft References, typically without giving a good explanation, so I gave them a try and actually found them to be a good tool to have at my disposal.

If you understand how Java Soft References work, you can quite easily build a very simple and efficient cache, which has excellent performance and uses and frees up memory as required in other parts of your application.

Before we look at that, let’s discuss some common pitfalls with Java Soft References.

SoftReference Pitfall 1: Don’t Trust the Javadoc:

All Java Soft References to softly-reachable objects are guaranteed to have been freed before the virtual machine throws an OutOfMemoryError.

That’s a lie. It was true when Java Soft References were first introduced in Java 1.2, but from Java 1.3.1 the jvm property -XX:SoftRefLRUPolicyMSPerMB was introduced. It defaults to 1000 (milliseconds), meaning that if there’s only 10MB available heap, the garbage collector will free references that have been used more than 10 seconds ago. Everything else will not be freed, leading to an OutOfMemoryError, breaking the guarantee from the javadoc.

(I’ll try to get that changed).

No problem, let’s just set it to -XX:SoftRefLRUPolicyMSPerMB=0 and the javadoc is suddenly true again.

SoftReference Pitfall 2: It’s All or Nothing

When the GC figures that memory is running low and it better frees some softly referenced objects, it will free all of them. This will make our cache very inefficient, because it’s expensive to recreate those objects. It would be better if the GC would only free a small portion of the available Java Soft References.

Working Around those Pitfalls to Build an simple yet Efficient Cache

The first javadoc pitfall is easily fixed, but how do we fix the issue that the GC frees all Java Soft References? A very straightforward (if not the most efficient) approach is to simply hold additional strong references to the objects you don’t want to get freed.

Obviously, we need to ensure that we drop these strong references if more memory is needed, e.g. when other softly referenced objects have been freed. That’s why we override the finalized method, which gets invoked when the GC frees an object. As long as we always have some softly referenced objects, we’ll not run into an OutOfMemoryError.

Note that we could also perform other actions in finalized, like serializing the object somewhere. If you do, keep in mind that it must be a fast operation that doesn’t require allocating a lot of additional memory, otherwise we’re risk running out of memory again. Read this thread on the mailing list about overriding `finalized. The good news is that as of OpenJDK 11, it’s fixed altogether.

Summary: The Value of Java SoftReferences

Java Soft References are a simple and powerful concept on the JVM, and it’s very useful to understand how they work. That’s true not only if you want to build your own cache, but also if you use “proper” caching libraries. Some even have the option to use Soft References internally, in which case it’s essential to understand the caveats detailed above.

Your default choice should still be a caching library, but if you need better performance, or want the cache to grow and fill up the entire heap, yet shrink automatically when your application needs more space elsewhere, Java Soft References may be for you. Beware that they are rather low level, and come with their own set of tradeoffs. As always: choose the best tool for the job, and keep Soft References in the back of your head (literally).

To read about more useful Java security tips and vulnerabilities, visit the ShiftLeft blog. Or get in touch with an expert at ShiftLeft to learn more about how ShiftLeft CORE can help your team scan often, fix fast, and secure code at scale.

About Qwiet AI

Qwiet AI empowers developers and AppSec teams to dramatically reduce risk by quickly finding and fixing the vulnerabilities most likely to reach their applications and ignoring reported vulnerabilities that pose little risk. Industry-leading accuracy allows developers to focus on security fixes that matter and improve code velocity while enabling AppSec engineers to shift security left.

A unified code security platform, Qwiet AI scans for attack context across custom code, APIs, OSS, containers, internal microservices, and first-party business logic by combining results of the company’s and Intelligent Software Composition Analysis (SCA). Using its unique graph database that combines code attributes and analyzes actual attack paths based on real application architecture, Qwiet AI then provides detailed guidance on risk remediation within existing development workflows and tooling. Teams that use Qwiet AI ship more secure code, faster. Backed by SYN Ventures, Bain Capital Ventures, Blackstone, Mayfield, Thomvest Ventures, and SineWave Ventures, Qwiet AI is based in Santa Clara, California. For information, visit: https://qwiet.ai

Share