One of the most common problems in building enterprise web applications are leaks. A leak is consumption of a resource by a program where the program is unable to release the resource. Leaks come in various types, such as
- Memory leaks
- Thread and ThreadLocal leaks
- ClassLoader leaks
- System resource leaks
- Connection leaks
As you can see in the picture above, leaks are really nasty and bring down an application server after just a few minutes if they’re stressed at little bit. That should not ever happen.
Memory leaks often occur due to performance optimizations, such as using caches to hold references for expensive objects or for objects retrieved from external sources such as a database: lookup tables, object caches, object pools etc. You can identify such leaks very quickly by analyzing a component. If it adds new entries without ever removing entries from that instance, that’s a potential source for memory leaks (not always though).
Thread leaks often occur when low-level libraries create new Threads without knowing about their own lifecycle within an application container, such as a Java Enterprise Application Server. Most developers know about the unwritten rule that new Threads should not be created within Web Applications: the EJB spec even forbids it, Servlet Spec 2.5 does not forbid or even mention Thread creation by application developers and Servlet Spec 3.0 introduces the chapter about Asynchronous Processing and also states the following, which clearly allows Thread creation:
If a thread created by the application uses the container-managed objects, such as the request or response object, those objects must be accessed only within the object’s life cycle as defined in sections 3.10 and 5.6. […] Servlet Specification 3.0 PR
But often libraries internally create Threads unknown to the application developer. Threads are used for background jobs or for Timers, which should clean up resources after time. However, since the time when a web application should be reloaded or undeployed is unknown to the library, it is impossible for the library to cancel the Timer or shut down the Thread.
ThreadLocals are a similar problem, when their value is not reclaimed and references application classes. Such cases often occur in web applications, as Threads are reused for multiple web application instances and thus values of ThreadLocals often remain. So, how do you find ThreadLocal leaks?
– Take heap dump
– Object Query Language: SELECT * FROM INSTANCEOF java.lang.ThreadLocal
– for each ThreadLocal found:
— List objects – with incoming references
— expand 1 level to see the class name and field name of the class which has the Threadlocal
— open the class in your favourite IDE and see how the ThreadLocal is used and clean up
If set(null) or remove() is never called > ThreadLocal leak > ClassLoader leak
The worst kind of Thread leaking is coupled with ClassLoader leaking. Threads hold a strong reference to the ContextClassLoader. The ContextClassLoader is often the WebappClassLoader. In such a cycle, it won’t be possible to unload the classes loaded by the WebappClassLoader due to the running Thread. Even if the Thread is set to be a daemon-thread the leak is there, since in a webapp, the JVM is never shut down and hence Daemon-Threads are not shut down by the JVM automatically.
ClassLoader leaks are very problematic for redeployment scenarios or dynamic applications. Caches or reflective utilities often hold a reference to the ClassLoader, either the WebappClassLoader or the ThreadContextClassLoader. When those references cannot be reclaimed, the Web application cannot be undeployed cleanly. The result is either that the server needs to be restarted, or that undeploying a web application results in open file handlers or a “corrupted” web app folder. For example, Tomcat expands a .war file into a temporary working directory. This working directory is not deleted completely when such an web application with a leak is tried to be undeployed. Developers often just delete the folder manually and redeploy, since it’s very hard to find out the root cause of the leaks.
If you search for ClassLoader leaks in webapps, you often stumble upon the example where a custom LogLevel is used to force a leak and to show how to find and resolve it using profilers and heap dump analyzers. If you happen to stare at a heap dump of a usual web application, you will see a lot of Logging infrastructure objects. Sometimes, they’re really to blame, but not always. When hunting down memory leaks, it’s crucial to watch for all objects and all traces. Sometimes, it’s just that objects are there because the ClassLoader cannot be reclaimed due to other reasons. Don’t blame the first (and easy) find. Go deeper until you’re sure about the cause.
System resource leaks are, for example, open file handles or a temporary folder which fills up all the available disk space. For network sockets, the operating system or network stack will take care of unclosed connections and kill them after a while. However, file handles don’t have a timeout and thus will be kept open. Most operating systems have a limit of open file descriptors.
Rule #1 in preventing leaks: Close the resource you have opened when you don’t need it any more!
Problem #1: When? As a developer of a web app, you often don’t know when to close a resource. When is the page being viewed by the user not used any more? When is a scheduler timer not used any more? When is a cache invalidation timer not used any more?
Problem #2: Where? As a developer of a web app, you often don’t know about all the resources used by your application, because it’s buried down in thirdparty libraries or in the container itself.
Rule #2 in fixing leaks: Have automated stress tests
Set up an environment with an application server, your web app, a stress testing tool and a profiler. My suggestion:
- Tomcat as app server
- Your web app (use the real thing, no dummy)
- JMeter test plan
Rule #3 in fixing leaks: Move as much as possible out of your web app
For example, a JDBC driver should not be within your webapp. The DriverManager has references to the ClassLoader and Connection pools. If you package a JDBC driver within your webapp, you certainly trap into a leak. Use JNDI to get Connections from the app server. Don’t package libraries which are already available in the app server, such as logging frameworks. If you can omit them, do it. Tomcat even has a workaround class which tries to deregister JDBC drivers loaded from within the web application ClassLoader.
Rule #3 in fixing leaks: Trial/Error various configurations
Try to disable some of your features, run the tests and compare the results (Memory heap dump, Visual VM graphs). This is a very fast method of getting to know which feature causes a leak.
Rule #4 in fixing leaks: Reevaluate
After you have reached a leak-free state of your webapp, disable all the fixes you have done one-by-one and perform your tests again. You will find out that some of the workarounds or fixes are meaningless, because they were just a symptom of one of the root causes.
In my current project, i’ve identified the following potential leaks:
- Proprietary Singleton BeanLocator (delegate to Spring’s WebApplicationContext)
- Multiple EhCache Caches
- Dozer Bean Mapper’s JMX Beans for Administration and Statistics, which get registered by default but not unregistered automatically
- Additional Threads whose ContextClassLoader is the WebappClassLoader
- XWS Security’s Timer Thread for cleaning up nonces
- iBatis SqlMaps Mapped Statements Cache
- iBatis SqlMaps ClassInfo Cache
- JDBC Drivers loaded from the WebappClassLoader (e.g. H2 as in-memory database for demo purposes)
- Commons-Pool Eviction Timer, when started from within the WebApplication
- Java Beans Introspector
- Spring’s CachedIntrospectionResults
- AspectJ’s ReflectiveWorld in v1.5.4 (seems to be fixed in v1.6.6)
- Commons-Logging LogFactory
- OpenOffice ODF Toolkit’s TempDirDeleter Timer
When are you finished fixing leaks?
When your webapp can be redeployed a large number of times without PermGen cranking up and when the Memory Analyzer only finds two suspects after undeploying all your webapps:
- <system class loader>
If there are other ClassLoaders, you might still have a leak. If you’re happy and end up with the following long-term monitoring graph, after 12 hours of continuous redeployments, you haven’t got any ClassLoader leaks:
(VisualVM only shows the last view minutes, but you can see in the lower left ‘Unloaded classes’ that there was some smoke-testing going on)