Movate Blog - A Thought leadership platform for Cloud, Analytics, Tech Support Articles

A Tale of Threads and A HashMap

Written by CSSCorp | Nov 19, 2014 9:20:01 AM

One of the main objectives of Performance Testing is to ensure an Application is in no way sub-optimal in terms of resource consumption. Resources generally mean CPU, Memory, I/O, Network. But then we also have other resources that lie towards the application's side, like Heap, GC, Threads, Sessions, Request Processors etc.

So what do you do when you notice something unusual about your Web Application, especially when your customers are finding the application to be performing sub-optimally or even inaccessible at times (which gets pretty worse when your application is mean to be visited from across the globe)?

Here's what we did when one of our clients' pre-staging environment got into a fender bender:

Our monitoring stack using vanilla linux commands were in place, tracking CPU, Memory, I/O, and Network metrics. Additionally, we used JStack to collect thread dumps from the application. A little digression: the application was written in Java using popular open-source frameworks (DWR, Hibernate, Spring).

Certain things that saved the day for us were:

  • JStack with "-l" option, which provides additional info on threads
  • Linux's very own and popular "top" with "-h" option, which lists all the top CPU consuming threads (rather than processes)

Upon examining the thread dumps, we pretty much ended up looking at threads like this one:

And this one:

Conclusively, one of the frameworks had a bad HashMap implementation that when concurrently accessed, led to a circular dependency, which led to threads executing that part eventually loop over their access operations.

Here's The Drill

In a nutshell, if two people were trying to pass a corridor: Alphonse moves to his left to let Gaston pass, while Gaston moves to his right to let Alphonse pass, eventually ending up in a collision course between the two. That is precisely the definition of a livelock (courtesy: Oracle).

A touch of synchronization to the code of the framework that caused the livelock was all that was needed to avoid it.

Here's a wonderful blog post on the HashMap race condition that details through all levels of the livelock that we encountered. Be sure to check it out!.