Lessons in scaling

In my last post I wrote about some strange threading behavior in tomcat 6. After a couple weeks of struggling between iBatis, our app, our connection pool and even trying the NIO connector we finally narrowed it down to the connection pool.

For some unknown reason, DBCP wasn’t taking the connections back into the pool. The result was the thread that was using them would hang and eventually tomcat would hang. How on earth DBCP has lived as long as it has with this problem is beyond me. We tried everything, even manually pulling the connection out of iBatis and closing it. Nothing worked. I could get DBCP to lock up with a connection pool of 150 and a user load of 50. Go figure.

c3p0 to the rescue

Thankfully there is another popular connection pool out there and it is c3p0. We swapped it in nice and easy (thanks Spring!) and presto, all our troubles went away. I could even overload the app (more concurrent users than connections available) and it wouldn’t die. It got slow, but it wouldn’t die.

Thanks to my finding, we swapped out DBCP as the pool that ActiveMQ was using and solved a few problems there as well.

Now our bottleneck is the communication between the application and our search engine, SOLR. That is to be expected somewhat as it opens an HTTP connection. We use SOLR for more than search however, it is also our navigation engine by means of faceting so needless to say we make a lot of calls to it. A little caching will solve that problem though and should be fairly easy to implement.

What are the lessons learned?

Don’t think any piece of the puzzle is immune to being the problem. My initial thoughts were that since DBCP has been around so long it couldn’t possibly be the problem. I spent most of my time in iBatis since I was new to it and thought maybe we were just doing something wrong.

In the end we were able to run load tests that were several times the capacity that our old (current) site sees. A good sign indeed.

Two tools helped more than anything else. JMeter for our load testing and JConsole, the built in JMX application for monitoring. With JConsole it was easy to see what threads were being locked, what they were waiting on and how many threads were running. Very, very handy indeed.



Don't miss anything, subscribe!

Did you enjoy this post? Why not leave a comment below and continue the conversation, or subscribe to my feed and get articles like this delivered automatically to your feed reader.

Comments

Man, if I had known before that you were using DBCP I would have told you to replace that first. We found all kinds of nasty stuff (thread locks, pool leaks) in it when we were working through performance issues around the ‘04 or ‘06 mid-term election here at CNN. As soon as we found c3p0 we replaced it everywhere.

Oh well, guess I had to learn the hard way! c3p0 has been great and when it finds problems it actually gives you some useful messages, instead of just locking up and puking.

Would you mind sharing your c3po connection settings in spring config? ie max connections etc?

@javaguy44 sure, our max connections are 250, maxidleTime=900, idleConnectionTestPeriod=300, testConnectionOnCheckout=true,preferredTestQuery=SELECT 1

The only thing we change in dev and QA is the max connections. I’ve run load tests with as high as 225 concurrent users (no think time, 8 requests per user) with no problems. I’ve run smaller tests where our max cons were at 150 and our user load was 150 and ran it for four hours and c3p0 never complained. At the higher 200 loads talking to SOLR slows down the app so we didn’t go much higher on the db connection counts.

We have just moved our code to use DBCP :( - can your provide details of what the problems were and any usage pattern to reproduce it based on your experience with it.

Leave a comment

(required)

(required)