Twitter has been on a fast ride lately, getting lots of attention from various bloggers. The interview with Alex Payne from Twitter (which I found via Brandon Werner’s post) has really exposed Ruby on Rails current scalability issues.
This isn’t the first time that Ruby on Rails scalability has been called into question, even I asked how scalable is Ruby. The Grails vs. Rails benchmark by Graeme got a lot of attention, but it was a small benchmark and wasn’t a real world example. Well you don’t get more real world than Twitter now do you?
Adding CPUs for scaling is always an option, but what the Rails community isn’t really saying is how many Mongrel instances can you efficiently run per CPU? Mongrel not being multi-threaded is a real hindrance. Comparing to say a JEE server such as Tomcat or Jetty where you don’t have to run as many instances per CPU. Anybody will tell you that threading is much cheaper than processes. The cost saving in development time would easily be offset by the cost of hardware, especially if you are having to buy lots of hardware quickly.
While Alex’s comment were certainly interesting, what really got me was the reply by David Hansson. He made a point about the database becoming a bottleneck at some point, and this can be the case with many high traffic applications. However, not being able to connect to multiple databases is a problem. I’m not sure about other drivers, but with JDBC, you can specify a list of servers to connect to that will go a long way towards helping with load balancing. “Caching the hell out of everything” as Alex pointed out is not always a workable solution. What if you have a lot of dynamic queries? What if you don’t have a largely read only system? In those cases caching only goes so far.
The reply that was really a crying call however was
Second, when you work with open source and you discover new requirements not met by the software, it’s your shining opportunity to give something back. Rather than just sit around idle waiting for some vendor to fix your problems, you get the unique chance of being a steward of your own destiny. To become a participant in the community rather than a mere spectator. This is especially true with frameworks like Rails that are implemented in high-level languages like Ruby. The barriers to contribution are exceptionally low.
In this case, it seems that Twitter requires more sophisticated ways of talking to many databases at the same time. Alex puts it a little black and white with “…there’s no facility in Rails to talk to more than one database at a time”, which isn’t really true, but it could definitely be done better. Last I spoke with Twitter, we discussed this and they sounded enthusiastic about being able to further this area of Rails. It’s disappointing to hear that they’ve forsaken that opportunity for an arms-crossed alternative.
Hello… They have a business to run as well you know. Yes the beauty of open source is that you can contribute, but when you are trying to build a business you don’t have the luxury of digging through a ton of source code and then try to figure out the where the problems are and make changes.
I may be wrong, but it appears that David’s response was the more closed-arms approach. “You don’t like it, fix it yourself”, which is often the attitude of many in the open source community. I would think that with a high profile use like Twitter, either the Ruby on Rails team or the Mongrel developers would work towards solving these problems instead of adding new features. Not every Rails application will see the kinds of loads that Twitter is seeing, but it becomes bad PR. Someone who is contemplating using Rails for an app could easily look at those kinds of reports and decide not to use it because it doesn’t scale, even if it would scale to what they would need.
What the Ruby on Rails community needs to do is step up and build a container that can scale effectively as well as support other options, such as the multiple database connectivity issue, before their current and potential future users decide to go elsewhere.
Don't miss anything, subscribe!

Dude, he (DHH) wrote Rails for you to use, for free!. Do you think he ows you fixing your particular problems? He also runs a business, he´s also busy improving other aspects of the framework. This is the sort of reasons Twitter chose to go with an Open Source framework in the first place, as opposed to a propietary solution where the vendor will, indeed, run to fix your problems in exchange for juicy bucks. You have a scalability issue with Rails? Fix it. Write a plugin or something. Share it. DHH did it with an entire framework, and he didn’t really had to.
Ismael: No, he may not personally owe anybody anything. There are enough people working on Rails directly though, or at least with the various containers, that someone should be on this.
No everyone has the resources to contribute back. Besides, it can take quite a while to actually figure out how a framework really works. In this case it probably isn’t so much the Rails framework itself, but the container(s).
As for vendors, don’t even get me started there. They are the worst with support and they won’t come running to fix your problems. They will give the same excuses that everyone else does “oh just throw some more hardware at it”
I’m in the process of going through a redesign for the web community I’m in charge of. We’ve been looking at Ruby. This whole discussion makes me think staying with PHP is the right way to go for now.
PHP isn’t going to scale any better than Ruby. We’re talking about handling multiple users by processes rather than threads - which is exactly how php works (you just have apache children rather than mongrels).
Yes, with j2ee you use threads, but the efficiency of that is very dependant on your JVM/OS - it doesn’t magically work better on all platforms.
I believe that ‘caching’ in this context means memcached (i.e. DB caching, not application/page caching), which is easy to use with dynamic content (and built into rails).
Whether DHH is a pillock or not shouldn’t be part of this discussion. His point that you have the *option* to extend rails easily stands, however.
If the issue is just about issuing and handling async DB lookups, then Ruby is hardly in trouble — that’s not a language issue (i.e., it’s not that Ruby is so dynamic that it can’t possibly run effeciently and so there are knees that run performance into the ground). So, yes, it’s a framework issue, and if you’d be willing to write a large and complex application, then you might be willing to write async DB classes; the “it’s open source, so you can fix it yourself” attitude seems appropriate in this case, IMO.
Another thing we don’t know is just how well written Twitter is. It may be reasonable to give the developers the benefit of the doubt, but then let’s look at the exposed parts we can see. For instance, wen registering, the full name field is restricted to 20 characters. That right there is a big red flag that the development is not all that thoughtful. *Many* full names don’t fit in 20 characters. Surely they could have spared a few more. Even if they had, say, a hundred million registrants that’s not very much disk space.
And the user name is not allowed to have a space in it. There’s no good technical reason for that. That’s another sign of an “old school” mentality of placing arbitrary restrictions on the users to make it easier on the developers.
Those two things alone make me have to believe there is a lot of room for improvement in the app’s code.
“There are enough people working on Rails directly though, or at least with the various containers, that someone should be on this.”
How about someone like Twitter?!
What did Facebook do when memcached didn’t meet their needs fully? They built out memcached and everyone benefitted, Twitter should be doing the same thing if multiple DB connections is their problem, I doubt they would have trouble getting a patch noticed.
– Mongrel not being multi-threaded is a real hindrance. –
Sorry to contradict you there but Mongrel is multi-threaded. It’s the Rails framework which is not thread-safe and therefore you need to run a cluster of mongrels.
Mongrel is pretty darn fast as far as ruby apps go mostly because the url and header parsing are done in C libraries.
You might want to take a look at merb which is a very stripped down framework when comparing with something like rails but it’s thread-safe and can take advantage of mongrel’s performance.