Thoughts on large scale Java apps

So I just got finished reading Nati Shalom’s post about why most large scale apps are not written in Java. If you read through the comments you will see a combination of complete ignorance as well as correct assertions. As a few of the commentors pointed out, Nati missed a few of the biggest sites out there, namely eBay, Amazon, Hotwire, etc. Others also pointed out the the sites mentioned in the chart largely started out as small projects that grew. Beyond that, let’s look at some factors that contribute to more sites not being run on Java.

Hosting

It doesn’t take a genius to know that it is cheap and easy to find LAMP hosting for these kinds of projects. A lot of big public sites running did not start out as big commercial entities. Several years (when most of these sites got started), Java hosting was hard and not cheap. To some extent this issue still exists today. A decent Java hosting company is going to charge a minimum of around $25 a month to get started. No it isn’t expensive, but there are cheaper LAMP hosting companies out there. Let’s face it though, if you are expecting to have the next Flickr, Twitter, etc. and want to build in Java you are probably going to have to buy some hardware to start out with.

On the flip side, you may not have to buy as much hardware in the long run to achieve the same amount of scaling. Manageability also comes into play here as well. Last I remember, Twitter had some 180+ mongrel instances running(?). I’m not an admin guru, but that seems like a management nightmare.

Java/JEE is expensive

If I hear this ignorant excuse one more time I’m gonna puke. JEE is only expensive if you are dumb enough to buy an app server from one of the big vendors. In the early days the options were pretty limited and not well supported. That is no longer true however. With JBoss, Geronimo, Glassfish, Jetty, Tomcat, etc. all worthy servers with a variety of options depending on your particular needs. In my opinion, this is the major reason why Java hasn’t gotten a wider adoption in the ‘public’ arena and been left to big companies: they could afford those overpriced app servers. On the other hand, if it wasn’t for that era of big app servers and corporate buy in, Java and JEE might not have existed this long.

If your next argument is going to be “JEE is expensive in development time”, then I laugh at you. Granted, compared to say Ruby on Rails or other similar frameworks, it can be more time consuming and therefore expensive in the short run. What about say PHP? PHP is fast and easy to develop too right? Absolutely, and I use PHP on occasion myself and have little bad to say about it. The problem is that too many non-Java developers see the whole JEE stack and/or the big frameworks that are popular (Hibernate, Spring, Struts, etc.) and see complications. I have to agree with them for the most part. However I can just as quickly build a Java/JSP app in the same manner as a PHP app. Using JSTL I can use the SQL tags and easily connect to a database and build an app in much the same style as a PHP or Perl style web app. No complicated frameworks to get in the way.

Now, most Java developers would fall out of their chairs in disgust at such an option, but it is an option. While it doesn’t fit into the ‘ideal architecture’ (whatever that is supposed to mean) that most of us Java folks are used to, it is viable.

Either way, an experienced Java developer with a good framework can developer a comparable CRUD pretty quickly. Maybe not as quick as a really good RoR developer, but depending on the complexities of the app the time may be negligible. By the same token, depending on the complexities, it may actually be quicker in Java.

Hardware scale vs. Rewrite

A few savvy commentors were correct in saying that it can be far easier to buy more hardware than it to rebuild an application with scalability in mind. This is doubly true for a company that is online and completely swapping codebases is not an appealing option. In some of the corporations that I have worked in where we rewrote an application (usually from COBOL to Java), we had the luxury of running the apps in parallel for a time period to make sure all the bugs and kinks were worked out. They were also internal applications, which makes this easier to manage.

Scalabitily

So is PHP, Perl, Python, etc more scalable than Java? No. Nor is Java more scalable than anything else for the most part. Most experienced developers with large systems will easily tell you that your real scalability issues will come in the database area. It is far easier to throw hardware at the web layer and scale it out that way than it is to scale a database.

Given that, imagine what a system like say Twitter would have run like if it had been built on top of a messaging system (JMS anyone??) instead of Rails? Not all online applications can be built using a message infrastructure, but I doubt that something similar to Twitter really needed to be built on a straight CRUD set up like Rails. I’m not necessarily knocking Ruby on Rails, but it isn’t a perfect fit for everything like any other language.

Back to messaging… While more hardware, lots of caching, etc. can help with a lot of applications and it has certainly helped with Twitter, I wonder if they would have had the scalability issues as soon as the did if their infrastructure would have been done with a messaging system. Who knows…

Anyway, back to the original question of why more large apps aren’t built in Java. The timing is one, the apps that are large now were built by small teams or individuals that found Perl, PHP, Python, etc. easer to work with OR they already knew those languages. Free app servers were pretty limited several years ago as well, further limiting options.

I would also venture to say that there wasn’t as many entrepreneurial Java developers as there are now. Most Java developers are in a corporate environment where you don’t see a lot of PHP, Python, etc. and that is where they (we) tend to stay. Even now, most of my side work isn’t in Java, primarily because building and hosting a PHP app is so simple. If I had the money for a dedicated box I might build more Java apps or even host them with a 3rd party.

In any event, there is nothing really preventing Java from being used in a large scaled application. From all practical purposes, there is nothing preventing most languages from being used a large scaled application. It just happens that there are not a lot of public facing Java apps that are, but the ones that are to happen to be pretty damn big, e.g. eBay.



Don't miss anything, subscribe!

24 Responses to “Thoughts on large scale Java apps”


  1. 1 hrm2100

    Could you please make the text color a little bit darker, as it is a bit hard to read. Thanks!

  2. 2 The Bull

    You’re right, it was a bit light. Changed :)

  3. 3 Nati Shalom

    Interesting analysis.
    Would it be fair to say that sites that are mostly *content centric* would tend to choose LAMP stack and those that have more *business logic centric* like trading application would tend to choose framework like Java or .Net for that matter?

    Another question on this matter is why do we see parallel stack being developed like Map/Reduce, mecache etc. where the equivalent already exist in the Java world. Is that driven from the first question i.e. if i choose LAMP i’d go with one stack and if i choose Java that stack would be totally different even though both element in the stack covers that same functions?

  4. 4 Nati Shalom

    Interesting analysis.

    Would it be fair to say that sites that are *content driven* would most likely choose LAMP stack and *business logic* driven sites would tend to choose Java?

    So far it looks like the stack that I’ll use in each framework is dependent on the first question i.e. if i choose LAMP i’ll end up using memcache, Map/Reduce even though equivalent implementation already exist in the Java world.
    Do you expect that there we will see any convergence between the two different stacks or a continues parallel development?

  5. 5 Guy Nirpaz

    GigaSpaces is developing a Java platform to overcome many of the challenges you’ve descirbed. It is Java based, deeply integrated with Spring and have messaging, data-access and caching within a single platform - so it’s much easier to scale. Actually it was built to scale.

    Nati Shalom, is the CTO and founder of GigaSpaces, and in his post, he was trying to better understand why a lot of the current Web 2.0 hype does not flow well in the Java community. I would suggest reading some more on his blog… (he knows what he’s talking about…)

  6. 6 Steven Devijver

    The only thing I can add is this: I don’t know what del.icio.us runs on - Java or not - but it’s currently not working.

  7. 7 Hmmm...

    “Either way, an experienced Java developer with a good framework can developer a comparable CRUD pretty quickly. Maybe not as quick as a really good RoR developer…”

    Ahhh, how about frameworks like Tapestry, Wicket, Grails!? I mean, Grails is at least as (if not more so) productive than RoR!

  8. 8 infonote

    Java is used in large scale applications. 10 websites is not exactly a real sample of what companies use.

  9. 9 Dmitriy Kopylenko

    If I’m not mistaken, the current del.icio.us is written in Perl

  10. 10 Dmitriy Kopylenko

    That’s right. You forgot that we have Grails in the Java world and that makes all the difference!

  11. 11 Affar

    Thank you for this post. It will clarify things for the Java scalability issue.

  12. 12 HernĂ¢ni Cerqueira

    There are cheap java hosting as well. I wont advertise here, but for about $15 a month you can get a private jboss or tomcat or jetty instance, and for about 1/2 the price you get shared hosting on one of those application servers.

  13. 13 The Bull

    I haven’t forgotten Grails, and have dabbled in it myself. I haven’t seen it used in a production public app yet, but no reason why it couldn’t be.

    I know there are some inexpensive Java hosting solutions out there, what I was trying to get across was that there wasn’t a lot of choices available; certainly not as many as with a LAMP solution.

    Thanks for the comments guys!

  14. 14 The Bull

    Nati: “Would it be fair to say that sites that are *content driven* would most likely choose LAMP stack and *business logic* driven sites would tend to choose Java?” I would say that at this time that is a pretty fair assessment. As for Map/Reduce, as far as I know, the only Java implementation is Hadoop which is fairly new, but I could be wrong.

    Guy: I’m away of GigaSpaces and what they produce. I would venture to say that for most public sites it isn’t a requirement for scaling Java. An option yes, and I’m sure they will find their niche. However Java will scale pretty well on a decent app server.

  15. 15 Dennis

    Well,
    As for the Java - it’s just one of the languages and we have a lot of applications and frameworks built around.

    To my understanding there are 2 different things on market. They are visualization and grid computing. Both allows you to obtain max throughput. But of course database usually is a bottleneck.

    Yes Hadoop is pretty good, but we are at GridGain also provide Grid computing based on Map/Reduce.

  16. 16 Michael Kimsal

    Interesting that you talk about how “Java is expensive” makes you want to puke. Then a few paragraphs later you state “If I had the money for a dedicated box I might build more Java apps”. Perhaps I’m not reading deep enough, but you seem to confirm the ‘expensive’ argument.

    You hit the nail on the head with one - most apps didn’t start out to be ‘large scale’. Someone had an idea, then got started with a prototype/softlaunch/whatever, then the project got larger than imagined. Often it’s easier to refactor what you started with bit by bit rather than rebuild from scratch, especially if the system is gaining popularity. And if the bottleneck is often a db layer, optimizing your db access and schema will often yield much more tangible short term results than changing languages.

    A LAMP system is shared-nothing by default. Not saying Java apps can’t be that way by default, by the more I work in the Java world, the more I’m learning that that’s not the default approach. To some extent you’re swimming against the tide if you wanted to build a ’shared-nothing’ Java web app.

    I’m working on some basic Grails stuff right now, and find that it’s rather a memory hog (relative to a PHP app) just to get started. Yes, perhaps it might ’scale’ a bit better than a PHP app (though probably not now since it’s so early in its life), but for most people wanting to start off (remember, most large scale apps started off small) in dev mode, the requirements are moderately high (gets in to cost again!)

    While a couple years old, I suspect the concept hasn’t changed much in that time frame: http://www.killersites.com/blog/2005/java-hosting-is-kicking-my-ass/

  17. 17 The Bull

    @Michael: Having to buy a dedicated box for java apps is not a necessity, just that Java hosting itself _can_ be more expensive, but that doesn’t necessarily mean that Java itself is expensive. If you look at a corporate environment they are buying boxes anyway regardless of what they are going to run. The bigger the site becomes, and hence the more they have to scale, the fewer boxes in general it seems it takes to run the same performance.

    I think comparing Java memory consumption can be a bit misleading as the JVM is going to grab its heap and run inside that, under one process. I may be wrong, but from what I recall PHP isn’t a single process execution model, and unless optimized is interpreted every execution. Can someone confirm that?

    The one beauty of Grails is that you can get the speed of development that Rails provides while having the full power of Java at your disposal should you need it.

  1. 1 A good point on scaling Java at Thinking Outloud
  2. 2 Java Hosting In People’s View: Java Jams - Gregory Page on County.. » Host News . biz
  3. 3 Web Hosting Providers Directory
  4. 4 Classiest Posts about Java (10-12-2007) | spokedweb.com
  5. 5 Web Hosting Reviews, Web Site Hosting
  6. 6 Software Development Guide
  7. 7 Shared Hosting Resources

Leave a Reply