The joke that is dependency management

For the second time in this job I’ve had to deal with maven. So far I’ve avoided it like the plague, but now I have to work on some old code and the company “standard” is to use maven. So, I download the new version, set up my path and attempt to do a simple build. I add one dependency, c3p0, and maven in all its glory can’t find it. Why on God’s green earth does anyone submit themselves to this kind of bullshit? Not to mention the 440+ pom files maven decide it need to download. WTF?

The first time I had to deal with it was with the stupid Eclipse plugin and maven would keep putting in jars I didn’t need or ask for and wouldn’t supply the ones I did ask for. I gave up and wrote an Ant script and everything worked just like I wanted it to.

Seriously, all the dependency management zealots need to see a shrink. By time you setup the stupid pom file, or ivy file if you like, I could have downloaded the jar, dropped it into my lib directory and presto it works. Worse case, you have two libs, one for build only jars and another for build and deploy. That’s it.

Dependency management and tools like maven just underscore the java community’s desire to over build, over architect and generally make things far more difficult than they need to be. In 12 years of doing java I’ve never had that thought that putting jars into a directory was a difficult thing.

“What about different versions of jars?”
Are you kidding me? I upgrade needed jars much like I upgrade to a new JVM or a new Tomcat. Evaluate and if needed replace the jar with the new version. Oh boy, that was so hard I should use a tool to make it harder!

Adding a new jar (excuse me, dependency) is just as painful. I write four lines of XML to describe the dependency, spend several minutes figuring out what repo has the damn thing, and if needed write some more XML to point to that repo. Then pray it works.

Compare that to downloading the jar, dropping it into my existing lib directory and not having to configure anything. Yea, that 6 lines of XML that I wrote in the beginning of my Ant setup telling it where my classpath is still works. Imagine that.



Don’t miss anything, subscribe!

On Git and SVN

Naturally if you search for pros and cons of Git vs. SVN you will find all sorts of compelling arguments on both sides. The vast majority are technical such as distributed vs. central and, IMO, completely miss a big point.

As a bit of background, like most non-microsoft based developers I used CVS and SVN for the better part of my 12 years in development. Then a few years ago when I was working for Zappos, and we moved from perl to java, we eventually moved from SVN to Git. Now at the time I was a bit reluctant, but it was more about user friendliness and such. After using Git there and personally via GitHub, I’ve come to the conclusion that most developers are missing a huge opportunity in source control.

See, I’ve come to realize that SVN is a decent versioning system, but a horrible source control system. The typical cycle in a SVN environment is to use the head branch as your working tree and branch/tag when you have a release. Well anyone who has done development long enough knows the pitfalls of such a setup. To add to that, commits are far fewer and at far longer intervals than when properly using Git. The reason is generally quite simple, as branching rather painful in SVN and a total breeze in Git. It seems that in every SVN environment I’ve worked, SVN was just a glorified backup. A versioned backup if you will. A place where you pushed your code and prayed it didn’t break anything. And heaven forbid if you have to work on multiple things at once and then go back and fix something.

The way we had it setup at Zappos, and to this day I still think is the best setup I’ve ever used at a company, is we had a release manager, and amongst other duties, he maintained the “central” git repo. At first glance this isn’t terribly different than using SVN, however, no developer could commit to the master branch. The master branch represented what was currently in production, and as such it was always the baseline. We used Jira for our ticketing/issue system, and everything we did was a ticket. Nothing too out of the ordinary there.

However, what we did was we branched our local git repo for each ticket that we worked on. This is very important. EACH ticket had its own branch. Some people would faint at such a process, but with Git’s seamless branching it was a joy. It also meant that we could work on many tickets at once without affecting the others or more importantly without breaking anyone else’s. So, when we were complete with a ticket/branch, we pushed it up to a specific repo that was controlled by the release manager (RM). Notice I didn’t say merge. We pushed our branch up, and using Gitorius (a local install), we made a merge request.

From there, we switched branches and worked on something else. When a set of tickets were ready for testing, the RM merged in all the tickets required for that particular QA release and pushed to the QA servers. QA did their thing, and if there were issues, we would switch to that branch, made changes and pushed that branch back up and the cycle repeated until it was cleared. Once it was verified and after it was pushed to production, the RM merged those changes back into master and we would pull and chug right along. We would also removed that particular branch from our local repo as it was no longer needed.

Someone coming from the ‘traditional’ way of doing things would read that and think it was a maintenance nightmare, but it really wasn’t. even with 40+ developers it worked pretty smooth. It also meant that two devs, usually a back-end (like me) and a front end/html dev could work on a ticket together and push back and forth between the two without anyone else being affected. That is the distributed nature of Git at work. It was not uncommon for any one dev to have half a dozen or more local branches going at any one time. It was just so easy to branch, merge and push as needed.

To me, that is real source control and not just source versioning.



Don’t miss anything, subscribe!

What I want from a CDN

I was looking around last night at different CDN solutions out there. Everything from Amazon’s new CloudFront to CacheFly to Limelight and a couple others.

All of them have different features and levels of performance but the one thing I really want to be able to do is on-demand invalidation. Regular TTLs are fine for static items like images, css, js, videos and the like. But what about regular HTML files? Most of today’s websites these days have pages that are dynamic in some way or another. What I would like to be able to do is generate a page, cache it in a CDN and then if/when needed tell the CDN to remove that file immediately. Not hours from now. From what I can tell, none of the solutions I looked at had that ability. CloudFront has a minimum TTL of 24 hours. Limelight is just over an hour. I couldn’t find the info on CacheFly.

Where I find this most appropriate is an ecommerce app. Say you have a product page that you want to cache. Makes sense, but what if you want to be proactive to the customer and let them know when that item is not longer in stock instead of waiting until they add it to the cart? That isn’t a very good customer experience.

I specifically mention ecommerce because one, I’ve worked for a big one and two it has been shown that reductions of mere milliseconds in latency increases revenue.

Surely there is a solution out there that can do this?



Don’t miss anything, subscribe!

Interesting hibernate transaction issue

I’m working on a new Grails based project and came across a transaction issue. The application is an ecommerce app and as such the placing of the order has a lot going on that should all reside in a transaction.

Being a Grails app I decided to put the processing inside a transactional service method. So far so good. The flow looks something like this:

  1. Create billing address
  2. Create credit card
  3. Assign billing address to card
  4. Create shipping address
  5. Loop through the cart items and for each item find stock(inventory) to fulfill each
  6. Mark the stock as sold as applied
  7. Add each order item to the order
  8. Assign order to customer
  9. Save order and customer
  10. That is a little oversimplified, but you get the idea. So, what was the problem?

    The problem came in during the select for the stock. This select would cause a hibernate exception about the address being a transient object. Well that made no sense at all.

    At first I started down the road of abandoning the whole transaction thing and doing it manually. Naturally that was ugly and very error prone, so I did some research to find out what the cause might be. Came up empty.

    My gut said that a select should not cause a problem with unsaved data. Then I thought that maybe hibernate was enforcing some sort of isolation level mechanism. Most of us never bother with database isolation levels (and fewer even know what they are sadly enough), but perhaps that was causing it. So what I did was move all the save() calls from the end to where they were being used.

    In other words I called save on the two addresses and the credit card before the loops on getting and assigning stock.

    Worked like a charm. I now have the whole thing in a transaction as it should be.



    Don’t miss anything, subscribe!

    My new app minus the RDBMS

    I’ve been intrigued for some time about building an application on using a datastore that wasn’t a relational database. This “movement” as it were is being called “noSQL”.

    My application is a search engine service. As such I don’t need much actual data that is stored in a database and I decided to forgo using trusty old MySQL for this job and try something different.

    SimpleDB

    So I decided to go with Amazon’s SimpleDB for this project. It has come a long way since I first looked at it and has all the query capabilities that I need.

    If you aren’t familiar with SimpleDB, it is a schema-less setup where domains act as tables. The schema-less design means that the schema can accommodate new columns as needed. While not really necessary for my particular application, it is a feature.

    What do I gain with SimpleDB? As traffic increases, I can spool up more servers (can you say Amazon’s EC2?) and since they communicate with SimpleDB, I don’t have to worry about a load on a central RDBMS.

    The App

    The app itself is built on Grails. For those familiar with Grails, it uses Hibernate under the covers as part of GORM. Well, I’m not using that obviously for this project. However since my query requirements are pretty minimal, that is not an issue. I only have a few domain objects and in most cases I get them by id.

    Behind the grails app is Solr, the web app front end to Lucence indexing engine.

    Initial development is about done and will be alpha tested behind an existing site that receives ~100k pageviews a month. I’ll give that a month or so and expand into beta testing.



    Don’t miss anything, subscribe!