Architecting an Amazon S3-based application

Defining the architecture for my social network style app without using a relational database is intriguing, but not overly difficult. In my last post, putting my non-RDBMS idea to the test, I gave a brief introduction to how I am going to use Amazon’s S3 service in lieu of using a relational database for content storage. Today I will cover that architecture and the tools I’m going to use.

High Level Architecture

The high level architecture is pretty simple. The application layer will obviously control user actions, an in-memory cache will be used along with a search engine and S3 will be the datastore.

As much as possible I will stay away from using any session data, instead retrieving everything from the cache. A simple cookie will be used for keeping track of the user that is visiting the site, but it will not keep user data in session.

Technologies

The application is Java based, with Stripes used for the controller layer, the JetS3t library used for communicating with the Amazon S3 service, Solr used for the search engine and JCS for the cache.

Search Engine

The Solr search engine, which is built on top of Lucene, will be performing double duty here. It is primarily the search engine, but since it can store the actual content itself, it will also act like a cached database for searches. In other words, instead sending a search request to Solr which could return the IDs of the items making the search, then going to the cache layer for actual data, I can have Solr return the whole data itself, eliminating the need for a second call for data.

There are a few things that I really like about Solr over using Lucene directly. First, is that it is a web app with REST-like API. This makes it really easy to use and allows me the future option of partitioning the search engine physically away from the main application. Second, it supports a schema for the data which pretty nice and allows a unique key to be added for each entry. Third, they extend the query syntax to support results ordering, filtering and some other options that are handy. Fourth, it has built in scaling options such as replication.

Cache

I’ll be using JCS initially as a simple in-memory cache. Much like Solr, going this route affords me the later option of partitioning the cache as a separate instance with a lot of options for replication, cache size, etc.

Like most cache-based set ups, the application will first check the cache for the requested data and if it doesn’t hit, then it goes to the datastore.

Datastore

Obviously in this scenario, the Amazon S3 service will be used as the primary datastore. With this I get the benefits of their service for reliability of data. My initial tests show that it is pretty fast for a web service, which means that when the app does have to go to it for data it won’t be bogged down.

So that is the basic architecture for the application. Fairly simple huh? In my next installment I’ll cover some more details of the implementation.



Don't miss anything, subscribe!

Did you enjoy this post? Why not leave a comment below and continue the conversation, or subscribe to my feed and get articles like this delivered automatically to your feed reader.

Comments

No comments yet.

Leave a comment

(required)

(required)