Wednesday, May 2, 2012

"NoSQL" Databases - (MongoDB)

I have recently become a big fan of the "NoSQL" databases.  When I first read about them, I thought that the entire concept seemed a bit like fantasy land because of its simplicity and elegance.  It seemed too good to be true.  However, there are already thousands of world-class applications using these very real databases.  In fact, companies like Netflix, Facebook, and Twitter (to name a few) make extensive use of NoSQL.  These three alone make use of Apache Cassandra in particular.


ThoughtWorks has an excellent introduction to NoSQL databases that I recommend to anyone that wants to learn more about the subject.  The most important question to ask of any new tool or technology is: 'How is this thing going to help me do my job better or faster?'  


Taken directly from the ThoughtWorks post...

Why would you want to use a NoSQL database?
One of fundamental drivers is that you have challenges in your business that are difficult to solve using traditional relational database technology. If you have an excellent relational model running on a mature database that provides all the features you need then there is probably little need to change your data storage mechanism. Here are some use cases where it is sub-optimal to use a conventional database: -
  • Your relational database will not scale to your traffic at an acceptable cost
  • Your data is supplied in small updates spread over time so the number of tables required to maintain a normal form has grown disproportionally to the data being held. Informally if you can no longer print your ERD on an A3 piece of paper you may have hit this problem or you are storing too much in a single database.
  • Your business model generates a lot of temporary data that does not really belong in the main data store. Common examples include shopping carts, retained searches, site personalisation and incomplete user questionnaires.
  • Your relational database has already been denormalised for reasons of performance or for convenience in manipulating the data in your application.
  • Your dataset consists of large quantities of text or images and the column definition is simply a Large Object (CLOB or BLOB).
  • You need to run queries against your data that do not involve simple hierarchical relations; common examples are recommendations or business intelligence questions that involve an absence of data. For the latter consider "all women in Paris who do have a dog and whose ex sister-in-laws have not yet purchased a paperback this year" as a contrived example, "all people in a social network who have not purchased a book this year who are once removed from people who have" is a real one if you want to target advertising on a site that says "Fred bought X".
  • You have local data transactions that do not have to be very durable. For example "liking" items on websites: creating transactions for these kind of interactions are overkill because if the action fails the user is likely to just repeat it until it works. AJAX-heavy websites tend to have a lot of these use-cases.


There are two of these bullet points that I find interesting in the context of ScrumTime 1.0.  The next release of ScrumTime will have extensive support for attaching artifacts to nearly all object types.  For example, a Release may have a Statement of Work document uploaded and associated with it.  Also, a Story may be associated with a few photographs of a whiteboard discussion for which it relates.  Therefore, the ScrumTime dataset will consist of large quantities of binary objects.  Also, the collaborative nature of ScrumTime is an excellent fit for AJAX as was shown in release 0.9.  The 1.0 release will use AJAX in a similar way that Facebook and Google use it to promote team collaboration and communication in so far as indicating another user is online.

Due to the fact that the purpose of ScrumTime is to provide a usable open source agile project management tool to the masses, it stands to reason that providing a simple setup and configuration to get running quickly is a high value requirement.  Therefore, when considering the use of both a relational database and a NoSQL database to solve all of the data storage needs of ScrumTime, I have to think about the impact to the guy or girl that deploys ScrumTime.  If that person has to setup two databases, they are not going to be happy with me.  I could write a setup application to configure both databases, but that effort takes away from adding value to the ScrumTime product itself.  Therefore, if all of the features of ScrumTime may be implemented in one database, all the better.


In the past week, I have implemented the ScrumTime domain models in a NoSQL implementation called MongoDB.  I also tried CouchDB and RavenDB.  I like all three of them, but there is no standard querying syntax between them.  This had me wondering which of the three would be the best choice for ScrumTime since none of them seem to follow a standard convention.  But, what is a standard convention...well, it is usually dictated by the most popular choice.  As a result, I decided to research the number of open jobs on the market in the United States that mentioned either of the product names.  The winner by far was MongoDB which tells me that it is currently the most popular.  I was also able to find numerous options for language specific database drivers and lots of documentation for MongoDB.  So, MongoDB is the database choice for ScrumTime 1.0.

If you have not had a chance to look at NoSQL, please make time.  It is not all hype.

No comments:

Post a Comment