Robert Haas: Why The Clock is Ticking for MongoDB

Wednesday, April 16, 2014

Why The Clock is Ticking for MongoDB

Last month, ZDNet published an interview with MongoDB CEO Max Schireson which took the position that the document databases, such as MongoDB, are better-suited to today's applications than traditional relational databases; the title of the article implies that the days of relational databases are numbered. But it is not, as Schireson would have us believe, that the relational database community is ignorant of or has not tried the design paradigms which he advocates, but that they have been tried and found, in many cases, to be anti-patterns. Certainly, there are some cases in which the schemaless design pattern that is perhaps MongoDB's most distinctive feature is just the right tool for the job, but it is also misleading to think that such designs must use a document store. Relational databases can also handle such workloads, and their capabilities in this area are improving rapidly.

Let's look at his example of entering an order into a database. In this example, it is postulated that the order is split between 150 different relational tables, including an order header table, an order line table, an address information table and, apparently, 147 others. Relational databases do encourage users to break up data across multiple tables in this way, a process called normalization. But not for no reason. Storing every order in one large document may be ideal if all access will be strictly by order number, but this is rarely the case. When a user wants to run a report on all orders of one particular product, an index on the order line table can be used to efficiently find and retrieve just those order lines. If all order data is lumped together, the user will be forced to retrieve the entirety of each order that contains a relevant order line - or perhaps even to scan the entire database and examine every order to see whether it contains a relevant order line.

Of course, like any good thing, normalization can be overdone. Few database schemas are so heavily normalized that a simple order entry task touches 150 tables, but if yours is, you may well wish to consider denormalizing. But you need not go so far as to denormalize completely, as Schireson appears to advocate. Instead, you should determine what degree of normalization will best meet your current and future business needs. Seasoned relational database professionals understand the trade-offs between normalization and denormalization, and can help companies make good decisions about when and how to normalize. Schireson appears not to understand this trade-off, or else understands it but advocates for total denormalization anyway because that is the only paradigm his product can support.

Schireson also mentions another advantage of document stores: schema flexibility. Of course, he again ignores the possible advantages, for some users, of a fixed schema, such as better validity checking. But more importantly, he ignores the fact that relational databases such as PostgreSQL have had similar capabilities since before MongoDB existed. PostgreSQL's hstore, which provides the ability to store and index collections of key-value pairs in a fashion similar to what MongoDB provides, was first released in December of 2006, the year before MongoDB development began. True JSON capabilities were added to the PostgreSQL core as part of the 9.2 release, which went GA in September of 2012. The 9.4 release, expected later this year, will greatly expand those capabilities. In today's era of rapid innovation, any database product whose market advantage is based on the format in which it is able to store data will not retain that advantage for very long.

The advantages of relational databases are not so easily emulated. Relational databases allow complex transactions that affect multiple records, synchronous commit so that each transaction is guaranteed to be durable on disk before the client is notified that the commit has succeeded, support not only for JSON but also for other complex datatypes such as XML and geospatial data types, mature query optimizers that not only support combining data from multiple indexes (which Schireson mentions as a forthcoming feature; PostgreSQL added that capability in 2005) but also the ability to combine data for multiple tables via joins. While some MongoDB users may not require any of these features, many will, and I believe that MongoDB will find that adding these features to a product that natively supports JSON is much harder than adding JSON support to a product that already possesses these - and many other - enterprise features.

This is not to deny that MongoDB offers some compelling advantages. Many users have found that they can get up and running on MongoDB very quickly, an area where PostgreSQL and other relational databases have traditionally struggled. And the sharding capabilities of MongoDB are clearly useful to some users, but the process of scaling out is not as transparent as the documentation might imply, and sometimes goes badly wrong. In the end, a large, complex-system requiring continuous uptime typically requires that the application developer and DBA work together and have very specific knowledge of which data is stored where. Auto-sharding may succeed in hiding the complexity from the user in some cases, but it does not eliminate it.

In short, I don't expect MongoDB, or any similar product, to spell the end of the relational database. Rather, I think it's likely that PostgreSQL and other database engines will continue to innovate, providing many of the features that have caught the imagination of developers who are now choosing NoSQL engines; and that NoSQL systems will struggle to add features which relational databases have had for years.

66 comments:

AnonymousApril 16, 2014 4:48 PM
I fully agree with your point of view. The whole NoSQL vs. relational debate reminds me much of the "relational" vs. "object oriented" debate in the late 90s where every vendor of an object-oriented database claimed that relational database will be gone in 10 years.

I think every tool has a problem domain where it fits and also has many domains where it does not make sense to use it. Claiming that NoSQL (or documented oriented systems) can cope with every problem domain is just as wrong as claiming that relational databases can cope with problem domain (but I to think there are more domains that can make use of a relational DB than there are for document DBs)
ReplyDelete
Replies
AnonymousApril 16, 2014 5:44 PM
Hi Robert, a single "order" document, as described in your example, can have multiple secondary indexes (similar to Postgres) in MongoDB. Using an index to search on Order location, type, amount, etc. would all be accomplished with additional secondary indexes. I think you are confusing MongoDB with a key/value store. Postgres is a better option for truly relational data, MongoDB will be a great option for a lot of other use cases.
ReplyDelete
Replies
Lawrence KestelootApril 16, 2014 6:09 PM
I used MongoDB for two years on several projects. In each project the schema started out denormalized, and as the requirements grew, the schema evolved to be increasingly normalized, until we ended up with what we would have had in a relational database anyway. The main disadvantage of denormalized data is that ad-hoc and unanticipated queries are hard, which is especially damaging in a start-up situation where the needs of the app are not well understood ahead of time.

In MongoDB's defense, I will say that setting up a replica is trivial. Just launch it and seconds later the data has been fully replicated. It's magical.
ReplyDelete
Replies
Ian BarwickApril 16, 2014 9:30 PM
A case in point:

Internet-of-stuff startup dumps NoSQL for ... SQL?
ReplyDelete
Replies
Tomáš VondraApril 17, 2014 3:42 AM
An article about how MongoDB is great and everything else sucks, from a MongoDB CEO. Who would expect that ... ?
ReplyDelete
Replies
AnonymousApril 17, 2014 6:13 AM
I am using postgres since 7.x.

The really cool feature in mongodb is how simple it is to get started. This might be just a packaging thing in ubuntu, but it is a point for mongo

MongoDB: just type mongo, insert something, query something
Postgres: log in as postgres to create a user, create database, then log in as user, create table, insert something, query something

This missing security in mongo is a benefit to try it out. For production I need to tune (mongo|postgres) anyway.
ReplyDelete
Replies
AnonymousApril 17, 2014 9:54 AM
I personally find the title of this article misleading.
ReplyDelete
Replies
UnknownApril 17, 2014 10:45 AM
I love how the NoSQL fanboys just throw away 30 years of software development knowledge in favour of 'Yeah, but it's *really* easy to set up!'
ReplyDelete
Replies
FlipperPAApril 17, 2014 11:06 AM
The key, the whole key, and nothing but the key, so help me Cod. Aim for level 3, denormalize when performance dictates to do so! :)
ReplyDelete
Replies
Mark ThienApril 17, 2014 11:45 AM
the CEO of MongoDB just try to be foolish, forgive him as "stay hungry, stay foolish"
ReplyDelete
Replies
Mark ThienApril 17, 2014 11:46 AM
if a women sell u a flower, will she tell u that the flower stink?
ReplyDelete
Replies
Adriano AlmeidaApril 17, 2014 12:42 PM
Robert, when you say: "In today's era of rapid innovation, any database product whose market advantage is based on the format in which it is able to store data will not retain that advantage for very long.", I can see your point, but I don't fully agree with you.

Graph databases, for instance, seems like a very good alternative to relational dbs in order to work with data with many relationship levels.

But I still think that 80% or more of the use cases we have nowadays are easily handled by a relational db. And for some exceptional cases, you must look carefully for which other alternative better fits your need, not relying on "my product works better for every scenario".

Anyway, great post.
Cheers
ReplyDelete
Replies
UnknownApril 17, 2014 1:41 PM
Discrediting NoSQL database systems because they don't compare apples to apples to a relational model is like discrediting Microsoft Word as a valid spreadsheet creation tool.

I can just imagine the review. "We tried to convert all of our worksheets into this trendy new 'word document' and we found that fundamental things were a real problem to do, like referencing a calculated cell. I mean it was great for things that didn't have numbers... and required fancy formatting, but eventually ALL our documents had numbers and then... Finally, after much expense, we decided to get rid of the 'fluff' and go back to what we are comfortable with... Microsoft Excel. At least there we can make a table if we want to and do our basic calculations."

The technologies are not mutually exclusive. This problem was the very first thing I remember about what the MongoDB folks said at the Chicago conference this year. I think it went something like this: "If you need the following things from your database: ACID, transactions, defined entites and relationships, normalized data; you are in the wrong place. We are not here to replace relational database systems, they still have a purpose. We are here to fill a need for an accessible unstructured data store that can scale. "
ReplyDelete
Replies
MizchiefApril 17, 2014 2:04 PM
I think the opinions of the SQL-clingers is that they don't look past the database when it comes to application development. They model the data only looking at the entity's structure, not how it will actually be fetched and stored.

NoSQL allows the application to control the schema and organize the storage in the same structure it is consumed. This eliminates ORM's and an code generators to translate the data into a useable format.

The author likes to talk about how you may want to pull ad-hoc reports etc. but with any high-volume system, you can't report off of the transaction system anyway without bringing down your client sites, and end up having to create complicated ETL's to push the data into separate reporting environments anyway.

ReplyDelete
Replies
Robert YoungApril 17, 2014 3:55 PM
If all one wants is ease of setup, and all coding (transactions, etc.) in some client code, then any of the NoSql candidates will do. They're just slightly glorified files, after all. Anyone remember dBII? As one of the comments said, eventually you end up writing your own, bespoke, CICS if you actually care about the data. And the notion of an extra hour to setup a database engine as material to development of an application is silly beyond belief.
ReplyDelete
Replies
AnonymousApril 19, 2014 1:59 AM
"If you need the following things from your database: ACID, transactions, defined entites and relationships, normalized data; you are in the wrong place...

Huh, who doesn't want ACID for their database ?
ReplyDelete
Replies
Lukas EderApril 19, 2014 6:42 AM
Well, Mark Madsen's history of databases in "no-tation" kind of hints where the trends might be going. PostgreSQL might indeed be one of the leaders at the end of this decade
ReplyDelete
Replies
AnonymousApril 19, 2014 7:45 AM
The degree of ignorance in this is stunning. When you have equal knowledge and experience of RDBMS of your choice and NoSQL come back and compare them. The developers and creators of MongoDB built and sold a multi billion dollar company on RDBMS, MongoDB kernel developers have worked on and for MSSQL, Oracle and MySQL. Actually they may have some insight into how to build a new database to meet the challenges relational DBs are sub optimal for.
ReplyDelete
Replies
UnknownApril 19, 2014 8:09 AM
Competition is great, it pushes every one to innovate, and the result is that difference between a relationnal DB an and noSQL can be very thin.

As a postresql expert, you probably know the JSON(b) store built in psql 9.4 (and before that the hstore) that let you mix relationnal and non relationnal data in a painless way, keeping the full power of relations when it's usefull, and combining it with other postgresql extension like postgis.
ReplyDelete
Replies
UnknownApril 19, 2014 2:02 PM
in reply to all the posts that propose nosql "document" system == file system storage, then you don't get it indeed. try tuning your file system to allow the use of a json formatted query to return documents that contain certain elements that exist in petabytes of data located across hundreds of hosts potentially in many different folders and tell me how that goes for you. for that matter, try that in a rdbms system… find all the rows (no matter what structure) in all tables that contain specific elements across a distributed db scaled to the multi-petabyte range. of course then you might say, something like 'I consistently don't get the difference between petabytes and megabytes' at which point I would probably realize I'm wasting my time.

Good post Robert, this for me was not a waste of time tho, and I really appreciate your view on MongoDB.
ReplyDelete
Replies
Scott BlodgettApril 19, 2014 6:24 PM
I am surprised that no one has mentioned Cassandra. Does it not belong in this conversation?
ReplyDelete
Replies
AnonymousApril 19, 2014 7:33 PM
I'd like to replicate from live models into historical documents for transparent reporting of both in a consistent manner without having to hand-role my App to talk to an ACID MySQL cluster for the live models and a GraphDB for historical data.
ReplyDelete
Replies
UnknownApril 20, 2014 3:47 AM
Give me anders async driver option and p lease Let me worry about the store to use
ReplyDelete
Replies
AnonymousApril 27, 2014 2:10 PM
A lot of the commenters seem upset at the hyperbole used by Max Schireson. Compared to the various CEOs of Microsoft, or Larry Ellison of Oracle, I think the MongoDB folks are still very much ahead in terms of polite behavior. I could write a thousand articles like this one about the RDBMS world's CEOs saying much less believable drivel than Max. Since everyone seems to be keeping score.
ReplyDelete
Replies
SameerMay 01, 2014 12:00 PM
I do agree that RDBMS and for that matter open source RDBMS like PostgreSQL have been innovating and providing lot of additional features w.r.t. to data format. But consider this- word allows you to store data in tabular format and still we can not let go our excel sheets.

IMHO Max Schireson was not implying that "we must replace RDBMS with MongoDB" or "we must always consider MongoDB over RDBMS". I think all he wanted to say was that for certain requirements MongoDB would be a better choice.

I strongly believe RDBMS have a long life to live and the world will always have applications which can not be designed without RDBMS.
ReplyDelete
Replies
UnknownMay 13, 2014 11:07 PM
Truthfully it sounds like Max is saying "MongoDB is a much better alternative to doing RDBMS horribly wrong"

..and after fully reading Max's commentary in the ZDNet article, I see that he markets Mongo by weaving bits of truth with utter fallacy; who would put an object in 150 different places with the expectation of reassembling it fast? _yes_, with a RDBMS you must commit/put structure to your objects before storing them.. but *every* time you go to query Mongo you've got to put structure to the data in the query itself

A tool is only as effective as he who wields it, and many experienced people still (and will continue to) use RDBMSs for a reason.
ReplyDelete
Replies

Add comment