Everything is awesome

My kids watched the new Lego movie today and spent the rest of the day repeating "Everything is amazing". I spent a few hours reading MongoDB documentation to help a friend who uses it. Everything wasn't awesome for all of us. I try to be a pessimist when reading database documentation. If you spend any time near production then you spend a lot of time debugging things that fail. Being less than optimistic is a good way to predict failure.

One source of pessimism is database limits. MongoDB has a great page to describe limits. It limits index keys to less than 1025 bytes. But this is a great example that shows the value of pessimism. The documentation states that values (MongoDB documents) are not added to the index when the index key is too large. An optimist might assume that an insert or update statement fails when the index key is too large, but that is not the specified behavior.

As far as I can tell, prior to MongoDB 2.5.5 the behavior was to not add the document to the index when the indexed column exceeded 1024 bytes. The insert or update would succeed but the index maintenance would fail. Queries that used the index after this can return incorrect results.

A quick search of the interwebs shows that people were aware of the problem in 2011. I can reproduce the problem on my Ubuntu 12.10 VM. Why do we tolerate problems like this? Maybe this isn't a big deal and the real problem is that new risks (from a system I don't know much about) are worse than risks in software that I have been using for years. But corruption (either permanent or via incorrect query results) has been a stop the world bug for me -- as in you do nothing else until the problem has been fixed. Why have MongoDB users tolerated this problem for years?

While it is great that the bug appears to have been fixed, database vendors should understand that FUD takes much more time to go away. See all of the stories about transactions and MySQL that lived on long after InnoDB became viable. And note that some of the MySQL FUD was self-inflicted -- see the section of the MySQL manual on Atomic Operations.

Found this code in 2.4.9 to explain how key-too-large is handled. A careful reader might figure out that index maintenance isn't done. Nice optimization. I spoke to one user who doesn't like the old behavior but doesn't want to break apps with the new behavior that fails inserts/updates with too large keys. Indexes on a prefix of a key would help in that case.

template< class V >
void BtreeBucket<V>::twoStepInsert(DiskLoc thisLoc,
                                   IndexInsertionContinuationImpl<V> &c,
                                   bool dupsAllowed) const

        if ( c.key.dataSize() > this->KeyMax ) {
            problem() << "ERROR: key too large len:" << c.key.dataSize() 
                            << " max:" << this->KeyMax << ' '
                            << c.key.dataSize() << ' '
                            << c.idx.indexNamespace() << endl;
            return; // op=Nothing
        insertStepOne(thisLoc, c, dupsAllowed);

/** todo: meaning of return code unclear clean up */
template< class V >
int BtreeBucket<V>::bt_insert(const DiskLoc thisLoc, const DiskLoc recordLoc,
                              const BSONObj& _key, const Ordering &order,
                              bool dupsAllowed,
                              IndexDetails& idx, bool toplevel) const
        guessIncreasing = _key.firstElementType() == jstOID && idx.isIdIndex();
        KeyOwned key(_key);

        if ( toplevel ) {
            if ( key.dataSize() > this->KeyMax ) {
                problem() << "Btree::insert: key too large to index, skipping "
                << idx.indexNamespace() << ' ' << key.dataSize()
                << ' ' << key.toString() << endl;
                return 3;