Schema-Less is (Usually) a Lie

Peo­ple are fond of cat­e­go­riz­ing trendy data­base engines by what they’re not … “schema-less” is just one exam­ple.

Just because a tool is not rigid about the type or amount of detail that needs to be applied to a component does not make it component-less. Specifically, just because the notion of a pre-defined schema is not required to be applied to MongoDB when creating a collection does not mean that MongoDB (or the class of NoSQL data stores in general) is schema-less.

Any application of significance will benefit from the development of a data model so that it is, among other things, understood what data is needed to meet the needs of the application and how that data is going to used by the application at hand as well as any future applications.

I spent significant time in my early career doing application design and development with the PICK data model. PICK, like many of today’s NoSQL stores is typeless and does not require a rigid system level definition of what data is stored. Unless a predecessor was kind enough to have documented an application’s data model, either formally or by way of comments in the application source code, coming in to an unfamiliar environment to do bug fixes or application enhancement means spending hours of software archeology reverse engineering what is stored, where. If the source code is not available, the task becomes that much more difficult.

The bottom line of all this is, there is no such thing as a schema-less data store; every data store has a schema defined some­where. A primary dif­fer­en­tia­tor between data­ stores lies in where (and with how much formality) the schema for the data being managed is defined. The continuum for this is almost always which bits of the schema live in appli­ca­tion code vs. the data­base engine.

SQL data­bases typically have very strong schema definition requirements and thus provide sup­port for schema definition, management and enforcement in their core engine. Most key-value pair type data stores (Riak, Cassandra, etc.) push any schema definition and / or enforcement to the applica­tion layer.

Document type data stores (Mon­goDB, MarkLogic, xDB, etc.) have struck a bal­ance between the rigid schema definition and enforcement of SQL databases and the wild-wild west of key-value pair data stores. While they typically offer the ability, in the database engine, to define a schema for the data being stored, enforcement of any schema definition by the database core is lackadaisical at best and left to the application logic. Worse, it is oft possible and too common for there to exist multiple and contradictory schema definitions.

Schema-Less is (Usually) a Lie