For all its popularity and success, SQL is a study in paradox. It can be clunky and verbose, yet for developers, it is often the simplest, most direct way to extract the data we want. It can be lightning quick when a query is written correctly, and slow as molasses when the query misses the mark. It’s decades old, but flush with new, bolted on features.
These paradoxes don’t matter because the market has spoken: SQL is the first choice for many, even given newer and arguably more powerful options. Developers everywhere—from the smallest websites to the biggest mega corporations—know SQL. They rely on it to keep all their data organized.
SQL’s tabular model is so dominant that many non-SQL projects end up adding an SQL-ish interface because users demand it. This is even true of the NoSQL movement, which was invented to break free from the old paradigm. In the end, it seems, SQL won.
SQL’s limitations may not be enough to drive it into the dustbin of history. Developers may never rise up and migrate all their data away from SQL. But its problems are real enough to generate stress, add delays, and even require re-engineering for some projects.
Here are thirteen reasons we wish we could quit SQL, even though we probably won’t.
13 ways SQL makes things worse
- Tables don’t scale
- SQL isn’t JSON- or XML-native
- Marshalling is a big time-sink
- SQL doesn’t do real-time
- JOINs are a headache
- Columns are a waste of space
- Optimization only helps sometimes
- Denormalization treats tables like trash
- Bolted-on ideas can wreck your database
- SQL syntax is too fragile, yet not fragile enough
- Not everything is a table
- SQL is not so standard
- There are better options
Tables don’t scale
The relational model loves tables, so we just keep building them. This is fine for small or even normal-sized databases. But the model starts to break down at truly large scales.
Some try to solve the problem by bringing together old and new, like integrating sharding into an older open source database. Adding layers might seem to make the data simpler to manage and offer infinite scale. But those added layers can hide landmines. A SELECT
or a JOIN
can take vastly different amounts of time to process depending on how much data is stored in the shards.
Sharding also forces the DBA to consider the possibility that data may be stored in a different machine, or possibly even a different geographic location. An inexperienced administrator who starts searching across a table may get confused if they don’t realize the data is stored in different locations. The model sometimes abstracts the location away from view.
Some AWS machines come with 24 terabytes of RAM. Why? Because some database users need that much. They have that much data in an SQL database, and it runs much better in a single machine and a single block of RAM.
SQL isn’t JSON- or XML-native
SQL may be evergreen as a language, but it doesn’t play particularly well with newer data exchange formats like JSON, YAML, and XML. All three support a more hierarchical and flexible format than SQL does. The guts of the SQL databases are still stuck in the relational model with tables everywhere.
The market finds ways to paper over this common complaint. It’s relatively easy to add a different data format like JSON with the right glue code, but you’ll pay for it with lost time.
Some SQL databases can encode and decode more modern data formats like JSON, XML, GraphQL, or YAML as native features. But on the inside, the data is usually stored and indexed using the same old tabular model. The JSON formatting is just a facade that may make the developer’s life easier, but could also hide the conversion costs.
How much time is spent converting data in and out of these formats? Wouldn’t it be easier to store our data in a more modern way? Some clever database developers continue to experiment, but the odd thing is, they often end up bolting on some kind of SQL parser. That’s what the developers say they want.
Marshalling is a big time-sink
Databases may store data in tables, but programmers write code that deals with objects. It seems like much of the work of designing data-driven applications is figuring out the best way to extract data from a database and turn it into objects the business logic can utilize. Then, the data fields from the object must be unmarshalled by turning them into an SQL upsert. Isn’t there a way to leave the data in a format that’s just ready to go?
SQL doesn’t do real-time
The original SQL database was designed for batch analytics and interactive mode. The model of streaming data with long processing pipelines is a relatively new idea, and it doesn’t exactly match.
The major SQL databases were designed decades ago when the model imagined the database sitting off on its own and answering queries like some kind of oracle. Sometimes they respond quickly, sometimes they don’t. That’s just how batch processing works.
Some of the newest applications demand better real-time performance—not only for convenience but because the application requires it. Sitting around like a guru on a mountain doesn’t work so well in the streaming world.
The newest databases designed for these markets put a premium on speed and responsiveness. They don’t offer the kind of elaborate SQL queries that can slow everything to a halt.
JOINs are a headache
The power of relational databases comes from splitting up data into smaller, more concise tables. The headache comes afterward.
Reassembling data on the fly with JOINs is often the most computationally expensive part of a job because the database has to juggle all the data. The headaches begin when the data starts to outgrow the RAM.
JOINs can be incredibly confusing for anyone learning SQL. Figuring out the difference between the inner and outer JOINs is only the beginning. Finding the best way to connect several JOINs is even worse.
Columns are a waste of space
One of the great ideas of NoSQL was giving users freedom from columns. If someone wanted to add a new value to an entry, they could choose whatever tag or name they wanted. There was no need to update the schema to add a new column.
SQL defenders see only chaos in that model. They like the order that comes with tables and don’t want developers adding new fields on the fly. They have a point, but adding new columns can be expensive and time-consuming, especially in big tables. Putting the new data in separate columns and matching them with JOINs adds even more time and complexity.
Optimization only helps sometimes
Database companies and researchers have spent a great deal of time developing good optimizers that take apart a query and find the best way to order its operations.
The gains can be significant but there are limits to what an optimizer can do. If the database administrator submits a complicated query, there is only so much the optimizer can do.
Some DBAs only learn this as the application begins to scale. The early optimizations are enough to handle the test data sets during development. But at crunch time, the optimizer hits a wall. There’s only so much juice the optimizer can squeeze out of a query.of a query.
Denormalization treats tables like trash
Developers often find themselves caught between users who want faster performance and bean counters who don’t want to pay for the hardware. A common solution is to denormalize tables so there’s no need for complex JOINs or cross-tabular anything. All the data is already there in one long rectangle.
This isn’t a bad technical solution, and it often wins because disk space has become cheaper than processing power. But denormalization also tosses aside the cleverest parts of SQL and relational database theory. All that fancy database power is pretty much obliterated when your database becomes one long CSV file.
Bolted-on ideas can wreck your database
Developers have been adding new features to SQL for years, and some are pretty clever. It’s hard to be upset about cool features you don’t have to use. On the other hand, these bells and whistles are often bolted on, which can lead to performance issues. Some developers warn that you should be extra careful with subqueries because they’ll slow everything down. Others say that selecting subsets like common table expressions, views, or windows over-complicates your code. The code’s creator can read it, but everyone else gets a headache trying to keep all the layers and generations of SQL straight. It’s like watching a film by Christopher Nolan but in code.
Some of these great ideas get in the way of what already works. Window functions were designed to make basic data analytics faster by speeding up the computation of results like averages. But many SQL users will use some bolted-on feature instead. In most cases, they’ll try the new feature and only notice something is wrong when their machine slows to a crawl. Then they’ll need some elderly DBA to explain what happened and how to fix it.
SQL syntax is too fragile, yet not fragile enough
In the distant past when SQL was born, only humans would write SQL. Now so many systems stitch together queries automatically. That gives naive or malicious users too much power to do bad things.
DBAs quickly learn to avoid reserved words but that doesn’t help the casual user who just might want to use “SELECT GROUP” as a column. And then there’s the wonderful standard solutions for escaping reserved words like “SELECT”. MySQL uses back ticks. PostgreSQL uses double quotes. Just make sure you use the right one for your version of SQL.
To make matters worse, clever attackers can target this weak spot by injecting SQL commands into queries. Instead of just typing their name into a field, the attacker inputs ; DROP TABLE users; DROP TABLE products; DROP TABLE orders;--
. The SQL parser is happy to do what it’s told. After all, it was written in an era when only humans issued the queries.
Not everything is a table
A surprisingly large amount of data fits nicely into tables, but a growing amount of data doesn’t fit neatly. For instance, social networks, hierarchical data, and many scientific phenomena are modeled with graphs. These can be stored in tables but doing anything more than the simplest query becomes complex. And then there’s spatial data in two or three dimensions. At least time series data has only one major axis.
Other data exists in two, three or maybe even multiple dimensions. Tables, though, have only one axis for the rows and a subordinate axis for the various columns. This means that storing two-dimensional data like latitude and longitude is possible, but multi-dimensional calculations like distance isn’t easy. New geographic extensions can patch over it, but the paradigm still limits.
SQL is not so standard
SQL may be an ANSI/ISO standard but that doesn’t mean you can just move it from one standard-supporting implementation to another. You must be thinking of another meaning of the word standard.
DBAs are very familiar with the wide variety of syntactical differences. MySQL uses “CURDATE()
”, Oracle uses “SYSDATE
”, and PostgreSQL uses “CURRENT_DATE
”. SQL Server lets you concatenate strings with a “+
” operator. Others want two vertical lines (“||
”).
The dozens of syntactic incompatibilities are just the start. There are major philosophical differences between the implementations of stored procedures, triggers, and supported functions. Even the foundational data types have nuances in their precision and range.
There are better options
IT teams must often make do with whatever already exists. The best reason that SQL has to go is that we have better alternatives that are more concise, flexible, and readable. GraphQL, for instance, is often found in web applications, where it’s used to ask for just the right combinations of data with a simple pattern. Hierarchical data is naturally supported.
There are already several good options for searching NoSQL databases. Many of the key-value stores just look for matching nodes. Some, like the MongoDB query language (MQL), imitate the popular JSON standard. Developers using some of the document-centric solutions like SOLR or Elastic search can use complex similarity functions.
All of these can support queries that are both more powerful and easier for humans to read and craft. They create possibilities for storing data that isn’t limited to the tables full of rows and columns. What a narrow vision of the world that is.