Do Databases Rot the Mind?
Do databases rot the mind? Charles Petzold’s recent post about Visual Studio triggered this thought in the back of my mind.
All too often when confronted with a “business” problem I, like many other people I know, wind up reaching for my trusty database. Or, in a more common scenario, someone hands you an existing schema and says “solve my problem” using it. The reality, however, is that relational databases are a lousy solution to many problems that we commonly run across. In a recent e-commerce platform that I created there was a 7-way join that had to be executed just to retrieve a list of products to display to a customer. We did LOTS of performance tuning of that database and our queries to make it go about as fast as it could reasonably be expected to go under SQL Server.
Just this week, I decided to do an experiment. I reimplemented the core of our e-commerce platform by loading the entire database into a custom in-memory data structure. It took me about 1 hour to write the code to load the core tables into my custom data structure, and about an additional 15 hours to tune the data structure and refactor the code to my liking. Oh yeah, I wrote it using Ruby.
My app was 201 lines of Ruby code (including liberal use of blank lines to enhance readability). For sake of comparison, just one of the dynamically generated SQL queries that it replaced was in the order of 150+ lines of SQL. And that, of course, doesn’t count all of the lines of C# code required to generate, execute, and parse the results of the SQL query.
One more thing: it also ran 100x faster.
A bit of history: the original application that I inherited was on the order of 20,000 lines of SQL + C#. This was largely due to an enormous amount of duplication of code due to the (ab)use of stored procedures to handle queries. My rewrite of that application trimmed it down to 4000 lines of SQL + C#. My Ruby application, while not at feature parity with the existing application, solved essentially all of the “hard” problems. I would estimate that it duplicates >80% of the functionality of the real application.
The first version of the code was about 5x faster than the original application. I then spent some time looking for optimizations. When you only have about 100 lines of code to look through, optimizations become pretty obvious. I was able to cache the results of a (relatively) expensive O(M+N) algorithm that sat on the rate-determining-step of the computation. This netted me a 20x speedup. This optimization is simply not possible using SQL Server, but is dead-simple when you get to implement your own query engine.
I love the symmetry of my results: 1/100th the code and 100x the performance. It’s also much easier to maintain 200 lines of code than 20,000 lines of code. Testing was also much more straightforward.
Now the code that I rewrote was tailor-made for this kind of rewrite. It was a read-only database that was relatively (< 1GB) small in size. If you have such an application lying around – try rewriting it using in-memory data structures. If you’re a bit more adventurous, try using a dynamically typed language to do this (might I suggest Ruby?). You just might be surprised by the outcome and come away with a new technique in your toolbox by doing so.
Recent Comments