Skip to main content

Drupals SQL Slowness Problem

Drupal Rants

The problem exists from Drupal reliance on Node functionality. I am going to explore an alternative to using Nodes and Content Types for everything.

Drupal is the greatest Content Management Framework in my opinion due to the following: Content Types, Views, View Modes and Fields. Couple these together with Panels or Display suite and the Hook System you have one the most powerful systems i've worked with. With Drupal you can build powerful websites efficiently and effectively. But once you begin start creating a large amount of Entities Drupal starts to slow down. You can remedy this by integrating Memcache, Varnish, and a PHP caching mechanism (OP or APC) but even these techniques only go so far. For example Memcache and Varnish only work while data is cached so you can still end up with slow and Varnish doesnt play well in some scenarios such as when a user is logged in.

What makes Drupal so slow? There may be several reasons but one I am talking in particular is an issue with the amount of tables Drupal requires for its FIeld API. Each time you create a new field it creates two new tables that hold data. So when you get a large amount of data you are inserting a record in each field table, and the node table. So if you pulled in a million rows of data for a node with 10 fields, you now have 21 or more tables with a million records. Not only does this make the database large but it makes it difficult to query the data. Note I am estimating this data because the fields have a field_data and a field_revision and I am not exactly sure how those records a preserved in the revision table so just know I am simply speculating this for an example.

Joins. Joins are one of the most expensive operations that you can perform in SQL. And when you have a large data set in many rows and tables joins will slow down Drupal significantly. If you want an example go into the views setting and turn on SQL output and take a look at some of the queries that run.

Drupal stores all nodes in the node table. My theory is that since content types are dynamic it would be impossible to create and store a column for each new field in this table. So the alternative was to create a table for each new field. If you look at other frameworks the opposite approach is taken. In Symfony each "Content Type" aka "Entity" is its own encapsulated object with its own table. Each field has its own column in its entity table and only differs with joining tables for multiple associations. This speeds up the process significantly by limiting the amount of Joins needed.

You may be thinking, I am making a website why does this even matter? The beauty about Drupal is it works well and you may never run into a data problem with Drupal. The problem arises when pulling in large data sets and working with data via Drupal's API. I do envision Drupal as a "website only" platform but also as a applications and business applications framework. The tools and contrib modules allow for a powerful business application platform that many do not initially see as something built in Drupal. People that are building reporting applications, business applications, frameworks, APIs should not disregard Drupal as an option.

With all this complaining I want to offer a solution that would require an slight different approach in how Drupal can be used.

Entity Construction kit. One of the first Business Applications I created I found myself manually creating entities in code, properties, and administration. Come to find out there was a tool that could do this for me. This was a game changer. We actually had entities in their own table that was fully capable of integrating within the Entity API and made sense from an Object Oriented prospective. 

This is the module that I currently use to do all of my business applications. I only use nodes whenever there is specific modules that require it to be a node, or the content needs some of the node extras that are built in (promoted to front page, menu etc). This works but doesnt solve our data issue because the entity construction kit still relies on fields for leveraging the Field API and automatic form building. Where it takes a step in the right direction is you can add properties to an entity from inside Drupal which creates a column in the entity table. The problem with this approach is you lack the ability to integrate with forms and form widgets without manually hooking into the Entity form.

A real life example is recently created a data reporting application which integrates with a popular marketing API that pulled in close to a million rows of data from January 1st, 2017 to now. I then developed a page which implements data from Date A to Date B and displays the data accordingly. I leveraged the Entity API and Entity Metadata Wrappers for displaying data but the page was taking over 2 minutes to load and the database had grown to 10+ gigabytes. It faster once I added Memcache but it was still averaging over 15 seconds to load, and 2 minutes whenever the data was not cached. I decided to take a different approach and since all the data was pulled via an API added properies on the entities and removed all the fields. My database shrunk in size and I was then able to cache almost all the records in Memcache with no problems. The load time went down to about 1-2 seconds. This made me think how great it would be to have the Field API readily available for Properties as well. This would be the new game changer in Drupal.

I want to mention this article is not meant to be a knock on Drupal because for most people Drupal works exactly how they need. But in some instances Drupal gets slow, there is no denying that. And I envision Drupal not simply as a platform to build websites, but a platform to build Applications. I also appreciate all the hard work from developers who made the Entity Construction Kit,Content Construction Kit, and Drupal possible and will be looking to contribute more to that effort in the future. I don't have all the answers but just want to give my two cents.