Using Mysql’s slow query log to pinpoint SQL performance bottlenecks.

Its no secret that Drupal relies on Mysql heavily. It can run dozens or even hundreds of queries on a single page. Generally Drupal’s philosophy is a large number of database queries per page load is fine as long as the queries are fast. This usually works fine until your one or more queries are slow. How do you assure you do not have slow queries in your application? During development you can use Drupal’s devel module to output a list of queries that run on a given page. You can set the list to sort by query time. There is no magic number for determined how slow is too slow but perhaps queries taking a half second or more may need to be tuned.

Once your site is in production and is experiencing some amount of traffic and server load you may find some queries that were perfectly acceptable in development, qa and testing are now slow. This is where Mysql’s slow query log is extremely helpful.

To enable the slow query log just add these lines to you my.cnf file:
slow_query_log=1
long_query_time=1

This will enable the query log for queries that take one second or more.

We found that in production we had 4 queries that were consistently slow. Three of these queries come from Views module and the fourth comes from Core Search module. We are solving the slow queries from Search module by switching to Solr search. The slow Views queries needed tuning.

Two of the Views queries were what you may call Related content views/queries. They basically take the current node’s taxonomy terms and find the most recent nodes with the same taxonomy terms. Here is what the query that View’s generates looks like (taken directly from the Mysql slow query log):

All those Left JOINS Yikes! First to be fair there are dozens if not hundreds of Views’ derived queries on this site. Only 3 of these queries are consistently slow. Views generally creates very reasonable SQL, but in this particular case it does not. Furthermore, caching is generally not a good strategy in this case as this query run per individual node (in this case news story nodes).

Now that we have identified the slow query what can we do about it? We can start by using Mysql’s EXPLAIN. This gives us insight as to how Mysql executes the query. You can just prepend the word EXPLAIN to the beginning of the query to see output like this

For a detailed explanation of Mysql’s EXPLAIN output see see this post from Percona. In short this is telling us Mysql has to look at 34829 rows to find the answer to this query. That’s way too many. Mysql takes all the rows from the first table/first line of EXPLAIN and then checks them against the next table/ next line in explain and so on though all the tables. This underscores the problem with this query – too many joins! I did not even know you could use JOIN syntax like

LEFT JOIN taxonomy_index taxonomy_index_value_2
ON node.nid = taxonomy_index_value_2.nid
AND taxonomy_index_value_2.tid = '2734'

The key to solving this puzzle is to reduce the number of joins and move them into the WHERE clause. The query can be re-written like this

In this case we removed the table aliases taxonomy_index_value_1, taxonomy_index_value_2, taxonomy_index_value_3, taxonomy_index_value_4 and changed the WHERE clause to add
AND taxonomy_index_value_0.tid IN (‘6925’, ‘1901’, ‘2734’, ‘2717’, ‘1899’ ) .

Running EXPLAIN on the altered query we get this:

Now we are looking at only 69 rows to answer the query! Addtionally mysql only has to look at 3 tables instead of 7 previously. This query now runs in less than a millisecond whereas it took as much as 2 to 3 seconds previously.

So this is all great but how do we make Views use our version of the query?
The answer: hook__views_query_alter in a custom module! The view stores the join tables for the query in a array at $view->query->table_queue and the WHERE clause conditions in $view->query->where so we can manipulate the query as follows

Note that you can use dpq($view->build_info[‘query’]); in the hook_views_pre_execute hook to dump the sql being executed to the page while you are working on tuning your query.

It can’t be overstated how much of an impact this can have on performance and scalability. This shaved 2 to 3 seconds off each news story page’s execution time. It dropped our Database server load average from as much as 5 to around 0.2. It speeds each Php process greatly as well. Since we are running Php as a Fast CGI this allowed us to increase the number of Php processes available on our Web servers.

Caveats and Alternatives
I should also mention that this solution does create a situation where a site admin could change a view in Views’ UI and find these alteration break their new changes to the view. This technique should only be use when a View is considered stable and unlikely to change.

Alternative options to solve this problem
We may even bypass Views altogether in the future and implement this as a custom Panel Pane or Block using Drupal’s Database API. Drupal’s Solr integration also has some similar functionality that looks up related content via a Solr index bypassing Mysql completely.

  • Keith Walker

    Looks like you have just allowed MySQL to treat the query as using an INNER JOIN rather than LEFT JOINs. If you were using any of the data from the taxonomy_index table then you couldn’t use DISTINCT to exclude it and would have to code around the multiple returned rows.

    The SQL it has now done would possibly be more obvious if it were coded as:-

    SELECT SQL_NO_CACHE DISTINCT node.title AS node_title,
    node.nid AS nid,
    field_data_field_date.field_date_value AS field_data_field_date_field_date_value
    FROM taxonomy_index taxonomy_index_value_0
    INNER JOIN node node
    ON node.nid = taxonomy_index_value_0.nid
    LEFT JOIN field_data_field_date field_data_field_date
    ON node.nid = field_data_field_date.entity_id
    AND field_data_field_date.entity_type = ‘node’
    AND field_data_field_date.deleted = ‘0’
    WHERE taxonomy_index_value_0.tid IN (‘6925’, ‘1901’, ‘2734’, ‘2717’, ‘1899’ )
    AND node.status = ‘1’
    AND (node.nid != ‘58137’ OR node.nid IS NULL )
    AND node.type IN ( ‘story’ )
    ORDER BY field_data_field_date_field_date_value DESC
    LIMIT 5 offset 0