Wordpress is a very popular PHP-based blogging platform, which uses a MySQL database. It has a certain reputation for slowness, but still sees a lot of use.
Last year, someone looked into what queries Wordpress (1.5) uses in preparing pages;
article here. I thought it would be interesting to see how things have changed, so I got the latest version of Wordpress (2.2) and installed it on my desktop, with MySQL set to log all queries.
Now, before I start I should mention that for a couple of years I used Wordpress for this blog. I was never enormously irritated with it, but I did start having considerable speed problems as the blog grew. I was able to deal with some of these by adding appropriate indexes to database tables; however, in the end, it became more trouble than it was worth. I switched to Google Blogger, and have had no trouble since.
Now, on with the query review. Please note that I am by no means a database expert, and that I may have made mistakes.
On displaying a page, Wordpress first finds the site URL in the options table. It then loads all records from the options table which have 'autoload' set to yes; note that the site URL row is one of these. Next, it loads 'widget_text' and 'widget_rss'; neither of these exist on my installation.
It gets the user's details; this is an improvement on 1.5, where, as noted in the above article, it loaded details for
all users. This was particularly concerning as some larger blogs allow their commenters to log in as low privilege users to comment; the amount of data retrieved could, then, be considerable.
Oddly, there is no index on the 'user_login' column; this both allows, potentially, for duplicate users (although I'm sure that this is guarded against in the application itself) and means that if you really have a
lot of users, it may have to do a fairly large table scan. This won't be a problem for most Wordpress blogs, though. It then gets data for that user from 'wp_usermeta'; this could have been done as a join, but is, at least, done on an indexed column.
Interestingly, it initially looked for a rewrite option in the options table at this point. When I enabled URL rewriting, however, this query vanished, presumably because it got it in its first read of the options table. This seems odd; if it didn't get it the first time 'round, why try again?
The next query finds the last ten posts, and downloads all data on them in the 'wp_posts' table. Fine, although there are no indexes on any of the constraints it uses (even the date!) so this really isn't ideal for a big blog. The one after that, though, is an oddity. It looks up a mapping between posts and categories. This is the sort of thing that I would really expect to have been done with a join; I can't imagine that avoiding a join saves them any runtime, and it's an extra query. The query after that gets
all category data, and the query after that all metadata for the posts. All of these except for the category lookup could have been done in one query.
The next three queries retrieve settings for the (default) template. I really don't feel that this is the sort of thing that should be done in the database, but see comments below on configuration data.
Next, pages are retrieved; Wordpress has a separate, non-blog-post type of page. Oddly,
all of these are retrieved, with full text, even though they're never going to be displayed on this page.
The next step is to find what months there are posts for, and how many. Again, un-indexed fields are used as constraints, and counts are done. I'd be inclined to store this data in a separate table and only update it when new posts are posted.
Next, categories are found (again, but in a specific order this time) and posts associated with each discovered; again, insufficient indexes. Following that, categories are retrieved
again. That's the third time, mind you. It would seem to make more sense to just get them
once and perform the ordering in the (PHP) client, but whoever wrote Wordpress doesn't seem to think like that.
Links for the sidebar are retrieved, with a constraint based on the time of day used to find recently added ones. This will, I'm fairly sure, prevent the results of the query being cached, and could be determined in the client code anyway. Oddly, a join
is used here.
Mercifully, that is it. The index page is loaded and all is well in the world. "But wait!" you say. What about comments? What, indeed?
Well, to see comments, we'll have to click on a post. Pop! Queries fly high into the air!
First, we have more options and user stuff. Fine. Next, we get the appropriate post. All seems more or less sane so far. Again, less than ideal separate queries, but it was hardly likely that that'd be different here, links notwithstanding.
A little more weirdness; the previous and next post are found (a query each) and then
retrieved in full (another query each). I can't fathom the thinking behind this.
Finally, we get to comments. Mercifully, except for the order by date clause, indexes are used here. Big blogs can have a
lot of comments, and it would be mad not to have indexes on them.
Similar madness is involved in retrieving posts for RSS feed; a potential concern because popular sites can be hit by thousands of RSS feeds a day.
Interestingly, throughout, there are occasionally '1=1' clauses in the queries. This isn't really a problem in itself, but indicates something weird going on, possibly. I can't face reading the code to find out; I do
so dislike PHP!
So, what is my conclusion? Clearly, things have gotten better since the above article was written, but they don't seem to be all the way there yet. There seems to be a general reluctance to use indexes, and complete willingness you use very large numbers of queries. While using large numbers of queries mightn't
sound like a big problem, each does take time to parse, and if you're on a shared server you're likely to be contending with other people to submit queries; best get the job done in as few as possible. Some shared hosts actually limit customers to a certain number of queries per day. The multiple options queries, in particular (about eight or ten per page) are odd; they could easily be done in a single query per page.
Putting options in the database is itself a strange move; I suppose the reasoning behind it is to expose the user to as little messing with file permissions as possible, but it is certainly non-ideal from a performance point of view.
At this point you may wonder why I don't fix the problems I see. That is, after all, the dreary cry that people who criticise open-source software in any little detail are honoured with; "go and fix it yourself then". Well, first off, as I say, I am no database expert. I am also no PHP expert, and would prefer to keep it that way, if you don't mind too much.
But besides that, and here, I feel, may lie a large part of the roots of the problem, the Wordpress community (
not, I would like to stress, the developers, rather the people who use it) seem to be, largely,
extremely offended by criticism. If you look at the Wordpress forums, you will see that people are very much inclined to blame any speed problems encountered on hosts, configuration, or, in a few cases, the sheer alleged awfulness of MySQL!
I just can't understand how, with such a popular but clearly deficient product, these problems haven't been fixed.
If you are one of these people who is very attached to Wordpress, please refrain from hunting me down and killing me. Thanks.