It seems like at least once a year, I sit down and tackle the problem of slow queries. This year is no different. A query, for those of you without technical knowledge, is basically a word to describe the process of getting data out of a database. If a database is a telephone book, a query would be looking up someone's name to get their phone number.
But sometimes, queries can be really slow. For instance, if I asked you to look up someone's name based on a known telephone number, I bet that would take a lot longer for you to solve with a typical telephone book. The
numbers aren't in any order at all! You might have to look through the whole book to find what you're looking for--or even to find out that it's not there!
The data is there, but it's hard to look up and very slow. So when I optimize queries, I'm trying to figure out ways to make the search for data go a bit quicker. Like... create another phone number, but this one is sorted by phone number rather than by name! =)
Yeah, I know, I'm simplifying things a bit, but slow queries can absolutely kill a website, and Atlas Quest is no exception.
The queries I target first tend to be the slowest ones (that's where the most room for improvement is!) and the ones that run most often (that's where the most room for improvement is!).
This year, there really aren't
any slow queries that run very often. Which is probably the first time I can really say that. Probably 95% of the slow queries (which I define as taking 5 seconds or longer for the database to run) are all in the nightly updates. Calculating your P and F counts, which boxes have blue diamonds, which days of the week boxes are planted and found, the Hall of Fame, etc.
I pushed these queries to run late at night, when few people are on AQ, because I know they're slow queries, and they'll cause Atlas Quest to feel "sluggish" at best. May as well do that late at night when it affects the least number of people and when AQ otherwise has a lot of "downtime" twiddling its thumbs.
And since there aren't any slow queries that run
often, I'm now focusing on optimizing these nocturnal activities on Atlas Quest. First those that run every single night (e.g. the Hall of Fame stats). Then those that run once per week (e.g. Box of the Week selections). Then those that run once per month (e.g. blue diamond calculations).
Many of the slow queries had relatively quick and easy fixes, which I quickly fixed. And you'll
see absolutely no change on AQ as a result. Perhaps the site feels a bit faster and more responsive late at night than before, but that's the only change you might notice.
However, a couple of my query-speed improvements
has affected some of the functionality on AQ. One of the slowest queries of all involved the calculation of blue diamonds, and it took over 30 seconds to run. Anyone using Atlas Quest during those 30 seconds would probably assume the website was hung. In a sense, it was--hung up on a slow query, processing over 600,000 votes cast for letterboxes. And that did not have a quick and easy fix. That would take some work to fix. A related query that fed into it took more than 10 seconds to run. Those two queries, combined, essentially shut down AQ for more than 40 seconds once each month. And as the number of votes increased, so did the time it took the query to run. If the number of votes doubled, so did the query time.
It was finally time to conquer these slow query once and for all. Some of you might have noticed that your box rankings were changing an usual amount recently. That was me, trying to work out the slow queries. The blue diamonds haven't changed (those only chance once per month), but there will be an unusually high turnover of them come June 1st since I had to tweak the algorithm for calculating them a bit to make the queries faster. Overall, the change in the algorithm is pretty minor, but you likely will notice
some changes.
And.... since I was mucking around with that code, it occurred to me that I ought to make a little effort to have it support votes on all types of boxes. I don't know that I'll ever rank other types of boxes like I do with traditional boxes, but the old way the code was structured, it was all but impossible to do even if I wanted it. I'd rather keep my options open, though, so I tweaked the design so the database can support the ranking of
all boxes. You'll notice this change when you record a find on a non-traditional box--you'll be able to cast a 1 through 5 vote for it. Postals, LTCs, hitchhikers... doesn't matter. You can cast a vote for it. AQ doesn't currently DO anything with the vote, but at least the option is there so it can be used later.
But my main goal was just to improve the slow queries, and now the blue diamond queries are running fast. =) The slowest of the queries barely takes over two seconds to run--quite an improvement over the 30+ seconds and 10+ seconds those two queries used to do!
The next slow query on my hit list that has very public repercussions involved the ones that created the nightly Hall of Fame. AQ would have to sort through over 3.5
million finds to count up what your F-count was for each category. It was pretty fast when I originally wrote the code and there were 10,000 finds in the database. The problem grew quite a bit as the number of finds continued to increase. (The same problem happened with plants to a lessor degree, but since finds outnumber plants by more than 10 to 1, it was really the F-counts that were causing the biggest problems.)
My solution to this problem is just to keep a running total of each person's P and F counts for every type of box supported. AQ already kept running counts for traditional boxes--that's how it could display your P and F count in the ribbons and trophies by your trailname and allowed you to restrict boxes based on a person's P and F counts.
Counting that number every time it was needed would have just been to slow--so a running count was established. But it was only established for traditional boxes. It would take a heck of a lot more work to establish it for all boxes types!
But now.... that's exactly what I would have to do. I spent hours making the necessary changes, which I finally uploaded last night.
Now that I had running P and F-counts for all box types, though, that meant I could
use that information in places where I couldn't before. The reason the
Hall of Fame was only updated once each night was because counting all of the plants and finds was so slow. Now AQ didn't have to
count anything--it could just look up the numbers like it was a phone book. So I updated the Hall of Fame page to update in live time. No more time delay. The second you add or remove a plant or find, these numbers update immediately in the Hall of Fame. =)
There were a couple of other places where AQ was counting up plants and finds, such as
your profile. These particular queries didn't merit the "slow" status, though--it didn't have to count up
all of the plants and finds on AQ--just your own which was a much smaller and more manageable number to deal with. =) My profile page--with over 250 plants and over 1,500 finds--took AQ about 1/10th of a second to count. That's pretty fast, and plenty fast for AQ purposes. But since I had this pre-counted data readily at my fingertips now, I switched the page to start using that data instead. Now it takes AQ less than 1/1000th of the second to get the exact same data! It's over
one hundred times faster! From your point of view, it's no big deal, though. The human eye isn't going to notice the difference between a page that takes 1/10th of a second or 1/1000th of the second to run. =) Especially when you consider that the other two dozen queries that are used to create that page are still running at the same speed. This is one of those changes that you won't really notice. There were a few of them like that. The Hall of Fame is noticeable now, however, just because it updates immediately instead of overnight like it used to, but it's allowing even fast queries to run even faster!
But there was one other place I realized that I could use this pre-counted data... box restrictions. Previously, you could only restrict boxes based on a user's traditional P and F-counts because those were the only numbers available that had been pre-counted. Even postals and LTCs could only be restricted by a person's traditional P and F-count, which is highly unintuitive and doesn't really make a much sense. But since all box types are now pre-counted, that unintuitive functionality could be removed. So, starting today, if you have restrictions on non-traditional boxes, they refer to the user's P and F-counts for the same box type.
For example: A postal with a P-count restriction of 1 and an F-count restriction of 10 means that only people with 1
postal plant and 10
postal finds will be able to see it. An LTC with a P-count restriction of 10 and an F-count restriction of 50 will only be visible to people with 10
LTC plants and 50
LTC finds. You get the point....
I was trying to do the same thing with Trackers, but I was having more trouble with those since there's not a one-to-one relationship with box types to tracker types. (There are no Hitchhiker Trackers, for instance, and there's no "Boxing Buddy" box type.) I'm not sure how to go about fixing those, so if you have P and F count restrictions on Trackers, they'll still only restrict trackers based on a person's traditional counts. Someday, I'd like to fix that.... But that will have to wait another day. It's too much work for the payoff at the moment. =)
I'm still tackling slow queries--I've nailed the vast majority of them that used to occur nightly or weekly, and a good number of those that run once each month. But hopefully AQ seems at least a bit more responsive if you're up late at night with insomnia. =) Perhaps still not as responsive as during the day, but an improvement nonetheless! That's my ultimate goal. The other changes you might see on AQ are just "collateral improvements." =)