Monday, February 23, 2015

Improving Searches


I know many of you are waiting with bated breath--what is Ryan going to do to ruin Atlas Quest next?! I've heard those complaints, and I've been listening! =)

My most recent changes are completely invisible. At least, on the surface, they're invisible. I've been working to made word-based searches.... better.

The thing to remember with almost all searches--be it for letterboxes with a specific name, message board posts that use a certain phrase or even the help pages, when you search for a string of words, AQ runs the search using an invisible "or" between each word unless you stated otherwise. So if you tried to run a search for the letterbox named "Deep Space Nine," the results would return all boxes with the words "deep", "space" OR "nine." The list could be a lot longer and unwieldy than you might have expected! In a worst case scenario, if more than 1,000 boxes returned with that name, you might not even see the specific box you were looking for because AQ stops looking after it finds 1,000 matches. If the box you were searching for wasn't within those first 1,000 matches, it won't show up in the search at all.

Ironically, you'd have gotten better results just by searching for a single word. "Deep" would have only returned boxes with the word "deep" in it. Boxes with "space" or "nine" but not with the word "deep" wouldn't have cluttered up the searches.

This system might seem convoluted, but it was rather handy when people tried to run a letterbox search for a box named "Forest Grove Cemetery" but misspell a word and searched for "Forest Grove Cematery" instead and had people puzzled why the box couldn't be found. With only a few thousand boxes on AQ, AQ could still find a match, sort it so boxes with the words "forest" and "grove" came before boxes with "forest" OR "grove" and everyone liked the results. The cemetery didn't match the misspelled "cematery" but it didn't matter. It was close enough!

Nowadays, with hundreds of thousands of boxes listed on AQ, a better method was needed. The easy solution was just to put your search in quotes. "Deep Space Nine" (with the quotes) would have found the exact title, but do you know how many people did that while running searches? Not many!

So to make a long story short, AQ does some fancy manipulation of your search terms automatically if you don't specify the manipulation yourself. The first two words of a search will always be required. If you misspell one of those first two words, AQ will not find the box you were looking for. So now a search for "Deep Space Nine" (without the quotes) will return all boxes with the words "deep" and "space".... and optionally, if it happens to find the word "nine" in the title, it'll be sorted before the others that don't have the word "nine."

As you add more and more words to the search, more and more of them will be required. Let's take an exaple for a search for "The Life and Times of Deep Space Nine."

First, stopwords aren't included. Words like "the", "and" and "of" are pretty useless for searches, so those get crossed out completely.

Now we're left with "Life Times Deep Space Nine." In this case, the first two words are required. ("Life" and "times".) Among the rest of the words, AQ requires the first half. If there's an odd number of words, it's rounded down to an even number. Here we have three words left, so we round it down to an even 2. The first "half" of those two words is "deep" so the word "deep" becomes required as well.

The words "space" and "nine" will both be optional, but if they're found in the results, they'll be sorted ahead of those without them.

Clear as mud, I know. It's confusing. The exact algorithm for which words are required isn't really that important. The important thing to remember is that, in general, the first half of the words in a search are required and the last half are not. A misspelling in the first half will cause your box NOT to show up in the results. A misspelling in the last half will be pretty forgiving.

I haven't rolled these changes out to all of AQ as of yet. I've been testing them and tweaking them for the best results. You will find them working if you run a letterbox search, however, which was the place where the most problems were being caused. I've also updated the code for message boards and the help pages to use the new algorithms as well. If you always found the help pages difficult to search, you might consider the searches dramatically improved now! I did a test this morning where the old algorithm returned about 20 results (the vast majority having nothing to do with what I was looking for) while the new one returned 2 results (both of which were immediately relevant to my search).

Eventually I'll be rolling it out to all of the other word-based searches. Searching for trackers by name, events by name, etc.

What if you don't want AQ to automatically try to "improve" the search you're running? The easiest way to "opt out" is simply to include a special processing symbol of your own in the search. AQ won't manipulate the search if you're doing it yourself! If you want to search for all boxes with the words deep, space, and nine in them, you could run a search for "deep" space nine. The quotes, in this case, are essential so AQ sees them and knows to leave well enough alone, even though on the surface they don't really appear to actually do anything. (And under the old searching algorithm, they didn't do anything!) But really, I can't think of any reason not to let AQ automatically improve your searches.

At times, you might want to further refine the search beyond what AQ does automatically. (Perhaps you want your search to require the words deep, space, and nine, so you can still run +deep +space +nine. Now they'd all be required. Or you want to search for the exact phrase "deep space nine" (all in quotes) which requires all three of the words to be in consecutive order.

Or if you want deep and space to be optional, but nine to be required: deep space +nine.

So there you have it. =)

Happy trails!