Wednesday, September 17, 2008

FullText Searches

Good news, folks! But before I get to that.... you're probably wondering, "What the heck is a full text search?" Good question!

It involves a special type of index in the database that will take a long string of text, figure out each and every word that's used it in, then create a massive index so when you try to do a search for all posts with the words "deep dark secrets," the database can look up each of those words in the index and figure out--practically instantly--exactly which posts match the search. There's a lot of sophisticated stuff going on there.

And that's where the good news comes in--I finally broke down and figured out how to personalize the index specifically to Atlas Quest. You see, all this time, I've just been using the defaults. Those defaults include the minimum length a word must be in order to be indexed (four) and the list of stopwords (common words that aren't worth indexing at all such as "which" or "that").

I hadn't been messing around with the defaults since I thought it would require tweaking the code for the database itself and recompile--something I so did not want to do and could seriously screw things up in a very big way.

Turns out, that's not the case. (Well, actually, certain settings do require this, but the minimum word length, maximum word length, and stopword list does not.) So this evening, I changed the minimum word length to 3 and the maximum word length to 10. I took the already existing stopword list and added a few commonly searched for terms that don't seem to be particularly useful but can bog the database down with especially common words.

Then I rebuilt the indexes and presto! Most three letter words are now searchable. You can search for them in the message boards, in clues, and wherever word-based searches are allowed. Very nice. I still need to tweak some of the code on Atlas Quest before the change is complete, but already the searches should be much more useful and powerful. You can complete full searches on three letter words such as pie, box (though I might add that one to the stoplist), toe, etc. Very nice. *nodding*

In other news.... just before I started doing these tweaks, I happened to noticed that the database grew to 1001.3 megabytes. The big One Gig! Must have happened some time this afternoon.

After I changed the full text index settings and rebuilt the indexes, the database size dropped to 986.76 megabytes. Hmm.... I actually expected it to grow even larger since I'm allowing three letter words to be indexed which wasn't happening before, but I guess the words longer than 10 letters that are not being indexed anymore more than compensated for the difference. *shrug* Plus a handful of words in the stopword list that are no longer being indexed.

In any case, it looks like I shrank the database by several megabytes with my change, so it's under the 1G size now. Give it a week, though, and I bet it'll be above one gigabyte again. The database certainly grows fast enough! I still thought it rather exciting to see the database being over 1G. I've never worked with a 1G database before. =)

So anyhow.... That's the update for tonight. Hope you like it! =)

-- Ryan

3 comments:

Anonymous said...

isn't 1024 MB = 1 GB??

Anonymous said...

ooh. good point from a stickler!

Anonymous said...

Yeah, wait till all the clues from Live and Breathe have been posted! YIPES!

LW