Saturday, July 16, 2011

The Magic of Geocoders....

Wassa might start fires *cough*LbCon*cough*, but Wassa Jr likes
to help Smokey the Bear prevent forest fires. =)
I was going to spend today explaining the ins and outs of listing a location with your box, but I'm still making tweaks to that code and don't really want to put anything in writing that might change. Maybe by tomorrow I'll be ready for that ball of ear wax. =)

So today will be an easy lesson on geocoders. I've mentioned them in past posts, but I haven't really delved into detail about what they can do that they weren't able to do before this update.

A geocoder, as a reminder, is a system that can convert a human-readable address such as "Lincoln Park, Seattle, WA" into latitude and longitude coordinates that Atlas Quest can use to calculate distances and map locations.

In the "old days" (i.e. a week ago), there was one geocoder. It was the geocoder provided our friends at Google. It wasn't perfect (no geocoder is perfect), but it was leaps better than nothing at all, which is what AQ had been using before the Google geocoder came along.

When I started working on the custom location feature, I discovered the the geocoder AQ had been using was depreciated and replace with a new (and presumably better) geocoder. I needed to upgrade, and the code was not particularly well suited for upgrading the geocoder. I imagined the geocoder as something like a lego piece, and I could snap off the old geocoder and snap on a new geocoder that would fit perfectly into the newly opened space where the old geocoder used to rest.

But it wouldn't be that easy. No, it was going to get messy, but it really shouldn't have been messy had I designed it better. It should work like interchangeable lego pieces.

So, I needed  a replacement for the geocoder, and I started looking around. The replacement geocoder from Google I took a look at, and the results frightened me. It looked mean and ugly. So I took a look at the Yahoo geocoder, and with a little poking and prodding, I thought, "Wow, this is cool!" I started working on a Yahoo geocoder, which also returned the radius of a location--a new piece of data AQ never had access to before and I ended up using extensively during the rewrite of the search engine.

The Yahoo geocoder could do some things the old Google geocoder could not--for instance, airport codes. Type in SBP (the airport code for our little local airport in San Luis Obispo), and by golly, it finds that airport and returns all of the relevant entries. Not that it would be used very often, but still, it's nice when you realize there's more functionality than before.

More useful, perhaps, was that it could geocode locations in a huge list of foreign countries--here's a list I've copied from their documentation:

North and South America:
  • BAHAMAS, THE
  • BRAZIL
  • CANADA
  • CAYMAN ISLANDS
  • FRENCH GUIANA
  • GUADALOUPE
  • MARTINIQUE
  • MEXICO
  • SAINT BARTHELEMY
  • USA (including PUERTO RICO and US VIRGIN ISLANDS)
Europe:
  • ALBANIA
  • ANDORRA
  • AUSTRIA
  • BELARUS
  • BELGIUM
  • BOSNIA AND HERZEGOVINA
  • BULGARIA
  • CROATIA
  • CZECH REPUBLIC
  • DENMARK
  • ESTONIA
  • FINLAND
  • FRANCE
  • GERMANY
  • GIBRALTAR
  • GREECE
  • HUNGARY
  • ICELAND
  • IRELAND
  • ITALY
  • LATVIA
  • LIECHTENSTEIN
  • LITHUANIA
  • LUXEMBOURG
  • MACEDONIA
  • MOLDOVA
  • MONACO
  • MONTENEGRO
  • NETHERLANDS
  • NORWAY
  • POLAND
  • PORTUGAL
  • ROMANIA
  • RUSSIA
  • SAN MARINO
  • SERBIA
  • SLOVAKIA
  • SLOVENIA
  • SPAIN
  • SWEDEN
  • SWITZERLAND
  • TURKEY
  • UKRAINE
  • UNITED KINGDOM (including ISLE of MAN and CHANNEL ISLANDS)
  • VATICAN CITY
Asia:
  • HONG KONG-CHINA
  • INDIA
  • INDONESIA
  • MACAU-CHINA
  • SINGAPORE
  • TAIWAN
  • THAILAND
Middle-East and Africa:
  • BAHRAIN
  • BOTSWANA
  • EGYPT
  • JORDAN
  • KENYA
  • KUWAIT
  • LEBANON
  • LESOTHO
  • MOROCCO
  • MOZAMBIQUE
  • NAMIBIA
  • NIGERIA
  • OMAN
  • QATAR
  • REUNION
  • SAUDI ARABIA
  • SOUTH AFRICA
  • SWAZILAND
  • UNITED ARAB EMIRATES
Holy jumping junipers! That's an awfully big list! (Clearly written by a developer, I might add--a professional writer would have never used all caps in creating such a list!) Not that Liechtenstein and Mozazmbique are letterboxing hotbeds of activity, but still, it's nice to know that the geocoder can figure out such locations! 

I developed a little block of code that used the Yahoo geocoder, got it working, and all was well. I really wanted that "lego-like" snap-in kind of functionality, though, and I figured the best way to do that was to create another geocoder and try snapping it in. I went back and implemented Google's new geocoder, finally getting that to work--and with just a single line of code that was changed, I could switch between the two geocoders practically on the fly. And then I had another idea... why not chain them together into another "super" geocoder? 

In the end, this super-geocoder will attempt to pull in information from three different sources in an attempt to understand your location. Park names that never worked before suddenly were now supported. The Yahoo geocoder supported airport codes, but the Google geocoder did not. The Google geocoder could figure out street intersections, but the Yahoo geocoder didn't seem to be able to do that. Ying and yang. For best results, I needed both.


As it stands now, when you type in a location that AQ need to convert into latitude and longitude coordinates, it runs the information through up to five distinctly different geocoders:


  1. The first geocoder really isn't a "geocoder" in the traditional sense. It takes what you typed in and tries to figure out if you're typing in the coordinates directly, so you can run a search for "34.6, 121.6" and have it return boxes near that point.
  2. AQ gets the first shot at figuring out what you are trying to say. It'll search it's own database looking for information matching what you typed in, and if it finds it, that's what gets returned.
  3. If AQ doesn't find a suitable match, Yahoo gets the next crack at figuring out your request. If a suitable match is found, awesome--that's what AQ will use.
  4. When the Yahoo geocoder fails, then the Google geocoder gets a crack at the information. If you ever run a search for a street intersection, it'll almost certainly reach this geocoder. AQ knows only a tiny number of street intersections that people have used for their box locations and Yahoo doesn't seem to handle intersections at all. If a suitable match is found--awesome, AQ runs with it.
  5. And finally, if all else fails, there's another source of data I found with 1.86 million "points of interest" in the United States. I found it a little scary some of the small parks I searched for that this source of information could find. Had I bet money, I'd have lost. This data is actually hosted by AQ--the data was in an enormous file and wasn't provided as an API by another company. (I basically had to create the API to use this data--a geocoder that was able to search this file.)
Each of the geocoders has their strengths and weaknesses, but collectively, it's a pretty amazing little beast. If you had trouble listing a certain park as a location in the past, it might work just fine now. The new geocoders, I noticed, also had better support for geographical features, so you can even list boxes as being on specific mountains or by a lake. They also often support more "vague" locations such as "Pacific Northwest" and "Central Coast of California"--perhaps useful for listing mystery boxes in those locations, though admittedly less so for running a search.

I also updated the ability to process coordinates. You can type in a location such as "34.6, 121.6" to run a location-based search centered on that point on the Earth, but AQ is better about processing other formats for the coordinates. It used to be required that you used the latitude, a space, then the longitude. And heaven forbid, don't even THINK about putting the comma between the two! AQ, in the past, was a very temperamental beast.=)

The new "coordinate geocoder" will allow commas now. In fact, you can type in a wide variety of formats that will properly be understood including:

  • 50.3 -120.5
  • 50.3, -120.5
  • -120.5 50.3
  • 50.3N 120.5W
  • 120.5 W 50.3 N
  • 50 18 0 -120 30 0
  • 50 18 0N 120 30 0 W
  • 50° 18' 0" N 120° 30' 0" W

Ironically, I copied these examples from the "reverse geocoder" directions from the Yahoo geocoder which will convert coordinates into an address. I don't use Yahoo's reverse geocoder, though--I just want to figure out if a person is trying to type in coordinates and understand what they typed in--I don't need to attach an address to it. But all of these examples work with AQ now. I didn't know where these example coordinates went to, but I copied them into AQ to test they really did work, and was pleased to see that they found boxes at Manning Park, Canada! Which is a small park that most people never would have even heard of.... except that it's also the end of the Pacific Crest Trail! Makes me wonder if the person who wrote that documentation on Yahoo's website might have been a former thru-hiker? =)

Before the update went live, I had to run every single location for every single letterbox and event through the new geocoders. Old locations did have have information about the radius of the location, all US locations had no information about the county of the location, and all mystery boxes had no latitude and longitude coordinates associated with them, and for this upgrade, I needed to fill in those blanks. So every single location was run through the geocoder--a process that took the better part of a month--and which is why you might have noticed that the location you had listed for some boxes might have shifted a bit. Most of the issues I found involved the "free-flowing" nature allowed for mystery boxes, so if there's a problem with how one of your boxes got geocoded, the mystery boxes would be the first ones you should check. I tried to fix problematic "updates" manually, but with over 100,000 boxes listed on AQ, manually checking every single one wasn't possible. I wrote a bunch of code for me to help identify "possible problems"--such as a new location that ends up being 6,123 miles away from the coordinates of the old location or a location with a radius of 1000 miles (not many places span such a wide area, but they do exist!). Subtler problems, however, could have slipped through, so it wouldn't hurt to run through your plants and check that the locations are what you think they should be.

So that's the skinny on using the new geocoder. It's bigger, it's better, and badder (in a good way) than ever! =) However, I'll note, there's no such thing as a perfect geocoder--you'll still find locations that none of the geocoders can figure out. But with this update, I hope you'll have a much more difficult time finding "unsupported" locations. =) (And, yes, you CAN use commas in coordinates now!)


3 comments:

Okie Dog said...

You're doing an amazing job for us, Ryan! Thank you, thank you, thank you!!!
Your explanations are fairly easy to understand, even for someone that knows nothing of the tech stuff. Stumbles may happen some but it can usually be discerned by reading further. So, thank you once again.

Ryan said...

Shoot, I meant to post this message at 5:00PM--not 5:00AM. Argh! I wasn't done tweaking it!

Oh, well.....

-- Ryan

Unknown said...

You are amazing and we really appreciate all your work and your time taken to explain it to us. Thank you.