Wednesday, August 12, 2009

Results of the Great Blue Diamond Experiment

The second poll is now officially closed. The official winner, by a whisker, is the original blue diamond algorithm I had been using all along. Before I started this whole voting process, I actually saved a list of exactly which boxes had a blue diamond, and during this vote, I simply put them back. So despite all these other algorithms I tried, the original blue diamond algorithm is actually still the favorite. =)

A close runner up was the green algorithm, which is close enough where I feel the two colors really were a tie from a statistical standpoint. That doesn't surprise me much--the core algorithm for the two is exactly the same. The difference between the two is that the blue algorithm had additional "tweaks" I added after the core algorithm ran. The core ranked boxes based purely on the votes, adjusting for the voter's average vote and the standard deviation of their votes. The green algorithm is the "pure" results. The blue algorithm included a few additional tweaks after the fact by rearrange the "borderline" results.

Boxes that ranked near the cutoff for a diamond usually ended up there more-or-less by chance. From a statistical standpoint, the boxes immediately above and below the cutoff are actually ties. The difference in ranking for #2223 or #2252 might depend on what a voter had for breakfast that morning. So I added a couple of tweaks to make the rankings more consistent and (I hoped) fair. If a box already had a blue diamond the previous month, it would still keep the diamond even if it technically fell below the cutoff (but was still a borderline case). If two new boxes fell close to the border line, one on each side of it, I would give a slight edge to the one with a planter's choice listed as an attribute. Basically, in the event of a tie, then the planters would cast a tie-breaking vote. (Don't think putting a planter's choice icon next to ALL of your boxes will help either--how discerning one is in applying them to your boxes is also taken into account.) There were about a half-dozen various tweaks I made to those borderline boxes in an attempt to break the statistical ties, and those were applied to the blue algorithm but not the green.

The tweaks only affected the results of the borderline boxes, and apparently it didn't make a significant difference in the results.

The purple and white diamonds I didn't expect to do well since they didn't do especially well in the last vote. The white diamond used the algorithm where it removed the best and worst vote for a box, then took the average of the remaining votes. The purple diamond took the ratio of high votes (5s and 4s) to the number of low votes (1s and 2s) and sorted accordingly. It actually did surprising well in the last vote, but still nowhere close to the original core algorithm that adjusted votes based on the average and standard deviation of an individual's voting patterns. While the first vote had the high-low ratio score nearly double the rate of the straight-average of votes, this time they scored almost identically. I'm a bit puzzled about that, but they both did significantly worse than other options, so it doesn't make much of a difference.

The red and yellow algorithms were the "combined" algorithms, where I ran three different ranking algorithms, then combined the results to generate the red and yellow diamonds. Intuitively, I thought these would do very well--perhaps even beating out the original blue diamond algorithm--and was stunned to see them go down in flames like they did. I guess in my head, I thought a combined algorithm would pick up on the best of all the algorithms. It seems actual results were more skewed towards "the weakest link." It took the results of the green, purple, and white diamonds, and combined them. The red is the "pure" combined algorithm, while the yellow is the "tweaked" version using many of the same tweaks I did for the green/blue variations.

The end results of the combined algorithms, as I see it, is that the most popular core algorithm (the green), was pulled down by the poorer results of the purple and white algorithms. Or you could view it as the green algorithm "pulling up" the results of the purple and white algorithms. The combined algorithms did score better than the two least favorites, but it scored worse than the most popular algorithm. An average of algorithms thus resulted in average results.

And that was the biggest surprise for me. I really expected the combined algorithm to get much better results than that.

The different between the tweaked and non-tweaked version of the combined algorithm 31-29, a statistical tie in my book. Again, there doesn't seem to be much preference one way or another based on the tweaks.

So, the core algorithm using the average and standard deviations of a person's voting patterns is hands down the winner and will continue to be used. The tweaked version shows a *slight* preference, but it may not be outside the range of a statistical tie. I also never broke down the multiple tweaks that could be voted on to see which ones might be preferred--it was an all-or-nothing type of deal.

The two "tweaked" algorithms also didn't all have the same tweaks, so I can't really compare those two very well. I literally applied the blue diamonds on exactly the same boxes that had blue diamonds before the votes were counted, which meant that tweaked version did allow boxes with just two votes to get a diamond, but the yellow diamond was limited to boxes that had a minimum of three votes. The blue diamond included the tweak that gave preference to boxes that already had a blue diamond if it now falls just under the cutoff, but the yellow version had no previous diamonds that it could be compared to and thus did not use that tweak.

So I'm left trying to decide exactly which tweaks to keep and which ones to throw away, but based on the results of the poll, I'm not sure such decisions will make a big impact anyhow. They're little decisions that ultimately have little impact. I'll definitely continue favoring boxes that already have blue diamonds just for the consistency factor--one of the biggest complaints about blue diamonds was their fleeting nature for borderline boxes. It would appear one month, disappear the next, and return the month after that, and so on. Giving a slight edge to those with the blue diamond already got rid of most of that inconsistency (and the subsequent complaints about "losing" diamonds).

But in a nutshell, after all this voting and discussion, pretty much nothing will change. =) Was it a waste of time? I think not. There were several very good things that came out of these proceedings:

1. You no longer have to take my word that I'm using the best algorithms possible.

2. I also don't have to trust that my biases had been playing a roll in the selection of algorithms.

3. I hope that anyone who intuitively felt that a simple average of all votes really is NOT the best ranking algorithm available will finally be able to let it go. Yes, there are some people who actually liked that result the best, but there were also nine people who each voted for the "completely random" results as well. The results were pretty overwhelming, however, that a simple average is NOT the best ranking algorithm available, and it's time to simply agree to disagree.

4. And I hope to gave many of you a sense of empowerment. Not the "cram it down your throat whether you like it or not" feeling that some people seemed to have, but a sense that you're in control of how the boxes are ranked. The end results may not have changed, but this time it was you all who chose the algorithm--not me. =)

On another note, I'm seriously considering giving boxes with different status different colored diamonds. Not because it has any significance, but rather because there continues to be that persistent myth that retired boxes are "taking" diamonds away from active boxes. It's not true, and even after I explain mathematically why that's not happening, it's a myth that continues to persist. And maybe a simple change of colors can finally put the nail in that myth once and for all. It's an intriguing idea to me, and it would be pretty easy to implement given the fact I already have lots of colors available now. =)

Thanks to everyone who participated. I'll be putting everything back to normal shortly. I'll leave the original blue diamonds up this months, but I might make a couple of minor tweaks when it comes to next month's ranking of the boxes. For the most part, however, expect the same algorithm.

Happy trails!


Anonymous said...

An erudite and comprehensive explanation of the nuances involved in optimal diamond selection! One quick question, though... what's an "algorithm"?


Unknown said...

Ack I lost. And I never got a chance to campaign for the one I liked best!

ArtGekko said...

Yikes. I'm not sure I could read that explanation enough for it to make sense in my brain right now. I didn't vote (didn't care to), but thanks for all the effort, anyway!

Interestingly,I did look at my logbook during all this, and I had diamonds of many different colors on about 1/3 of my plants. I think the most common was was grey/silver. Anyway, now that it's all over and back to normal, I have no diamonds at all (which is how it was before). Sad, but there it is. So I liked any algorithm EXCEPT the blue!

Thanks again, Ryan!

Anonymous said...

Thanks for all of the work you put into this Ryan. It was nice to see all of the gemstones floating around AQ for a while!

Forgive me if this is something that came up in all of the discussions, but would it be possible to create diamonds for boxes based on separate categories?

I read a number of comments about LBers noticing that people voted for different boxes for different reasons. Could you create a couple of other ratings so that people could rank a box based on things like location, quality of the carving, quality of the clue, etc.?

You've got all those extra gems just sitting around, maybe this is a way to use them. Perhaps in the next great experiment!! :-)

Bikers n Hikers

Ryan said...

I want to keep the voting process to a single vote. Having people rate the different aspects of a box requires more work and less would people be likely to cast a vote. I need all the votes I can get!

Mandy said...

I lost a Blue Diamond (on Georgia Boy)! It's okay, though, since I picked up 5 more. :o)