I’ve been trawling thru the leaked AOL data with some perl scripts and came up with a few statistics.
The data contains search records for about 658 million users collected over a three month period from March to May 2006. According to AOL spokesperson Andrew Weinstein, this represents 0.33% of the search traffic conducted through AOL over that period. My own informal test indicates the actual fraction may be higher. The leaked data shows 30 visits to Krazydad over that period, and I actually had about about 480 visits from AOL search engines according to my web logs. Assuming that my web logs only account for half the traffic, this would still indicate that the leaked AOL data represents 3.0% of all AOL search traffic during that period.
Percentage of users who searched for, or landed on Google: 17%
Percentage of users who searched for, or landed on Yahoo: 18%
This is interesting because the number of users who actually searched for Google is about 3 times higher than Yahoo (you can see a list of the top 500 search terms here). A lot of folks are landing on Yahoo hosted sites without actually searching for Yahoo.
Percentage of users who searched for, or landed on AOL: 13%, Myspace: 11%, EBay: 10%, Amazon: 8.2%, Flickr: 1.2%, YouTube: 0.7%.
Percentage of users who did at least one search for porn, or visited a porn site: 20%.
This real number for this last figure may be a bit higher, since it’s difficult to suss out all the creative ways that users search for porn. I used a Bayesian filtering technique to come up with a list of likely keywords and sites, and then searched on users who searched for at least one of the candidate keywords or visited one of the candidate sites. Interestingly, the site most indicative that a user didn’t search for porn was a savings and loan.
Percentage of users who searched for “cream pie”: 0.2%
Percentage of users who searched for “cream pie” and actually wanted a recipe: 0.1%