Sunday, November 10, 2002

Today's lesson: abuse of statistics
(1) From a Wired article on European sites outlawing Internet racism sites:
"Many European countries have existing laws outlawing Internet racism, which is generally protected as free speech in the United States. The council cited a report finding that 2,500 out of 4,000 racist sites were created in the United States."
Actually, I'm surprised that the percentage is so low. Consider: 2500 out of 4000 is 62.5%. I'd be surprised if the total fraction of Web sites which originated in the US was significantly less than 62.5%. Of course, I could be wrong -- but the point is that this statistic is completely meaningless without a baseline for the total to compare it to.
(2) Len Pasquarelli claims that the fact that only three teams in the NFL rank in the top ten in offense and defense is evidence that the salary cap creates unbalanced teams. Well, no. Suppose the two are uncorrelated. Well, there are 32 teams in the NFL, so the top 10 represents a fraction (10/32) = 0.31 of the total number of teams. Consequently, if we assume that offense and defensive ranking are uncorrelated, we would expect the number of teams in the top 10 of both to be (32)*(0.31)^2 = 3.13, which is exactly the number that we see! Maybe he's just really worried about that missing 0.13.

I could go on and on, but I have a feeling it's a losing battle.

