Monday, April 26, 2010

Distributions and outliers

John Cook has an old but good post on the issues that even well behaved normal distributiosn can have have in the extremes. I would tend to argue that these extreme outliers (women over 6' 8", for example) probably are due to some process that is rare (i.e. a genetic mutation, an extreme environmental exposure) and so the real height distribution is a mixture of several underlying distributions with latent (or unobserved variables).

But this line of thinking is actually dangerous. After all, with enough latent variables I can model almost any distribution as a sum of normal distributions. And, if I can't observe these variables, how do I know that they exist?

So I guess this is one place where my intuitions are precisely wrong for handling the problem.

No comments:

Post a Comment