[cn: Bayesian math]
Suppose that I create a test to measure suitability for a particular job. I give this test to a bunch of people, and I find that women on average perform more poorly. Does this mean that women are less suitable for the job, or does it mean that my test is biased against women?
Psychologists do this all the time. They create new tests to measure new things, and then they give the tests to a variety of different groups to observe average differences. So they have a standard statistical procedure to assess whether these tests are biased.
But I recently learned that the standard procedure is mathematically flawed. In fact, rather than producing an unbiased test, the standard procedure practically guarantees a biased test. This is an issue that causes much distress among psychometricians such as Roger Millsap.
Following Millsap, I will describe the standard method for assessing test bias, sketch a proof that it must fail, and discuss some of the consequences.