There is yet another discussion of intelligence raging across the internet just now, sparked by Sam Harris’ interview of Charles Murray and a Vox article critical of that interview. (h/t to PZ) I have been critical of the uses of IQ testing for quite some time now, dating back to 8th grade or so. There is nothing per se wrong with intelligence testing. Nor is it inherently bad to make use of intelligence testing. As part of a job application where one is being asked to perform particular tasks in a particular environment, it’s entirely conceivable that a particular intelligence test or set of such tests might well predict success in that job. However, for many if not the vast majority of public policy purposes, IQ and other intelligence testing will function badly, misleadingly, or both. This is even more true if we make assumptions about how much of a particular test result is due to intraracial genetic factors (factors shared within one race, but not between people of different races).
Rather than making this even more lengthy, I’d like to focus on just one substantive criticism and save other discussions for other posts. One metaphor used by Harris in his published e-mail exchange with Ezra Klein of Vox is that of height:
Height is highly heritable, but you can surely stunt a person’s (or a whole population’s) growth through malnutrition. So, merely seeing a group of short people, one can’t be sure to what degree environment determined their height. And yet it remains a fact that if a person doesn’t have the genes to be 7 feet tall, he won’t be. It is also utterly uncontroversial to say that while there are many ways to prevent a person from reaching his full intellectual height, if he doesn’t have the genes to be the next Alan Turing, he won’t be that either.
People respond sympathetically to this argument by Harris, but it is unfortunately misleading. The argument that Murray’s conclusions are both wrong-headed and harmful comes not from the fact that Murray assumes that some portion of intelligence is genetic, but rather that some portion of the observed difference in mean IQ from group to group is indicative of different racial population genetics.
Whether some person’s IQ test partly reflects a genetic component is a very different question than whether that person’s results can be combined with others of a particular racial grouping to demonstrate difference in racial population genetics related to intelligence.
Imagine, if you will, a hypothetical IQ test administrator who administers tests only in Ojibwe.
- Suppose this test administrator is asked to test the IQ of a monolingual English speaker.
- The administrator returns a result stating that the subject was unable to successfully follow instructions or answer a single question.
- Would it be reasonable to conclude that the subject deserves to be rated as low as the test’s accuracy allows (probably 20-55 for most IQ tests)?
- If the identical twin of the monolingual English speaker also failed to answer any questions, would a genetic assay of the twins allow us to say anything productive about the genetic contributions to intelligence?
- Since mean IQ is purposely defined as 100, and if (as Murray and Harris assert) 50% to 80% of intelligence is genetically heritable, then does that mean that these twins’ scores demonstrate a minimum 23 point genetic disadvantage on IQ tests for their race?
In truth, genetically comparing the most successful subjects of our hypothetical administrator to the twins might genuinely appear to show that indigenous/ First Nations peoples have the most intelligence-promoting genetic factors. In any group you can find average differences in the presence of some allele or other when compared to the genetics of some other group. Given the vast differences in average IQ we would expect to be found by this administrator, and how the ability to speak Ojibwe is passed down through families in a manner that highly correlates with how genes are passed through families, it is almost inevitable that some genes more common in certain indigenous/ First Nations families would correlate with being scored more highly by this administrator.
But the lesson we should take from this is not that the hypothetical administrator has the power to reveal how much of intelligence (or even how much of an IQ score) is determined by racial population genetics. The lesson we should take is that it is possible to create a test that appears culture neutral, but that actually generates scores highly dependent on certain shared cultural traits. Language is only the most obvious of these (and because it is so obvious, testing of someone is not generally considered valid unless performed in the first and/or native language of a subject).
The worst part of this lesson for Murray’s position, however, is that my reductio ad absurdum doesn’t actually communicate the real problem here. With monolingual speakers and total non-responsivity to the administrator’s instructions, it appears that I’m talking about a binary trait of intelligence tests: the test is either invalidated entirely by cultural effects or it is entirely valid despite cultural effects. In truth, most cultural traits with the capacity to impact an IQ test’s results will affect the final score only marginally. Further, while bilingual adults above a certain age may test better on certain IQ tests, it’s not at all certain that we can expect the same effect for, say, US citizen children that speak Spanish, Tagalog or Cantonese at home but English in school. The lesser amount of practice in a single language may put such a child at a disadvantage in either of the child’s native languages until after reaching a certain age where language proficiency is no longer developing as rapidly.
To return to the Harris metaphor, while it is easy to say that a 7′ tall person must have a distinct phenotype that enables growing to 7′, when comparing someone 5’6″ to someone 5’7″, how do we determine the genetic contribution is to height difference if one person had the genetic potential to grow to 5’9″ and the other 6’1″, but both suffered a degree of malnourishment during growth? Does it matter wether the 5’6″ tall individual had the genetic potential to grow to 5’9″ or if that shorter individual was the one with the genetic potential to grow to 6’1″? Say, for a moment, that the taller individual was actually more malnourished than the shorter, but the taller individual had the shorter maximum genetic potential for growth (i.e. 5’9″). This could easily be the case if the taller individual had a mutation that did not increase maximum height, but made height growth less sensitive to malnutrition. Now how do we characterize the genetic contribution to height in each of these persons?
To complete the metaphor: If these are racial averages instead of individuals, how do we characterize the racial genetic contribution to the racial average? The answer is that we may not be able to do so, and if we try then we may run up against methodological choices that create conflict between our own choices and the measurement choices of others. That hypothetical gene that makes one group more likely to grow to a higher percentage of maximum height for a given individuals genetic potential? Is that actually a “height” gene? Does it become a gene for height if malnutrition is ubiquitous? What if malnutrition is only expected for 70% of children? 50%? 20%? 1%?
We can think about cultural differences again in a moment, but for right now let’s compare the height-but-for-malnutrition metaphor used by Harris. We know that Black citizens of the US are exposed to higher environmental lead levels in childhood. We know that such exposure is negatively correlated with IQ. Hypothetically, if Black US populations have much higher rates of a genetic variation that leads to higher sensitivity to lead poisoning, you would see Black identical twins with a higher correlation on IQ tests than fraternal twins even when all other genetic factors for IQ happened to be the same because some fraternal twins would share this vulnerability but all identical twins would share it. Does this IQ-but-for-lead poisoning model tell us more about Black genetic factors for intelligence, or more about Black environmental factors for intelligence?
Back to culture. We know that for an IQ test to accurately measure what we hope we’re measuring when we seek to discover intelligence, we have to eliminate certain cultural factors. However, we can’t eliminate all of them. We don’t want to do so. If we’re trying to rate one’s ability to solve problems, it may be that a cultural emphasis on study and education positively impacts the ability to solve problems. For most purposes (say, hiring for a particular job) we won’t want to actually downgrade one of two people who score the same on the test because one comes from a cultural group that values education. However the person got good at solving the problem we don’t care, right? So then we’re back to arguing about what cultural factors need to be eliminated as a source of bias and what cultural factors are accepted as simply a few of many valid paths to success on the exam.
Poverty, too, is a confound. We know that skipping a meal before an IQ test can lead to a lower result. We know that people living in poverty a much more likely to skip meals. Since poverty is racialized, effects like missing a meal on a test day are also racialized. Interestingly, racial differences in IQ increase from pre-school age to high school age. For young students some schools have subsidized or free breakfasts and lunches, but for older students free breakfasts are generally not available. In addition to increased test-day effects, there is also an impeded ability to learn over the course of an educational career when many breakfasts are missed, even if a student manages to have breakfast on a test day. How much, precisely, do these racialized effects contribute to the gap in racial mean IQs?
The hard work to answer questions about poverty effects and cultural effects (including defining which cultural effects are acceptable to measure and for what purposes) is still ongoing, and as such IQ tests cannot currently credibly assert that they’ve eliminated culturally distinct sources of unacceptable bias or the effects of poverty. Since cultural groups are very frequently intraracial groups, these cultural differences will appear in analysis to be sub-racial differences that can add together with the differences inherent to other racial subgroups to become an apparent racial difference. With poverty racialized, you add another source of environmental effects to the measurements.
When we combine known confounds such as racial differences in lead exposure with unknown but potentially very real confounds such as hypothetical genetic sensitivity to lead (and there are so many of these “unknown but potentially real” genetic factors that some will exist even while most, under scrutiny, will turn out not to exist) and add in a huge number of other cultural confounds and then top all that variability off with methodological and definitional disagreements about bias, we arrive at the frustrating answer that these sources of error sum to larger than the observed racial differences in IQ. Since most of the factors of which we’re aware appear to show undermeasurement of disadvantaged people more than overmeasurement of the disadvantaged or undermeasurement of the advantaged, any apparent racial gap is likely to be in large part eliminated by measures that properly assess underlying intelligence. Combined with the fact that some genetic factors may be of the type we discussed with our hypothetical gene for extra sensitivity to lead poisoning, any remaining difference in observed racial mean IQ cannot be positively asserted to be a difference in genes for intelligence (unless resistance to lead poisoning is truly to be categorized as a “gene for intelligence”).
Finally, imagine that we finally get truly “environment neutral” IQ results as a combination of better tests, better test administration, uniformly excellent education, and the elimination of the effects of environmental racism that lead neighborhoods of color to be more toxic and racism and sexism generally, so that we don’t see racism- or sexism-determined differences in children’s heroes and intended adult careers (which can have the effect of focusing a child on things other than the types of education that positively affects IQ tests) or racism- or sexism-determined stereotype threat effects. Everything we know tells us that those effects are large enough to fully close the gap between racial and gender averages, but that they will be unlikely to precisely close the gap. There will likely always be some small difference between racial and gender groups.
At that point, we might be able to say that there is a genetically controlled difference in IQ test results, but if the difference is a single point on a scale where that equals 1/15th of a standard deviation, what is the significance of that result? As a mean, the difference could easily be 1/30th or 1/150th of a standard deviation. Yes, Harris and Murray are correct to say that there will almost certainly be some difference, but will it be meaningful? What will that meaning be?
Murray especially, but apparently also Harris want to say that since we know that we must eventually find a difference in mean racial IQ, we can go ahead and begin to ask questions about the meaning even before the exact quantity of difference is known. Worse, Murray and, apparently, Harris want to talk about the meaning of the genetic contribution to the differences in mean racial IQ. But at this point, and let me stress this,
given the fact that possible sources of error, methodological differences and deficiencies, and unacceptable cultural bias sum to more than the difference between mean racial IQ for whites and Blacks in the US, we don’t even know which racial group will prove to have the best genes for intelligence.
Going back to the hypothetical twins example above, Murray is asserting that since our current testing shows a multi-point deficit between Black mean IQ and white mean IQ in the US, then there is a multi-point genetic disadvantage for Black folk on IQ tests = 50% to 80% of the total disadvantage. But we cannot actually know this. We do not actually know this.
Right now we desperately need to address poverty and environmental racism, including but not limited to eliminating malnutrition and exposures to toxic levels of environmental lead. We desperately need to improve our schools generally and to create a uniform minimum standard that disproportionately raises outcomes in our least performing schools. In 20 years I’d like to see our worst public schools deliver results that would place them a standard deviation above the mean today. During those 20 years, intelligence researchers should be encouraged to investigate many cultural effects on IQ tests and do the hard work of determining whether those effects constitute confounds in measuring intelligence or whether they constitute positive contributions to the core of what an intelligence test actually should measure.
At that point, we can revisit what we know to see if we’ve actually reached a point where we have a measurement of genetic contribution to differences in racial mean IQs. Until then, we’re ill served by giving attention to Murray and those who support him in prematurely discussing the meaning of a genetic contribution we’re not yet able to measure.
polishsalami says
Let’s take a step back here.
Charles Murray is not a geneticist; neither is Sam Harris. In fact, if you look at who is promoting this stuff, it never seems to be anyone trained in genetics. So why does anyone take these people seriously?
colnago80 says
Post on Panda’s Thumb seems to claim that there is a genetic and heritability component to intelligence as measured by IQ and/or g.
https://pandasthumb.org/archives/2018/03/general-intelligence.html
Crip Dyke, Right Reverend Feminist FuckToy of Death & Her Handmaiden says
@colnago80:
Did you read my post? Because I’ve now read the Panda’s Thumb article and it does literally nothing to refute what I’ve written here.
My argument here is that while g is measured by IQ tests, IQ tests also measure other things. Among these things that affect scores (and thus are indirectly measured by IQ tests) are whether or not you’ve had breakfast on the day of the test (or, indeed, whether or not you regularly miss breakfast).
Murray and Harris like to believe that since IQ tests *do* measure g, we can safely assume that the gaps in mean IQ test performance between racial groups is a measure of actual gaps in g.
But assuming the g theory is correct (and I think some version of the g theory is: though “multiple intelligences” theory has some merit, I don’t believe that there is literally zero connection between types of intelligence), by definition it is resistant to change. Therefore the observed variability over short periods of time of test results for a single subject on multiple instances of the same test (The PT article estimates it at 3 points) is almost certainly not a measure of g. Likewise, interrater reliability is high, but not perfect. The points of variation attributable to interrater reliability are also not measuring g. The effects of Stereotype Threat are well documented and can be induced or (largely) ameliorated in a controlled setting. The points of variation attributable to the effects of stereotype threat are therefore also not measuring g.
Then there are portions of the test that do measure g. If white folks and Black folks have the same genetic contribution to g at birth, but Black persons experience greater exposure to toxic levels of lead, then Black persons will be disproportionately prevented from reaching their genetic potential g. If 50 points of an average, unpoisoned person’s IQ test is, on average, attributable to g but only 40 points of an average, poisoned person’s IQ test is attributable to g, then 50% (50/100) of the first person’s test is measuring g, but only 44.4% (40/90) of the second person’s test is measuring g. If lead poisoning is unevenly distribute by race (and it is) then the accuracy with which a test measures g depends, in part, on an underlying factor that correlates with race, and thus the accuracy of the test (if by accuracy we mean the extent to which it is measuring what it is intended to measure, g, and not other factors) differs between races.
This leads to 2nd order errors, in which even though g is in fact different, the extent of the difference and the appropriate measure of how much of that difference is genetic are both misestimated by the naive analysis of someone like Murray. Ironically, if Murray were consistent in a particular estimate of how much of a score is attributable to g and how much of g is genetic, the case of lead poisoning would lead to overestimating the g of Black test takers (50% of 90 is 45, but the scenario calls for holding other effects equal, save for the 10 point drop in g from lead poisoning, thus the underlying g measurement should be 40). This doesn’t help Black people, though, because by holding steady the genetic contribution at, say, 50%, Murray would estimate a genetic contribution of 22.5 IQ points to g. Instead, the genetic contribution would be the same 25 IQ points, but the environmental contribution to g‘s IQ measurement is stunted at 15 points because of the effects of lead toxicity.
The point here is not that g does not exist. Nor is the point that g has no genetic contribution.
The point is that by looking only at gaps between the mean performance of different groups – racial or otherwise – even a test that actually measures g may not measure g with the same accuracy for each group (and when it does measure it equally accurately, doesn’t necessarily measure the genetic component with equal accuracy), and thus may produce a gap that does not measure a gap in g, or, when it does, may produce a disparity that despite measuring a gap in g does not measure a gap in the genetic contribution to g.
Murray shows no understanding of this. Harris, too, shows no understanding of this. Reading your comment pointing to the Panda’s Thumb article, I’ve wondered if you understand this.
Do you understand it now, or is there something else I should clarify?
colnago80 says
Just the set the record straight, I am in no way, shape, form, or regard supporting Harris and especially Murray, who has no expertise or training to pontificate on this subject. I think that Harris made a major mistake in not being more confrontational with Murray in that interview. This is rather sad because, IMHO, Harris is a serious person and Murray is a phony.
I would point out something relative to Murray. His co-author of the Bell Curve book, Richard Hearnstein, wrote a number of essays on this subject, some of them in the racist journal “Mankind Quarterly”. My recollection from the now somewhat distant past is that Hearnstein appeared to agree with many of the premises of that disreputable publication. I have always taken the position that one who gets into bed with the pigs cannot complain if he/she emerges with a coating of mud.
As I pointed out in a comment on the Panda’s Thumb article, the issue of inheritability of IQ starts out behind the 8 ball because many of the articles written before 1970 all cite Cyril Burt’s work on the subject. It is now known that Burt’s “research” is very much dubious at best and an outright fraud at worst (it seems that ole Cyril may well have dry labbed his experiments).
https://en.wikipedia.org/wiki/Mankind_Quarterly
Crip Dyke, Right Reverend Feminist FuckToy of Death & Her Handmaiden says
That’s cool, colnago80.
I honestly didn’t assume you were supporting Murray and/or Harris. Your comment was so short that I wondered if it were possible, but it was only a possibility I entertained, not something I really thought/believed.
Nonetheless, after reading your comment I found it useful not only to write what I did here, but also to write a new post that was oriented differently than this one. I hope I have not done either in a way that misrepresents you…
…and, please, let me know if I have! I’ll correct things as appropriate as quickly as I can.
ETA: Also I realize now that “do you get it now” was ungenerously phrased. I’ve revised it to what I think is a more neutral phrasing.
Pierce R. Butler says
I no longer comment on Panda’s Thumb due to an advanced case of Disqus-phobia, or I’d ask this there.
The authors of the “general intelligence” article there do not directly address the widely-circulated claim that if a given person takes IQ tests several times in a row, they will usually score higher each time due to learning the IQ-test-taking skills that those tests “really” measure.
Has anyone done a reasonably rigorous study to confirm/measure this effect?
Crip Dyke, Right Reverend Feminist FuckToy of Death & Her Handmaiden says
@Pierce R. Butler:
Actually there has been quite a bit of research on learning effects – not just from taking the test multiple times, but also attempting to measure the effects of “brain training” websites, formal education, and similar factors.
In most cases any effect persisting after the training is over is temporary – at best, it’s a medium term effect on the order of weeks-to-a-few-months. More frequently it’s significantly less than a month. If you’re constantly engaging in relevant, problem-solving games or education, there’s not currently a reason to believe that the effect wouldn’t last as long as you continue to pursue that activity/education. The size of the effect appears to depend on the nature of the activity/education, but reaches a fairly hard cap, no matter how good the program is, at around half a standard deviation or just a bit more.
There’s a range of uncertainty that makes it possible that the effects peak at over half a standard deviation, but my best guess is that the effects of learning are actually lower than that for adults. Two very important confounds here are the effects of motivation and anxiety. (Stereotype threat is one example of an anxiety effect, but anxiety from any source can affect test performance.) Persons who have practiced similar problems to those presented on IQ tests probably have different motivation levels in real life (though not necessarily in the lab setting) and lower anxiety. The effect we see from practice or from educational programs or from “brain games” and similar things is probably a combination of learning effects (which, again, are temporary) + lower anxiety effects. There is a significant chance that motivation effects are also being measured. Thus while we can say that you can find a bit more than a half standard deviation in some cases, not all of that will be due to learning. I’m not sure whether you care much about the distinction, but your question assigned responsibility for practice effects
which is only partially true, so I thought I ought to explain.
Additionally, we know that the learning has to be tied to the specific types of problems solving that will be present on the IQ test, but it doesn’t have to be simply taking the test multiple times. Music lessons, for instance, can have some effect on spatial reasoning tests. IIRC the best results were for keyboard/piano lessons. One possible explanation for this is that not all instruments allow you to watch yourself play (violin and guitar, for example). Further, the results achieved for young children were much larger effects than for older children or adults in similar music classes. Also, as I said, the effect was limited to specific kinds of intelligence testing. Music lessons don’t increase scores across all problem-solving tests. This and other factors*1lead researchers to be very confident that the learning does not actually increase general intelligence (“g”). So we are able to state not only that, you are correct, skill learning can increase IQ scores, but also that the portion of the IQ score represented by the increase is measuring something other than g.
Does that help?
=================================
*1: plus prior knowledge from other research that in humans learning/studying one skill does not make it easier to perform novel, unrelated tasks, nor does it necessarily make learning other skills any easier
Crip Dyke, Right Reverend Feminist FuckToy of Death & Her Handmaiden says
Oh, I should have given you some links. There are many here, but perhaps the most relevant bit of research is that which shows that certain effects – probably motivation effects – can actually account for all of the observed gains associated with practice effects or skill learning.
Here’s the study.
But just because the effect size (here labeled a “placebo” effect, though I think it is likely to be a motivation effect, which is itself a combination of arousal effects and attention effects) is sufficient to explain a 10 point IQ rise (2/3rds of one standard deviation), and just because a 10 point rise is about the most we ever see from educational programs, that doesn’t mean that skill learning has zero effect on IQ testing. The nature of the placebo study lends itself to particularly large motivation effects. It might be that motivation effects are much less with other educational programs, especially where the educational program lasts over weeks such that the arousal effect aspect of motivational effects might be lessened. (Whereas in the placebo study, the methodology seemed almost designed to heighten arousal effects)
Back to making lunch.
Pierce R. Butler says
Crip Dyke… @ #s 7 & 8: Does that help?
Considerably (thanks!), though it also adds to some of the inevitable confusion (f’rinstance, how the word “learn” blurs short(er)-term and long(er)-term effects).
… skill learning can increase IQ scores, but also that the portion of the IQ score represented by the increase is measuring something other than g.
Which makes “g” even fuzzier than before. (As a long-time science fiction reader, I always have to correct my initial reading of that as denoting gravity, leaving me even less grounded within an already-floaty conceptual space.) Just how do we distinguish general-intelligence g from ability – and motivation! – to learn things (short-term or otherwise)?
I basically took for granted that the IQ-test training programs in business are mental snake oil because USAians daily buy enough snake oil to flood the country.
Yr footnote *1 apparently explicitly contradicts the Kane/Willoughby assertion that
I take no position either way on the one/many types of intelligence debate, other than noting how relatively little seems settled in these questions. Human science does worse at the study of humans than at any other subject.
Pls consider revising your link formatting so that it differentiates somehow from the general text of your comments.
Back to doing the work I should’ve dug into hours ago…
Crip Dyke, Right Reverend Feminist FuckToy of Death & Her Handmaiden says
yeah, I need to do that. Thanks.
As for the contradiction with the Kane article, it’s easily possible for a less than 100% correlation (as we see between different tests of cognition or problem-solving skills) to be present even while things that cause a fluctuation of 5-10% on one of those tests do not impact results on other tests. It’s only if we had a correlation of 0.97+ that we would find a flat-out contradiction of results that show movement on one test of 5%-10% without showing any movement on the other(s). The correlations we have are significantly lower than that (I think Kane reported 0.7, but I didn’t follow the link to the original research on that).