What the Flynn effect tells us about intelligence

I thought I would use the recent resurgence of interest in the issue of intelligence and race to highlight some lesser-known and more technical aspects of this contentious debate.

While everyone has some intuitive sense of what intelligence consists of, these vary widely from individual to individual due to the amorphous nature of the concept. Is it verbal fluency? Numerical adeptness? Critical thinking? Logical skills? Depending on one’s preferences, one can come up with many different ways of defining intelligence and testing for it. When it comes to quantifying intelligence and trying to measure it (assuming that it can be reduced to a single measure, itself a highly problematic thesis) one must realize that any measure is always a proxy for the quantity being sought and the issue becomes how good a proxy it is.

Any test measures something. The question is what that something is and how we interpret the result. Charles Spearman in the early 1900s asserted that, while individual tests may be context-dependent, if we take a large number of tests of different kinds, then we can statistically extract from them a single number g (which is now referred to as “Spearman’s g“). Moreover, this number will be context-independent and so will provide a meaningful and unitary measure of “general intelligence.” Thus, just as length and time can be measured, people can be ranked along a linear intelligence scale that allows for comparison of intelligence. Similarly, the quality of various I.Q. tests can be evaluated by the amount of “g-loading” they have. That is, those tests that correlate strongly with g are deemed “better” at evaluating a person’s intelligence than those that do not.

The problem with such a definition of intelligence lies in the assumption that because you can calculate a single number g for an individual, that number necessarily measures an existing property (this is the problem known as reification, where we assign an objective reality to the result of a measurement). In the 1930s, such critics as L. L. Thurstone argued that there could be many different kinds of cognitive abilities and that the same array of I.Q. test results could equally well be analyzed so that they clustered around many different centers, with each center measuring a different cognitive property. The critics argued that to take some overall average of these different measures (Spearman’ s g) is to get a meaningless number. The proposed number of different facets of intelligence that can be measured is more than 200 – a huge difference from the single measures we use for length or time.

To draw an analogy to grading, we could calculate a set of grade-point averages (GPAs) for clusters of subjects, each dealing with a distinct aspect of knowledge: the physical sciences, the life sciences, the social sciences, the cognitive sciences, the fine arts, the humanities, athletics, and so forth. Students could score high in one area and low in another, and it could plausibly be argued that this is because the GPA of each cluster measures a different kind of ‘intelligence’. Graduate departments of physics, conservatories of music, medical schools, drama schools, political think tanks, and so forth might find one or more of these GPAs a more meaningful measure than the overall GPA of the kind of abilities they look for. The debate over what constitutes intelligence is similar, and there is no consensus on which approach is better. But Spearman’ s g created a belief that lives on in the minds of many who believe that 1.Q. test scores are valid measures of a real human property.

But as I argued in a previous post, even if we concede the point that I.Q. test scores are good proxy measures for intelligence and measured something intrinsic to an individual that was largely due to their genes, and was also heritable and immutable, that still did not mean that differences in average values for different groups were due to the genes. They could be entirely due to the environment.

When I.Q. scores are tabulated for each cohort (say within any large enough group like a nation), the scores are normed to give an average of 100 and a standard deviation of 15. This has the advantage that within any cohort, your score immediately tells you your ranking within that cohort. Since the average scores for the population is fixed at 100 each time, what you lose in this process is any longitudinal information, how average scores have changed over time. Psychometrician James Flynn’s work plays an important role in this debate. When he looked at the I.Q. scores of nations over time, he found quite dramatic gains in average I.Q., a increase of 18 points over a 54-year period from 1948 to 2002. This was a gain of 0.33 points per year, a very rapid increase indeed. If I.Q. scores are largely genetically-based, such a rapid increase cannot happen. Note that this increase in scores is larger than the 15-point gap between black and white that Charles Murray trumpeted as signifying black genetic inferiority. (Incidentally, the black-white gap is now 10 points, not the 15 at the time of publication of The Bell Curve and which he still uses.)

How did Flynn figure this out if scores are always normed for each cohort to have an average of 100? What Flynn did was to go back and take I.Q. tests from various times in the past and give them to people now. What he found was that the older the test was, the higher the scores of people now on those tests. In other words, people now seemed to have much higher I.Q. than people just 50 years ago, well beyond the reach of any genetic explanation.

What might be the causes of this increase in averages and the narrowing of the gap? Many things. Education is becoming more widespread in that more people are going to school for longer periods and the curriculum has also become more advanced. Furthermore, life itself has become more complex, even on a technological level, requiring people on a daily basis to develop skills to survive that their parents and grandparents did not need. If one looks at popular culture, TV shows and films now feature multiple interweaving complex plotlines to follow and requiring inferential skills, a far cry from the straightforward narratives of the past. All the skills required to navigate the world are reflected in the I.Q. tests. In other words, I.Q. tests measure what people can do, not what they are, and what they can do depends on what they are taught and what they need to do to live their lives, i.e., their environment.

Marcus Ranum says

April 12, 2018 at 1:55 pm

It seems to me that ‘g’ is used to set up a circular argument. IQ tests measure ‘g’. What is ‘g’? It’s abstract intelligence. Therefore, IQ tests measure intelligence.

But not so fast: we still don’t know what ‘g’ is.

In other words, people now seemed to have much higher I.Q. than people just 50 years ago, well beyond the reach of any genetic explanation.

If you don’t already know them, I can teach you some test-taking tricks that will improve your overall scores by around 10%. Those are techniques like: running through the test and doing similar tests together (we seem to not have to “shift mental gears” for certain types of abstract problems), eliminating the worst option for guessing on things you don’t know, and quickly answering the questions you think you can do well on while you’re fresh. Clearly, IQ tests don’t just measure IQ, if test-taker can upload a few new test-taking algorithms and improve their performance that much, so fast. There’s also a perceptible practice effect in IQ tests. If I take a test twice in a row and am able to alter my score by 2%-5% just by doing that, The racists claim there’s a 9% difference between some races, and simple changes to test-taking (let alone practice and study) can easily account for that.

I just had an idea which I don’t think I’m going to do (mostly because IQ tests are so stupid they make me want to throw things) -- what if someone measured IQ test performance over time, in a group of people who practiced and studied IQ test-taking. We’d expect to see improvements across the board, and some marked improvements. Presumably that would mean that the subjects were becoming more jewish.

Comments

AndrewD says

April 12, 2018 at 1:55 pm

Hi Mano,
Mike the mad biologist has said many things about this subject-one post from 2012 is linked below. It is one of many.
https://mikethemadbiologist.com/2012/02/14/the-question-charles-murray-needs-to-answer-and-cant-why-are-massachusetts-whites-superior-to-alabamas/
Marcus Ranum says

April 12, 2018 at 1:55 pm

It seems to me that ‘g’ is used to set up a circular argument. IQ tests measure ‘g’. What is ‘g’? It’s abstract intelligence. Therefore, IQ tests measure intelligence.

But not so fast: we still don’t know what ‘g’ is.

In other words, people now seemed to have much higher I.Q. than people just 50 years ago, well beyond the reach of any genetic explanation.

If you don’t already know them, I can teach you some test-taking tricks that will improve your overall scores by around 10%. Those are techniques like: running through the test and doing similar tests together (we seem to not have to “shift mental gears” for certain types of abstract problems), eliminating the worst option for guessing on things you don’t know, and quickly answering the questions you think you can do well on while you’re fresh. Clearly, IQ tests don’t just measure IQ, if test-taker can upload a few new test-taking algorithms and improve their performance that much, so fast. There’s also a perceptible practice effect in IQ tests. If I take a test twice in a row and am able to alter my score by 2%-5% just by doing that, The racists claim there’s a 9% difference between some races, and simple changes to test-taking (let alone practice and study) can easily account for that.

I just had an idea which I don’t think I’m going to do (mostly because IQ tests are so stupid they make me want to throw things) -- what if someone measured IQ test performance over time, in a group of people who practiced and studied IQ test-taking. We’d expect to see improvements across the board, and some marked improvements. Presumably that would mean that the subjects were becoming more jewish.
Acolyte of Sagan says

April 12, 2018 at 2:02 pm

I.Q. tests measure what people can do, not what they are, and what they can do depends on what they are taught and what they need to do to live their lives,

That is possibly the best summary of the problem with I.Q. testing I’ve seen to date.
Mobius says

April 12, 2018 at 2:13 pm

I can recall all those standardized test we took as kids in grade school and high school. In many ways those test were very similar to IQ tests I have taken. I had become quite good at take those tests and also scored well on the IQ tests.

Is there a correlation? Did those standardized test teach me how to take that type of test and thus helped me score high on IQ tests? Has the same happened with other people?
Reginald Selkirk says

April 12, 2018 at 2:32 pm

Mobius #4: I tend to agree. I was very good at taking those standardized tests, having grown up in a state that required them regularly. I don’t think I have taken an actual IQ test more than once or twice (I recall seeing a score when I was in high school). MENSA accepts standardized tests such as the SATs in place of actual IQ scores.
deepak shetty says

April 12, 2018 at 4:31 pm

Clearly, IQ tests don’t just measure IQ, if test-taker can upload a few new test-taking algorithms and improve their performance that much, so fast

Exactly. When I was in my teens, my older sibling was planning on taking a GRE test (a vocabulary, math and logic test needed to come to the USA and is similar to a lot of IQ tests) and his initial score was 1600 out of 2400 -- which was inline with my expectation of his intelligence and I had a good laugh at his expense. 3 months later after he read the Barrons guide and a few practice tests , he was upto 2200 out 2400. Having a low opinion of my sibling at that time , and having to choose between my older sibling is really intelligent and these tests suck and dont really measure intelligence -- I chose “these tests suck and dont measure anything useful” and havent seen any reason to reconsider. Reading isaac Asimov and his views on Mensa also shaped some of my views.
anat says

April 12, 2018 at 6:36 pm

Recommended reading: ‘The End of Average’ by Todd Rose. Rose shows 2 strands that became common in our culture with respect to the average -- one is ‘average is representative and desirable’ -- systems designed for ‘the average user’ fail because the more traits you average the fewer people there are that meet the definition of average. The other strand is ‘average is a standard one must beat’. This causes a narrow competition based on measurements that are often irrelevant. If we give out prizes (or places in college or in the workforce etc) based on who beats the average on one type of test we are constantly selecting a narrow subset of the population regardless of utility, let alone justice. Because talent is ‘jagged’ (success in one area does not necessarily correlate with success in other areas -- unless the system uses one such area as a requirement).
Heidi Nemeth says

April 13, 2018 at 11:59 am

@4 My understanding is that IQ tests must, by definition, be administered by a trained professional in a one-on-one setting. Aptitude tests, which are given in groups settings with proctors, are otherwise identical to IQ tests. Often they serve as proxies for IQ tests and will even be referred to as “IQ indicative” tests.
brutus says

April 16, 2018 at 10:34 pm

I don’t much care about the history or technical descriptions of what intelligence might be. It’s the source of significant taboo and goes to the heart of identity, much like sexual identity commonly used to attack people (or their mothers). There’s little doubt that some people possess greater or less intelligence, however it’s measured or observed. Big deal … It neither validates nor invalidates a person. Same is true of any number of other human features or abilities. Of course, that doesn’t stop anyone from discriminating on such bases. I know that I skew high on some spectra, low on others, and average on most. Again, big deal …

Mano Singham

Just another Freethought Blogs site

A lapsed atheist's journey back to faith

Goddamn cancer

Sidebar Mysteries Solved?

The Greater Gardening of 2026 - Part 7 - Tilling Topsoil

Origami: Horses

The Probability Broach: Crime is everywhere, crime, crime

What the Flynn effect tells us about intelligence

Comments

Leave a Reply Cancel reply

Share this:

Comments

Leave a Reply Cancel reply