The second biggest problem is that the social science types would have better results if they tried to fit their theories to evidence/people rather than trying to manipulate the evidence/people to fit their theories.
Maybe the issue with Money’s patients wasn’t that they were transgender but that they were being sexually abused by an adult who was also supposed to be acting as their doctor.
BUT the fact that his two patients committed suicide (one actively, the other by overdose, aka slow suicide) was seized upon by the other intellectual lightweights in the field as “evidence” that transgendered people are all inherently sick and suicidal.
GIGO
Or maybe the field is especially attractive to the dimwitted and/or abusive, in the same way that the Boy Scouts are an attractive opportunity for pedophiles. Both groups are notorious for not weeding out the offenders but instead covering up the problem.
i know this is true of BMI, but what are some other examples?
MBTI, IQ, MMPI, DSM, the Authoritarian Index… The problem is that the social sciences want to categorize people based on non-measurable or self-reported criteria, which leaves them vulnerable to this critique. I feel it’s a fair critique, generally – there are too many indices and inventories that depend on subjective self-reporting, which is funny because they simultaneously recognize that subjective assessment does not translate into an objective diagnosis, which is the whole point of the exercise. Diagnosis by listicle.
BMI is a good one. Quetelet asked a few of his acquiantences who he thought were particularly well-built what they weighed, and that was the input into his ideal height/weight chart. Later, the chart was captured in a formula tweaked to yield the same results. And BMI was born. Quetelet did not know Bruce Lee, and appears to have taken his friends’ self-reported weight as gospel, because, you know, they were trustworthy people and nobody would ever lie about a few pounds.
jeanmesliersays
@Marcus Ranum #5
IQ is a glaring example , its validity is essentially derived from a cricular definition of “intelligence”. Especially amusing since our beloved JayPee is a raging defender of it while moaning about “gender ideology” (gender identity, which has already outlived IQ by millenia)
robert79says
When I saw the cartoon my first thought was: the 0.05 threshold for null hypothesis significance testing. Especially the last panel.
Pierre Le Fousays
@robert79 That was also my first thought. Ever since I was a student I wondered why 5% ? Shouldn’t the threshold be adjusted for different experiments, based on how important the experiment is? Well, yes, they do that for some critical physics experiments, at least.
Alcoholics Anonymous (AA). At the time, it was seen as the only treatment for alcoholism, it was “good enough” so that no one bothered to see if it actually worked, and now it’s the de-facto treatment option in the US; so much so that it’s oftentimes court-ordered.
This isn’t to say that alcoholism isn’t a problem, or that AA doesn’t work for some people, just that the scant information available indicates that it’s not terribly effective overall.
IMO, because it relies on a “alcoholism is a personal failure” mentality, this places the guilt & fault on the individual. We now look at alcoholism as a disease worthy of medical intervention and counseling (see: naltrexone). I find AA to be true as per the comic PZ posted.
chrislawsonsays
robert79–
To be fair, the p<0.05 standard was always known to be arbitrary. This is how Fisher put it in his discipline-defining Statitistical Methods For Research Workers (1925):
‘We shall not often be astray if we draw a conventional line at .05, and consider that higher values of χ2 indicate a real discrepancy.’
This was long before the modern era of big data, back when most studies were small and only tested one or two hypotheses. If Fisher were alive today, he would definitely advocate tougher p values when multiple hypotheses are tested. And good researchers know this. I’ve seen papers that set p<0.01 as their test for significance.
chrislawsonsays
(BTW, in case people are wondering, it does look from that quote like Fisher is equating the p value and the chi-square value. This is an artefact from keeping the quote as short as possible.)
Jim Baltersays
@11
Go back and read the cartoon above.
chrislawsonsays
@13–
Umm, what part of me saying p<0.05 is an arbitrary level would be changed by any of those comics?
It’s both important and extremely discouraging to remember that the foundational work in the “original” social science — Smith’s The Wealth of Nations — was written as a work of moral philosophy, arguing both morally and practically (in this sense, somewhat anticipating Kant, Russell, Schopenhauer, and Heidegger) against abuses of power by the politicoeconomic elites of his time. And from that we end up with von Mises and the Austrian and Chicago Schools in orthodox economics, which do the exact opposite, usually by twisting data in ways that make p-hacking look honest…
StevoRsays
@ ^ Jaws : Moral philosophy – so ethics? Kinda?
chrislawsonsays
@15–
Objection! Austrian school economists don’t need to p-hack. They don’t even need data.
simpliciosays
Yes. And Galton had the highest IQ ever! How else could he have come up with such definitive techniques for determining IQ? Some people are simply so talented that they know what the results should be, and for people who don’t measure up to what standard is set, that’s their problem. If you’re white, you’ve met the standard.
A good example of knowing what’s best is Naom Chomsky’s determination of what is required for a human language, proving definitively that the Pirahã must be subhuman. Oliver Wendell Holmes realized that stupidity is congenital when he ruled that three generations of people who are uneducated are enough, and that Carrie Buck should be neutered. [Yes, the construction of that sentence was intentional.]
captainblacksays
@robert79 #7
All of classical hypothesis testing, and confidence intervals belong in the dustbin of statistical history. Bayesian methods make their assumption explicit (or at least more explicit), and give what the customer actually wants. That is the probability of something actually being the case conditioned on explicit assumptions and prior knowledge.
jo1stormsays
@17
Socratic method is enough! Humans are rational 100% of time, act rational 100% of time and have perfect information 100% of time! So there! /s
The problem with BMI is that people are using it incorrectly. It’s a measure for populations, not individuals. Sure, you can find people who have a high BMI who are perfectly healthy, perhaps because they’re body builders with a tremendous amount of muscle mass. You can also find people with a “healthy” BMI who are not healthy because they have very little little muscle mass (e.g., the elderly). But those are exceptions in a given nation, not the norm. If you don’t fall into a special category like that and you have a high BMI, sorry, but that’s associated with health risks. In short, if you happen to be employed in the health sciences and you’re looking at populations, BMI can be useful. But, as an individual, it doesn’t tell you everything, but don’t ignore your BMI if you honestly don’t fall into one of the extreme categories.
I should add that another fairly useless metric is GPA. The idea that a four year college experience can be boiled down to a two-digit number is laughable given all of the variables. I think that you can say in general terms that someone with a 3.7 will likely be a bit better than someone with a 2.2 (same degree, same major), but that’s only true if they have similar backgrounds and support systems. And it also depends on the kind of job they will be doing along with their personality. Anyone who asserts that person A with a 3.2 must be “better” than person B with a 2.9 is not realistic. I used to tell my students that if an interviewing company only wanted to talk to grads with a 3.5 or higher (or whatever level), it was probably not a company they’d want to work for.
birgerjohanssonsays
Jimf @ 23
You mean reality has more than one dimension? But that makes things so cooomplicated…(sark).
chrislawsonsays
@24–
Be careful. You start innocently pointing out that reality is more than one-dimensional and pretty soon you’re advocating string theory.
wzrd1says
I’m still trying to wrap my head around BMI calculations based upon body builders, who are a decided minority throughout the world and well, within any nation thereof.
Given that BMI is an average of an entire population.
The index, an epidemiological model of healthy vs unhealthy persons, based upon that metric.
Which ends up, I need to lose 10 pounds. Then, hopefully, I’ll continue to evade type 2 diabetes, which is otherwise 100% in my peer group within my family.
And I remain non-diabetic.
Could just still be an outlier.
So, less cookies at night, forget about a cake.
And get off of my fat ass and walk to the store more for more greens, which I love anyway.
And maybe cut back on two servings of lard… ‘)
Great American Satan says
i know this is true of BMI, but what are some other examples?
Oggie: Mathom says
Perhaps the IQ test?
wzrd1 says
It’s all about merit, because we removed all demerits.
Which imparts job security for the folks at Retraction Watch.
silvrhalide says
The biggest problem with the social sciences is that their founder was a Victorian dude who did enough cocaine to stun a pony and wanted to fuck his daughter.
GIGO
https://www.vox.com/2015/8/27/9216383/irreproducibility-research
https://www.nytimes.com/2016/03/04/science/psychology-replication-reproducibility-project.html#:~:text=The%202015%20report%2C%20called%20the,retested%20by%20an%20independent%20team.
https://www.science.org/doi/10.1126/science.aac4716?amp;keytype=ref&siteid=sci&ijkey=1xgFoCnpLswpk
https://en.wikipedia.org/wiki/Reproducibility_Project
The second biggest problem is that the social science types would have better results if they tried to fit their theories to evidence/people rather than trying to manipulate the evidence/people to fit their theories.
And then there’s this:https://en.wikipedia.org/wiki/John_Money
https://www.jhunewsletter.com/article/2014/05/hopkins-hospital-a-history-of-sex-reassignment-76004/
Maybe the issue with Money’s patients wasn’t that they were transgender but that they were being sexually abused by an adult who was also supposed to be acting as their doctor.
BUT the fact that his two patients committed suicide (one actively, the other by overdose, aka slow suicide) was seized upon by the other intellectual lightweights in the field as “evidence” that transgendered people are all inherently sick and suicidal.
GIGO
Or maybe the field is especially attractive to the dimwitted and/or abusive, in the same way that the Boy Scouts are an attractive opportunity for pedophiles. Both groups are notorious for not weeding out the offenders but instead covering up the problem.
Marcus Ranum says
i know this is true of BMI, but what are some other examples?
MBTI, IQ, MMPI, DSM, the Authoritarian Index… The problem is that the social sciences want to categorize people based on non-measurable or self-reported criteria, which leaves them vulnerable to this critique. I feel it’s a fair critique, generally – there are too many indices and inventories that depend on subjective self-reporting, which is funny because they simultaneously recognize that subjective assessment does not translate into an objective diagnosis, which is the whole point of the exercise. Diagnosis by listicle.
BMI is a good one. Quetelet asked a few of his acquiantences who he thought were particularly well-built what they weighed, and that was the input into his ideal height/weight chart. Later, the chart was captured in a formula tweaked to yield the same results. And BMI was born. Quetelet did not know Bruce Lee, and appears to have taken his friends’ self-reported weight as gospel, because, you know, they were trustworthy people and nobody would ever lie about a few pounds.
jeanmeslier says
@Marcus Ranum #5
IQ is a glaring example , its validity is essentially derived from a cricular definition of “intelligence”. Especially amusing since our beloved JayPee is a raging defender of it while moaning about “gender ideology” (gender identity, which has already outlived IQ by millenia)
robert79 says
When I saw the cartoon my first thought was: the 0.05 threshold for null hypothesis significance testing. Especially the last panel.
Pierre Le Fou says
@robert79 That was also my first thought. Ever since I was a student I wondered why 5% ? Shouldn’t the threshold be adjusted for different experiments, based on how important the experiment is? Well, yes, they do that for some critical physics experiments, at least.
Jim Balter says
@7 @8 Obligatory XKCD’s:
https://xkcd.com/539/
https://xkcd.com/882/
https://xkcd.com/1132/
https://xkcd.com/1478/
https://xkcd.com/2533/
Eric says
Alcoholics Anonymous (AA). At the time, it was seen as the only treatment for alcoholism, it was “good enough” so that no one bothered to see if it actually worked, and now it’s the de-facto treatment option in the US; so much so that it’s oftentimes court-ordered.
This isn’t to say that alcoholism isn’t a problem, or that AA doesn’t work for some people, just that the scant information available indicates that it’s not terribly effective overall.
IMO, because it relies on a “alcoholism is a personal failure” mentality, this places the guilt & fault on the individual. We now look at alcoholism as a disease worthy of medical intervention and counseling (see: naltrexone). I find AA to be true as per the comic PZ posted.
chrislawson says
robert79–
To be fair, the p<0.05 standard was always known to be arbitrary. This is how Fisher put it in his discipline-defining Statitistical Methods For Research Workers (1925):
‘We shall not often be astray if we draw a conventional line at .05, and consider that higher values of χ2 indicate a real discrepancy.’
This was long before the modern era of big data, back when most studies were small and only tested one or two hypotheses. If Fisher were alive today, he would definitely advocate tougher p values when multiple hypotheses are tested. And good researchers know this. I’ve seen papers that set p<0.01 as their test for significance.
chrislawson says
(BTW, in case people are wondering, it does look from that quote like Fisher is equating the p value and the chi-square value. This is an artefact from keeping the quote as short as possible.)
Jim Balter says
@11
Go back and read the cartoon above.
chrislawson says
@13–
Umm, what part of me saying p<0.05 is an arbitrary level would be changed by any of those comics?
Jaws says
It’s both important and extremely discouraging to remember that the foundational work in the “original” social science — Smith’s The Wealth of Nations — was written as a work of moral philosophy, arguing both morally and practically (in this sense, somewhat anticipating Kant, Russell, Schopenhauer, and Heidegger) against abuses of power by the politicoeconomic elites of his time. And from that we end up with von Mises and the Austrian and Chicago Schools in orthodox economics, which do the exact opposite, usually by twisting data in ways that make p-hacking look honest…
StevoR says
@ ^ Jaws : Moral philosophy – so ethics? Kinda?
chrislawson says
@15–
Objection! Austrian school economists don’t need to p-hack. They don’t even need data.
simplicio says
Yes. And Galton had the highest IQ ever! How else could he have come up with such definitive techniques for determining IQ? Some people are simply so talented that they know what the results should be, and for people who don’t measure up to what standard is set, that’s their problem. If you’re white, you’ve met the standard.
A good example of knowing what’s best is Naom Chomsky’s determination of what is required for a human language, proving definitively that the Pirahã must be subhuman. Oliver Wendell Holmes realized that stupidity is congenital when he ruled that three generations of people who are uneducated are enough, and that Carrie Buck should be neutered. [Yes, the construction of that sentence was intentional.]
captainblack says
@robert79 #7
All of classical hypothesis testing, and confidence intervals belong in the dustbin of statistical history. Bayesian methods make their assumption explicit (or at least more explicit), and give what the customer actually wants. That is the probability of something actually being the case conditioned on explicit assumptions and prior knowledge.
jo1storm says
@17
Socratic method is enough! Humans are rational 100% of time, act rational 100% of time and have perfect information 100% of time! So there! /s
Ray Ceeya says
If you ever wondered why water freezes at 32F, THIS!
jimf says
The problem with BMI is that people are using it incorrectly. It’s a measure for populations, not individuals. Sure, you can find people who have a high BMI who are perfectly healthy, perhaps because they’re body builders with a tremendous amount of muscle mass. You can also find people with a “healthy” BMI who are not healthy because they have very little little muscle mass (e.g., the elderly). But those are exceptions in a given nation, not the norm. If you don’t fall into a special category like that and you have a high BMI, sorry, but that’s associated with health risks. In short, if you happen to be employed in the health sciences and you’re looking at populations, BMI can be useful. But, as an individual, it doesn’t tell you everything, but don’t ignore your BMI if you honestly don’t fall into one of the extreme categories.
jimf says
I should add that another fairly useless metric is GPA. The idea that a four year college experience can be boiled down to a two-digit number is laughable given all of the variables. I think that you can say in general terms that someone with a 3.7 will likely be a bit better than someone with a 2.2 (same degree, same major), but that’s only true if they have similar backgrounds and support systems. And it also depends on the kind of job they will be doing along with their personality. Anyone who asserts that person A with a 3.2 must be “better” than person B with a 2.9 is not realistic. I used to tell my students that if an interviewing company only wanted to talk to grads with a 3.5 or higher (or whatever level), it was probably not a company they’d want to work for.
birgerjohansson says
Jimf @ 23
You mean reality has more than one dimension? But that makes things so cooomplicated…(sark).
chrislawson says
@24–
Be careful. You start innocently pointing out that reality is more than one-dimensional and pretty soon you’re advocating string theory.
wzrd1 says
I’m still trying to wrap my head around BMI calculations based upon body builders, who are a decided minority throughout the world and well, within any nation thereof.
Given that BMI is an average of an entire population.
The index, an epidemiological model of healthy vs unhealthy persons, based upon that metric.
Which ends up, I need to lose 10 pounds. Then, hopefully, I’ll continue to evade type 2 diabetes, which is otherwise 100% in my peer group within my family.
And I remain non-diabetic.
Could just still be an outlier.
So, less cookies at night, forget about a cake.
And get off of my fat ass and walk to the store more for more greens, which I love anyway.
And maybe cut back on two servings of lard… ‘)
DanDare says
Economic measures like GDP, and the growth model.