Sometimes, Bugs are Inevitable

Good point:

“Hacking an election is hard, not because of technology — that’s surprisingly easy — but it’s hard to know what’s going to be effective,” said [Bruce] Schneier. “If you look at the last few elections, 2000 was decided in Florida, 2004 in Ohio, the most recent election in a couple counties in Michigan and Pennsylvania, so deciding exactly where to hack is really hard to know.”

But the system’s decentralization is also a vulnerability. There is no strong central government oversight of the election process or the acquisition of voting hardware or software. Likewise, voter registration, maintenance of voter rolls, and vote counting lack any effective national oversight. There is no single authority with the responsibility for safeguarding elections.

You run into this all the time when designing systems. One or more of the requirements are a dilemma, pitting one need against another. Ease-of-use vs. security, authentication vs. anonymity, you know the type. Fixing a bug related to that requirement may cause three more to pop up, and that may not be your fault. The US election system is tough to hack, because it’s a patchwork of incompatible systems; but it’s also easy to hack, because some patches are less secure than others and the borders between patches lack a clear, consistent interface. Solving this sort of problem usually means trashing the system and starting from scratch, with a long, extensive consultation session.

Oh yeah, and an NSA report provides evidence that Russia hacked some distance into US voting systems. The Intercept also outed their source, the reporters somehow forgot that all colour printers output a unique stenographic code while printing. That doesn’t speak highly of them, the practice is decades old, and they should have know this as the Intercept was founded on sharing sensitive documents.

[HJH 2017-06-19: A minor update here.]

Russian Hacking and Bayes’ Theorem, Part 2

I think I did a good job of laying out the core hypotheses last time, save two: the Iranian government or a disgruntled Democrat did it. I think I can pick them up on-the-fly, so let’s skip ahead to step 2.

The Priors

What’s the prior odds of the Kremlin hacking into the DNC and associated groups or people?
I’d say they’re pretty high. Right back to the Bolshevik revolution, Russian spy agencies have taken an interest in running disinformation campaigns. They have a word for gathering compromising information to blackmail people into doing their bidding, “kompromat.” Putin himself earned a favourable place in Boris Yeltsin’s government via some kompromat of one of Yeltsin’s opponents.
As for hacking elections, European intelligence agencies have also fingered Russia for using kompromat to interfere with elections in Germany, the Netherlands, Hungary, Georgia, and Ukraine.
That’s all well and good, but what about other actors? China also has sophisticated information warfare capabilities, but they seem more interested in trade secrets and tend to keep their discoveries under wraps. North Korea is a lot more splashy, but recently have focused on financial crimes. The Iranian government has apparently stepped up their online attack capabilities, and have a grudge against the USA, but apparently focus on infrastructure and disruption.
The DNC convention was rather contentious, with fans of Bernie Sanders bitter at how it turned out, and putting Trump in power had been preferred to voting for Clinton, for some, but it doesn’t fit the timeline: the DNC was suspicious of an attack in April, documents were leaked in June, but Sanders still had a chance of winning the nomination until the end of July.
An independent group is the real wild card, with any number of motivations and due to their lack of power eager to make it look like someone else did the deed.
What about the CIA or NSA? The latter claims to be just a passive listener, and I haven’t heard of anyone claiming otherwise. The CIA has a long history of interfering in other countries’ elections; in 1990’s Nicaragua, they even released documents to the media in order to smear a candidate they didn’t like. It’s one thing to muck around with other countries, however, as it’ll be nearly impossible for them to extradite you over for a proper trial. Muck around in your own country’s election, and there’s no shortage of reporters and prosecutors willing to go after you.
Where does all this get us? I’d say to a tier of prior likelihoods:
  • “The Kremlin did it” (A) and “Independent hackers did it” (D) have about the same prior.
  • “China,” (B) “North Korea,” (C) “Iran,” (H) and “the CIA” (E) are less likely than the prior two.
  • “the NSA” (F) and “disgruntled insider” (I) is less likely still.
  • And c’mon, I’m not nearly good enough to pull this off. (G)

The Evidence

I haven’t placed quantities to the priors, because the evidence side of things is pretty damning. Let’s take a specific example: the Cyrillic character set found in some of the leaked documents. We can both agree that this can be faked: switch around the keyboard layout, plant a few false names, and you’re done. Do it flawlessly and no-one will know otherwise.
But here’s the kicker: is there another hypothesis which is more likely than “the Kremlin did it,” on this bit of evidence? To focus on a specific case, is it more likely that an independent hacking group would leave Cyrillic characters and error messages in those documents than Russian hackers? This seems silly; an independent group could leave a false trail pointing to anyone, which dilutes the odds of them pointing the finger at a specific someone. Even if the independent group had a bias towards putting the blame on Russia, there’s still a chance they could finger someone else.
Put another way, a die numbered one through six could turn up a one when thrown, but a die with only ones on each face would be more likely to turn up a one. A one is always more likely from the second die. By the same token, even though it’s entirely plausible that an independent hacking group would switch their character sets, the evidence still provides better proof of Russian hacking.
What does evidence that points away from the Kremlin look like?

President Vladimir Putin says the Russian state has never been involved in hacking.

Speaking at a meeting with senior editors of leading international news agencies Thursday, Putin said that some individual “patriotic” hackers could mount some attacks amid the current cold spell in Russia’s relations with the West.
But he categorically insisted that “we don’t engage in that at the state level.”

Is this great evidence? Hell no, it’s entirely possible Putin is lying, and given the history of KGB and FSB it’s probable. But all that does is blunt the magnitude of the likelihoods, it doesn’t change their direction. By the same token, this ….
Intelligence agency leaders repeated their determination Thursday that only “the senior most officials” in Russia could have authorized recent hacks into Democratic National Committee and Clinton officials’ emails during the presidential election.
Director of National Intelligence James Clapper affirmed an Oct. 7 joint statement from 17 intelligence agencies that the Russian government directed the election interference…
….  counts as evidence in favour of the Kremlin being the culprit, even if you think James Clapper is a dirty rotten liar. Again, we can quibble over how much it shifts the balance, but no other hypothesis is more favoured by it.
We can carry on like this through a lot of the other evidence.
I can’t find anyone who’s suggested North Korea or the NSA did it. The consensus seems to point towards the Kremlin, and while there are scattered bits of evidence pointing elsewhere there isn’t a lot of credibility or analysis attached, and some of it is “anyone but Russia” instead of “group X,” which softens the gains made by other hypotheses.
The net result is that the already-strong priors for “the Kremlin did it” combine with the direction the evidence points in, and favour that hypothesis even more. How strongly it favours that hypothesis depends on how you weight the evidence, but you have to do some wild contortions to put another hypothesis ahead of it. A qualitative analysis is all we need.
Now, to some people this isn’t good enough. I’ve got two objections to deal with, one from Sam Biddle over at The Intercept, and another from Marcus Ranum at stderr. Part three, anyone?

The Monkey’s Climate Agreement

This must have seemed like an excellent idea to Trump.

I am fighting every day for the great people of this country. Therefore, in order to fulfill my solemn duty to protect America and its citizens, the United States will withdraw from the Paris Climate Accord.

(APPLAUSE)

Thank you. Thank you.

But begin negotiations to re-enter, either the Paris Accord or in, really entirely new transaction on terms that are fair to the United States, its businesses, its workers, its people, its taxpayers. So we’re getting out. But will we start to negotiate and we will see if we can make a deal that’s fair. And if we can, that’s great. And if we can’t, that’s fine.

It distracts from the ongoing Russia scandal, and it’s a move which will earn favour from many Republicans. But there’s also good reason to think it won’t have the effect Trump hopes for.

For one, the USA has been very successful at watering down past climate agreements.

When aggressively lobbying to weaken the Paris accord, U.S. negotiators usually argued that anything stronger would be blocked by the Republican-controlled House and Senate. And that was probably true. But some of the weakening — particularly those measures focused on equity between rich and poor nations — was pursued mainly out of habit, because looking after U.S. corporate interests is what the United States does in international negotiations.

Whatever the reasons, the end result was an agreement that has a decent temperature target, and an excruciatingly weak and half-assed plan for reaching it.

If the US withdraws from climate talks, as seems likely despite Trump’s “renegotiation” line, the US delegation won’t be at the table. And with China now in full support of taking action, India pushing for aggressive targets, and even Canada still willing to stick with the Paris agreement, there’s no one left to step on the brakes. Future climate change agreements will be more aggressive.

They might also carry penalties for non-signing nations. There are only three countries who didn’t sign the Paris agreement: Nicaragua didn’t sign because the agreement didn’t go far enough, Syria had been diplomatically isolated so they weren’t even invited to the table, and the US refused to even submit it for ratification by Congress. Yes, the US is a major player in world financial markets, but its dwarfed by the output of the rest of the world. If the globe agreed to impose a carbon tax on non-signing nations, the US could do little to push back.

Even if the rest of the world doesn’t have the appetite for that route, there are more creative kinds of penalties.

Calling the President’s decision “a mistake” for the US as well as the planet, [French President] Macron urged climate change scientists, engineers and entrepreneurs to go to France to continue their work. “They will find in France a second homeland,” Mr Macron said. “I call on them,” he added. “Come and work here with us, work together on concrete solutions for our climate, our environment. I can assure you, France will not give up the fight.”

Climate change has become the one thing the international community could reach a consensus on. Pulling from the Paris agreement was like kicking a puppy; regardless of the intent or circumstances, it’s an action the world can unite against. It makes for a convenient excuse to isolate the US or play hardball, much more so than any boorish behaviour by Trump.

It also won’t stop the US from following the Paris agreement anyway.

Representatives of American cities, states and companies are preparing to submit a plan to the United Nations pledging to meet the United States’ greenhouse gas emissions targets under the Paris climate accord, despite President Trump’s decision to withdraw from the agreement.

The unnamed group — which, so far, includes 30 mayors, three governors, more than 80 university presidents and more than 100 businesses — is negotiating with the United Nations to have its submission accepted alongside contributions to the Paris climate deal by other nations.

“We’re going to do everything America would have done if it had stayed committed,” Michael Bloomberg, the former New York City mayor who is coordinating the effort, said in an interview. […]

“The electric jolt of the last 48 hours is accelerating this process that was already underway,” said Mr. Orr, who is now dean of the School of Public Policy at the University of Maryland. “It’s not just the volume of actors that is increasing, it’s that they are starting to coordinate in a much more integral way.”

Various US states, municipalities, universities, businesses, and even the military have been working towards cutting emissions for years without waiting for the federal government to get its act in order. A national policy would be more effective, but these piecemeal efforts have substantial force behind them and look to be gaining even more.

Finally, the boost this move earns from his supporters may get cancelled out by backlash from everyone else.

It’s also possible that Trump gave a win to his base on an issue they don’t care that much about while angering the opposition on an issue they do care about. Gallup and Pew Research Center polls indicate that global warming and fighting climate change have become higher priorities for Democrats over the past year. … As we wrote earlier, if Trump’s voters view the Paris withdrawal as an economic move, he’ll likely reap some political benefit from it. If, however, it’s viewed as mostly having to do with climate change, perhaps Trump won’t see much gain with his base. Jobs, the economy and health care rate as top issues for Republicans, but climate change and the environment do not, so it’s hard to know how Trump voters would weigh the president doing something they don’t like on an issue they care a lot about (the GOP health care bill) against him doing something they do like on an issue they don’t care much about (withdrawing from Paris).

This may have looked like an easy win for Trump, but the reality could be anything from a weak victory to a solid defeat. Time will tell, as it always does.

Russian Hacking and Bayes’ Theorem, Part 1

I’m a bit of an oddity on this network, as I’m pretty convinced Russia was behind the DNC email hack. I know both Mano Singham and Marcus Ranum suspect someone else is responsible, last I checked, and Myers might lean that way too. Looking around, though, I don’t think anyone’s made the case in favor of Russian hacking. I might as well use it as an excuse to walk everyone through using Bayes’ Theorem in an informal setting.

(Spoiler alert: it’s the exact same method we’d use in a formal setting, but with more approximations and qualitative responses.)

[Read more…]

Journal Club 1: Gender Studies

Last time, I pointed out that within the Boghossian/Lindsay kerfuffle no-one was explaining how you could falsify gender studies. As I’ve read more and more of those criticisms, I’ve noticed another omission: what the heck is in a gender studies journal? The original paper only makes sense if it closely mirrors what you’d find in a relevant journal.

So let’s abuse my academic access to pop open the cover of one such journal.

Gender & Society, the official journal of Sociologists for Women in Society, is a top-ranked journal in sociology and women’s studies and publishes less than 10% of all papers submitted to it. Articles analyze gender and gendered processes in interactions, organizations, societies, and global and transnational spaces. The journal publishes empirical articles, along with reviews of books.

They also happened to be at the top of one list of gender studies journals. I’ll go with their latest edition as of this typing, volume 31 issue 3, which is dated June 2017.

[Read more…]

About Damn Time

Ask me to name the graph that annoys me the most, and I’ll point to this one.538's graph of Trump's popularity, as of May 25th, 2017.

Yes, Trump entered his presidency as the least liked in modern history, but he’s repeatedly interfered with Russian-related investigations and admitted he did it to save his butt. That’s a Watergate-level scandal, yet his approval numbers have barely changed. He’s also pushed a much-hated healthcare reform bill, been defeated multiple times in court, tried to inch away from his wall pledge, and in general repeatedly angered his base. His approval ratings should be negative by now, but because the US is so polarized many conservatives are clinging to him anyway.

A widely held tenet of the current conventional wisdom is that while President Trump might not be popular overall, he has a high floor on his support. Trump’s sizable and enthusiastic base — perhaps 35 to 40 percent of the country — won’t abandon him any time soon, the theory goes, and they don’t necessarily care about some of the controversies that the “mainstream media” treats as game-changing developments. […]

But the theory isn’t supported by the evidence. To the contrary, Trump’s base seems to be eroding. There’s been a considerable decline in the number of Americans who strongly approve of Trump, from a peak of around 30 percent in February to just 21 or 22 percent of the electorate now. (The decline in Trump’s strong approval ratings is larger than the overall decline in his approval ratings, in fact.) Far from having unconditional love from his base, Trump has already lost almost a third of his strong support. And voters who strongly disapprove of Trump outnumber those who strongly approve of him by about a 2-to-1 ratio, which could presage an “enthusiasm gap” that works against Trump at the midterms. The data suggests, in particular, that the GOP’s initial attempt (and failure) in March to pass its unpopular health care bill may have cost Trump with his core supporters.

At long last, Donald Trump’s base appears to be shrinking. This raises the chances of impeachment, and will put tremendous pressure on Republicans to abandon Trump to preserve their midterm majority. I’m pissed the cause appears to be health care, and not the shady Russian ties or bad behavior, but doing the right thing for the wrong reason is still doing the right thing. It also fits in nicely with current events.

According to the forecast released Wednesday by the nonpartisan Congressional Budget Office, 14 million fewer people would have health insurance next year under the Republican bill, increasing to a total of 19 million in 2020. By 2026, a total of 51 million people would be uninsured, roughly 28 million more than under Obamacare. That is roughly equivalent to the loss in coverage under the first version of the bill, which failed to pass the House of Representatives.

Much of the loss in coverage would be due to the Republican plan to shrink the eligibility for Medicaid; for many others—particularly those with preexisting conditions living in certain states—healthcare on the open marketplace would become unaffordable. Some of the loss would be due to individuals choosing not to get coverage.

The Republican bill, dubbed the American Health Care Act, would also raise insurance premiums by an average of 20 percent in 2018 compared with Obamacare, according to the CBO, and an additional 5 percent in 2019, before premiums start to drop.

So keep an eye on Montana’s special election (I’m writing this before results have come in); if the pattern repeats from previous special elections, Republicans will face a huge loss during the 2018 midterms, robbing Trump of much of his power and allowing the various investigations against him to pick up more steam.

Finding Her Voice

Have you ever heard of a cool scientific paper, went out to find yourself a copy, and been frustrated to find no trace of it? I’ve been there for years for one particular paper, until I got lucky.

Cutler, Anne, and Donia R. Scott. “Speaker Sex and Perceived Apportionment of Talk.” Applied Psycholinguistics 11, no. 03 (1990): 253–272.
I’m sure you’ve heard the stereotype that women talk excessively. A number of studies have actually sat down and counter total talking time, only to find that men tend to be the blabbermouths. What gives?
An alternative suggestion is more complex and may rely on a difference in content between men’s and women’s speech. Kramer (1975) and Spender (1980) suggested that women are undervalued in society, and as a consequence women’s speech is undervalued – female contributions to conversation are overestimated because they are held to have gone on “too long” relative to what female speakers are held to deserve. Preisler (1986) similarly argued that evaluation of women’s speech is a function of (under)evaluation of the social roles most usually fulfilled by women.
The former explanation suggests that overestimation of women’s conversational contributions is a perceptual bias effect that should be reproducible in the laboratory simply by asking listeners to judge amount of talk produced by male and female speakers, even if content of the talk is controlled. [pg. 255]
So Anne Cutler and the other authors tested that by having the standard reference human listen to excerpts from plays, where both speaking roles said about the same number of words. The sex was varied, of course.
In single-sex conversations, female and male first speakers received almost identical ratings (49.5% and 50%, respectively), but in mixed-sex conversations, female speakers were judged to be talking more (55.2%), male speakers to be talking less (47.8%). Although the number of words spoken was identical for each column, listeners believed that in mixed-sex conversations, females spoke more and males spoke less.

In fact, three of these mean ratings are actually underestimates, since the true mean first speaker contribution across all four dialogues was 53.7%. ….

The interaction of speaker sex with whether the dialogue was mixed- or single-sex was significant in both analyses … There was also a main effect of speaker
sex, with female speakers’ contributions being overestimated, but male speakers’ contributions being underestimated relative to the actual number of words spoken. [pg. 259-260]
What’s interesting is that when people were asked to guess the sex of each role, handed nothing more than the script, men and women sometimes differed.
When a part was not particularly sex-marked (Dialogue 1), females speaking it were judged to have said more than males speaking it. When a part was marked as female for male and for female subjects alike (Dialogue 2), the same effect was found. When, however, a part was marked as female for male subjects only (Dialogue 3), only male subjects showed the effect; and when a part was marked as female for female subjects only (Dialogue 4), only female subjects showed any effect. [pg. 268]
Unfortunately, this muddied up the conclusions a bit. And I do have other issues with the paper, primarily in their use of p-values, but I think the findings rise above it. They also fit nicely into the existing body of work on sexism and speech.
These behaviors, the interrupting and the over-talking, also happen as the result of difference in status, but gender rules. For example, male doctors invariably interrupt patients when they speak, especially female patients but patients rarely interrupt doctors in return. Unless the doctor is a woman. When that is the case, she interrupts far less and is herself interrupted more. This is also true of senior managers in the workplace. Male bosses are not frequently talked over or stopped by those working for them, especially if they are women; however, female bosses are routinely interrupted by their male subordinates.

What can we do to raise women’s voices? Maybe technology can help.

Gender Timer is the app that measures the talk times between the sexes. It is used to raise awareness and generate discussion about how airtime looks in practice. The aim is to ultimately develop your organization and its meeting culture.

Available on Android and iPhone.

Daryl Bem and the Replication Crisis

I’m disappointed I don’t see more recognition of this.

If one had to choose a single moment that set off the “replication crisis” in psychology—an event that nudged the discipline into its present and anarchic state, where even textbook findings have been cast in doubt—this might be it: the publication, in early 2011, of Daryl Bem’s experiments on second sight.

I’ve actually done a long blog post series on the topic, but in brief: Daryl Bem was convinced that precognition existed. To put these beliefs to the test, he had subjects try to predict an image that was randomly generated by a computer. Over eight experiments, he found that they could indeed do better than chance. You might think that Bem is a kook, and you’d be right.

But Bem is also a scientist.

Now he would return to JPSP [the Journal of Personality and Social Psychology] with the most amazing research he’d ever done—that anyone had ever done, perhaps. It would be the capstone to what had already been a historic 50-year career.

Having served for a time as an associate editor of JPSP, Bem knew his methods would be up to snuff. With about 100 subjects in each experiment, his sample sizes were large. He’d used only the most conventional statistical analyses. He’d double- and triple-checked to make sure there were no glitches in the randomization of his stimuli. Even with all that extra care, Bem would not have dared to send in such a controversial finding had he not been able to replicate the results in his lab, and replicate them again, and then replicate them five more times. His finished paper lists nine separate ministudies of ESP. Eight of those returned the same effect.

One way to attack an argument is to merely follow its logic. If you can find it leads to an absurd conclusion, the argument must have been flawed even if you cannot find the flaw. Bem had inadvertently discovered a “reductio ad absurdum” argument against contemporary scientific practice: if proper scientific procedure can prove ESP exists, proper scientific procedure must be broken.

Meanwhile, at the conference in Berlin, [E.J.] Wagenmakers finally managed to get through Bem’s paper. “I was shocked,” he says. “The paper made it clear that just by doing things the regular way, you could find just about anything.”

On the train back to Amsterdam, Wagenmakers drafted a rebuttal, to be published in JPSP alongside the original research. The problems he saw in Bem’s paper were not particular to paranormal research. “Something is deeply wrong with the way experimental psychologists design their studies and report their statistical results,” Wagenmakers wrote. “We hope the Bem article will become a signpost for change, a writing on the wall: Psychologists must change the way they analyze their data.”

Slate has a long read up on the current replication crisis, and how it links to Bem. It’s aimed at a lay audience and highly readable; I recommend giving it a click.

So You Wanna Falsify Gender Studies?

How would a skeptic determine whether or not an area of study was legit? The obvious route would be to study up on the core premises of that field, recording citations as you go; map out how they are connected to one another and supported by the evidence, looking for weak spots; then write a series of articles sharing those findings.

What they wouldn’t do is generate a fake paper purporting to be from that field of study but deliberately mangling the terminology, submit it to a low-ranked and obscure journal for peer review, have it rejected from that journal, based on feedback then submit it to an second journal that was semi-shady and even more obscure, have it published, then parade that around as if it meant something.

Alas, it seems the Skeptic movement has no idea how basic skepticism works. Self-proclaimed “skeptics” Peter Boghossian and James Lindsay took the second route, and were cheered on by Michael Shermer, Richard Dawkins, Jerry Coyne, Steven Pinker, and other people calling themselves skeptics. A million other people have pointed and laughed at them, so I won’t bother joining in.

But no-one seems to have brought up the first route. Let’s do a sketch of actual skepticism, then, and see how well gender studies holds up.

What’s Claimed?

Right off the bat, we hit a problem: most researchers or advocates in gender studies do not have a consensus sex or gender model.

The Genderbread Person, version 3.3. From http://itspronouncedmetrosexual.com/2015/03/the-genderbread-person-v3/

This is one of the more popular explainers for gender floating out on the web. Rather than focus on the details, however, I’d like you to note this graphic is labeled “version 3.3”. In other words, Sam Killermann has tweaked and revised it three times over. It also conflicts with the Gender Unicorn, which has a categorical approach to “biological sex” and adds “other genders,” and it no longer embraces the idea of a spectrum thus contradicting a lot of other models. Confront Killermann on this, and I bet they’d shrug their shoulders and start crafting another model.

The model isn’t all that important. Instead, gender studies has reached a consensus on an axiom and a corollary: the two-sex, two-gender model is an oversimplification, and that sex/gender are complicated. Hence why models of sex or gender continually fail, the complexity almost guarantees exceptions to your rules.

There’s a strong parallel here to agnostic atheism’s “lack of belief” posture, as this flips the burden of proof. Critiquing the consensus of gender studies means asserting a positive statement, that the binarist model is correct, while the defense merely needs to swat down those arguments without advancing any of its own.

Nothing Fails Like Binarism

A single counter-example is sufficient to refute a universal rule. To take a classic example, I can show “all swans are white” is a false statement by finding a single black swan. If someone came along and said “well yeah, but most swans are white, so we can still say that all swans are white,” you’d think of them as delusional or in denial.

Well, I can point to four people who do not fit into the two-sex two-gender model. Ergo, that model cannot be true in all cases, and the critique of gender studies fails after a thirty second Google search.

When most people are confronted with this, they invoke a three-sex model (male, female, and “other/defective”) but call it two-sex in order to preserve their delusion. That so few people notice the contradiction is a testament to how hard the binary model is hammered into us.

But Where’s the SCIENCE?!

Another popular dodge is to argue that merely saying you don’t fit into the binary isn’t enough; if it wasn’t in peer-reviewed research, it can’t be true. This is no less silly. Do I need to publish a paper about the continent of Africa to say it exists? Or my computer? If you doubt me, browse Retraction Watch for a spell.

Once you’ve come back, go look at the peer-reviewed research which suggests gender is more complicated than a simple binary.

At times, the prevailing answers were almost as simple as Gray’s suggestion that the sexes come from different planets. At other times, and increasingly so today, the answers concerning the why of men’s and women’s experiences and actions have involved complex multifaceted frameworks.

Ashmore, Richard D., and Andrea D. Sewell. “Sex/Gender and the Individual.” In Advanced Personality, edited by David F. Barone, Michel Hersen, and Vincent B. Van Hasselt, 377–408. The Plenum Series in Social/Clinical Psychology. Springer US, 1998. doi:10.1007/978-1-4419-8580-4_16.

Correlational findings with the three scales (self-ratings) suggest that sex-specific behaviors tend to be mutually exclusive while male- and female-valued behaviors form a dualism and are actually positively rather than negatively correlated. Additional analyses showed that individuals with nontraditional sex role attitudes or personality trait organization (especially cross-sex typing) were somewhat less conventionally sex typed in their behaviors and interests than were those with traditional attitudes or sex-typed personality traits. However, these relationships tended to be small, suggesting a general independence of sex role traits, attitudes, and behaviors.

Orlofsky, Jacob L. “Relationship between Sex Role Attitudes and Personality Traits and the Sex Role Behavior Scale-1: A New Measure of Masculine and Feminine Role Behaviors and Interests.” Journal of Personality 40, no. 5 (May 1981): 927–40.

Women’s scores on the BSRI-M and PAQ-M (masculine) scales have increased steadily over time (r’s = .74 and .43, respectively). Women’s BSRI-F and PAQ-F (feminine) scale  scores do not correlate with year. Men’s BSRI-M scores show a weaker positive relationship with year of administration (r = .47). The effect size for sex differences on the BSRI-M has also changed over time, showing a significant decrease over the twenty-year period. The results suggest that cultural change and environment may affect individual personalities; these changes in BSRI and PAQ means demonstrate women’s increased endorsement of masculine-stereotyped traits and men’s continued nonendorsement of feminine-stereotyped traits.

Twenge, Jean M. “Changes in Masculine and Feminine Traits over Time: A Meta-Analysis.” Sex Roles 36, no. 5–6 (March 1, 1997): 305–25. doi:10.1007/BF02766650.

Male (n = 95) and female (n = 221) college students were given 2 measures of gender-related personality traits, the Bem Sex-Role Inventory (BSRI) and the Personal Attributes Questionnaire, and 3 measures of sex role attitudes. Correlations between the personality and the attitude measures were traced to responses to the pair of negatively correlated BSRI items, masculine and feminine, thus confirming a multifactorial approach to gender, as opposed to a unifactorial gender schema theory.

Spence, Janet T. “Gender-Related Traits and Gender Ideology: Evidence for a Multifactorial Theory.” Journal of Personality and Social Psychology 64, no. 4 (1993): 624.

Oh sorry, you didn’t know that gender studies has been a science for over four decades? You thought it was just an invention of Tumblr, rather than a mad scramble by scientists to catch up with philosophers? Tsk, that’s what you get for pretending to be a skeptic instead of doing your homework.

I Hate Reading

One final objection is that field-specific jargon is hard to understand. Boghossian and Lindsay seem to think it follows that the jargon is therefore meaningless bafflegab. I’d hate to see what they’d think of a modern physics paper; jargon offers precise definitions and less typing to communicate your ideas, and while it can quickly become opaque to lay people jargon is a necessity for serious science.

But let’s roll with the punch, and look outside of journals for evidence that’s aimed at a lay reader.

In Sexing the Body, Gender Politics and the Construction of Sexuality Fausto-Sterling attempts to answer two questions: How is knowledge about the body gendered? And, how gender and sexuality become somatic facts? In other words, she passionately and with impressive intellectual clarity demonstrates how in regards to human sexuality the social becomes material. She takes a broad, interdisciplinary perspective in examining this process of gender embodiment. Her goal is to demonstrate not only how the categories (men/women) humans use to describe other humans become embodied in those to whom they refer, but also how these categories are not reflect ed in reality. She argues that labeling someone a man or a woman is solely a social decision. «We may use scientific knowledge to help us make the decision, but only our beliefs about gender – not science – can define our sex» (p. 3) and consistently throughout the book she shows how gender beliefs affect what kinds of knowledge are produced about sex, sexual behaviors, and ultimately gender.

Gober, Greta. “Sexing the Body Gender Politics and the Construction of Sexuality.” Humana.Mente Journal of Philosophical Studies, 2012, Vol. 22, 175–187

Making Sex is an ambitious investigation of Western scientific conceptions of sexual difference. A historian by profession, Laqueur locates the major conceptual divide in the late eighteenth century when, as he puts it, “a biology of cosmic hierarchy gave way to a biology of incommensurability, anchored in the body, in which the relationship of men to women, like that of apples to oranges, was not given as one of equality or inequality but rather of difference” (207). He claims that the ancients and their immediate heirs—unlike us—saw sexual difference as a set of relatively unimportant differences of degree within “the one-sex body.” According to this model, female sexual organs were perfectly homologous to male ones, only inside out; and bodily fluids—semen, blood, milk—were mostly “fungible” and composed of the same basic matter. The model didn’t imply equality; woman was a lesser man, just not a thing wholly different in kind.

Altman, Meryl, and Keith Nightenhelser. “Making Sex (Review).” Postmodern Culture 2, no. 3 (January 5, 1992). doi:10.1353/pmc.1992.0027.

In Delusions of Gender the psychologist Cordelia Fine exposes the bad science, the ridiculous arguments and the persistent biases that blind us to the ways we ourselves enforce the gender stereotypes we think we are trying to overcome. […]

Most studies about people’s ways of thinking and behaving find no differences between men and women, but these fail to spark the interest of publishers and languish in the file drawer. The oversimplified models of gender and genes that then prevail allow gender culture to be passed down from generation to generation, as though it were all in the genes. Gender, however, is in the mind, fixed in place by the way we store information.

Mental schema organise complex experiences into types of things so that we can process data efficiently, allowing us, for example, to recognise something as a chair without having to notice every detail. This efficiency comes at a cost, because when we automatically categorise experience we fail to question our assumptions. Fine draws together research that shows people who pride themselves on their lack of bias persist in making stereotypical associations just below the threshold of consciousness.

Everyone works together to re-inforce social and cultural environments that soft-wire the circuits of the brain as male or female, so that we have no idea what men and women might become if we were truly free from bias.

Apter, Terri. “Delusions of Gender: The Real Science Behind Sex Differences by Cordelia Fine.” The Guardian, October 11, 2010, sec. Books.

Have At ‘r, “Skeptics”

You want to refute the field of gender studies? I’ve just sketched out the challenges you face on a philosophical level, and pointed you to the studies and books you need to refute. Have fun! If you need me I’ll be over here, laughing.

[HJH 2017-05-21: Added more links, minor grammar tweaks.]

[HJH 2017-05-22: Missed Steven Pinker’s Tweet. Also, this Skeptic fail may have gone mainstream:

Boghossian and Lindsay likely did damage to the cultural movements that they have helped to build, namely “new atheism” and the skeptic community. As far as I can tell, neither of them knows much about gender studies, despite their confident and even haughty claims about the deep theoretical flaws of that discipline. As a skeptic myself, I am cautious about the constellation of cognitive biases to which our evolved brains are perpetually susceptible, including motivated reasoning, confirmation bias, disconfirmation bias, overconfidence and belief perseverance. That is partly why, as a general rule, if one wants to criticize a topic X, one should at the very least know enough about X to convince true experts in the relevant field that one is competent about X. This gets at what Brian Caplan calls the “ideological Turing test.” If you can’t pass this test, there’s a good chance you don’t know enough about the topic to offer a serious, one might even say cogent, critique.

Boghossian and Lindsay pretty clearly don’t pass that test. Their main claim to relevant knowledge in gender studies seems to be citations from Wikipedia and mockingly retweeting abstracts that they, as non-experts, find funny — which is rather like Sarah Palin’s mocking of scientists for studying fruit flies or claiming that Obamacare would entail “death panels.” This kind of unscholarly engagement has rather predictably led to a sizable backlash from serious scholars on social media who have noted that the skeptic community can sometimes be anything but skeptical about its own ignorance and ideological commitments.

When the scientists you claim to worship are saying your behavior is unscientific, maaaaybe you should take a hard look at yourself.]

P-hacking is No Big Deal?

Possibly not. simine vazire argued the case over at “sometimes i’m wrong.”

The basic idea is as follows: if we use shady statistical techniques to indirectly adjust the p-value cutoff in Null Hypothesis Significance Testing or NHST, we’ll up the rate of false positives we’ll get. Just to put some numbers to this, a p-value cutoff of 0.05 means that when the null hypothesis is true, we’ll get a bad sample about 5% of the time and conclude its true. If we use p-hacking to get an effective cutoff of 0.1, however, then that number jumps up to 10%.

However, p-hacking will also raise the number of true positives we get. How much higher it gets can be tricky to calculate, but this blog post by Erika Salomon gives out some great numbers. During one simulation run, a completely honest test of a false null hypothesis would return a true positive 12% of the time; when p-hacking was introduced, that skyrocketed to 74%.

If the increase in false positives is balanced out by the increase in true positives, then p-hacking makes no difference in the long run. The number of false positives in the literature would be entirely dependent on the power of studies, which is abysmally low, and our focus should be on improving that. Or, if we’re really lucky, the true positives increase faster than the false positives and we actually get a better scientific record via cheating!

We don’t really know which scenario will play out, however, and vazire calls for someone to code up a simulation.

Allow me.

My methodology will be to divide studies up into two categories: null results that are never published, and possibly-true results that are. I’ll be using a one-way ANOVA to check whether the average of two groups drawn from a Gaussian distribution differ. I debated switching to a Student t test, but comparing two random draws seems more realistic than comparing one random draw to a fixed mean of zero.

I need a model of effect and sample sizes. This one is pretty tricky; just because a study is unpublished doesn’t mean the effect size is zero, and vice-versa. Making inferences about unpublished studies is tough, for obvious reasons. I’ll take the naive route here, and assume unpublished studies have an effect size of zero while published studies have effect sizes on the same order of actual published studies. Both published and unpublished will have sample sizes typical of what’s published.

I have a handy cheat for that: the Open Science Collaboration published a giant replication of 100 psychology studies back in 2015, and being Open they shared the raw data online in a spreadsheet. The effect sizes are in correlation coefficients, which are easy to convert to Cohen’s d, and when paired with a standard deviation of one that gives us the mean of the treatment group. The control group’s mean is fixed at zero but shares the same standard deviation. Sample sizes are drawn from said spreadsheet, and represent the total number of samples and not the number of samples per group. In fact, it gives me two datasets in one: the original study effect and sample size, plus the replication’s effect and sample size. Unless I say otherwise, I’ll stick with the originals.

P-hacking can be accomplished a number of ways: switching between the number of tests in the analysis and iteratively doing significance tests are but two of the more common. To simply things I’ll just assume the effective p-value is a fixed number, but explore a range of values to get an idea of how a variable p-hacking effect would behave.

For some initial values, let’s say unpublished studies constitute 70% of all studies, and p-hacking can cause a p-value threshold of 0.05 to act like a threshold of 0.08.

Octave shall be my programming language of choice. Let’s have at it!

(Template: OSC 2015 originals)
With a 30.00% success rate and a straight p <= 0.050000, the false positive rate is 12.3654% (333 f.p, 2360 t.p)
Whereas if p-hacking lets slip p <= 0.080000, the false positive rate is 18.2911% (548 f.p, 2448 t.p)

(Template: OSC 2015 replications)
With a 30.00% success rate and a straight p <= 0.050000, the false positive rate is 19.2810% (354 f.p, 1482 t.p)
Whereas if p-hacking lets slip p <= 0.080000, the false positive rate is 26.2273% (577 f.p, 1623 t.p)

Ouch, our false positive rate went up. That seems strange, especially as the true positives (“t.p.”) and false positives (“f.p.”) went up by about the same amount. Maybe I got lucky with the parameter values, though; let’s scan a range of unpublished study rates from 0% to 100%, and effective p-values from 0.05 to 0.2. The actual p-value rate will remain fixed at 0.05. So we can fit it all in one chart, I’ll take the proportion of p-hacked false positives and subtract it from the vanilla false positives, so that areas where the false positive rate goes down after hacking are negative.

How varying the proportion of unpublished/false studies and the p-hacking amount changes the false positive rate.

There are no values less than zero?! How can that be? The math behind these curves is complex, but I think I can give an intuitive explanation.

Drawing the distribution of p-values when the result is null vs. the results from the OSC originals.The diagonal is the distribution of p-values when the effect size is zero; the curve is what you get when it’s greater than zero. As there are more or less values in each category, the graphs are stretched or squashed horizontally. The p-value threshold is a horizontal line, and everything below that line is statistically significant. The proportion of false to true results is equal to the proportion between the lengths of that horizontal line from the origin.

P-hacking is the equivalent of nudging that line upwards. The proportions change according to the slope of the curve. The steeper it is, the less it changes. It follows that if you want to increase the proportion of true results, you need to find a pair of horizontal lines where the horizontal distance increases as fast or faster in proportion to the increase along that diagonal. Putting this geometrically, imagine drawing a line starting at the origin but at an arbitrary slope. Your job is to find a slope such that the line pierces the non-zero effect curve twice.

Slight problem: that non-zero effect curve has negative curvature everywhere. The slope is guaranteed to get steeper as you step up the curve, which means it will curve up and away from where the line crosses it. Translating that back into math, it’s guaranteed that the non-effect curve will not increase in proportion with the diagonal. The false positive rate will always increase as you up the effective p-value threshold.

And thus, p-hacking is always a deal.