The ethics of educational experiments


Rebecca Watson had an interesting article/video about the ethics of A/B testing. A/B testing is a type of experiment often performed by tech companies on their users. The companies split users into two groups, and show two different versions of their software/website to each group, and measure the results. The problem is that when scientists perform experiments on human subjects, there’s a formal ethical review process. Should tech companies have an ethical review process too?

Of course, this question is being raised as a result of a specific experiment performed by a specific company. Pearson produces educational software, and performed an A/B test where some students were shown motivational messages. They presented results at a conference, and part of their conclusion was that this was a promising methodology for future research. But is it really, if they didn’t comply with the ethical standards in science? They certainly didn’t get consent from all those human test subjects.

Watson also brought up another case from 2014, when Facebook performed an experiment that changed the amount of positive/negative posts people saw in their news feeds. They published a study, and it was called “Experimental evidence of massive-scale emotional contagion through social networks”. Sounds pretty bad, eh?

Watson seems to conclude that A/B tests should get consent, at least in the case of Pearson. But I think this is going too far. The thing is, A/B testing is absolutely ubiquitous. Watson says, “having worked in marketing and seen A/B tests, it’s just a normal thing that companies do,” but I think this understates it. My fiance and I were trying to figure out how many A/B tests Google has running at any time, and we thought it might be one per employee, implying tens of thousands of experiments. And most of them are for boring things like changing fonts or increasing the number of pixels of white space. If we judge A/B tests on the basis of just two tests that appear in the news, “cherry picking” doesn’t even begin to describe it.


The thing that people don’t like about these experiments, is that they don’t like companies to toy with their emotions. But the thing is, companies are already doing it, just by existing. Every piece of software, every website, involves countless design decisions–they are basically made of decisions–and plenty of those decisions are things that the user is unaware of, and did not give consent to. Plenty of the decisions impact the emotional state of the user, and what can you do? Do companies need to undergo IRB whenever they design any product for human use? And another one for each update?

What if, instead of embedding motivational messages into some students’ software, Pearson had just embedded messages into all of them? Maybe some programmer just thought it was a natural thing to do. Would we even think twice about it? I think I might roll my eyes at the messages, but if I wasn’t thinking of it as an experiment, I wouldn’t even think to apply the ethical standards that we apply to scientific experiments.

Hell, am I running low-level psychological experiments just by going about my daily life? Every time I talk, write, or go out in public, I’m affecting people’s emotions without asking their consent–and I don’t even have a proper control group, so what even is the point?

Okay, forget the daily life analogy, and take another one: suppose I’m a teacher. As a teacher, I might perform experiments on students in order to publish educational research. But even if I didn’t do that, I’d still be running experiments. Every year, I’d make some small changes in the classroom, and at the end of the year I’d conduct evaluations, and compare results to previous years. That’s just how you teach. So why is it that when I want to publish research or talk at a conference, I have to comply with more stringent ethical standards, when all I’d really be doing is basically the same thing I was doing anyway, albeit more systematically?

This isn’t even an analogy. Pearson makes educational software, and they were presenting at an educational research conference. It’s identically the same issue. I don’t work in education so I don’t know the score, but when I looked it up, it seems to be a rather knotty question that educators ask among themselves.

Should there be some ethical oversight for A/B testing? Yes, and in some places there probably already is. But I don’t think A/B experiments are special when compared to other changes in software/website design. When Facebook did the thing where they show more negative news feeds, that was bad, but it would have been bad even if it wasn’t part of an experiment. Maybe there are higher standards when people present the results of A/B testing at scientific conferences. But otherwise, I would treat A/B tests in the same way I would treat any other changes that a company makes to their product.


As a physicist, I never did any research on human subjects. But I do volunteer for a community survey. We’d like to have IRB approval because we get cited by academics, and we give them data. It would give us more legitimacy. And perhaps we will get IRB approval one day. But it’s very difficult for a team with no budget, and only volunteers. In fact, one of our survey’s predecessors made it a goal to get IRB approval, and that project totally failed to get off the ground. So, we hold ourselves to ethical standards, but skip the bureaucratic process.

Comments

  1. says

    Having gone through the IRB approval process recently, I have some personal experience to offer.

    We have things like the Stanford prison experiment, the Milgram experiment, and, of course, the infamous Tuskegee syphilis experiment. All bureaucracies exist in response to a problem, and in the case of experimentation on human beings, the problem is real and severe, even in social science experiments. There has to be some sort of bureaucracy to ensure that if we might expose people to real harm, we have a consistent social decision-making process to decide whether or not to do so.

    IRB approval is a requirement only for publication as a matter of jurisdiction, not ethics. It’s not that it’s acceptable to conduct research without approval if the researcher doesn’t publish, it’s that IRBs have jurisdiction over only scientific publications, and the only sanction they can apply is to refuse publication.

    I’m not sure how much standards differ between different boards, but I can talk about my own experience. It was kind of a pain in the ass to get IRB approval, but it wasn’t that bad (it probably didn’t hurt that two of my advisors were on the board). I had to make a couple of trivial changes to my application, and the whole process took about two months. The real difficulty I faced was making sure my experiment was solid, because it’s hard to make changes after approval. It’s usually necessary to have some sort of faculty sponsorship, and you have to fill out a lot of kind of silly paperwork. If it takes more than a couple of months, someone is definitely fucking up.

  2. says

    For some reason I was under the mistaken impression that IRB takes six months, not two. And the other difficulty is, I think it costs money. That doesn’t pose a problem for academic researchers, and probably not for companies like Google, but it’s a serious barrier for a volunteer-run community survey.

    My fiance thinks that the ethical standards should be the same regardless of whether the research gets published. And if most A/B tests wouldn’t pass IRB because of the consent issue, then that’s just a problem with the IRB standards.

Leave a Reply

Your email address will not be published. Required fields are marked *

Click the "Preview" button to preview your comment here.