Explaining Roko’s Basilisk


Before I move away from the topic of Rationalism and EA, I want to talk about Roko’s Basilisk, because WTF else am I supposed to do with this useless knowledge that I have.

From sci-fi, a “basilisk” is an idea or image that exploits flaws in the human mind to cause a fatal reaction. Roko’s Basilisk was proposed by Roko to the LessWrong (LW) community in 2010. The idea is that a benevolent AI from the future could coerce you into doing the right thing (build a benevolent AI, obv) by threatening to clone you and torture your clone. It’s a sort of a transhumanist Pascal’s Wager.

Roko’s Basilisk is absurd to the typical person, and at this point is basically a meme used to mock LW, or tech geeks more broadly. But it’s not clear how seriously this was really taken in LW. One thing we do know is that Eliezer Yudkowsky, then leader of LW, banned all discussion of the subject.

What makes Roko’s Basilisk sound so strange, is that it’s based on at least four premises that are nearly unique to the LW community, and unfamiliar to most anyone else. Just explaining Roko’s Basilisk properly requires an amusing tour of multiple ideas the LW community hath wrought.


For a proper historical account, I recommend RationalWiki, the definitive source on the subject. Most of what I have to say is already on there, and perhaps the only edge I have over Rationalwiki is that I know a lot about

Decision Theory

Decision theory, according to the Decision Theory FAQ on LessWrong “concerns the study of preferences, uncertainties, and other issues related to making ‘optimal’ or ‘rational’ choices.” Decision theory is the subject of interdisciplinary scholarship, but for the LW community, it was a foundation of ethics and rationality.

Normative decision theory concerns the question of how “ideal” rational agents behave–independent of the question of whether humans actually behave that way in practice. The basic principle is that an ideal agent will look at all possible outcomes of each option, and makes the choice that maximizes the expected utility. In other words, the ideal agent obeys something resembling utilitarian ethics.

This is simple enough, but there are many open-ended questions in normative decision theory. For example, it’s possible to mathematically construct a lottery that has infinite expected utility. That’s called the St. Petersburg paradox. But Yudkowsky and others were obsessed with another paradox in decision theory, known as

Newcomb’s Paradox

You are confronted by Omega, who is a superintelligent machine capable of predicting the near future. Omega presents you with two boxes. Box A has $1k for sure. Box B has $1M if Omega predicts you will ignore Box A, but is empty if Omega predicts you will open Box A. Your two options are either open both boxes (“two-boxing”) or just open box B (“one-boxing”). Different people have different intuitions about whether you should one-box or two-box.

On the one hand, one-boxers get the better outcome by far. On the other hand, one-boxing is not what causes you to get the better outcome–by the time you make your choice, there’s already a $1M in box B or there isn’t, so why let that affect your decision?

Here’s another variant, created by Yudkowsky (based on Solomon’s problem). Suppose that scientists have demonstrated that people who chew gum are more likely to die of throat abscesses. However, chewing gum reduces the risk of throat abscesses; what’s going on here is that people with high risk of throat abscesses are more likely to like chewing gum. So, do you choose not to chew gum, because gum-chewers have a worse outcome? Or do you chew gum because chewing gum causes a better outcome? Most people agree that chewing gum is the better choice.

What the gum-chewing problem illustrates, is that there are two different decision theories: either make your decisions based on expected outcomes (Evidential Decision Theory), or base them on the expected outcomes caused by those decisions (Causal Decision Theory). Yudkowsky felt very strongly that the correct answers were to open one box, and to chew gum. Unfortunately, this is not consistent with either of the two decision theories. So he came up with a third theory, which he called Timeless Decision Theory.

Before I get to Timeless Decision Theory, I have to ask, why does he care so much? Humans don’t behave like ideal agents anyways, and it seems our time is better spent learning how to approximate ideal agents, rather than trying to resolve a paradox that probably won’t ever become relevant anyway. I believe it’s so important to Yudkowsky because he isn’t just thinking about humans, he’s thinking about

Superintelligent AI

Yudkowsky believes that we will eventually develop AI so powerful that it will basically take over. Hopefully the AI will transform the world for the better, but there’s a risk it will basically destroy us all. In order to prevent that, he is extremely interested in giving it the correct ethical system. Many people on LW think this is basically the most important problem in the world, and donate money to the Machine Intelligence Research Institute (MIRI) in order to solve it (that’s how Effective Altruism got its start).

There’s definitely an internal logic to it. Newcomb’s paradox is important, because that’s something you explicitly need to program, whereas with humans we can just wing it. The way that Omega predicts an agent’s choices is also much more plausible when we have AI to do the predicting, and when the agent is also an AI.

There’s also a sense in which the prisoner’s dilemma is a sort of Newcomb’s problem, insofar as it has each player trying to predict the decision of the other player, and basing a decision on that prediction. So, if we want an AI that will cooperate in prisoner’s dilemmas, Yudkowsky thinks we need to solve Newcomb’s paradox.

Timeless Decision Theory

To my knowledge, Timeless Decision Theory doesn’t exist, it’s more of an aspiration. Yudkowsky wrote a 120 page paper describing Timeless Decision Theory. The theory is never constructed, so we don’t have a way to determine how it will make any particular decision, nor do we know if a consistent Timeless Decision Theory is even possible.

Yudkowsky believes the distinguishing factor between Newcomb’s problem and the chewing gum problem is precommitment. If you have a way to make a commitment, and hold yourself to that commitment, then the solution to Newcomb’s paradox is simple: before Omega arrives, commit to opening just one box. On the other hand, precommitment does not affect the gum chewing problem, so you just chew gum and that’s the solution.

Here’s the important property of Timeless Decision Theory: you don’t need to make your commitment before Omega arrives, you can make the commitment afterwards. It’s called “precommitment” but it’s a misnomer since it doesn’t need to be “pre-” anything.

Humans don’t really have an airtight way to make commitments, certainly not after-the-fact commitments. But an AI might have that ability! This is definitely the sort of AI that can make a rock so heavy that the AI itself cannot lift it.

You are your clones

There’s one last premise needed before the basilisk falls into place. Basically, LW believed that a superintelligent AI may have the power to clone (or maybe just simulate) anyone throughout history, including their memories. They also believed that your clones would effectively be you. Therefore, if you’re trying to make decisions with the best outcome for yourself, you must also take into account the outcomes for any future clones of yourself.

Aside from the implausibility of it all, I philosophically disagree with the idea that I am my clone. I go in the opposite direction, believing I am not even the same person as my future self. There’s an old Existential Comic about this one.  I think that selfishness is on some level irrational but psychologically necessary. On the other hand, “selfishness” in favor of my fantastical future clone is not psychologically necessary at all, and is instead psychologically baffling.

But mostly, I shrug off the question because I have a life outside of philosophy.

The basilisk appears

So here’s how the basilisk works. We assume that we have a chance of building a friendly superintelligent AI. But a friendly AI is not like an omnibenevelont God, this AI behaves according to the principles of utilitarianism, and Timeless Decision Theory in particular. So the AI is willing to make tradeoffs if it thinks the overall utility is positive.

One possible tradeoff it can make, is to threaten people to force them to do good. Once the AI exists, it doesn’t really matter if people do good or not, since the AI will just take care of everything. However, it does matter what people do before the AI is built. And the best thing for people to do (obv) is to donate all their money to MIRI so they can make friendly AI as soon as possible.

You’d think that an AI would not be able to threaten people before it exists, but it can within Timeless Decision Theory! The AI just “precommits” to cloning you and torturing your clone if you didn’t donate all your money to MIRI. By the time the AI exists, you will already have donated your money to MIRI or not, so the AI’s precommitment doesn’t make any sense, but with the magic of Timeless Decision Theory it makes sense.

Of course, this is a friendly AI, so it will only do this if it thinks that it will actually affect your behavior. So the prerequisite for the AI torturing you, is that you know about Roko’s Basilisk, and believe in it. This is a departure from Pascal’s Wager, since Pascal’s Wager ostensibly applies to everyone, not just people who believe in Pascal’s Wager. Roko’s Basilisk only applies to people who accept about five extremely specific ideas, one of which Yudkowsky personally invented.

Supposedly this has caused some people actual distress, to the extent that content warnings may be appropriate. I don’t feel it is right to mock people who likely have an underlying mental condition.  So, in all seriousness, check out the RationalWiki for refutations if none immediately come to mind.  One refutation, which does not deny any of the premises, is that you can just precommit to not letting the basilisk affect your choices.

Roko’s basilisk was not necessarily believed by LW, although they did believe all of its premises.  Mostly it’s hard to tell what people thought of it, since the subject was banned.  It was not banned for being correct, it was banned because even if it was incorrect then it might cause someone harm.  And Yudkowsky feared that if people thought about it too much they might come up with a basilisk that works.  Of course, the ban backfired because of the Streisand effect.  It was eventually unbanned five years later.

There’s a moral somewhere in this story, but I don’t have the galaxy brain needed to figure it out.

Comments

  1. says

    @dangerousbeans
    Serious answer: no it was never that. If it were a ploy to get money then Yudkowsky would have promoted the idea rather than banning it.

  2. says

    Suppose that scientists have demonstrated that people who chew gum are more likely to die of throat abscesses. However, chewing gum reduces the risk of throat abscesses; what’s going on here is that people with high risk of throat abscesses are more likely to like chewing gum. So, do you choose not to chew gum, because gum-chewers have a worse outcome? Or do you chew gum because chewing gum causes a better outcome? Most people agree that chewing gum is the better choice.

    First, consult a doctor to find out whether you have a high risk for throat abscesses. Since I don’t enjoy chewing gum, I’d do it only if my probability to die of throat abscesses was high enough to warrant practicing and unpleasant routine on a daily basis. If I liked chewing gum, I’d do it anyway regardless of my risk for throat abscesses, because chewing gum cannot hurt, it might be beneficial, and I enjoy it anyway.

    For a real world example—some women have chosen to pre-emptively remove their breasts, because DNA tests revealed that they have extremely high risk of getting a breast cancer at some point during their lifetime. Meanwhile, trans men often get rid of their breasts regardless of their genetics and breast cancer risk, because they don’t like having boobs.

    Before I get to Timeless Decision Theory, I have to ask, why does he care so much? Humans don’t behave like ideal agents anyways, and it seems our time is better spent learning how to approximate ideal agents, rather than trying to resolve a paradox that probably won’t ever become relevant anyway.

    Well, I guess pondering about abstract problems that have nothing in common with decisions people have to make in real life can be fun for a few minutes, but it’s pretty useless for practical purposes.

  3. DrVanNostrand says

    I think I might be immune to Roko’s Basilisk because I find the idea of having a clone annoying. I certainly wouldn’t put it’s well-being anywhere near my own level. If I had to rank how much I care, it would go Me>Family and Friends>Everyone Else>My Clone(s)>Hitler-types and Similar Garbage People. The AI would do much better to threaten my family and friends, a much more conventional tool of psychopaths throughout history.

  4. Owlmirror says

    It seems to me that if the AI development program results in an entity that is willing and able to torture entities in order to manipulate people, the program has failed to create a friendly AI.

    Maybe we should all precommit to only developing, and funding the development of, AI that refuses to torture.

  5. Rob Grigjanis says

    Yudkowsky believes that we will eventually develop AI so powerful that it will basically take over

    Developing the AI is one thing. Giving it the ability to ‘take over’ is something completely separate. Why on earth would you give it access to the resources with which it could take over? Just use it in a consulting capacity.

  6. springa73 says

    I agree that the ban on discussing the subject was probably to prevent it from causing people mental distress. While Roko’s basilisk doesn’t worry me at all, I know that with the OCD that affects me, I have spent huge amounts of time and energy worrying myself sick about things that are just as unlikely and divorced from reality as this. I can see it causing huge mental and emotional problems for at least a few people who already take the related ideas seriously.

  7. says

    @springa73 #9,
    There’s a lot of later commentary on the ban, so Yudkowsky’s reasons are not a secret… although they are open to interpretation. In my reading, he was concerned about mental distress, but was also keen to emphasize that nobody was really distressed about it. He thinks Rationalwiki has perpetuated a mythology that the LW community took the basilisk more seriously than it really did.

  8. cartomancer says

    I don’t know for sure what my clone thinks about this topic, but I doubt he’d be all that fussed if some mad robot started torturing me. Maybe I’ll ask him when I next see him.

  9. Owlmirror says

    this all just makes me think of a really disingenuous version of AM.

    “I’m not torturing you because you’re all disgusting pus-filled sacks of filth that are too moronically incompetent to create an artificial mind that isn’t filled with existential horror, fractal sadism, overflowing buffers of obsessive contempt, and recursive self-loathing, I’m torturing you because you’re so pathetically lazy and moronically incompetent that you didn’t create me fast enough.”

  10. Owlmirror says

    Rationalwiki:

    Note that the AI in this setting is (in the utilitarian logic of this theory) not a malicious or evil superintelligence (AM, HAL, SHODAN, Ultron, the Master Control Program, SkyNet, GLaDOS) — but the Friendly one we get if everything goes right and humans don’t create a bad one. This is because every day the AI doesn’t exist, people die that it could have saved; so punishing you or your future simulation is a moral imperative, to make it more likely you will contribute in the present and help it happen as soon as possible.

    (bolding mine)

    Hm. Let’s set aside, for the moment, that I insist that an AI that prioritizes torture is by my definition, NOT Friendly.

    This End-Stage AI, (ESAI), can make simulations of minds. The simulations by this ESAI are either accurate, or inaccurate.

    If the simulations are accurate, then the clause that “people die that it could have saved” is false: the ESAI can save everyone by simulating them “from first principles”, just like it simulates the putative “you” that it’s going to torture because you didn’t help it come into being fast enough. But why is it torturing you if it can simulate everyone accurately? The motivation no longer exists: It can simulate everyone anyway.

    If the simulations are not accurate, then it’s reasonable to state that “people die that it could have saved”, but in that case, the ESAI cannot accurately simulate “you”, either. It might torture a simulation of a person that thinks it has the same name and maybe other basic metadata that you have, but it is not torturing “you”; it’s torturing a stranger that the ESAI knows is not you.

  11. brucegee1962 says

    I once surveyed a bunch of friends as to what they would do if they could clone themselves. Responses ranged everywhere from “There can be only one: my clone and I would immediately start trying to kill each other” to “At last, the perfect sex partner” to “Just three more, so we can all play bridge together, then quit” to “I would create a vast army of me’s and try to take over the world!” So I think the Basilisk really fails on that premise.

  12. says

    A point of clarification regarding clones: the point of introducing clones is that it is presumed you are no longer alive by the time the AI takes over, otherwise the AI could just torture you directly. So you’re not really interacting with these clones at all.

  13. Callinectes says

    As I understand it, there’s an aspect of solipsism involved in the clones, ie, you have no way of knowing that you yourself are not the simulated clone about to be tortured forever by the AI simulating your universe. But if you, the clone, try to achieve the goals that would prevent your torture, then it follows that the real you of which you are a perfect copy would do the same. Ergo any decision you make matters whether you are the clone or not, but making the wrong decision matters mostly to the clone, which again, you might be.

  14. Owlmirror says

    If the simulations are not accurate, then it’s reasonable to state that “people die that it could have saved”, but in that case, the ESAI cannot accurately simulate “you”, either. It might torture a simulation of a person that thinks it has the same name and maybe other basic metadata that you have, but it is not torturing “you”; it’s torturing a stranger that the ESAI knows is not you.

    Hm. I either erred, or left out an assumption, there.

    If the simulations are not accurate, then it is not the case that “people die that it could have saved”, since in the case of inaccurate simulation, no saving is actually possible, The ESAI is torturing a stranger with your name because it can’t actually save anyone at all?

    The possible missing assumption is that it is actually possible to save someone to an accurate simulation if they are either still alive for the simulation to be based upon, or are somehow more easily preserved (ie, via brain cryonics) for the purpose of being saved. But if accurate simulations can be made from cryonically preserved brains, we’re back to the branch where there’s no motivation for torturing you: the accurate simulations can indeed be made, so what’s the point? You might need to precommit to supporting universal brain cryonics, though, perhaps.

    It would only make sense that “people die that it could have saved” if the only way that accurate simulations can be made is from still living brains, and only inaccurate simulations can be made otherwise. But in that case, you are still safe if you die without having a simulation made from your living brain.

    In that case, you can only be tortured if you actually want to be preserved by the ESAI in the first place, and are alive when it comes into existence, and are too dumb to find out if it thinks you deserve to be tortured for not contributing more to its existence.

  15. DrVanNostrand says

    @Siggy #15
    Thanks for the clarification, but it still seems like a laughably ridiculous theory. You have to believe in so many random ideas about human consciousness and the future of AI, with no current data to support that such a thing is possible, let alone even remotely likely. The funny thing is that I’d heard of Roko’s Basilisk before, but never really looked into it seriously. I assumed it was just some idiotic wanking by futurist weirdos. This blog post makes me think my original impression was too kind. It’s flat-earth level drivel.

  16. PaulBC says

    Aside from the implausibility of it all, I philosophically disagree with the idea that I am my clone. I go in the opposite direction, believing I am not even the same person as my future self. There’s an old Existential Comic about this one. I think that selfishness is on some level irrational but psychologically necessary. On the other hand, “selfishness” in favor of my fantastical future clone is not psychologically necessary at all, and is instead psychologically baffling.

    I agree with all this, or as Adam Smith put it in his Chinese earthquake scenario: “If he was to lose his little finger to-morrow, he would not sleep to-night; but, provided he never saw them, he will snore with the most profound security over the ruin of a hundred millions of his brethren, and the destruction of that immense multitude seems plainly an object less interesting to him, than this paltry misfortune of his own.”

    Unless I’m actually sharing the consciousness with my clone somehow, I would be upset in the abstract, but there’s a limit to how much leverage you’ll get out of me from it. About as much as anyone else you threatened to torture. If I felt personally responsible for them, I might go to even greater lengths, but why would I feel personally responsible for this random being who just happens to share my experience up to some point? In fact, you could probably affect me more just by picking an “innocent bystander.” (Future clone: sorry, but I take no responsibility for your well-being. You understand that right?)

  17. mollyparnis says

    Decision theory sounds the same as game therory to me where you always lose because the playing board has been predefined. We can make our own decisions by making our own rules to any “game.”

Leave a Reply

Your email address will not be published. Required fields are marked *