Self-Improving Artificial Intelligence & The Methods of Rationality


Who: Eliezer Yudkowsky
Where: Belfer Hall Room 430
When: Wednesday, March 2nd, 2011, 8:30PM
Transcribed by: Olivia Friedman
Edited by: Yitzchok Pinkesz
Audio by: Chumie Yagod
Click here to download a copy of the pdf presentation

Video
Audio

Introduction

(slide 1) Yitzchok: Hello and thank you all for coming tonight. My name is Yitzchok Pinkesz. Last night the computer science club held its first student presentation on practical computing. It was the first of a four part presentation on web development and design. We plan on having a presentation every Tuesday and already have some lined up through March and into May. For tonight, I would like to introduce our guest speaker here from California. He is a cofounder and senior research fellow at The Singularity Institute for Artificial Intelligence. He has authored numerous papers on AI. He has also authored The Sequences of Less Wrong and the extremely popular Harry Potter Fan Fiction: Harry Potter and the Methods of Rationality. Ladies and Gentleman: Eliezer Yudkowsky.

Presentation

(slide 2) Eliezer: Good evening. Am I being broadcast at this time?… You’re supposed to respond yes or no to that.
QUESTION: What was the question? Eliezer: The question was can you hear me (laughter)… but I will try again. Linda is 31 years old, single, outspoken and very bright. She majored in philosophy, as a student she was deeply concerned with issues of discrimination and social justice and also participated in anti-nuclear demonstrations. Rank the following statements from most probable to least probable.

  • Linda is a teacher in an elementary school
  • Linda works in a bookstore and takes yoga classes
  • Linda is active in the feminist movement
  • Linda is a psychiatric social worker
  • Linda is a member of the League of Women Voters
  • Linda is a bank teller
  • Linda is an insurance salesperson
  • Linda is a bank teller and is active in the feminist movement

Now, you don’t actually have to do that (slide 3) but when a group of experimental subjects was asked to do that they found that 89% of the subjects thought Linda is a bank teller and active in the feminist movement was more likely than Linda is a bank teller, (slide 4) now if it’s not obvious to you that this is wrong, consider that you can only have so many bank tellers but not all of the bank tellers are feminists then it is necessarily the case that there are more bank tellers than people who are feminists and bank tellers. (slide 5) Another similar case, you have a die with four green faces and two red faces. The die is rolled twenty times, we record a series of green and red faces, if your chosen sequence appears you win $25. This was done with real money. Now I would like you to this time actually take a moment and decide whether you would decide to bet on:

  1. RGRRR
  2. GRGRRR
  3. GRRRRR

Did everyone pick? Raise your hand if you picked a sequence to bet on. (slide 6) 65% of the 125 undergraduates who were playing this for real money picked sequence two. (slide 7) Now the actual course is of course that sequence one appears inside of sequence two so it appears anywhere sequence two appears. Now this is known as a conjunction fallacy. It’s known as a conjunction fallacy because you are assigning greater probability to the more complicated event, the event that is strictly more complicated. Sequence two is equal to sequence one plus the additional requirement preceded by (removes mike). The conjunction fallacy is called that because it violates an actual law of probability theory which states that the probability of the compound event A and B is always less than or equal to the probability of A alone. In other words A whether or not B happens is always more probable A happens and B happens. This illustrates what’s called a heuristic. The heuristic is that what we currently believe to be happening when people pick sequence two is that they want to know which of these sequences is most likely to come up, which of these sequences is most likely to win me the $25. (slide 8) But sort of unnoticed, the way their brain processes this question is it pops it around and asks which of these sequences most resembles the die. The die is mostly green so the sequence with the greenest in it will appear to mostly resemble the die. Judgment of representativeness has been substituted with judgment of probability. And this works a lot of the time. It is a decent heuristic that also gives rise to a systematic predictable bias. You can experimentally reproduce a certain type of human error.

(slide 9) The second international congress on forecasting, two groups of professional analysts, the first group was asked to rate the probability of a suspension of diplomatic relations between the USA and the soviet union in back in 1983 versus a Russian invasion of Poland and a complete suspension of diplomatic relations between the USA and the soviet union. The group asked to rate version two responded with significantly higher probabilities. (slide 10) It is another case of the conjunction fallacy because version two is the strictly more complicated story. (slide 11) And the moral of all this is that even though adding details to a story necessarily makes that story less probable the detail can also make it sound more plausible. This brings us to the Artificial intelligence part which we will be getting to in just a few moments, don’t worry. In general when you talk about the future if you look at the people who are selling you futurism, you’ll find two kinds of people out there who are doing futurism. There is a group of people who are telling you wonderfully detailed entertaining stories that sound nicely plausible and there are people who are weighing each and every additional detail they add to their stories and asking, can I justify this detail, does it hold up under scrutiny? And almost anything you see on television, is likely to be storytelling of the entertaining type. If you want the careful futurism, and there really is a just tremendous difference between these two professions, once you sort of realize that the key note is how complicated our group’s story in that it can’ even support its own details. If you want detailed supported futurism, you want to look at, say, the Oxford Future of Humanity Institute, it’s definitely done more by analytic philosophers than by the sort of people you see on television. Also in this introduction I’ve introduced a very counterintuitive idea except that this audience I hope will actually not find it so counterintuitive. The idea is that there are rules for how to think. Even if you’re dealing with probabilities, there are rules for how you handle uncertainty. I’d expect this to be one of the rarer upsides to yeshiva education, when you’re done processing all that Gemara, the idea of rules for asking questions and answering them will probably not be all that counterintuitive. I can’t think of any other education anywhere that will actually teach you that there are rules for how you think.

(slide 12) In sensitivity to predictivity, some experimental subjects were presented with a description of teacher giving a lesson. Some students were asked to evaluate the quality of lesson like is this in the top ten percent of lessons, the bottom twenty percent of lessons, is it about average. It might be that fifty percent of lessons are worse fifty percent are better, some other subjects were asked to predict the percentile standing of the teacher five years later. Like five years later will they be in the top 20% of teachers, the bottom 20% of teachers. (slide 13) And they gave equally extreme judgments. Why is this odd? Because if the quality of a single lesson does not perfectly predict how good a teacher you are five years later then it can’t be the case that you’re going to get more extreme data five years from now, the top 20% of teachers cannot all end up in the top 10% of teachers even if it’s perfectly predictable that the top 20% of teachers now will be in the top 20% teachers then. If there is any regression to the mean, if it is less than perfectly diagnostic, some of the people who seem like good teachers now will seem like worse teachers later and someone who gives lesson that is in the top 10% of lessons given random noise they will probably be in say the top 20% of teachers five years later. The other thing that you will find in incautious futurism is that they have no idea what they can predict and what they can’t predict. They will try to predict anything. It’s sort of the equivalent of saying based on this one lesson that I am getting a paper description of how good of a teacher will this be five years later. They don’t automatically realize, wait a minute, this information that I have is not perfectly diagnostic. We don’t ask how much noise there is in our judgments; we don’t ask how little we know.

(slide 14) And as those of you who have read Harry Potter and the Methods of Rationality will remember, the fundamental question of rationality is what do you think you know, and how do you think you know it? Most aspects of the future are simply not very predictable. For me to come here and say that I know anything at all about the future which is generally not all that predictable or about the role of artificial intelligence in the future is a rather sort of burdensome statement. You should demand that I argue it well. Now point of fact, this is not going be a twenty hour talk, and therefore I will not be able to argue it half as well as I would like. This is an attitude that the yeshiva audience in particular I would expect to understand exactly what I mean when someone comes to you and claims they know something about the future first thing you ask is what do you think you know and how do you think you know. Something I think I know, I know that: (slide 15) ♫♪♫One of these technologies is not like the others, one these technologies doesn't belong. Can you guess which one is not like the others, before I finish my song?♫♪♫ And if any of you did not recognize that tune, you can raise your hand and make me feel really old. (Chana Yudkowsky raises her hand) Thank you to the girl in the second row. So one of these technologies is not like the others because we have three cool gadgets and one thing that actually has something to do with the mind. Going between planets is cool, curing cancer is cool and very good. And Nano manufacturing is cool but possibly good and possibly very bad, it has had a very large impact upon the economy but one of these trumps all the others and it’s not because it has the word artificial in it, it’s because it has the word intelligence in it. (slide 16) We sort of overlook the power of intelligence because we already have fire, we already have language, we already have nuclear weapons and we forget that chimpanzees and rocks don’t have these abilities. When you think of intelligence, don’t think of getting a high score on a test, that is just sort of the rather minor use of intelligence. Think of skyscrapers, think of spaceships, think of money, think of science think of being human. (slide 17) Talking about intelligence evokes images of calculus, chess, good recall of facts, social persuasion, enthusiasm it’s not a function of your liver, it happens in the brain somewhere. (slide 18) What do I think I can guess? I think I can guess that technologies which impact upon cognition which have something to do with thinking and how we think will end up mattering the most in the shape of the future, because intelligence is something more powerful and significant than all the wonderful amazing gadgetry that popular magazines are going to tell you, is futuristic. Because futuristic is a word that marketers use to sell you products they want you to associate the word futuristic with your iPhone. So they give you this notion that the future is about which interesting devices that you have. There’s an alternative perspective pioneered I would say by ???(Bernard Benji)??? which says that the human brain has been more or less constant for fifty thousand years and if you do anything which significantly changes the human brain, or significantly changes the sort of intelligences that are doing the thinking around here, you have made a break with the past that is unlike the invention of your iPhone. It is unlike the invention of electricity, it is unlike the invention of agriculture, because for the very first time sometime in the next few decades we are going to have technologies that change the type of thinking that gets done on this planet.

(slide 19) What else do I think I know? (slide 20) I’d like to tell a little drasha. And the drasha is: imagine a planet where nobody has the faintest idea how to add anything. They can sort of add instinctively, intuitively, but they have no idea what exactly they’re doing when they perform addition so they try to build artificial adding machines. And the way they try to build their artificial adding machines, is by entering in a list of interesting arithmetical facts such that when you take the number seven and six and put them together you get thirteen. And when you take the number nine and six then you get the number fifteen. And if you take the number seven and the number two you put them together and you get nine. But their machine cannot take these first three facts and deduce from it that thirteen plus two is fifteen, which does logically follow because they don’t quite realize how addition works. In this alternative world, there are machines that will add numbers up to twenty but doing truly general addition like humans do is and adding numbers from the range of 40 to 60 to even 100, that is beyond the reach of current technology. Only humans can add numbers that high. In this world there are some views on the problem of artificial general addition. (slide 21) For example there is the idea that addition is a framing problem what twenty one plus equals depends on whether it’s plus three or plus four. So the way to get artificial general addition is that you just have to program in enough of those tiny little details and facts that you can cover most of the territory that you need addition for. Then there is an alternative approach which says that you need an artificial arithmetician which can understand the natural language so that it can learn all those different arithmetical facts by reading the World Wide Web. There is the idea that the only way to develop a general arithmetician is to do it the same way nature did natural selection. There is the idea that top down approaches have just failed to produce arithmetic and that you need a bottom up approach to make addition emerge. (slide 22) There is the idea that you can do it using neural networks which work just like the human brain and can be trained to perform addition without our ever understanding how it works. Or that the reason that you cannot program artificial general addition is that you just don’ have calculators as powerful as the human brain and that Moore’s laws says that we will get calculators as powerful as the human brain on April 27, 2031 between 4 and 4:30 in the morning. Or that the best way to get an artificial general adder is to scan in a human brain and simulate the neural circuitry that humans use for addition, since after all humans are the only known study of addition that really works. There is the idea that since Gödel’s theorem shows no formal system can ever capture the properties of arithmetic, and classical physics is formalizable, clearly addition must exploit quantum gravity. But really if arithmetic were simple enough for humans to program we couldn’t count high enough to build computers. Then there is John Searles’ Chinese calculator experiment and the idea that we will never know the nature of arithmetic because the problem is just too hard. (slide 23) Lesson 1: When you’re missing a basic insight, you must find that insight workarounds may sound clever. They’re not. You cannot talk sensibly about solutions until you are no longer confused. And conversely when you actually do find the solution, the way you’ll know that it is a solution is that you will no longer be confused after you know it If I tell you for example that arithmetic is an emergent property of bottom up systems it feels like you know it in fact but you cannot actually make any new experimental predictions. And it still has this wonderful mysterious quality that it had before I told you that it was emergent. The word emergence is actually functionally isomorphic to the word magic but that’s a whole separate thing. (slide 24) Moral 2: If you have to put an infinite number of surface cases into your computer it means you don’t understand the underlying process that is to generate all these separate cases you are putting in. Now you may ask well but how do you know that intelligence is understandable? Maybe it is just a permanent mystery maybe there is something inherently mysterious about intelligence.

(slide 25) Now I would to regale you with one of my favorite quotes, it is by a fellow named Lord Kelvin. Poor Lord Kelvin, he had a good scientific career; helped lay the first transatlantic cable and gave his name to the Kelvin scale of temperature and will be remembered forever after for a few really stupid things that he said that were extremely quotable, and it goes as follows:

“The influence of animal or vegetable life on matter is infinitely beyond the range of any scientific inquiry hitherto entered on. Its power of directing the motions of moving particles, in the demonstrated daily miracle of our human free-will, and in the growth of generation after generation of plants from a single seed, are infinitely different from any possible result of the fortuitous concurrence of atoms. Modern biologists were coming once more to the acceptance of something and that was a vital principle.”

This being from back in the nineteenth century when nobody had any idea if biology really worked. No-one knew that there were tiny little molecular machines down there and so they thought it was magic. They didn’t actually call it magic that would not have been scientifically respectable in the nineteenth century, they called it Élan Vital but it was semantically the same as Magic because telling you that it was Élan Vital or Vis Vitalis told you exactly as much as telling you that it was magic in terms of what new experimental predictions it let you make; i.e. none. And this is just a sort of useful quote to remember because every time you run into something new that is mysterious it comes as a tremendous shock to the current generation. If you’d actually lived over past 3000 years and you had seen stars go from everlasting permanent mysteries to the domain of science and you had seen life go from everlasting permanent mysteries to the domain of science and you had seen chemistry matter go from the domain of alchemy to the domain of science and then you looked at brain and you didn’t quite understand how it worked on the very first try, you’d say I’m not falling for that one again. But people have no sense of history and they think that chemistry has never been mysterious, the stars have never been mysterious these things have always been in the proper domain of science and anyone who used to think that the stars were mysterious, well they were just being silly going all mystical about such an obviously scientific subject as biology. Then you come across intelligence, and you don’t know how it works and you panic. Don’t panic. (slide 26) As Edwin Thompson Jaynes once said “If you are ignorant about a phenomenon, this is a fact about your state of mind, not a fact about the phenomenon itself.” Which I sometimes summarize by saying a blank map does not correspond to a blank territory. Or ignorance is in the mind, not in reality. If you see something that seems wonderfully mysterious and you start worshipping it because it seems so wonderfully mysterious what you are actually worshipping is your own ignorance. Because there is no way that the world itself can have the property of being unknown, all ignorance is, is always some particular person who is ignorant of something. You cannot solidify ignorance and have it be a special kind of ignorant atom, you know, where you have a lot of kind of ignorant atoms that is used to make a large amount of ignorance. Ignorance is a property of minds, not of thingies. (slide 27) What do I think I can guess about the future which is generally not very predictable? I think I can guess that today’s difficulties in constructing artificial intelligence are not because intelligence is inherently, permanently, and everlastingly a separate category of mysterious things apart from biology and the stars, but because we still don’t know what the heck we are doing.

(slide 28) What else do I think I know and how do I know it? I think the projector has stopped working. The more amazing labor saving devices that we have the more time we spend getting them to work. (slide 29) I think that there’s a designed space of possibilities for how you could construct minds that is very much wider than the space of human minds, that is actually a thing that I will be supporting with more detail a bit later, and I think that human minds are actually pretty near the bottom of the cosmic scale of what sort of intelligent physical objects you could construct. This is very clear when you look at the brain’s hardware. (slide 30) You’ve got the neurons transmitting their little electro-chemical spikes by pumping ions in and out of the membrane of the neuron. With the result that is transmitting signals at the breakneck speed of one meter or even one hundred meters per second, less than a millionth the speed of light. We also know that your computer has a clock speed on it that says something like 2.2 giga hertz which means 2.2 billion clicks per second. Whereas a very rapidly spiking neuron will fire maybe two hundred times per second, about one per millionth of that. (slide 31) Even the realm of heat dissipation, because your brain does still have more synapses than your computer has transistors, and yet your brain does not bake itself when it runs, even there you can calculate the sort of minimum theoretical energy that it would take to do computations like that of room temperature, and once again you should be able to get a fact of a million speedup. So knowing the laws of physics, knowing what neurons do and how they work, we can say that it ought to be physically possible to have an artifact that thinks a million times as fast as a human brain without shrinking it, cooling it using reversible computing, or quantum computing, or any number of other tricks it looks like it should be very physically straightforward to say that it is possible to have an artifact that thinks like a human brain only a million times as fast. At which rate you can do one year of thinking per every thirty one objective seconds a whole year would go by not like (snaps fingers), that would require ten million times as fast but you know we can be fairly confident that human brains are not the fastest possible thinkers.

(slide 32) What about the software? We’re accustomed to measuring intelligence on the scale that goes from village idiot at the bottom to Einstein at the top. This is because if someone is not at least as intelligent as a human, we don’t bother dealing with them at all. And right now there is nothing around smarter than human or we would be having this conversation at that level of intelligence instead. (slide 33) I’d like to say that you ought to visualize the sort of absolute scale where the zero at the bottom left is a rock then there is lizards and mice and village idiots and Einstein this little tiny point, practically the same point. It is almost unrealistic that I am letting you see the difference between them on this graph over here. (slide 34) Software is a bit more difficult to analyze than hardware. But at the very least we can talk about all sorts of wonderful features that a mind could in principle have but which human minds don’t have. For example we cannot read or write our own source code partially because we just don’t have the access to rewire our own neurons and partially because the human brain is a gigantic undocumented mess of spaghetti code that is not modifiable. You know, when we sequenced the genome, we were really hoping to find comments saying what everything did. Humans have about four times the brain volume of a chimpanzee, four times as much hardware is not a significant difference these days on the scale of computer programming. A significant difference is, I think, I need to buy one hundred times as many computers run this code. And if you’re at Google they’d promote you because they are always looking for new ways to use all the computers that they have. If there is something wrong with your hearing you can’t just slide out your hearing chip and slide in a new hearing chip instead. And the Human brain is just tremendously noisy because, well, that’s the way natural selection designs things, if there is anyone here who doesn’t believe in natural selection I think I had better mention, tough luck, you’re wrong. (slide 35) So I think I can guess that it is possible, physically permissible, to have thinking artifacts which are far more powerful than human. We can see very easily that human hardware is far short of the theoretical physical limits and given that the hardware is like that, there is no reason to expect the software is optimal. Especially when you consider the biases I showed you earlier. If you were designing something with this much computing power would you really have be unable to realize that a Soviet invasion of Poland followed by a breakdown of diplomatic relations is less probable than a breakdown of diplomatic relations for any reason?

(slide 36) What else do you think you know and how do you think you know it? (slide 37) In our current world today there is a very sort of one-directional idiom: intelligence makes technology. Now yes we can use technology to make things like paper, reading and writing and a somewhat less significant change, telephones, iPhones, printing books and computers. But none the less that doesn’t change fact that you have a frontal cortex over here, visual cortex over here, you’ve got temporal cortex over here. You’ve got your hypothalamus in the middle, it is the same basic brain design that we have been using fifty thousand years ago. (slide 38) Sometime in the next few decades we are going to not only be using intelligence to make technology we are going to be using technology to improve our intelligence, whether that is by genetic engineering, or neuro-pharmaceuticals or brain to computer interfaces, or computer assisted telepathy so that we can have sixty four node clustered humans. Or of course artificial intelligence, is none the less a very deep sort of seed change than instead of just having to use these little constant brains off buildings and improving technology were going to have the technology loop back around and change the nature of the intelligence doing the work. (slide 39) Of which the most extreme form of this idiom it would be an AI directly rewiring some of the source code. The reason why people like me tend to focus on AI, those who meddle in the destiny of worlds tend to focus on AI is because if you are trying to do this via genetic engineering, it takes at least 18 years for kids to grow up and become genetic engineers in their own right. And if you would have an AI modify its own source code you might be able to get interesting things over the course of eighteen seconds instead. (slide 40) In 1965, a fellow named I. J. Good who was actually anticipated by a science fiction editor John Campbell invented a notion that he called the “intelligence explosion,” the idea of which is that if you can have an artificial intelligence that can improve its own algorithms in a way that makes itself even smarter then it will also be smarter at the job of rewriting its own algorithms and you get a positive feedback cycle. That may eventually top out somewhere but if the upper bounds are very far about where we are the difference between this, being, say, infinitely more intelligence than us or merely a billion times more intelligence than us may not make much difference from our perspective. This is what is known as an intelligence explosion. (slide 41) How do we know that the problems don’t get more difficult? The problem of making your smarter does not get faster than you can make yourselves smarter? And the answer is we don’t but we have the following suggestive evidence. First is that going from the last common ancestor of chimps and humans to humans involves four times the brain volume, or six times frontal cortical area, and this was done by the sort of same idiom of natural selection, make a mistake, mistakes lead to differential rates of reproduction, keep the really good mistakes, that’ll repeat for millions and millions of years until we get something interesting the character of this process did not change over the last five million years but it didn’t seem to encounter any sort of visible major obstacles in going from the last common ancestor of chimps and humans to humans. Similarly we have had human brains exhorting this sort of roughly constant optimization pressure or roughly the same characteristic kind of thinkers along the last fifty thousand years and it went from flint hand axes to moon rockets and it was not the case that as our technology improved the problem of improving that technology got harder and harder and harder, so that we have this, like, logarithm that slowed down where going from rocket version one to rocket version two took a hundred times as long as going from flint hand axes to rockets we did not see anything like that there was no slow down there was no logarithmic curve, we just kept on ticking. So given that that is what we see for the difficulty of improving intelligence the difficulty of creating technology, if you have an AI that is strong enough to improve itself in a way that improves its ability to improve itself, you would expect this at some point to go a whole lot faster than the process that we’ve seen over the course of evolution or over the course of human history, computers, and technology. Because in human history, every time we invented a new technology, it did not double our brain size. Within AI, every time it manages to improve its own software, it is improving that which does the improvement. (slide 42) Because of some, shall I say, unfortunate media coverage that this kind of thinking has gotten lately, I should mention that this intelligence explosion does not imply nor require that more change occurred from 1970 to 2000 than from 1940 to 1970. I’m agnostic on that issue, the president of the Singularity Institute and its largest funder, Michael Vassar and Peter Teil both think this is actually wrong, we have more change from 1940 to 1970 than we have from 1970 to 2000. (slide 43) It does not imply, does not require that technological progress follows a predictable curve and you can say that the intelligence explosion will occur on April 27th, 2031 between four and four thirty in the morning. The singularity institute for artificial intelligence back when we named ourselves meant the intelligence explosion. Nowadays “Singularity” is the sort of really loose term that means any sort of amazing cool technology you like being produced at around the same rate that it is right now so that you can have a cool little graph that predicts exactly when you get everything. This is not what we advocate. If we were going to name our institute today, we’d probably pick a different name. (slide 44) What do we think we can guess? I think we can guess that over some threshold level of required intelligence an AI can recursively self-improve and undergo a very rapid ascent leading to super intelligence before it tops out. Now this is not true with every possible self-improving AI you can try to build, it has to be good enough to make deep changes to itself it probably has to understand the concepts that went into its own creation as well as you understand that at least if you plan to do this right it is not necessarily easy I don’t claim to have the slightest idea when it is going to happen except that it will probably be somewhere between five minutes and five hundred years from now. If you said, well what does your behavior show that you think you know? What predicted median arrival time you’re acting as if it would probably come out to acting as if 2035 or something like that? But you want to put very wide bounds on your confidence around a guess like that.

(slide 45) What else do I think I know and how do I think I know it? This goes back to the idea that the space of possible mind design is much larger than the human space of mind design. (slide 46) Human beings are really remarkably similar to each other; there is a list of human universals which is actually much longer than this. It is an entire book on all the things that anthropologists have found in every single human culture that has ever been studied. They all made tools, they all had weapons, they all spoke languages within a certain class of grammars, they all have tickles, they all have known times for meals, they all have conflicts and mediation of those conflicts, they dance, they sing, they have personal names, they make promises and are offended if you break them, they mourn their dead and the list goes on and on and on and on. Now, why should that be? Why are human beings so similar? (slide 47) A complex adaptation must be universal within a sexually-reproducing species. (slide 48) Suppose you have a complex adaptation it’s one of those big powerful adaptations that has lots of moving parts, some piece of complicated machinery in biology. Let’s say six genes necessary to make this. Which is actually a pretty simple piece of machinery, as such things go. And if each of those six necessary genes are 10% frequency within the gene pool then even if you have one person that is a complete version of this adaptation the next time they reproduce their offspring will get three in six of the genes on average and in general the machine will just get hopelessly scrambled unless it’s universal. At any given time you can have sort of selection on individual tweaks to the machinery but not lots of independent genes that come together to form a useful complicated machine. Evolution has to evolve incrementally. (slide 49) You start with a gene that’s advantageous by itself then once A has evolved to universality you can get another gene B that depends on A. If A has not evolved to universality at the time the gene B comes along, if A is only 1% prevalent in the populational gene when B comes along then that means that the fitness advantage represented by B decreases by a factor of one hundred and that means that B will probably never rise to universality. But once you do have A you can get B that depends on A and you can have an alternative version of A that depends on the presence of B and once you have got A’ B you can have C that depends on A’ B and so on and so forth until you have these complex machines with hundreds of parts which will fall apart if you move any one of them. Which of course some people quite incorrectly take as evidence of intelligent design, and the main sort of differential prediction is that whenever you find a piece of complex machinery like that, you should be able to look around and find predecessors that are simpler, have fewer of the same parts if you are designing something intelligent then there is no reason to have simpler predecessors around. If you’re evolving something then there were simpler predecessors at some point in history you can probably find some traces. (slide 50) I’d like to note at this point that there was a famous biologist, named J. B. S. Haldane, who was once asked what would falsify the theory of evolution to which J. B. S. Haldane immediately snapped back rabbit fossils in the Precambrian. Something else which would falsify the theory of evolution is the X-Men. Over here we have Storm, who, unlike either of her parents, can throw lightning bolts. Her body can generate electricity, control the electricity and not be hurt by the electricity. If anything of that happens in one mutation, the theory of natural selection can be falsified. (slide 51) And when you apply this logic to the brain, and of course you have any bit of complex machinery in the brain, we can have slightly different versions of it maybe but we all need to have the same basic complex machine. We all have frontal cortex, we all have visual cortex. Sometimes you can knock out a single gene, when someone is born without an auditory cortex, missing a piece of functional machinery. But by and large the fact that brain machinery is complicated means that everyone has the same engine running under the hood; we are just painted slightly different colors. This sometimes leads to another class of rather entertaining mistakes in movies.

(slide 52) Consider the Matrix. At first we have agent smith who at first appears sort of cold and emotionless (must not emote) but the people who write the script for these movies are incapable of conceiving of a mind truly different from their own. They do not understand their own design so they cannot conceive of a mind with a different design. In particular the only way they know how to model something emotionless is to imagine an emotionally repressed human. They imagine themselves as very repressed and they put themselves in the artificial intelligence’s shoes which of course yields the prediction that as soon as you are under enough stress, you’re repression will fail and the artificial intelligence will start showing emotions. (slide 53) And in particular Agent Smith will be disgusted with his Morpheus over here and he will look disgusted and he will have the human universal expression for disgust. Because one of the things that are the same among all studied cultures is the same facial expressions wired up to the same emotions. We all smile when we are happy, we all frown when we are sad. This is not culture specific it is in the wiring. Once you know the artificial intelligences in the movies have the same facial expressions as humans, this is to say nothing of the same emotions. (slide 54) There’s this class of, not so much science fiction as bad illustrations, where you have got the monster carrying off the girl in the torn dress, for some reason it is never a female monster carrying off the man with the torn shirt. Now what could the person who drew this picture possibly have been thinking? Are they thinking about the logic of natural selection and saying: “Hmmm… I bet a swamp monster would evolve to find human females reproductively attractive”? No. What they are thinking is: “Hmmm… this woman seems attractive, therefore the monster will be attracted to her.” Like the property of being attractive is a property of the woman herself not the mind looking at the woman. It is another case of the Mind Projection Fallacy. Which is sort of I suspect one of the really basic errors in philosophy. To take something that is a property of your mind and project it onto the external world. We think that certain things are desirable but actually it’s a property of a mind that it desires that. (slide 55) And there is I think a common fallacy of the in incautions futurism which runs like this. The first premise is that the smarter you are, the larger a cheesecake you can bake. And the second premise is that Moore’s law continues to give us more and more computing power. (slide 56) And from this we can deduce the future will be full of giant cheesecakes. Power does not imply motive. (slide 57) I think I can guess that features of human nature that we take for granted because they are so universal in our everyday practice, not necessary features of all possible minds. The human subspace of possibilities is this tiny little dot we are all packed into with this much larger space of minds in general. And not all possible agents of that space have the same motives. In particular, not all of them find “niceness” to be to them, desirable. If we can have possible artificial intelligences that are very smart and not nice, am I going to tell you don’t panic? No, I think panic is actually a smart move at this point. I’m not going to tell you not to panic prematurely because I actually suspect that we are panicking post-maturely and if we panic any later we would panicking posthumously. None the less the problem of the possibility of constructing very smart minds that are not necessarily your friends is one that cannot be solved by banning the study of artificial intelligence, though I fully encourage you personally not to build artificial intelligences that will destroy the world. I know that not all of you in the audience are going to take that injunction to heart. And so, what I would like to do would be to construct something that is smarter than us and nice.

(slide 58) There’s a suggestion that among the things we can know, this practice is impossible. (slide 59) And the argument is: you just cannot predict what a mind smarter than yours will do. Because, in order to predict a smarter mind, you’d have to be that smart yourself. (slide 60) For example if you have a chess playing machine and can predict exactly what move it will make at every turn, or actually let us just take Gary Kasporov who used to be the human world champion of chess before the machines beat him out for it. If you can predict exactly where Kasporov will move on the chessboard you must be at least as good a chess player as Kasporov because you can always just move wherever Kasporov would have moved. For those of you who are here because you have read Harry Potter and the Methods of Rationality, one of the secret agendas of Harry Potter and the Methods of Rationality is that geniuses in all the other science fiction or fantasy that you try to read are really no smarter than the author. They may be able to make amazing cool devices they may be able to solve problems that a real person could not solve in their shoes because the author is just letting them solve it. But their choices within the framework of the story are not actually going to be any smarter than the smartest choice that the author could possibly make. This makes all fiction about geniuses have this sort of horrible fakeness about them. And one of the reasons why people like Harry Potter and the Methods of Rationality is that the characters in it are not merely geniuses in theory who can make amazing devices, they are people who actually do smart things within the context of the story. I of course do that by cheating and being smart enough to simulate all the characters in Harry Potter and the Methods of Rationality. But if it comes to an actual artificial intelligence, you cannot cheat. Actually, it is smarter than you. So the argument is you can’t know anything about it, and in particular you can’t know if it is going to be nice or not nice. (slide 61) But consider, you can’t actually program the chess-playing AI by programming every position into it, by taking every possible chess position, looking at that chess position, saying what seems to you like the best move in that chess position. And pre-programming that move into the AI. That’s the equivalent of the artificial arithmetician that I was talking about earlier programming every fact independently. You can’t do that because there are too many possible chess positions. (slide 62) The insight you need to solve this apparently impossible problem, to perform this paradoxical feat of creating a machine that plays better chess than you do, is to realize that the code you write does not specify a move in each position. The code you write specifies a search. In particular it tries to figure out consequences of a move. It looks into the possible moves that could be made after its own move, where could Kasporov go, then where can I go, then where can Kasporov go, and tries to predict which moves lead to the space of final chess scores where its own side has won the game because that is what we have defined Deep Blue, the chess player, to want. (slide 63) Deep blue’s programmers couldn’t predict its exact moves. (slide 64) Okay, so if you cannot predict the moves why not use random move generator, isn’t that just as good? (slide 65) No because unpredictability of a superior intelligence is not like the unpredictability of random noise or flipping a fair coin. (slide 66) In particular, the unpredictable moves that Deep Blue was making had the consequence of winning the game. Or to state it a bit more precisely: The predicted fact about Deep Blue, was that it would try to find winning moves and moves that lead to the consequence of winning states of the board. (slide 67) You cannot ever prove by looking at your own code that a given move is going to have a given consequence in the real world. The real world is uncertain, it is incompletely known. And a lot of times you don’t have the computing power to use even the knowledge that you have. What you can determine by looking at the code of an AI is that the search it’s performing to find its action is by constructing the consequences of that action that fall and looking for consequences that fall into the space that we would regard as good. Of course in order to do that you have to define what means good within the AI. Which is a more complicated topic, and which I do not think we have time to get into. (slide 68) But if it has free access to its own source code can’t it just overwrite whatever utility function you try to program into it? Well, Gandhi doesn’t want to kill people. Let’s say you offer Gandhi a pill that makes him want to kill people. Will Gandhi take this pill? (slide 69) I predict not. Or at least if Gandhi correctly realizes what this pill does, Gandhi will refuse the pill because the current Gandhi doesn’t want people to die. (slide 70) This is a sort of rough outline that there ought to exist some sort of theorem, that we presently cannot prove because we don’t quite understand the decision theory of self-modifying AIs which is what I actually work on when I am not writing the non-fiction book on rationality, or working on the Harry Potter and the Methods of Rationality. But there ought to exist a theorem, even though we don’t currently know what theorem to prove let alone how to prove it, which says that if you are a rational agent and you are able to understand effect of change to your source code upon yourself you will in general try to preserve your utility function as you try to self-modify, self-improve, write your code to be better at searching for actions with consequences you want. You will not modify your utility function doing that. If you start out nice, you will stay nice. If you start out not nice, you will stay not nice. If you start out trying to maximize the number of paper clips in the universe, then as you improve your intelligence you will want to go on trying to maximize the number of paper clips in the universe. If you are successful, you will become super intelligent the universe will wind up full of paper clips. (slide 71) To avoid the dreaded paperclip maximizer, we would like the first AI that gets smart enough to repeatedly improve itself and become super intelligent to be friendly, nice, benevolent. There is probably a certain amount of demand I would guess for you to know what the heck I mean by that, correct me if I am mistaken. It looks like I do have enough time to describe that very briefly. Our current proposal is to have an AI that looks at people, learns models of how they make decisions, extrapolates those models toward what is called a reflective equilibrium by asking not what would you want me to do but what would you want me to do if you knew everything I knew, thought as fast as I did and can consider as many options as I can consider and had my ability as an AI to completely know my own mental state and modify myself so as not to have any unwanted weaknesses as well. This is a notion in moral philosophy of Reflective Equilibrium the limit of what you would want as you approach perfect knowledge, the ability to consider all the options and know unwanted weaknesses, self-control. And it is the closest that moral philosophy has ever come to defining in a sort of computationally describable way what it means to do what you should. The current proposal at the Singularity Institute is: If you can prove a recursively improving AI ought to extrapolate the reflective equilibria of everyone on the planet and have the AI do whatever those reflective equilibria tend to believe the AI should do. That was our attempt to define good in a way that you can actually compute within a sufficiently smart AI. Whereupon the super intelligence goes on to do all sorts of wonderfully nice things and we live happily ever after. So if you have ever wondered what people do nowadays that we don’t believe in magic anymore and they try to meddle in the destiny of the world in a nice fashion this is it.

(slide 73) For more info visit LessWrong.com, yudkowsky.net, SingInst.org.

© Copyright YUCSC.