tv Stuart Russell Human Compatible CSPAN December 28, 2019 8:00am-9:11am EST
8:00 am
created by cable in 1979. c-span is brought to you by your cable or satellite provider, c-span, your unfiltered view of government. .. >> this weekend on our author interassume program9 "after words", new york magazine contributor thomas chatterton williams talks race and ethnicity. also today and tomorrow, michael car insurance and aaron ross. and doug wead. check your program guide or booktv.org for a complete schedule of all the programs airing this weekend. now, uc-berkeley computer science professor stuart russell weighs in on the potential
8:01 am
threats artificial intelligence may pose to humans and what can be done to protect us. [inaudible conversations] >> all right. hi, everyone. thank you so much for coming to the beautiful faculty class here at uc-berkeley for this event today. we're really excited for the discussion. the event is being filmed by c-span to be broadcast at a later date, so we'll make that available to you as soon as we have it. this event is co-hosted by the a.i. security initiative which is housed at the center for long-term cybersecurity and by the sector for human compatible a.i., a new hub for interdisciplinary research on global security and politics of artificial intelligence. we are work to understand the effects of misuse and unintended consequences, how a.i. is changing global power dynamicses and what governance molds are meaningful to -- models are
8:02 am
meaningful. our work is oriented with a view over the horizon, and our goal is to help decision makers identify the steps they can take today that will have an outsized impact on the future trajectory of a.i. around the world. this work helps support the broader mission of the center for long-term cybersecurity which is to help individuals and organizations address tomorrow's information security challenges to amplify the upside of the digital revolution. the center for human compatible a.i. is a research lab based at uc-berkeley aiming to reorient the field towards beneficial defense through research. the faculty, researchers and ph.d. students are doing pioneering technical research on topics that include cooperative and reinforcement learning, objective functions, human-robot cooperation, value preference alignment, multiagent systems.
8:03 am
researchers use insights from computer science, machine learning, decision theory, game theory, statistics as well as social sciences. we are thrilled to have the founder and director, professor stuart russell, here with us this afternoon to talk about his new book, "human compatible: artificial intelligence and the problem of control." this book has been called the most important book on a.i. so far, the most important book i've read in quite some time by daniel conneman, a must read x the book we've all been waiting for by sam harris. stuart russell is known to many of you. he has been a faculty member at berkeley for 33 years. he is also an honorary fellow at oxford. he is the co-author of artificial intelligence and
8:04 am
modern approach which is the standard textbook on a.i. and used in over 1400 universities in 128 countries. right now he holds a senior andrew carnegie fellow hardship, one of the most prestigious awards in the social sciences. and last but not least, he -- [inaudible] also joining us today for this discussion is richard waters, the financial times west coast editor. he's based in san francisco, and he leads a team of writers focused on technology and silicon valley. he also writes widely about the tech industry and the uses and impacts of technology. current areas of interest include artificial intelligence and the growing power of the u.s. tech platforms. his previous position at financial times include various finance beats in london, new york bureau chief and telecoms editor also based in new york.
8:05 am
professor russ excel mr. water withs will are -- russell will an vise a.i. implications including the expectation that capabilities will eventually exceed those of humans across a range of real world decision making scenarios. we will hear about steps we can take to insure this is not the distaupic future of science fiction, but a new world that will benefit us all. we will hear from them for about half an hour and then open it up for questions from the audience. after that we will break for a reception out on the terrace here. the book "human compatible" will also be available for purchase, and professor russell has kindly agreed to sign copies for those interested. so with that, i will turn it over to professor stuart russell and west coast you would to have richard waters -- editor richard waters. thank you. [applause] >> thank you very much. welcome. thank you for joining us.
8:06 am
stuart, great to see you. if you don't rush out and buy the book after that introduction, i don't know what we'll do. we'll dig into it as much as we can. maybe we'll hold back some secrets so people actually have to pay for this thing, i don't know. [laughter] as a journalist, one of the things that i found absolutely fascinating about the a.i. debate is this complete schism amongst people who allegedly know what they're talking about. so on the one hand, we have people saying we're never going to get to superhuman intelligence, and even if we did, you know, these machines are perfectly safe. and and on the other hand, we have what i think as the elon musk tendency. and it's a shame that, as much as we all add hire him -- admire him, has run away with the sci-fi end of this debate, and i think it needs to be anchored in something a little more serious. stuart, what you've done is, you
8:07 am
know, is both make us aware of the potential and the risks while anchoring this in a real live, kind of a real solid understanding of the science and where we're starting from. i think, you know, this is a really good place to start the debate rather than this schism that we have right now. so from a journalist, i love a schism, i'm going to dive straight in. [laughter] so we're here at berkeley, particularly -- [inaudible] and i know, stuart, you did your own doctorate down in a sunnier place, down on the peninsula -- >> the other place. >> the other place. so stanford university study, the 100-year study of a.i. which is this kind of landmark attempt they've made to really map what's happening in a.i., to anchor this debate in some kind of reality going forward. and you quote them saying, you
8:08 am
know, that unlike in the movies, there is no superhuman robot on the horizon or probably even possible. basically denying, you know, that agi or whatever you want to call it is even coming. so how did you come to that? >> is this actually working? you can hear -- okay. [inaudible conversations] okay. ah. [laughter] >> they could hear you but i don't think i could keep my voice at a high level long enough. so interestingly, right, the 70-year history of a.i., a.i. researchers have been the ones
8:09 am
saying a.i. is possible. and usually philosophers have been the ones saying it's impossible. for whatever reason, you know? we don't have the right kind of tubule in our a.i. system. whatever it might be. and usually those claims of impossibility have just fallen by the wayside one after the other. but as far as i know, a.i. researchers themselves have never said a.i. is impossible until now. what could have prompted them? i mean, imagine if, you know, it's a 100-year study, right? 20 distinguished a.i. researchers giving their considered consensus opinion on what, what's happening and what's going to happen in a.i. so imagine if 20 biologists did
8:10 am
a summary of the field of cancer research and they said, you know, a cure for cancer is not on the horizon and probably isn't even possible. right? you would think what on earth would make them say that, right? we've given them $500 billion of taxpayer money over the last few decades, and now they're telling us actually the whole thing -- i don't understand what justification there could possibly be for a.i. researchers saying a.i.'s not possible except a kind of denialism which is just saying i don't want to think about the consequences of success. it's too scary. and so i'm going to find any argument i can to avoid having to think about it. and i have a long list.
8:11 am
i used to give a talk where i would talk about a.i. and about the risks and then, you know, hear all the arguments why we should ignore the risks. and after i got to about 28 arguments, kind of like the impeachment, right? the republicans' 28 reasons why you can't impeach donald trump. and i just gave up because it was taking up too much time in the talk, and i don't want it to take up too much time today. >> [inaudible] [laughter] so, you know, you get the usual, well, there's no reason to worry, right? we can always just switch it off. one of my favorites, right? that's the the last one. a.i. is never going to happen, and we can always just switch it off. and there are other ones that i won't even mention because they're too embarrassing. >> you know, what the machines might do to us if we get there, let's kind of focus on the are
8:12 am
we going to get there question. so, i mean, if you say this is an amazing point that you've lived three decades of a.i. researchers promising us the world and nothing happening. and now suddenly we're in the period of amazing progress, and they want to tell us that it's not going to happen. >> yes. >> nonetheless, it's the point where we know where we are now. we all know, you know, the massive limitations of deep learning and these models, and we can all see this kind of potential. there's a huge gulf from here to there, and you say, you know, it's going to take big conceptual break news still -- breakthroughs still to do that. we don't even know if they are, so what gives you the confidence to think -- what are the breakthroughs that you see, and why do you think they're going to happen? >> so i can tell you the conceptual breakthroughs that i think we need. i mean, you're right that after
8:13 am
we make all those breakthroughs, we might find it still not intelligent, and we may not even be sure why. but there are clear places where we can say, look, we don't know how to do this, but if we did, that would be a big step forward. and there already have been arguably dozens of breakthroughs over the history of a.i. and, actually, even going back much further. i mean, you could say aristotle was doing a.i., he just didn't have a computer or any electricity to do a.i. with, but he was thinking about the mechanical process of human thought, decision making, planning and so on. and he even described on the front of my textbook, actually, we have the little greek text which describes a simple planning algorithm that he talks about. this is how you can reach a decision about what to do. so the idea has been there, and
8:14 am
steps have been taken including the development of logic which, again, started in ancient greece and ancient india and revised itself in the mid 19 isth century. -- 19th century. you know, lodge aric is overlooked -- logic is overlooked these days by the deep learning community, but it is the mathematics of things. right? and the world has things in it. so if you want to have systems that are intelligent in a world that contains things, you have to have a mathematics that incorporates things as sort of first class citizens and logic in that mathematics. so whatever shape a super-intelligent system eventually takes, it's going to incorporate in some form logical reasoning and the kind of
8:15 am
expressive form of languages that go along with it. let me give a couple of examples that clearly needed breakthroughs. one is the ability to extract complex content from natural language text. right? so imagine being able to read a civics book and then being able to use that knowledge to be able to design a better radio telescope, right? that at the moment is not even close to being feasible. but there are people working on being able to read physics books and certainly being able to pass exams. the sad thing is it turns out that most exams that we give students, especially multiple choice exams, can be passed with no understanding whatsoever -- [laughter] of the content. so my friend, a japanese researcher, has been building software to pass the university of tokyo entrance exam which is
8:16 am
kind of like getting into harvard, mit or maybe even getting into berkeley. and her program is now, you know, up there, you know, around the passing mark to get into the university of tokyo. and it still doesn't understand anything about anything. right? it's just learned a whole lot of tricks on how to do well on the exam questions. so this is, i think, a little perennial problem that the media often overlook. they have the big headline, you know, a.i. system gets into the university of tokyo or whatever. but not underlying that it still doesn't understand stuff. is so being able to understand a book and extract complex content from it and then do reasoning and design and invention with that content would be a big step forward. and i think there's a little problem of imagination failure when we think about a.i. systems
8:17 am
because we think, okay, it's -- maybe in -- maybe if we try really hard, it can be as mart as us. but if a machine can read a physics book and do that, then that same morning it will read everything the human race has ever written. and to do that, it doesn't even need more computer processing power than we already have. right? so they're not going to be like humans in any way, shape or form. and this is, i think, an important thing to understand. you know, obviously, we far exceed human capabilities in arithmetic and in chess and video games and so on, but these are much broader corridors of capability. and when we reach human-level
8:18 am
text understanding, then that immediately they blow by human beings in their ability to absorb knowledge. that gives them access to everything we know in every language at any time in history. another really important thing is the ability to make plans successfully in the real world. so let me ec pound on that a lit -- expound on that a little bit. if you look at alpha go, which is a very impressive achievement -- so it's the program that beat the human world champion at go, and sometimes when it's thinking about what move to makers it's looking -- to make, it's looking 50, maybe 100 moves into the future which is superhuman, right? human beings don't even have the memory capacity to remember that many moves. but if you took that same program and applied it to a real embodied physical robot that actually has to get around in the world, you know, pick up the kids from school, lay the table
8:19 am
for dinner, perhaps, you know, relandscape the garden, 50-100 moves gets you about one-tenth of a second into the future in the physical world with a physical robot. so it simply doesn't help at all. right? so you might think of alpha go as being superhuman in its ability to look into the future, but it's completely useless when you take it from the go board and try to put it into a real robot. humans manage to make plans at the millisecond time scale. so your brain preloads, generates, preloads and downloads into your muscles enormously complex motor control plans that allow you to, for example, speak, right? there's thousands of motor control commands sent to your tongue and your lips and your vocal cords and your -- vocal
8:20 am
chords and your mouth and everything. the brain has special structures to store these instructions and spit them out at high speed so that your body can function. so we operate on the millisecond time scale, but we also -- as i was just talking to rich about his daughter's decision to do her ph.d. in molecular biology. it took six years, right? we also make decisions on that time scale. six years is a trillion motor control commands, right? so we operate at every scale sort of between the decade down to the millisecond, and we do it completely seamlessly. somehow we always have motor control commands ready to go, right? we don't usually sort of freeze in the middle of doing something and, you know, wait for 72 minutes for the more motor control commands to be computed and then redo moving. so we already have commands ready to go, but we have the
8:21 am
minute, the hour, the day, the week, the month, the year, and it's all seamless. a lot of it that capacity comes from our civilization which over the millennia has accumulated higher and higher level abstract actions that we learn about through our language and our culture x that allows us to make these plans. but that ability both to construct these levels of abstraction and to manage our activities over a long time scale is not something we really know how to do in a.i.. that, to me, would be the one big breakthrough that would allow machines to start functioning effectively in the real world. and there are dozens of groups working on it, and there is actual progress towards a solution. some of the results we've seen recently in games like star craft illustrate this because
8:22 am
whereas go is sort of a 200-move game, these are 20,000 or 100,000-move games. and yet the a.i -- [inaudible] >> well, let's kind of leap ahead here for a moment. let's assume that, you know, these problems are being tackled. let's say we get to that point of superhuman intelligence. then surely we're all, i mean, this is heaven, right? as you say at one point in your book, you know, it took 190 years for gdp per capita in the world to kind of go up tenfold. and we could do this with the technology we have at that point, we could do this in one generation or however long it would take to roll this out. so nonetheless, you know, what could go wrong. and i think when you think about what could go wrong, i think the very interesting point is not the technology, it's how we designed it at a very fundamental level that you seem most concerned about. finish -- talk more about that.
8:23 am
>> yeah. so, i mean, making a new -- i think the economists put it this way, like introducing a second intelligent species onto the everett, what could possibly -- onto the earth, what could possibly go wrong? [laughter] in fact, if you put it that way, clearly, intelligence is what gives us power over the world. so if we make things that are more intelligent and, therefore, more powerful than us, how are we going to have power over more powerful entities forever? right? when you put it like that, you say, ah, yes. good point. [laughter] perhaps we should think about that. and so that's what i tried to do. and the first thing to think about is why things go wrong. people know this is a problem. alan turing said basically we would have to expect the machines to take control, right
8:24 am
in he was completely matter of fact and resigned to this future. so it's not a new thing that elon musk just invented. and i don't think anyone would say alan turing isn't sufficiently expert to have an opinion about a.i. or computer science. and same with marvin minsky, one of the cofounders of the field itself, and various other people. but turing basically doesn't give you a choice, right? if the answer is we lose, you know, machines are going to take control and that's the end of the human era, then there's only one choice which is to say, okay, we for better stop doing . and that choice, he actually referred to samuel butler's novel from 1863. and that's the choice. so it's this little science fiction novel about a society
8:25 am
that's developing very sophisticated machines and then decides that they don't want to be taken over, they want to have control of their world taken by the machines. and so they just ban machines, they destroy all the machines in a terrible war between the pro-machinists and the anti-machinists. but the anti-machinists win the world, and now machines only -- win the war, and now machines only exist in museums. i think it's completely infeasible for exactly the reason that limp -- that rich mentioned, right in we have superintelligent a.i. and we can do this well, that ten job fold increase in gdp is very conservative. it just means giving everyone on earth access to the same level of technology and quality of life that we have here at berkeley. right? it's not is sci-fi, not talking about eternal life or faster
8:26 am
than light travel, right? that tenfold increase in gdp is just bringing everyone up to a decent standard of living, is worth about between 10-20 quadrillion dollars. right? so that's the size of the price. that's creating the momentum and saying, oh, we're just going to ban a.i., right, is completely infeasible. not to mention the fact that a.i., unlike, you know, nuclear energy or even, you know, crystal babies, a.i. proceeds by people writing formulas on white boards. you can't ban the writing of formulas on white boards. so so it's really hard to do much about it. why is making better a.i. a bad thing? right? and the reason is because the way we've designed our a.i. technology from the beginning has the property that the smarter you make the a.i.
8:27 am
system, the worse it is for humanity. why? because the way we build a.i. systems and always have is essentially a copy of how we thought about human intelligence. human intelligence is the capability to take actions that you can expect will achieve your objective, right? this is the economic, philosophical notion of the rational agent. and that's how we've always built a.i. build machines that receive an objective from us and then take actions that they can expect will achieve that objective. the problem is as we've always nope for thousands of years -- known for thousands of years, we are unable to specify objectives completely incorrectly. and this is the fundamental problem. this is the legend of king missouri das, this is why -- midas, this is why the third wish that you give to the genie is is always please undo the first two wishes because i
8:28 am
completely ruined everything. right? but we may not get a third wish. if you create a system that's more intelligent, more powerful than human beings and you give it an incorrectly specified objective, then it will achieve that objective, and you're basically creating a chess match between us and that machine. and we lose that chess match. and the downside -- >> chess match is is arbitrarily bad. >> the fundamental design error that we made very early on in the field. and, actually, not just a.i. control theory, economics, operations research, these things all operate on this principle that we specify an objective and then some machinery is going to optimize it. is so corporations optimizing quarterly profits are already destroying the world. right? we don't need to wait to see how
8:29 am
superintelligence a.i. messes things up. you can see it happening already. corporations are, for all intelligents and purposes,sal go rhythmic machines, and they are making a mess of the world, and we are powerless to stop them, right? they've outthought us. finish and they've been doing this for 50 years, and that's why we're unable to fix our climate problem despite the fact that we even know what the solutions are. to sum up, right, we have to design a.i. systems a different way if we are going to be able to live with our own creation successfully. >> [inaudible] to all our other organizations in many ways because we simply haven't had anything this powerful that can take -- which is the last thing we want. pretty scary. >> yeah.
8:30 am
corporations took us at our word, right? we is set them up to maximize shareholder returns, and that's what they did. and that's the problem, right? because, you know -- >> [inaudible] >> economists call it externalities, and sometimes you can fix it by tax or fines or to regulations. but sometimes, as with social media messing up our democracy and our society, you can't. there's no way to tax the number of neofascists that you create on your social to media platform. and that's, that's an early example. those social media platform algorithms are very simple learning algorithms that manipulate human beings to make them more predictable sources of revenue. that's all they care about, right? but because they're operating on billions of platforms for, you know, and interacting with everyone for hours every day, they're already a super-powerful force, and they have
8:31 am
misspecified. their objective of maximizing click-throughs is another one of those misspecified objectives that we keep messing up with. >> well, we're going to leave plenty of time for questions today, so please start thinking of what you want to ask. but before we do, we shouldn't kind of hold back the punchline from your book which is there is an answer, right? i hope so. >> yes. >> i guess we can do this without spoiling, can't we? >> i think it's in, actually, the answer's in the first captainer, right? -- chapter. the first chapter sort of presages the narrative arc of the rest of the book. so i don't want to lee everyone with the impression that i'm just one of these doomsayers who's predicting the end of the world. we have enough of those books already. i can't help being an optimist because i think, i always think every problem has a solution. if it doesn't have a solution, then it's a fact of life and not
8:32 am
a problem. [laughter] i am proposing one way of thinking about a.i. that's different in the following way: if we are unable to specify objectives completely and correctly for what we want our machines to do, then it follows that the machine should not assume that it knows what the objective is. all our a. i. systems in every chapter of the a.i. textbook is based on the assumption that the machine has the correct objective. but that cannot be the case in real life. so we need machines that know they don't know what the true objective is. the true objective is the satisfaction of human preferences about the future. right? what each of us want the future to be like and don't -- and what we don't want it to be like. that's what the machine should be trying to help us with. but it knows that it doesn't
8:33 am
know what our preferences are. and this is a kind of machine that in some ways we're already quite familiar with, right? how many people have been to a restaurant? right? when you go to a restaurant, does the restaurant already know what you want to eat? no. not usually, unless you go there a lot, right in the japanese place across the road, they just bring me my lunch. generally speaking, they have a menu. why? that way they can learn what you want. they know that they don't know what you want, and they have a process, a protocol to find out more about what you want. now, they're not actually finding out in complete detail exactly how many grain of rice you want on your plate and exactly where you want the little grill marks on your burger or any of that stuff, the righting? so they're getting a very, very rough sense, you know, if they
8:34 am
have 16 items on the menu, that's only 4 bits of information about your preference for your main course, right? but that's the protocol where the restaurant is like the a.i. system, and it knows that it doesn't know what you want. so it has a protocol to learn enough that it can make you happy. and that's the general idea, except this is going to be much more radical. this will be are not just what -- will be not just what you want for dinner, but what you want for the whole future and what everyone on earth wants for the whole future. and we can show the two important properties of these systems. number one, that they will not mess with parts of the world whose value they don't know about. so in the book, and i've often used this example in talks, right, suppose you have a domestic -- [inaudible] that is supposed to be looking after your kids because you're
8:35 am
late home for work, and it's supposedly cooking dinner and there's nothing in the fridge. what does it do? well, it looks around the house and it spots the cat and calculates the nutritional value of the cat and then cooks the cat for dinner, right? because it doesn't know about sentimental value of the cat, righting? so with systems that know they don't know the value of everything, it would say, well, the cat may have some value of it being alive that i don't know about, and so cooking the cat wouldn't be an option. at least it would ask permission. it would say, call me up on my cell phone, is it okay if we cook the cat for dinner? and i'd say, nope. [laughter] you know, is it okay if we turn the oceans into sulfuric acid in order to reduce the level of carbon dioxide in the atmosphere? no, don't do that. point number one, you get minimally invasive behavior.
8:36 am
so it can still do things as long as it understands your preferences in a particular direction. i would like a cup of coffee. if it can get me a cup of coffee without messing with the rest of the world, then it's quite happy to do that. the second point is it allow itself to be switched off, and this is sort of like the one plus one equals two of safe is a.i., right? if you can't switch it off, we're toast. so why will it allow itself to be switched off? because it doesn't want to do whatever it would be that would ca cause us to want to switch it off, right? so by allowing itself to be switched off, it avoids the negative consequences, whatever those are. it doesn't know, it doesn't know why i'm angry with it, doesn't know why i want to switch it off, but it wants to prevent whatever it is from happening, is so it lets me switch it off. and this is a mathematical theory, right? we can prove that as long as the machine's uncertain about human
8:37 am
preferences, it will always allow itself to be switched off. and as that uncertainty goes away, then our margin of safety also goes away. so machines that believe they have complete knowledge of the objective will not allow themselves to be switched off because that would prevent them from achieving the objective. so that's the, that's the core of the solution. so it's a very different kind of a.i. system, and it requires actually rebuilding all of the a. i. technology that we have because, as i said, it's all -- all of that technology's based on incorrect assumptions that we haven't relived it because our a.i. systems have been stupid and con straebed to the -- constrained to the lab, right? the constrain thed to the lab part is now going away, right? they're out there in the real world messing things up. and the stupid is also going away. and we have to solve this problem, we have to rebuild all
8:38 am
that technology from the foundations up before the systems get too powerful and too intelligent. >> [inaudible] i've got one more, one more thing that i wanted to raise. let's say the machines don't kill us. let's say they give us what we want. then we have to work out what we want and not just individually. and i have no idea what i want, but in the aggregate. i mean, this is going to be a phenomenal problem for humanity, and you come to the idea of the image of worley, which i'm sure everybody's seen, kind of sitting back in an easy chair being fed by robots and, you know, it's a kind of not with a bang, but a whimper kind of end to the world. but, i mean, how on earth are we going to look intard to a future -- forward to a future where the machines give us what we want? >> so i think this is, this is a
8:39 am
problem that i don't have a good solution for, and it's not really technological. this is really a social, cultural problem of how do we, how do we maintain the vitality of our civilization when, in fact, we no longer need to do what constitutes a vital civilization. for example, let's just think about education. why do we educate? well, from a very practical matter, if we didn't, our civilization would collapse because the next generation wouldn't be able to run it, right? so human cultures and even animal species have figured this out, right? they have to pass on knowledge to the next generation, otherwise kablooey. and you add it up over history, about one trillion person years of effort have gone into just passing our civilization on to the next generation.
8:40 am
because we have no choice. we can put it all down on paper, but paper's not going to run the world, right? it has to get into the brains of the next generation. but what happens when that's not true? what happens when, instead of going through all that long, painful process of educating all those humans, we can just put all the knowledge in the machines and they take care of it for us? and this is a story that actually ian foster wrote. if you wanted one takeaway, if you can't bring yourself to buy the book, you can download -- because it's no longer in copyright -- ian forester's short story called the machine's dot. and it was written in 1909, but in the story everyone is looked after by a machine 24/7. we spend most of our time on the internet doing videoconferencing
8:41 am
with ipads and listening to lectures or giving lectures to each other. and we're all a little bit obese, and we don't like face to face contact anymore. so it sounds a lot like today. it was written in 1909. and, of course, the problem is no one knows how to run the machine anymore. right? we've turned over the management of our own civilization to the machines and become enfeebled. and that's sort of a modern version of that story. what do we need to do? i'm reminded of the culture of the spartans. so sparta, you know, for all its faults, took a very serious cultural attitude to survival of their city-state. you know?
8:42 am
the typical life in those days seemed to be that, you know, every couple of years you'd be invaded by a neighboring civilization, city-state, whatever, and they would haul off their women and kill all the men, and life was pretty miserable. so sparta decided that it needed to have a very serious civil defense capability. so education for span tars was -- spartans, i was just reading another book which is soon to be out called world without work, and he describes it as 20 years of p.e. classes, right? [laughter] in order to prepare the citizens, both male and female, to fight. so it was a military boot camp that went on, you know, from before you could walk until you were old number to carry weapons -- old enough to carry weapons, and so that's how they fight. it was a cultural decision to create, and that was what was valued in the culture, right? so i'm not recommending that we
8:43 am
do that exactly, but some notion of agency and knowledge capability has to become an economic necessity as it is now but actually a cultural necessity that you're not a valuable human being, i don't want to date you unless you know a lot, unless you're capable of, you know, skinning a rabbit and catching your own fish and fixing a car and doing this, that and the other, right? so it's a cultural change. and and i think it's also then there'll be a matter of your own self-esteem, that you don't feel like a whole human unless you're capable of doing all these things and not being dependent on the machines to help you. i can't see any other kind of solution for this problem. the machines are going to tell us basically, you know, as you may have done with your children it's time for you to tie your own shoe laces, right?
8:44 am
but your children always say, no, no, no, no, you know? i have to -- we have to leave for school in five minutes and i can't do it. i'll do it tomorrow. right? so that what the human race is going to do. we're going to say, oh, oh, we'll get around to this agency stuff tomorrow, but for now, you know, the machines have to just help us do everything. right? we're going to be my p developpic, and that's a -- myopic, and that's a slippery slope. we have to work against that slope. >> i think this is a great point to leave this. one of the world's great educational institutions where maybe you won't need to learn anymore, but let's -- there's lots of questions now. we're going to pass around these microphones. i think we have another one here. and since you're constrained by a book, do ask anything that's on your mind. >> thanks for the talk -- [inaudible] the question of how to think
8:45 am
about this idea that, like, the objective should be unknown, because another way of thinking about what you said is kind of like, well, the objective is satisfying human preferences. human preferences are unknown, so we should satisfy or respect the value of human preferences. and that's standard physician theory where you think about it, i guess company also don't know what -- [inaudible] so they just do what will bring the most expected profit. and so i want to hear what the difference is is between your model where preferences are sort of generally unknown and the model where the preference is something like maximizing human welfare. and since we don't know that, we use expected value. >> yeah. so that's a great question. in fact, it brings up the point how come we never noticed this before? right. so the answer actually is what you say is correct in the
8:46 am
standard formulations of decision making, uncertainty about the objective can simply be eliminated. you just replace the objective with its expected value and everything's fine because everything is lin -- linear, blah, blah, blah. it's false. expect reason it's false is because -- and the reason it's false is because the environment is a source of additional information about preferences. and the most obvious source is that there are humans in the environment and what they do provides more information. in fact, you don't -- you can't just make decisions on the basis of expected value. what the system will do is, for example, ask questions. is it okay if i cook the cat for dinner, right? as opposed to just saying, well, on average it seems like the cat has high nutritional value and,
8:47 am
you know, perhaps they hate the candidate, so we -- the cat, so we may as well just cook it. no, that's not the right answer. the right answer is to ask permission. and so i think that there are two reasons we didn't notice this for 70 years. one is that we copied this notion of intelligence from human rationality, and for the most part in thinking about humans we just assume that we have objectives. sort of of course we know what our objectives are. which turns out not to be true either. we have real uncertainty about our own preferences for the future. and there are from time to time over the last 50 years in decision analysis, in philosophy and economics a few papers that talk about that but not very many. but the other reason i think is
8:48 am
that this notion that the environment is an additional source of preference information wouldn't make so much sense if you were thinking about the human decision maker. but here we're talking about a coupled system, right? a machine that is trying to be beneficial to a human. and in economics there's something called principal agent gain where the principal would be an employer, the agent would be an employee. you know, so the employee -- in order to get a raise -- tries to find out more about what the employer wants. so you get something like these gains. but it's from the point of view of a. i., it's a different and actually textually inconsistent with the previous way of doing things. >> [inaudible]
8:49 am
ready to be on -- [inaudible] >> thank you very much for -- [inaudible] it seems like the model you're proposing is you have some notion of like what the possible variance is in consequences, some way -- [inaudible] a person designing an a.i. way of saying could this be consequential enough that we ought to have a human in the loop. i guess i'm wrestling with this. it feels like some of this might be contrary -- [inaudible] and i guess maybe more fundamentally if there's any way of not having some human evaluation ourselves of when those systems are really consequential if having a.i. itself be able to tell, yes,
8:50 am
this is really something i ought to go to humans about as opposed to humans more fundamentally knowing that? >> yeah. so the decision to get help from the humans depends on how expensive it is to get help. so, you know, if someone is busy doing surgery, you don't want to interrupt them to know how much they want to pay for their cough feet. so there's -- cough feet. so everything -- coffee. the more expensive it is to a human, the less often the a.i. system ends up doing it. but, of course, the less often the human experiences minor if inconveniences, there's no solution. it's, you know, so everything workings out kind of the way you expect. there are some important and difficult technical questions.
8:51 am
the most difficult one, i think, is the fact that all the theory we have so far are assuming rational, and, of course, we are far from ration aal. and if you're observing the because of of a human, you want to observe manager about their underlying preferences for the future. that really means reverse engineering human cognitive architecture, right? so to give you a simple example, lisa doll, doll, who the famous match -- [inaudible] in order to play, he had -- in order to lose, he had to play some losing moves. the only conclusion you could draw was he wanted to lose the match. that would not be the correct conclusion. instead you need to understand that he had limited computational ability, limited look ahead, limited decision capacity and by far the most likely explanation is that he
8:52 am
wanted to win, but his limited abilities prevented him from choosing the right move. and that's a case where, you know, you've go a highly trained human working on a relatively solvable, tiny piece of the world. so, of course, in real life our actions are not even close to rational. just think about what rational mean, right? your life consists of about 20 trillion motor control commands. so to be rational, you'd have to pick the first one such that the expected value of the remaining 19 trillion blah, blah, blah, you know, is maximized. it's completely and utterly, totally bonkers infeasible. that's a new technical term i just invented, right? so we have to start looking at what are the major ways in which
8:53 am
human preferences are realized in the form of behavior and what are the major ways in which we deviate from pure rationality. the other point is that it's not as if the a.i. system has only, the only source of information is, you know, its owner and is going to be constantly bugging the owner. everything the human race has ever written is evidence about our preferences because it describes human beings doing things and other human beings getting cup set about it in many cases. even like babylonian clay tablets which, you know, you hope that they contain the secrets of the universe. in fact, they contain boring accounts of joe buying 22 camels and getting 17 bushels of corn and 2 slaves in return. right? so but that tells you something about human performances, right? it tell you the exchange rate between camels and bushels of corn at that time.
8:54 am
another interesting thing is that you can actually infer manager about human preferences without seeing any humans -- something about human preferences without seeing any humans at all. imagine that we all were on holiday. berkeley campus is now empty. there's no humans here at all, right? but just seeing the way we have left the world, right, tells us an awful lot about human preferences because it's the result of us humans pursuing our preferences for decades and decades and decades. right? and we actually published a patient about this. so you can think of -- a paper about this. so you can think of the state of the world as a sample from the result of what happens when quasi-rational entities but sue their ends over time. and then you can sort of work back from that to figure out what are the ends of these entities who have made the world bust. right? so we called it the
8:55 am
non-naturalistic nonfallacy, for those of you who know what the naturalistic fallacy is says you can't perceive performances in the world. but in a sense, you can. >> hi. yeah -- [inaudible] so i had a question about -- [inaudible] that doesn't have an objective. so maybe -- [inaudible] because when you are describing -- [inaudible] you know, the a.i. who is considering cooking the cat, you know, it would access -- [inaudible] does have an objective. to maybe the objective is -- [inaudible] or something like that. so it would have an objective, but the objective was -- [inaudible] but then i kind of wonder if the kind of unfortunate results
8:56 am
could come up as a result of trying to set -- nawm. [inaudible] in, like, weird ways. [laughter] yeah. so, yeah. i mean, the kind of thing that would happen in the story that you mentioned. so i kind of wonder whether we can escape this objective structure the for a.i., because it kind of sounds like it only depends how general this objective is, and we think you have an objective. maybe it's still open to weird kinds of -- [inaudible] that, you know, would lead to -- [inaudible] >> so, yeah, it's a great question. and, in fact, that's sort of -- that's what we try to do, right? we're constantly playing, you know, devil's advocate with ourselves saying what if, you know, is there some loophole in this scheme we haven't thought of? so it's true, right? in the design we're proposing, the goal of the a.i. assistant is to satisfy human preferences,
8:57 am
but crucially it nose it doesn't know -- it knows it doesn't know what they are, right? so that's what gives us this margin of safety. that's what makes the machine want to be witched off if that's -- switched off if that's what we want to do and so on. so one loophole is that human preferences are not fixed. so there are actions that the machine can take, you know, as every politician knows, to manipulate human preferences, to modify them to be easier to satisfy. so that seems like a failure mode. that seems like a loophole. and, you know, the obvious answer you might say is, oh, well, okay, we have to set things up so that the human preferences are sacrosanct, that the machine isn't allowed to modify human preferences. but that's not feasible because,
8:58 am
you know, obviously having a highly capable domestic robot in your household is going to change your preferences, right? you're probably going to become a little bit more spoiled as a result. you know, lots of things change our preferences, otherwise we'd all have the performances of newborn babies, whatever those are, right? and we don't. so the question is what are acceptable preference modifications and what are unacceptable preference modifications. to that, i think, that's an open question that we're working hard to understand right now. but that's, i think, that's who i think -- the main loophole we've identified so far. but this is what we do, right? we try, you know, it's an engineering process, so we try small scale experiments with a little simulated world and make sure everything's behaving --
8:59 am
theorums, our interpretation of our theorums is, in fact, yes, they do behave the way we want. and you get very interesting behavior. it's also interesting to look at the human side of this equation. so if the machine is solving its problems better and better and better, right, the nice thing is that the smarter your machine, the more beneficial it is under this model to humans, right? and so that's good, because under the old model the smarter the machine, the worse it gets for humans. but there's also an incentive for the human, right, if you formulate this, formulatedded in game theory. so the human half of this game actually involves teaching the robot. because the human will benefit by the robot learning more quickly what human preferences are. and is so when we make the litte simulated world where we have a toy human and a robot, you know,
9:00 am
the toy human in some sense sort of leads the robot around by the hand to show him where not to go, where to go. so you have these human behaviors falling out as a solution for this form of problem. >> we have time for -- [inaudible] >> one last question. .. to not have these how do we steer society towards their but today in technology with the regeneration of the luddites, people saying technology is bad
9:01 am
and researchers and corporations saying, we will develop the most advanced version of ai that we can, how do we decide how to reengineer these institutions. >> that is a great question. in my experience, scientists never say i am wrong, you are right. in the best case in ten years they will all say of course we always thought this, we've always done things this way. that would be the ideal outcome. probably if you want to get for example google to change the way they do machine learning the first thing is to get them to see where it is hurting them. if you look at what happened with a photo which classified a
9:02 am
person as a gorilla that was a huge public relations disaster for google. why did it happen? it happened because they trained their machine learning algorithm with a fixed objective which i am willing to bet was minimize the number of errors on the training session. that is what we all do in our machine learning competition but the errors are not all created equal so miss classifying a norfork terrier is an orange terrier, these are two categories in the image net competition and 50 years ago they weren't even two different categories of dogs they were the same kind of dog, very hard to tell apart and i'm pretty sure the terriers are not going to go on to twitter and say i was misclassified, i'm a norfork terrier, how dare they, so offensive.
9:03 am
apples don't care if they are misclassified as pairs but obviously people care a lot. so google used an incorrectly specified objective function. that lost matrix, 20,000 categories in the admission database, test 400 million entries for what is the cost of miss classifying an objective type a or object of type b, no one knows what those 400 million entries are so why were they using an algorithm that assumed it did no? you can see immediately from this perspective what went wrong and what you need to do, you need machine learning algorithms that operate with uncertainty over the loss function and those algorithms look quite different. every so often they will refuse
9:04 am
to classify objects. every so often they will go back to the human expert and say how much do apples get upset about being called pairs and vice versa so you get a different type of algorithm. second point is those algorithms yet don't exist so we the ai researchers who are proposing that everyone should follow this way of doing things it is on us to develop those algorithms so to develop this core technology like learning algorithms and demonstration systems show one thing we are doing, we are currently figuring out what is the right demonstration system, i'm in favor of a digital personal assistant because i think a personal assistant needs to understand human preferences quite well in order to be
9:05 am
useful but the preferences of the user very enormously. a digital persona for donald trump had better be different than the one working for me in terms of preferences it is pursuing and go that kind of work is important so it's not enough to just keep saying doom doom doom doom doom. you've got to say here is another way, the other road we can take and here is the proof that when you take this road you get better ai systems and this is absolutely crucial, you can't just talk about ai safety like i am the ai safety accident i'm awake my finger at you and tell you you're a bad
9:06 am
person. ai ethics is an even worse term because then i'm saying you are unethical, stop doing that. show them the right way and it is not like this is an extra safety add on. this is like a nuclear power station that doesn't blow up. would you rather have one of those? i was watching chernobyl on the plane coming back from toronto yesterday. people have seen chernobyl, fantastic, if you haven't seen it watch it. it shows you how difficult it is to convince people that the technology isn't perfect and you have to pay attention to risk because if you don't, what happened to the nuclear industry? we didn't get any of the benefits of nuclear power because of chernobyl so all the stuff about a well, why don't you talk about the benefits, why are we talking about the
9:07 am
risks? you won't get any benefits if you don't pay attention to the risks and build nuclear power stations that don't blow up and that is what we're trying to do with ai that doesn't blow up. >> thank you very much indeed. >> here's a look at the most notable books of 2019. according to the new york times, midnight in chernobyl adam higginbotham examines the world's and worst nuclear power plant disaster, luis later reports on no visible bruises, patrick keith recounts the decades long conflict in northern ireland in say nothing. in the club, the regular gatherings of british philosophers, artists and economists in london in the late 18th century. and in the yellow house sarah
9:08 am
broome explores race and class to the story of her child shalom in new orleans. she was this europe's winner of the national book award for nonfiction. >> in this room tonight, my mother who is a poet in her own right, how is a child i watched her every move seeing her eyes fall upon every word anywhere, encountered in the grocery store, on the bus, the package labels, my high school textbooks, she was always wolfing downwards, insatiable which is how i learned the ways in which words were kind of sustenance, could be a beautiful relief or the greatest assaults, how i learned that words were the best map, make me know my mother was always saying in between raising 12 humans. i'm in this room, semi:.
9:09 am
so is my mother. [applause] >> in this room my big sister lynette who left the yellow house for fashion school in new york city when she was only 19 felt like a lurching mission to planets unknown. in this room tonight, a fellow artist, the most inspired accompaniment of my life and the chorus, my siblings, not here but whose voices exist in mind, carl, michael, karen, byron, troy, eddie, debra, valeria, thank you for telling me the stories in the first place and for trusting me to make something of them. allowing me to call your names because it is no small thing to recover the names. there are other names of my family who told me the history
9:10 am
of myself some of whom died before this book was finished, these absent presences, my auntie lane, my mother's only sister, my uncle joe in january of this year and the swiftest blow, my oldest brother simon junior who died the day after this book appeared in the world. >> most of these authors have appeared on booktv and you can find these programs on booktv.org. type the author's name in the search bar on top of the page. [inaudible conversations] >> hello, everyone. can everyone hear me? awesome. thank you for coming out tonight su
75 Views
IN COLLECTIONS
CSPAN2Uploaded by TV Archive on
![](http://athena.archive.org/0.gif?kind=track_js&track_js_case=control&cache_bust=1216552062)