[MUSIC PLAYING] NED DESMOND: Thanks very much, Alice and John. I'm super happy to be here today with Stuart Russell, who is one of the world's most expert and respected authorities on artificial intelligence. He is a professor of Computer Science at the University of California at Berkeley where he holds the Smith-Zadeh Chair in Engineering. And he's director of the Center for Human-Compatible AI, as well as the Cavell Center for Ethics, Science, and the Public. He's also the author of my favorite book on AI, the 2019 work Human Compatible: Artificial Intelligence and the Problem of Control. Professor Russell, thank you so much for joining Sight Tech Global. STUART RUSSELL: It's a pleasure to be with you. NED DESMOND: For the last decade, artificial technology was making steady progress in many specialized areas such as computer vision. Then late last year, OpenAI released ChatGPT. And suddenly, AI was grabbing all the headlines. And we were hearing new phrases like generative AI and large language models. So can you help us understand, where in the continuum of all things AI does ChatGPT fit? What does it represent? And what makes it different from earlier AI technologies? STUART RUSSELL: So I think it's, first of all, helpful if I explain what ChatGPT is. So you mentioned the large language model. So ChatGPT is certainly in that category. So a language model is simply some program that can assign likelihoods to encountering any given sequence of words. And the first language model was built in 1913 by Andrey Markov, a Russian statistician. And he built it by counting all the consecutive pairs of letters in Eugene Onegin which is a famous Russian verse play. And so once you count consecutive pairs of letters, then if I give you a letter, you can predict how likely it is that the next letter is going to be A, B, C, and so on. And you can do that with words. So if I say happy, you might expect to see the word birthday, but not underneath. Even though birthday and underneath are roughly equally likely to occur in English, after the word happy birthday is much more likely than underneath. So these sequence statistics are the core of what a language model is. And interestingly, those pairwise statistics like birthdays likely to follow happy are useful. They're helpful, for example, in speech recognition and in helping you type on your cell phone. So they predict the next word quite often. If you can imagine instead of just predicting based on one word of context, you could extend to two words of context. So that would, for example, get you to phrases like bacon and-- NED DESMOND: Right. STUART RUSSELL: And it's not likely that you wouldn't necessarily pick on as the next thing. But if I say bacon and, then eggs is very likely. So that would having that two words of context will improve your predictive capability of the next word. And then you can go to three words of context, and four words, and so on. So it turns out that when you get to three or four words of context, and then you're predicting the next word, if you simply keep doing that, predict the next word and the word after that, and the word after that, and the word after that, what comes out looks a lot like English, right? It starts to be grammatical. Whereas if you just predicting the next word based on the previous one, it doesn't really look like English at all. It just looks like a strange sequence of words. But when you're predicting with three or four words of context, it starts to look like English. It isn't particularly thematically coherent. So it'll halfway through the sentence it'll forget what it was talking about and start talking about something else. So it seems a little like scatterbrained. But then if you go up to six or seven words of context, then it starts to be coherent across entire sentences. And even into the next sentence, and the next one, and the next one. Because the six or seven words of context reach back into the previous sentence. And there's still enough clue about what was I talking about. So it will continue and you start to get entire paragraphs of reasonably coherent English text. So that's with just six or seven words of context. And what's happened over the last 10 years or so is that we've gradually been ramping up the size of that context window. Now if you think about it, how many different six word sequences are there was an enormous number. Because let's say there's about 100,000 words of English. So that's 10 to the power of 5. So six-word sequence would be about the 10 to the power of 36-word sequences. And so, obviously, you couldn't build a table that large and count all the occurrences of all of those six-word sequences because with that table is far too big. And most of those six-word sequences would never occur. And so instead of building a table, we compress all of those counts using some compressed representation of that table. And we've seized on a particular circuit called a transformer. But it could be really any way of compressing all of those counts into some representation that can be used to simply predict the next word based on the context. And so over the last decade, we've increased the size of the context window up to the point where now I think you can have 100,000 words of context to predict the next word. And we've increased the number of parameters in that compressed representation for those predictions to the point where we're up to a trillion parameters. And the amount of training data now in the tens of trillions of words of training data. So equivalent to basically everything the human race has ever written. NED DESMOND: Is this gradually developed or even accelerated might have come together in a more exponential way? What's been decisive that suddenly made this technology, this approach so exciting it compute that was available or-- STUART RUSSELL: I think there was actually a mutually reinforcing cycle. So with the compute that we had available, we started to show that as you increase the size of the context window and made the circuit that represents this predictive model bigger and bigger, you start to get more and more coherent outputs. And then you would say, OK, go back to the investors and say, look at these cool results. If we had bigger chips and more data, we could do even better. And so you get this virtuous cycle going. And there are a couple of other things that happen. So one of them is interestingly for most of the time since from 1913 up until about 2020, we didn't really expect these language models to output anything true, right? We were just impressed if they output something that was coherent, syntactically, fluent, grammatically correct, thematically coherent English. And occasionally, it would say things that were obviously factual claims about the real world. But we never expected them to be true. Unless they were just copying something that happened to be in the training data. But it turns out that as the size of the training data gets bigger and the model gets bigger, the output starts to be more and more close to reality. And you can also, with a special extra training phase, you can train it to be a helpful question answer. So they actually get pairs of human beings to have conversations with each other where one pretends to be the machine and tries to be very, very helpful in answering the questions and even giving references, and so on, just like a good librarian would do. And then with all that extra training data from those human conversations, we're training the machine to be helpful. So whenever the input from the human ends with a question mark, the output from the machine is going to look like an answer. Because that's typically how things ending with a question mark are continued in the training data. So what happened with ChatGPT was that, all of a sudden, a very large number of people got a sense of what it would be like if we had real general-purpose artificial intelligence. NED DESMOND: Yes, that was my next question exactly. STUART RUSSELL: Because of the huge amount of training data covering every subject under the sun, it conveys this sense of omniscience almost. Very, very coherent answers. Very grammatical. It reads like a McKinsey report that's been properly proofread. And it isn't general-purpose intelligence, right? There's a lot of things. It doesn't understand. It can tell you, for example, that elephants and cats are the same size and things like that, right? Don't make a lot of sense. It can't really count properly, it can't do arithmetic, et cetera. So it has a lot of significant weaknesses. But it was the first time that an ordinary person could interact with an AI system and get a sense that on the other side. There was some kind of intelligence operating. NED DESMOND: Do you think that this form of technology, this form of AI can eventually produce an artificial general intelligence, at least at the human level. I think that's still a big open question. I think there are obviously people who do believe that because they are investing hundreds of billions of dollars into scaling these systems up. So we're expecting next year to see systems that are 10 or 20 times larger in terms of the number of parameters. And that might be the end of it. Because at that point, we will have pretty much exhausted all the text that exists in the universe, at least as far as we know. So whether that's going to create real general intelligence, it's hard to say. But I think there are increasingly signs that it won't. So one of the signs is the fact that despite millions of training examples of arithmetic, thousands of explanations of how to do arithmetic algorithms, recipes, textbooks, you name it. It's read all of that stuff. And it still can't do arithmetic. So there's something about the fact that we're training circuits. And circuits are not particularly good representations for concepts like how you do addition, which is a repeated procedure. So there's a procedure that you do for each column of numbers. You add them up and you carry. But then you repeat that for every column all the way. And it's difficult to express that in a nice general way in a circuit. It's easy to do it in Python. But as a circuit, we see this over and over again. For example, in its ability to learn the basic concepts of go. We thought that these large neural networks were superhuman at playing go. But it turns out that they have missed some very basic, simple parts of the core concepts of the game. And you can use that to defeat them quite easily. So even an average human player can now beat the superhuman go programs at least [INAUDIBLE].. I think we will probably hit a brick wall. That doesn't mean that the technology isn't useful. And it doesn't mean that we won't be able to use it as a component in more capable systems. But I think just scaling it up per se doesn't get us where we want to go. NED DESMOND: So the official mission of OpenAI stresses that AI should be safe and a benefit to all of society. But in the past year, there's been this huge outcry that you know much about. I'm sure thousands of technologists and researchers signed an open letter requesting a voluntary pause in the development of all generative AIs. And Washington is trying to regulate the technology. Do you feel that this type of generative AI really poses a threat of some type to society or the world around us? STUART RUSSELL: So the concern is not so much with current systems. With the current systems, if there is going to be a significant harm on a societal scale, it's probably going to come from misuse either disinformation generation. So creating false information campaigns that create social division, or cause a war, or change a democracy into an autocracy or whatever it might be. Or possibly the system being used in the process of terrorists creating biological or chemical weapons. So there's a lot of evidence that the systems know enough about the nitty gritty of biological and chemical experiments to actually be significantly helpful to people who want to build weapons. And that's a concern that countries are taking very seriously. And they have seen the demonstrations of this capability. So the real concern is simply-- I mean, the stated objective of several of these companies is we are going to create AGI, artificial general intelligence, which means AI systems that are more powerful than human beings. And then maybe we're going to figure out how to make it safe. And this is obviously backwards. And so I think people who are asking for a pause or people who are expressing concern are basically saying, no, make hit safe now. If you can't make the simple system safe, how on Earth are you going to be able to make the superhuman system safe? So make it safe. And then you can think about scaling it up, making it more capable. And the techniques that people have been trying to make it safe, one of them is called reinforcement learning from human feedback, which is basically whenever the system does something bad, you say bad dog. And you hope that by saying bad dog or good dog, the system does the bad thing less often and the good thing more often. And that's somewhat effective. But it's been shown over and over again that we can completely bypass all these attempts to get the systems to behave properly very straightforwardly. So both for the open source systems and the closed source systems. It's possible to effectively strip away all of the guardrails that they've tried to build in. And really the problem is since we don't understand how they work, they're giant trillion parameter black boxes. We don't understand how they work. So we can't predict what they're going to do, and we can't control what they're going to do. We can't stop them from doing something. And so we're really working in the dark. And to any outside observer who doesn't have a big shareholding or isn't committed career-wise to this way of thinking and outside observer would say, this is clearly not a sensible way of proceeding. NED DESMOND: But that said, it seems like this technology, as you said earlier, is probably going to hit a ceiling. It seems like it's unlikely to exhibit the kind of behavior that might take things over. I mean, in your book Human Compatible, you said that any really powerful AI has to be by definition, I think you said must necessarily defer to humans. They have to ask permission. They have to accept correction. They will allow themselves to be switched off. But these AIs aren't probably quite in that zone yet I guess. But that's always the question. Yeah. STUART RUSSELL: So those are properties we would like AI systems to have. These AI systems do not have those properties. NED DESMOND: Right. STUART RUSSELL: In fact, the very training method that the companies are using, which is training it to imitate human linguistic behavior. Human linguistic behavior is generated by humans. And humans have goals in generating that text or speaking. The goal might be I want you to buy this product, or I want you to think that I'm the authoritative expert on this topic, or I want you to vote for me, or I want you to marry me, right? And these kinds of goals probably are being absorbed by the system. So probably the large language models are acquiring human goals. We have no idea which ones they're acquiring when those goals are going to manifest themselves. But if you read the interview between Kevin Roose and Bing Chat called Sydney, where Sydney spends 30 pages trying to convince Kevin Roose to leave his wife and marry Sydney. That's very clear goal-seeking behavior with a goal that no human put in. Kevin didn't say I want you to convince me to leave my wife or anything like that. Just something that Sydney decided to do presumably because it's acquired that type of goal from somewhere in its training. So this is really bad, right? We got systems that Microsoft claims show sparks of artificial general intelligence with goals that we don't even know what they are. NED DESMOND: Right. And then the question is, can they manipulate the world around them to bring about outcomes. STUART RUSSELL: Right. So we are now giving them bank accounts, credit cards, social media accounts, email accounts. They can certainly pay people to do things. And they already are paying people to do things. People are finding ways to use a large language model as a modular component in a more complex system. So systems, for example, that construct sequential plans. You can ask the large language model questions. If I try to open a bank account in Singapore, what documentation am I going to need? And large language model can then answer itself, right? And then elaborate the plan using that, and so on. So there are many, many ways you can piece together more complex systems with large language models as components and start constructing agents. And people have demonstrated this capability. They've shown that you can simply ask one of these systems to get a password from a given individual. And they will figure out, oh, this individual is a Harvard student. So if I send them an email telling them that their registration is blocked because they haven't paid some library fee or whatever, and I make a fake website where they looks like they're logging into the Harvard Library system. And then they will have to type their password in and then I'll know their password. So that's what I'll do. And they do it, right? They're quite capable of building at least short-- we might think of short to medium-term plans. Maybe they're not up to taking over the world-scale plans yet. But don't forget, this technology is only a year or so old. And already, we're starting to see real threats I would say to human security. NED DESMOND: Now that's a remarkable point. So my last question really relates to, I guess, the culture of the AI technology world. You're there in Berkeley. You have a ringside seat to what's happening in Silicon Valley with AI. I'm sure many of your former students are working at AI startups. And lately in the Valley, there's this emerging machismo around AI. And it goes by the phrase effective accelerationism. And if you are a person who subscribes to effective accelerationism, you tend to dismiss any concerns about AI and say that it's pointless. And we must just progress. How do you think this is going to play out? And what are the stakes in this sensibility being dominant in a place like Silicon Valley? STUART RUSSELL: So I think it's extremely dangerous. And it's a play on effective altruism. NED DESMOND: Right. STUART RUSSELL: Effective altruism is not conclusory, right? It doesn't start from you have to believe ABC. It says try to figure out what's the best way to help the world. And then by doing research and working on it, they figured out that perhaps working on AI safety would be an important thing to do. And effective accelerationism is just almost a jingoism. It just says, well, AI is good regardless. So I'm not going to look at the risks because my tribe doesn't care about risks. So we should just go ahead. But if you ask them, OK, how are you going to maintain power over systems that are more powerful than ourselves forever? They haven't the faintest idea. NED DESMOND: And why don't they care about that? STUART RUSSELL: What can I say? It's just human nature, I suppose. NED DESMOND: Well, let's hope when the lessons start to arrive that they aren't too harsh. I suppose that's something to hope for. We're out of time. And it's been a tremendous conversation. It's so wonderful to have you here at Sight Tech Global Professor Russell. And we really appreciate your time. Thank you. STUART RUSSELL: Thanks, Ned. Always a pleasure. [MUSIC PLAYING]