-
DESCRIPTIONAt META, Mike Shebanek has a ringside view of the emerging AI universe. Not only is META one of the top contenders developing the most powerful generative AI models, it is a player in hardware as well, with the rollout later this year of the META Quest 3 AR/VR headset and Ray-Ban META smart glasses.That combined with leadership on the evolution of VoiceOver at Apple earlier in his career, provides Shebanek with almost unique perspective on where accessibility and assistive tech are headed. Are we nearing a time when critical technologies, like GPUs, sensors, and generative, multimodal AI might yield remarkable agents that were once the realm of sci-fi? Will we think of those technologies as purpose-built for people with disabilities, or will they be facets of something much bigger, a vision of universal design, the realization that all tech is assistive technology, to quote the artist and designer, Sara Hendren.
Speakers
-
-
SESSION TRANSCRIPT
[MUSIC PLAYING]
NED DESMOND: Well, thank you, Alice. I love your introductions. It’s so great to have you as the host of Sight Tech Global. So as everyone just heard, I am Ned Desmond and I’m here today at Sight Tech Global with Mike Shebanek, who’s the Head of Accessibility at Meta, and prior to that had a similar role at Yahoo, and prior to that was the product leader on Apple’s VoiceOver, among other roles at Apple. And VoiceOver, of course, is one of the most consequential bits of accessibility technology ever. When you ask Mike’s colleagues at Meta, Yahoo, and Apple what Mike’s like, they all say pretty much the same thing– to get along with Mike, you got to go for the home runs. He wants big ideas. So we’re going to do that today. We’re going to try to swing for some fences. So Mike, let’s get started by talking about what Meta just rolled out, some very impressive hardware, the Meta Quest 3 AR/VR headset and the Ray-Ban Meta smart glasses. What do they represent in the evolution of AR/VR experience for consumers? And what are the important advances that they represent?
MIKE SHEBANEK: Yeah. Well, thanks, Ned. Great to be here with you, and looking forward to our conversation today. I think they represent some really impactful, and I’ll call it accessible in the sense that they’re now widely publicly available, things that are relatively affordable for a lot of people to gain access to in traditional retail stores and locations and online. That’s a huge part of being accessible as well as the usage of the product. But I think it just represents this sort of really interesting moment in the technology landscape, where we’ve sort of become very focused on a certain sort of user experience or interaction with our technology. Traditionally, like, it’s been like a GUI or a Graphical User Interface system. We’re using a mouse. We’re pointing to things. Or maybe we’re using a keyboard. And what these represent is sort of the first of this next wave of some pretty massive changes, where we’re going to be doing more talking, we’re going to be doing more gestures, we’re going to be doing like– and not particularly in these cases– but eye gaze. We’re going to be doing things with sensors. And so the way we’re going to be interacting with technology is, going forward, going to be really radically different. And maybe it’s good to just call out for a second just because we got smartphones and we could do gestures didn’t mean we stopped using personal computers, right? But if you think back and even just take a consideration for a moment, most of what we’re doing now is on those smartphones most of the time because they’re personal and we carry them with us. And these types of products are also very personal, very adaptable, but are going to use these really different sort of interaction models. And that’s really exciting because that gives us a new opportunity to address the needs of people with disabilities in a really new way.
NED DESMOND: Can you give us a concrete example that would be of particular relevance to blind users?
MIKE SHEBANEK: Yeah. So for example, with the Ray-Ban Meta glasses, those have– those are standard Ray-Bans. They look exactly like a Ray-Ban, the famous model that’s been out for a decade.
NED DESMOND: They’re beautiful. They’re really [INAUDIBLE].
MIKE SHEBANEK: They’re gorgeous, right? And they have multiple frame shapes, and sizes, and colors, and the whole thing. They can even take a prescription lens if you want them to. But they have a camera and microphone and speaker built in that’s almost invisible. Now we make sure that there’s actually an indicator that allows people around them to know if you’re taking a picture or using it for live video. But you can stream video from them. You can capture a photo, you can upload it. And what makes that great is you can decide whether you want to use a gesture to do that on the frame, or whether you just want to use your voice and, say take a picture, you know, and post it. I won’t use the wake word because I don’t want to trigger a bunch of products out there, but you can imagine. So that’s one where it’s literally in the moment. It doesn’t require you to hold something or grip something. It’s sort of automatically focused. It’s when you need it in the moment. And it has this really sort of transparent interactive model where you can just talk to it and ask it to take something or perform an action for you and then post it up on social media. So I think that really is a great representation of this more invisible interaction model we have with technology as we go forward.
NED DESMOND: And there’s a built-in AI assistant as well, right? I believe this might be the first time Meta has rolled something like that out.
MIKE SHEBANEK: Yeah, we’re going to be rolling out the new Meta AI– AI assistant. And it’s going to be coming across various products. Meta Ray-Ban glasses are going to be one of those that allow you to actually start to ask questions and get information. As I mentioned, there’s a speaker in those as well. So you can hear responses coming back to you.
NED DESMOND: And is this– I mean, sometimes people refer to these AIs as next– as generative AIs or multimodal AIs because they can do a lot more than, say, what your Alexa or your Siri does. They can actually come up with answers to questions that may not have been, sort of, pre-programmed, you could say.
MIKE SHEBANEK: Yeah, yeah.
NED DESMOND: How does– how does that work? And what does that really mean, the shift to generative AI in consumer-facing products like this?
MIKE SHEBANEK: Yeah, and I think it’s important to establish what we mean. So there’s sort of general purpose AI and then there’s generative AI. And the difference really for generative AI is it’s the idea of creating content. So for example, you type in or you ask a question with your voice and prompt it, and then it returns output in either a text output, where you say, hey, write me a letter, an introductory letter for someone I want to meet, or something like that, or create an image for me. And so for example, like, on some of our social media products, you can create stickers, things that you would use in a chat, and say, hey, I want a sticker, but there’s not one that I want in the list of pre-configured stickers. So I can prompt the AI and say, could you create me one of those? You know, like, a dinosaur riding a pony or something? And it will generate it instantly and allow you to then use it in your conversation. And– and so these are sort of the– I guess I’ll call them the second wave of AI. I think the first ones are sort of canned response, pretty straightforward, fundamental things. These are going to be more creative, far more interactive, and have far more– I guess I’ll call it a wider domain of things you can interact with or ask them about. But I think it touches on something I wanted to share, which is I think one of the really critical changes here is the ability to generate content. We talk a lot about accessibility. And honestly, most of the focus and effort I see in the industry is around consumption, and rightfully so, because we’re all consumers of this kind of media and landscape. But the ability for someone to create something in a new way– for example, imagine this. Today, if someone wanted to create an image, if you were a professional or commercial producer, even like somebody at home, you probably are using something like Adobe Photoshop. And the level of difficulty of trying to make that interface accessible to someone who’s blind is so staggering. But if you could simply type something or say something and generate a high-quality photographic image, that’s game-changing. And that’s what we’re talking about when we talk about AI and its ability to be generative and to create content for you. And I think that is going to open the door for people with vision loss in a way that has really not been possible before. Most of the effort, most of the difficulty, most of the frustration has been around making the control systems, the menus work. And here, we have a completely– I’ll call it a sort of– I won’t call it a back door, but a side door to the same process that is so easy and so compelling. Yeah, I’m super excited about that. I think a lot of people are starting to understand what that’s going to mean.
NED DESMOND: Yeah, the implications are very far-ranging. And I don’t think we’ve even begun to explore all the possibilities. But just to clarify one point though, the Meta Quest 3 and the Ray-Ban Meta smart glasses aren’t really accessibility tools on their own, right? I mean, they have some accessibility built in, but they’re not really designed that way. But what you’re saying is some of the technology built into them is giving us a kind of view into the future of things that may be rising in other types of hardware deploying AI in new ways that could present some very interesting possibilities. Could you develop that a little bit? You know, when we’ve talked in the past, you mentioned, for example, that interactions with hardware could change dramatically, and partly along the lines of what you’ve just described, interacting with voice, but in other ways as well. I mean, you’ve cited the example of creating imagery. But then there’s also that idea of interacting with memos, the side door, as you mentioned. So how does this start to spill over into the world of accessibility and the technology that people have come to really love and rely upon?
MIKE SHEBANEK: Yeah, it’s a really great question. And you’ve opened a couple of avenues here, so I’ll see if I can cover some of these. I may ask you to prompt me on some of those if I forget. I think the first thing to mention is that Quest 3 and Ray-Ban– on Quest 3, there are specific features for people with vision loss. So we have things like high contrast settings, and even for people who are color sensitive, being able to shift the color space to make it easier to see. We have larger fonts so you can make it easier to read in those things. And so there are specific features that we’re already building in. We have an accessibility settings area that has a variety of different accessibility settings. So we’re starting to see the traditional model of, hey, let’s have some adaptations that will welcome people and make those products more useful for a broader audience. And we’re going to be expanding those over time, for sure. In the Meta glasses, the Ray-Ban glasses, we talked about voice control and response. There’s actually an app that goes with these products. So there’s even another surface and another potential interaction model. And I think this points to what you were just talking about. These products, we interact with them in a very different way. We use them in different situations and places, right? So Ray-Ban is really designed to walk around the world. Like, you can go outside and go do things and be wherever you want. Quest 3 is really designed to be indoors, be in a room where you are. And so we think of like productivity solutions. And if you are trying to tracking where these are heading, while games are awesome in Quest 3, absolutely, we’re also working really hard because we recognize this actually has potential for productivity in the way that today we use a desktop computer or a laptop computer. So for example, let’s say that you’re limited by space in your physical location. You’re in a small apartment, you’re in a dorm, at a small desk, what have you. You can’t afford a large screen with large fonts and big pictures because you don’t physically have the space. In a quest 3, you could have a huge large screen display. You could literally have a movie theater-sized display in this virtual reality. And you can change the contrast and you can change the font size. And so– and you can bring it closer. You can walk farther away. There’s so many more possibilities now that sort of escape the limits of the physical world to give you access in ways that just aren’t even considered in terms of the traditional desktop or laptop form factors, right? Being able to take like a Ray-Ban with you and have access to that AI agent, or be able to use gestures instead of just talking, like, having the ability to use the right mechanism in the moment, again, typically, you have one or the other. And that’s how the products are designed. These, in a sense, are multimodal, where all of a sudden, it’s like, in some cases, I’ll use a gesture. In other cases, I’ll use my voice. And in other cases, I might do something else. And so I think this is the new wave of things. This is going to be really, really different.
NED DESMOND: I was looking– the evolution of accessibility technology has been very much one of purpose-built products, you know? And maybe the best example of all is a screen reader. So the computer world raced ahead with keyboards and mouses, and then eventually figured out how to circle back, and in an imperfect way, developed this remarkable technology called screen readers. What does technology like screen readers look like in this new emerging world? You know, how could it potentially be different?
MIKE SHEBANEK: Yeah. Boy, that’s something I think about a lot having worked on VoiceOver for so many years ago and really dug in deep on what the screen reader experience should be. And it was really predicated on what the experience was we were trying to translate for people. And a great story about this, when we first invented the VoiceOver screen reader and people started using it, the comment we got back at the time was, something’s wrong with this screen reader. And we’re like, well, what’s wrong with it? They said, well, because the menus are at the top of the screen all the time. They’re not in the window I’m in. And we said, well, because that’s how a Mac works. That’s literally the experience we’re trying to translate for you. And they said, no, that’s not how a computer works, because they’d only ever had access to Windows. And that’s how Windows work. And they assumed that’s how every computer must work. And of course, the Mac’s very different. It was the first time they realized it wasn’t the screen reader, it was the device they were using that had a different experience attached to it. And that opened up a completely new sort of mindset about, oh, not every product is the same. And my screen reader isn’t the product, it’s translating the product for me. So when we get to experiences like VR, we’re having to think really hard about, how do you translate an experience that’s all around you, that can change, that you can walk through, that feels more like being outdoors or in different environments than sort of a two-dimensional screen that sort of has top to bottom, left to right texts or Windows or things like that? And so I think this is where the context is so interesting now, because now we have devices, you know, like, headsets and AR glasses and things. We have the ability to do gestures, and voice, and eye gaze, and all kinds of new sort of interaction models. And now you have these sort of new virtual spaces that are all encompassing. How do you bring all these together? And what’s so fascinating is that we can. There’s a moment where all of these things suddenly get to a point of maturity where there’s going to be a breakthrough. And we don’t always know when that’s going to happen, but we know it’s going to happen, right? And this sort of goes back to having a smartphone, but you have to have internet that’s wireless, and that could be cellular or Wi-Fi. And you have to have small enough chips and all those things. And then suddenly, boom, you’ve got smartphones. And they’re super amazing. And they’ve been amazing ever since they’ve been created. And that changed the market, changed how we think about technology, changed our everyday lives. And I think that that moment is coming soon, where all of these things like AI, and these headsets, and these multimodal interactions are going to come together, and something huge is going to happen. So I’m excited by that. I hope everybody who’s listening and watching is excited by that. And there’s just people working on every edge of this to figure out what’s it going to be. So when we get to screenreaders, it’s like the way I’m thinking about this, and I think Meta is thinking hard about this as well, we want to be able to bring in the 2D experiences you’re familiar with into these virtual worlds. So for example, if you’re sitting at a virtual desk, using a virtual personal computer– maybe you have two screens, maybe you have one screen, you know? Who knows? In this virtual world, we want to replicate that experience you’re used to with a traditional screen reader. But what about the environment you’re sitting in? What if I leave that environment and go into another space? It could be a home. It could be a game environment. It could be whatever. You’re going to need something that can describe to you what’s going on, but you’re not going to want to move a cursor 360 degrees up, down, left, and right and ask what’s in the cursor. That just doesn’t really make sense. So our thinking, really, is that it’s going to feel more like if you took a friend with you to go somewhere. So you step out of your house. You go outside. And you kind of get in a car, or you get in a bus, or they drive you, and you go to a place. And you kind of say, well, are we near the store? Or I want to go here. Can you take me there? And it’s more of an interaction. It’s a dynamic conversation. And you asked different questions based on the input. So hey, I hear a fountain. Are we near water? Yeah. Oh, that’s a fountain. That’s a piece of art. Oh, that’s really cool. Or hey, I hear– I smell– I smell the popcorn. Are we near a theater, right? These moments will happen to us. And I think in the virtual world with a screen reader, it’s not going to be a screen reader, it feels, to me, more like an AI agent who is with you at all times that you can ask questions of or can prompt you and say, hey, this is happening. Do you want to go take care of that? So it could be a notification or a message. It could be, hey, I noticed someone else just came online, and you’ve been talking with them last three days. Do you want to send them a message and let them know you’re here? So it’s going to be a combination, I think, a hybrid model between bridging the old world in so you still have that capability, but introducing you to this completely new way of interacting in this space. And that’s going to be a little scary, but it’s also going to be incredibly exciting. And it’s going to unlock a lot of really interesting opportunities to create new experiences for people that are far more accessible.
NED DESMOND: Now will that– the world you’re describing is in the virtual world, I think. Is that right?
MIKE SHEBANEK: That’s right. And then there’ll be a sort of a different one. The augmented reality is I’m going to live in the real world and take my glasses, or my sensors, or whatever those are with me, and be prompted as I move through the street for directions. You know, today, we think about directions on a mobile phone. We can hear them in our earbuds or something. But there’ll be sort of that same sort of agent I think will be available to us, like I said, the Meta AI agent or others. And I think one of the other interesting things we didn’t get to yet, but you’ll be able to have your own personalized AIs. And I think that’s something that Meta’s been talking about and introduced. We’ve got about 28 different character AIs in some of our products. We’re going to be rolling out more. But also, the AI Studio will enable people to create their own personal AIs as well. So it’s not going to be something– again, the model is kind of getting broken. It’s going to be really unleashed, where we sort of assume that someone makes something for me and that’s what I get to use. You’re going to have a lot more creative control over these things. It’s going to be a pretty amazing world.
NED DESMOND: Well, creative control and also the capacity of the AI to adapt to you– I mean, at the Connect event, which is the big tech rollout Meta has every year, there was an interesting conversation among a couple of your colleagues along the lines of imagining an AI that knows you’re just– even the peculiarities of how you type, so that it knows that you’re always going to miss that key so you don’t miss that key anymore because it just knows.
MIKE SHEBANEK: Right. It’s a great point, Ned. Most people don’t realize when you’re typing on a smartphone, it actually isn’t necessarily typing where you put your finger. Like, it’s actually– because we’ve turned this off in the past, where you turn it off and see how well you type. You can’t type at all. But like, it knows if you type the first three letters of T-H, the next letter is pretty likely going to be E. And so you just have to get close. Your brain thinks, oh, I touched the E, but you really didn’t. But it kind of knows that’s what you meant. And that’s where we get sort of the word prediction and the word correction. And we know that some of those have been awesome and some of those have been terrible. But we also know that without it, you can’t even really use the product. And so the trick, of course, is to improve those over time, and make them more accurate, and learn how you type. And of course, now there’s multiple different ways to skim across the keyboard on a phone. But I think that’s a good example of AI’s been with us a long time. It’s been helping us in various ways. Sort of the breakout moment was ChatGPT, where it was called AI, there was a thing you could go do, and it was pretty broadly available to everybody. And so it’s exciting to see that kind of get some spotlight and allow us to really talk through what this is going to mean.
NED DESMOND: Right. Well, this idea that we might have AI agents in a real world that have a strong sense of a lot of things, where we are, what we’re trying to accomplish, the world around us. You know, sometimes these are referred to as multimodal AIs. They’re taking in lots of data from different places. You know, they even appear to be sensate sometimes, at least if you watch the sci-fi movies like Her, or Blade Runner 2049, or something like that. And this really has Silicon Valley in its grip at the moment. You know as well as I do there are several hardware companies coming out that have a device that’s really just meant to embody an AI. It’s a pin, or it’s a pendant, or something you wear, and you communicate with your AI through that.
MIKE SHEBANEK: Yeah.
NED DESMOND: Do you think these are ahead of their time? You’ve been around Silicon Valley a long time. You think we’re there yet?
MIKE SHEBANEK: That’s a good question. In a couple of years, I’ll be to say, yeah, it was of its time. Of course, that happened. But right now, it hasn’t really happened yet, I think. But I think, again, like things like the Meta– the Ray-Ban Meta glasses, with an AI agent in them, it sort of becomes– it’s just with you all the time. You forget– I mean, I don’t know, I wear glasses, you wear glasses. We’re wearing them now. And we sort of forget, as glasses users, that we even have them on.
NED DESMOND: Right.
MIKE SHEBANEK: I mean, I’ve gone out of the house and been like, where’s my glasses? Mike, you’re wearing them. Oh, that’s right. And so I think that these AI agents, over time, like, not right today, but I think over time, will start to be embedded and be where we need them. And we will carry them in different ways, whether it’s through something like the glasses, whether it’s through something that we wear on our wrist, which we’re also working on some technology there. It might be something you carry in your pocket on your phone. It’s going to be in places where you need it, I guess. And I think part of the exciting part is, where do you need it? And when do you need it? And what’s the most convenient way to have it? And you’ll choose your interaction. You’ll choose the way that you bring that to life. And you sort of alluded to this. Like, in the VR space, you could ultimately create avatars that are actually embodied in the virtual world that act and serve as that agent. You can have multiple of them. They could be ones you create or ones that are provided for you or somebody else uses.
NED DESMOND: That’s crazy. So take your avatar out of your VR world and into the real world.
MIKE SHEBANEK: Right. And this is– by the way, this connects the dots on what we’ve been talking about with the metaverse, right? We’ve said very clearly that it’s not going to be just Meta creating the metaverse, it’s going to be a collaborative industry effort with lots and lots of different companies and individuals contributing to this. But this idea of being able to carry it with you and move from place to place and not have the silos that have been traditional in the technology landscape previously, we’re really working hard to see that we can maybe make that less siloed, and so these things can carry over from the virtual world to the real world and have some continuity there. So that’s another area, I think, that hasn’t really gotten a lot of attention, but I hope connects the dots for a lot of people about why this is such a big deal.
NED DESMOND: Yeah, it’s remarkable. So one last question. In the realm of design, so there’s always this tension between purpose-built technology for accessibility and the idea that any good technology really shouldn’t exclude people. And it’s this notion of universal design. And we see the tension in different ways, for instance, the Waymo vehicles ripping around San Francisco right now that are so popular with people, especially people who have sight challenges, because it’s just a better experience overall than, say, taking an Uber. And you know, Waymo is not perfect, not even close yet. But that might be an example of a universal design because so much of what they did to accommodate people on the accessibility front was actually great for the overall experience of a Waymo. And then of course, you’ve got a lot of purpose-built technology, which is also great, out there. But do you do you see AI altering that tension at all? Do you– do you think that the world is going to be different as a result of the incredible sort of flexibility that these advanced AIs bring to a lot of equations?
MIKE SHEBANEK: Yeah. You know, if we think about it on a continuum, I think that it’s going to slide very far very fast. So you talk like– it’s pretty well known Sara Hendren said all technology is assistive technology, right? It enables anyone to do more than they could do otherwise. That’s why we use the technology, right? I’m either faster, or smarter, or better informed, or communicate farther than I could otherwise. I think what’s happening, at least the way I envision this going forward, is the number of products that are more universal is going to grow dramatically. And they’re going to be far more inclusive. And we can already sense this with some of our conversation about where AI is assisting people in generating new content, or providing information or answers or direction or what have you, and always sort of being available to us. These are things that are universally beneficial. And so I think we’re going to have to start getting more comfortable as we go forward with the idea that while assistive technology has its place, so much more of the generally designed products are going to be so much more inclusive over time that it might actually become a little uncomfortable. Because it might start feeling like they’re not designing this as an assistive thing, they’re just designing a good tool. And I can remember this vividly when we came out with the iPhone. And I was at Apple at the time, and we were introducing this. And people were giving me the argument pretty vociferously that this wasn’t an assistive device. And I said, no, it’s so universally designed that you’re missing it. Like, it’s just got one button. You don’t have to think too hard. It’s not complicated. It has a grid of icons. You don’t need a book to learn how to use it. It’s always on. It’s always connected. Like, it– and of course, we brought in assistive technology features to improve it over time, but there was so much– and that was a precursor to this. There was so much that was sort of universally valuable. And I think when VoiceOver came out, particularly for the blind, everybody who uses VoiceOver and their smartphone, and of course in Android with TalkBack, started to understand the universal benefit of a smartphone. And the assistive technology was actually, even though there were some breakthroughs there, in a sense, a smaller part of the story. Because it’s the phone, the smartphone itself, that’s so valuable, not necessarily the assistive tech. That was just the enabler to get to it. And we’ve lived now through a 10-year cycle of, wow, this is amazing. And for the first time, the blind and vision loss community got to live in the moment as it was happening with a brand new technology platform. And so we’re working really hard to make that the same as we move into these new technologies, whether it’s AI or VR or augmented glasses or smart glasses, so that they can experience that same sort of benefit at the same time, and not have to wait or come later. So that’s the work that’s going on in industry. That’s what’s going on behind the scenes. We’re working really hard to make that happen. And it’s really exciting. I mean, things I’ve struggled to try and bring to the market because of just technology limits are suddenly now available to me and to many other companies. So it’s going to be– it’s going to be a pretty wild and fun ride going forward.
NED DESMOND: Yeah. Well, the future is nothing if not exciting. So we’re out of time, I’m sorry to say, Mike. And thank you very, very much for joining us. And I’ll throw it back to Alice.
MIKE SHEBANEK: My pleasure. Thanks, Ned.
NED DESMOND: Take care. Bye-bye.
[MUSIC PLAYING]