AI That Sees, Hears, and Understands: Google's Accessible Technology

DESCRIPTION

Explore how Google is using AI to transform accessibility for people with disabilities. This session showcases innovative vision-focused tools including TalkBack with Gemini for detailed image descriptions, Lookout's Image Q&A feature, Pixel Magnifier with voice search, Guided Frame for photo composition, and the StreetReaderAI prototype. Learn how machine learning is leveling the playing field and making the visual world more accessible. Features a live Gemini demo and insights into Google's accessibility mission.

Speakers
- Christopher Patnoe, Lead for EMEA Accessibility and Disability Innovation, Google
SESSION TRANSCRIPT

Download transcript as .txt file

[MUSIC PLAYING]

VOICEOVER: AI That Sees, Hears, and Understands: Google’s Accessible Technology. Speaker: Christopher Patnoe, Lead for EMEA Accessibility & Disability Innovation, Google.

CHRISTOPHER PATNOE: Hi. This is Christopher Patnoe out in London. I’m really happy to be here at Sight Tech Global. I’m here to share with you some thoughts and ideas of, of AI accessibility and to teach you about some of our vision tools. Just a quick personal story, though. Even though I’m here in London, I’m not from here, as you can probably tell. I was actually born and raised in California, near San Francisco. I’m a failed opera singer. In my mid-20s, I realized I wasn’t very good. I got involved in tech. Um, within a year joining tech, I was at Apple, where I worked on the first iPod, I worked on the first iTunes, and the G4 towers and founded their studio. I spent a couple years at Sony Ericsson making mobile phones and even a year at Disney, making video games. But it was at Google where I learned about accessibility, and I learned about it because my product wasn’t accessible.

At the time, I was the lead technical program manager for Google Play Music. We had a test engineer come in. She turned on voiceover and I heard, “Button. Button. Button. Button.” I said, “What’s that?” And she said, “Well, this is Google Play Music for someone who’s blind.” I said, “Well, gosh, that’s stupid. How do they use it?” She said, “Well, they don’t. That’s why I’m here.” And I couldn’t look away. So within a couple months, I volunteered to take on accessibility for the Google Play suite of products, and a couple years after that, I started taking on it for all of Google, helping make accessibility teams around the company. And then back in 2021, I moved to the UK to get Google ready for the European Accessibility Act, some really meaningful legislation that was just been enforced this year in 2025.

But I work at Google, and I get asked this question a lot: Why does Google care about accessibility? And I’m happy to be able to say, it’s actually part of our mission statement, which is to make the world’s information universally accessible and useful. And we take that mission seriously, even to the point of looking at the real world, not just your digital world. So it, it’s not just about Gmail or YouTube. It’s also about having information of the real world. Say, if you’re in a wheelchair, wouldn’t you want to know if the restaurant you’re gonna go has an accessible bathroom, or if there are stairs that you have to go up? You wouldn’t want to go there, right? So we, we… Because we have this information about the world, we have these features in Google Maps so we can tell you whether or not you have accessible parking or accessible toilet or hearing loops, and we share this with you because it matters. Having an understanding about your world matters, and we take this mission very seriously, because disability affects everyone.

I, I, I know I’m, I’m, I’m preaching to the choir here, but of course, according to the WHO, there’s 1.3 billion people with a profound disability, 16% of the world. And for those of us with that, these tools can be profoundly meaningful. But realistically, there’s a bunch of people who, who, who have a disability, but don’t identify with it because of, because of stigma or because you don’t think it… “I just need a little help. I don’t have a disability,” and, but that was me. Um, I have dyslexia, and I didn’t realize until my 40s about this. And it only happened because I started doing this work in disability, but it sort of tick-boxed all of these things that I wasn’t good at and understanding why. I now have the tools to make my life and my job easier.

Then, there’s our friends and family, the allies of someone with a disability. Again, think of that family with someone in, in a wheelchair. You’re not gonna go to a restaurant with your friends or with your family if your friend or family can’t go, right? And even looking towards the digital world, what we’re starting to learn is, especially with Millennials and Gen Zs, they care a lot about identifying with brands and doing and be participating with companies that mean something to them. So, uh, if a brand is more accessible, they have a much better chance of having more business. So there’s even a business impact in terms of, of making sure your products and your culture are accessible. But in reality, every one of us, if we’re lucky enough to get old or, or not, we’re gonna have some kind of disability, uh, whether it’s a temporary one, a situational one, or, or, or, uh, uh, an age-related disability. If someone builds a product that is accessible, it’s gonna be useful for you no matter who you are, where you are, or when you are.

So, what I really want to do here is, is share with you a bit about AI, because a lot of people are talking about AI, and it means different things to, to different people. This shift with AI is as big as it comes, and it’s why we’re so, why it’s so important that we’re trying to make it helpful for everyone. We’re trying to approach it boldly with a sense of real excitement, and some of the things we’ll be showing you here today really leans into that. And, and no one company can do this alone. Feedback from our users and partnership with the disabled community, um, and the developer communities will be key to unlocking this enormous potential. We look forward to working together to build this next generation of AI.

And with the Gemini ecosystem, again, I’ll be sharing you some of these things, we’re working harder to make this future exciting for everyone, and it’s really, really cool. And at Google, we’ve had a lot of practice with this. If you go back to 2009, we had our first experiment with accessibility and AI. Back in 2009, we added automatic captions onto YouTube in English, and it… Honestly, there weren’t… It wasn’t great, but it’s a whole lot better than nothing. And that, that was the feedback that we got from the, from the, the, the Deaf community. And we continued to invest, and now we’re at a point where we have over 60 different languages where you get free captions on YouTube because we continue to work at it.

Back in 2018, we added… We talked about Lookout, which is something I’ll share with you later. Back in, in… And when we continued from 2019 to 2020 to 2021, adding more and more features to Android, more and more features to Chrome browser and ChromeOS. And, and, and in this year, we announced a thing called SignGemini, which is really exciting. It’s the first general-purpose model that helps someone have sign language interpretation into text.

So, coming back from 2009 where we had not great captions, to the point now where we can do real-time sign language, the, the, the technology evolves faster and faster. It’s getting more and more exciting. And AI can level this playing field, especially when you think about the context of platforms. Chrome OS and Android both have AI features that bring alt text for pictures that don’t have alt text. There’ve been captions for the people who, who need captions. The Chrome browser even could provide real-time captions that pipe into a refreshable braille display, letting someone who’s deaf/blind participate in videos where someone didn’t bother to, to create a transcript. So these technologies can enrich the lives of billions of people, and as these machine learning and, and AI features can continue to unlock these major benefits, making the technology more useful, making the, the digital world more useful.

So with this intention of being helpful, AI has the power to help level the playing field for people with disabilities. And the way that we do this, it leans into that, that, that mission statement of, of making the world universally accessible and useful. Our commitment is to improve the lives of as many people as possible, and we want you to do this to maximize the benefits to humanity, guided by our AI principles. And these principles guiding our innovation is the North Star for accessibility teams around the company.

So, now that I’ve been talking big and, and abstract, I wanna bring this down to something that might make more sense to, to, to you. So, we wanna use AI for people with visual impairments to help understand the real world. There’s a lovely application called Lookout which uses AI to provide information for people who are blind or low vision in the world around them. In fact, in the, in the images mode, we can use Gemini to tell you what’s actually happening in, in an image. But we can also tell you about objects in the foreground and background. We can read labels. We can even use optical character recognition. So we can use AI to describe the things that are given to you in your phone. You can use it to actually look at the world around you, have it read things to you. It was designed hand-in-hand with people who are blind and low vision. Lookout’s a really great application.

And TalkBack, TalkBack is a screen reader, and this is one where I’m really excited about these changes we’ve made in the past handful of years. Recently, we’ve added Google Gemini in an offline mode and online modes to give you different kinds of information. So offline, again, this is with, if, if you have no internet, we can tell you if buttons don’t have… if a button doesn’t have a label, we can read it out for you, so we can use a platform to help compensate for applications that were not designed to be as, as, as inclusive as they could be. But if you have online mode, you can even take a picture and ask it to describe what the picture is, and you can ask questions. So for example, I would take a selfie with a friend, and I would ask TalkBack to recite to me or describe to me the picture, and then I can ask it, “How old do you think I look?” Sometimes it says I’m middle-aged. Sometimes it says, “I can’t tell you how old you are, but perhaps you’re in your 40s or 50s.” And, and it’s usually pretty good about that.

So we, we use these AI technologies to help us understand the device in our hand, but also the world in which that we live, where we have this kind of digital information. So what I wanna do now is share with you three different features that we have, and a video of each of them, to give you a sense of the kind of tools that we’ve built.

So the first one is called the Pixel Magnifier, and this was co-designed with the low vision community. In fact, it was co-designed out of the Accessibility Discovery Center we have here in London. The way it came about is Robin Spinks, who is the head of Inclusive Design for RNIB, said he’s a big fan of the Pixel camera. But he says it’s, it would be really helpful if it were a magnifier, but you’d have to do a couple things to understand what would it take to be a magnifier. So, we sat down over a couple of days, and we built out a bunch of the features that were there that’s important. So you can use this app, and magnifiers aren’t, aren’t particularly special, but with this, in this co-collaboration space, we actually added some features that make it really meaningful, so you can search for words in the world. You can ask it, “Is there gluten in this menu?” You can zoom in really, really deeply, macros for close-ups, wide angles for the big picture. You can turn on selfie illumination for a mirror kind of experience. You can use this device to, to, to help be the bridge between the real world and what you can do, and it was co-designed with, with the low vision community.

So, I’m gonna play an audio-described video that we created about this when we launched the application.

[Video plays]

VOICEOVER: Audio-described video about Brendan, a 12-year-old boy with visual impairment using Google Pixel Magnifier to view his coin and rock collections, read cereal boxes, and check expiration dates. The video demonstrates various filters including grayscale and contrast options that help him see details clearly despite his light sensitivity and depth perception issues.

CHRISTOPHER PATNOE: I think this is a good place to stop. I wanna make sure that we, we have more things to share and we have time for it. But again, this application was co-designed with, with the blind and low vision community, and it’s one of our most popular applications with, with over a million downloads.

The next I wanna show you is a really neat feature, and this one actually came from a hackathon inside Google. And the idea of this… well, the name of the feature is called Guided Frame. It’s, it’s a feature on Pixel phones of the camera. It’s not a new application. It’s the same application everyone uses, but there’s, there are features that get turned on when you use TalkBack. So what Guided Frame is, it’s a tool that helps someone who’s blind or low vision be able to take a picture, starting off with selfies.

And the way it came about is that this hackathon came from a Google Pixel program manager on the Pixel team whose son is blind, and his son was one of these sports kids and wanted to be able to take a selfie of himself like all of his friends, but he, by virtue of being blind, had a hard time keeping his face in the frame. So he, he challenged us to try to solve this problem, and what we did is we used a different concept of AI. We used machine learning to recognize what is a head and computer vision to recognize where the head is inside the frame. And the hardest part actually was coming up with a really good UX, a really good user experience. So we used haptics, spoken text, shapes, and colors all at the same time with a timer, and in real time, we can tell you when or if your head gets out of the screen, we’ll stop the timer.

So this is a really neat tool built a couple years ago and we’ve continued to invest in it. So not only do we now do selfies. We can also describe pets, food, documents. We do intelligent face filtering, low light reminders, scene understanding, and starting with Pixel 10, we’ve even integrated Gemini so you can take a picture and have it described to you what the picture is before you take the picture. If you don’t have the Pixel 10 or this new feature when it rolls out, you can always throw it into TalkBack and ask it to describe things for you.

So what I’d like to do now is play for you this quick commercial.

[Video plays]

VOICEOVER: Commercial showing Guided Frame feature through the perspective of someone with low vision, capturing daily life moments: selfies with a dog, birthday parties, sports events, camping, moving into a new apartment, running a marathon, and finally a newborn baby photo in the hospital. The video demonstrates real-time audio guidance: “One face cropped. Move your phone down. One face in frame. Hold for photo.”]

CHRISTOPHER PATNOE: So this is a great example of what happens when you take someone’s need and apply technology to it to create a solution that you didn’t know how to do. That’s why this is so much fun. We’re at this point where AI is that it’s going to create new opportunities and that by collaborating together with the community of people with disabilities, we can find new solutions to these new problems.

And these technologies continue to evolve, so what I wanna share with you now is, is a research paper. This isn’t even a product yet. It’s a prototype that is so exciting I wanted to share it with you here. It’s called Street Reader AI. And what it does, it shows you for someone who is using Street View, which is sort of this 360 view of a specific place, it can provide you a real-time AI-generated description of roads and intersections. You can have conversations with it. You can pan around and ask questions. So I’m gonna share this with you, and, uh, I’m warning you now, this is being used and tested with real blind users. So the screen reader’s gonna go fast. So some of you are gonna do great. Some of you might have a problem with it and I apologize. But this is just too cool not to share.

[Demo of Street Reader AI plays]

VOICEOVER: Showing real-time AI descriptions of a park with playground, answering questions about slide height, play equipment suitability for toddlers, crosswalks, and directional navigation. The system provides detailed environmental descriptions and answers natural language questions about the surroundings.

CHRISTOPHER PATNOE: So this is something that was designed hand-in-hand with, with blind users, and it drew on previous work from games like Shades of Doom, Blind Square, Soundscape. So this is really exciting and it’s just the beginning of these kind of integrated experiences. So this, this AI is so powerful and can be so meaningful, um, and we’re just scratching the surface.

So what I’d like to do now, gosh help me, is I’m gonna try and do a live demo in my room. So I’m gonna turn on my own personal phone, and then I’m gonna turn on Gemini Live and I’m gonna turn the volume on loud so hopefully you all can hear. So I, um, on my phone, this is a, this is a Pixel phone. Gemini also runs on iOS so those of you who have iPhones you can do this yourself as well. But I’ll turn it on, and turn on Gemini Live, and share my screen.

[Live demo with Gemini Live]

CHRISTOPHER PATNOE: Can you tell me what you see right here?

GEMINI: I can’t actually see your screen yet. It seems you haven’t shared it with me. Would you like to try again?

CHRISTOPHER PATNOE: Yeah. Can you see what do you see through the camera?

GEMINI: Yes. I can see a fireplace with several items on it, including a framed picture of a person in red robes. Is there anything specific you’d like me to focus on?

CHRISTOPHER PATNOE: Is there a poster on the wall?

GEMINI: Yes. I see a poster on the wall. It has a circular design in the center. Is there anything specific you’d like to know about it?

CHRISTOPHER PATNOE: What do you think it is about?

GEMINI: It looks like a poster about Satyagraha featuring a mandala design. What makes you ask about it?

CHRISTOPHER PATNOE: Well, it’s on the wall. What is Satyagraha in this context?

GEMINI: On the wall, above the fireplace, there’s a poster with a mandala-like design that says, “Satyagraha”. Satyagraha is a principle developed by Mahatma Gandhi meaning truth force or non-violent resistance. Is there anything specific you’d like to know about it?

CHRISTOPHER PATNOE: Do you think this is about a theater piece or an opera or anything like that?

GEMINI: That’s an interesting question. While Satyagraha is indeed a powerful concept associated with Mahatma Gandhi, it’s also the name of an opera by Philip Glass. Given the artistic nature of the display on the mantelpiece, it’s possible that the poster is related to the opera. Would you like to know more about the opera itself?

CHRISTOPHER PATNOE: No. This is good. Thank you.

[End of live demo]

CHRISTOPHER PATNOE: So, even when I make a mistake, it guides me on how to do what I want to do. So that is in fact a poster from the opera Satyagraha, uh, by Philip Glass. But I’m a sighted person. I want you to have the firsthand experience of someone who’s blind. So this is a buddy of mine named Mark, and he recorded, he told me this really funny story, and I asked him to record a video that I could share with you. So this is Mark in his own words.

[Video testimony from Mark, a blind user]

MARK: Hey, everyone. My name’s Mark. So as a blind guy, I rely on technology every day, whether it’s in my job or whether it’s simply getting me from A to B. Tech has totally transformed my life. So over the past few weeks, I’ve been having a really good play with Gemini and Gemini Live specifically, to see what it can do for me in my everyday life.

Now, about two days after I started using it, my other half said to me, “We need to get a card for my mom. It’s her birthday and we haven’t been on Moonpig.” Now, my other half and I are both blind, so going into a shop and selecting a card is a little bit tricky, although she has a bit more vision than me. But she had to go into London and I was at home and I was left with the task.

So I thought I’d take full advantage of that, and I decided to use Gemini Live to go and select a card within the shop. Now, as I was left to it, I decided to go down the route of finding a card, which was a little bit rude, I have to say. So I used Gemini Live in the store and it told me that the card section, I think which was called Humor, was over to the left. So I went over to there and Gemini then told me, or gave me a bit of a flavor as to what cards were around. And I selected a couple of cards, Gemini read them to me. It did make me laugh out loud actually in the shop a couple of times, so everybody must have thought I was a bit weird. But then I did select a card which I will include at the end of this video. But I have to say, my other half’s mom was laughing, but my other half definitely wasn’t impressed with me. But I took full advantage.

Now, it sounds like such a simple task, but I’ve never been able to do that. I felt like a superhero in many ways, because going into a shop like that and being presented with so many visuals, I’ve just not done that. I’ve never been able to see that and I’ve never had that experience certainly laughing at cards in a shop. But Gemini gave me that ability and I felt incredibly empowered and I absolutely loved it. And that’s just one way that it’s been integrated into my life.

[End of video]

CHRISTOPHER PATNOE: So this is Gemini Live. And this is, oh, by the way, I’m not gonna show you the second half of this because it is kind of rude and I don’t want to offend anyone. Um, but this is a mainstream application. This is the same one everybody uses. This isn’t dedicated specifically for people with disabilities, for people with visual impairments. Another session you’re gonna have is about Project Astra which is a dedicated visual interpreter co-designed with people who are blind, and that’s really exciting. So you’ll learn more about that in a different session. So I won’t go into there, but again, this app exists today on Android and iOS and it’s really cool.

So with that, thank you very much. Um, thank you to Sight Tech for giving me the opportunity to join you. There is a QR code here where you could subscribe to our newsletter and you can get updates on a quarterly basis. Or you can reach out to us at g.co/a11y/news. Again, g.co/a11y/news. Again, thanks to Sight Tech. Have a great day.

[MUSIC PLAYING]

AI That Sees, Hears, and Understands: Google's Accessible Technology

Speakers