If the Jetsons had screen readers, would they be using keyboard commands?

DESCRIPTION

The screen reader is arguably the most consequential digital technology ever for people who are blind or visually impaired. At the same time, screen readers depend on a dizzying array of keyboard commands, and — when it comes to reading websites in a browser — they struggle with the ugly reality of poor website accessibility. New technologies may lead the way to better outcomes.

Speakers
- Glen Gordon, Software Fellow, Vispero
- James Teh, Technical Lead for Firefox Accessibility, Mozilla
- Léonie Watson, Director, TetraLogical
- Moderator: Matt King, Accessibility Technical Program Manager, Meta
SESSION TRANSCRIPT

Download transcript as .txt file

[MUSIC PLAYING]

MATT KING: All right. Thank you, Will, very much, and thank you everybody for attending this conversation about screen readers, where they are now and where they might go in the future with the help of advanced technologies and with the help of more people like you, all of you who are attending Sight Tech Global, and you’re striving to learn how to change technology and society so that we can fully include people who are blind, all people who are blind.

As we’ll mentioned the title of our session promises that we’re going to answer the question, if the Jetsons had screen readers, would they be using keyboard commands? But before we dive in to this discussion of the future and what advanced tech can do, let’s just spend a couple of minutes setting the scene and talk about where we are now, a little bit about where we’ve come from.

We’ve had screen reading technology for about 40 years, give or take, depending on what you consider to be the very first screen reader. And I don’t think very many of us would argue with saying that that is probably one of the most, if not the most important developments in the field of technology for people who are blind. It has certainly opened countless doors of opportunity.

But back in the beginning, there wasn’t much more than just words on the screen. I first learned a talking word processor called Word Talk back in 1984 when I was a freshman in college, and it really didn’t have to do very much more than just read the words on the screen. And I was productive. I would even argue that I was competitively productive with my sighted peers with just that level of functionality.

Leonie, I would like to start with you. I can imagine that if the only thing that your screen reader did was just read the words on the screen, that you wouldn’t be very happy with your ability to get work done. So can you explain to people, what do you say is the purpose of a modern screen reader?

LEONIE WATSON: Answering that from the position of the consumer, I think the short answer is that it’s a software package that enables me to be able to use an interface that has a lot of visual characteristics about it. The slightly longer answer is that an interface that has different visual characteristics can mean lots of different things depending on what you’re doing, what device you’re doing it with.

So just in my immediate day to day, I’ve used a laptop with a Windows operating system on. I’ve used a touch screen phone with a very different mode of interaction and a very different operating system. I’ve also used the command line to get things done. So right back to that very text-oriented mode of interface. And in every single one of those environments for every single one of those tasks, of course, it was a screen reader that was there enabling me to interact and get whatever the task or the activity was done.

So Glen, Leonie is explaining that from her perspective as a consumer, that part of your job as a screen reader developer is to help her understand and be able to use these graphical user interfaces drawn on the screen. That sounds pretty daunting because we’re talking about, essentially, visual imagery that represents interactions.

And what’s going on under the hoods? How can a screen reader interpret graphics so that people who cannot see them can actually use them?

GLEN GORDON: Well, it’s really funny that we have the term screen reader because in this day and age, that’s pretty much of a misnomer. But back in the early days, there was actually text and a video buffer. And so screen readers would go in and read the actual text. The reason that people got so concerned when graphical interfaces became available was the techniques for getting the text at that last minute were no longer available because you mentioned that things are graphical, but yet fortunately, in most apps, if they’re not intended to draw things, have a lot of text.

And so the role of a screen reader is to take that text and use semantic information that originally was sort of gleaned from the operating system despite the fact that we were a screen reader rather than because we were one, and over time has evolved to a situation where apps and browsers make information that’s semantically useful when representing this information to screen reader users. They actually provide it, and then it’s the role of the screen reader to aggregate the information in useful ways.

So take a spreadsheet, for example. On the surface, you care about whatever the current cell is, and you can arrow left and right and up and down and read the cells. But you probably really want more than that. You may want to know if there’s a formula there. You may want to know the totals at the bottom or the right. And so it’s the goal, and I think the role of a screen reader, to aggregate information that doesn’t necessarily appear right next to each other and present it in ways that a user would find palatable and more efficient than having to go find each and every thing each time they wanted to hear it.

MATT KING: So when you say semantic information, you’re talking about the information that has to be provided by the person who developed that spreadsheet, and it’s information like this is a cell or this is a button, correct?

GLEN GORDON: Correct. Now in some cases, that information is sort of germane to the operating system, and so we can piggyback on what the operating system provides. And in other cases, if an app is doing something special, we need to ask that app. Similarly, it’s either provided by the browser because of standard HTML controls, or by the person who designed something special.

MATT KING: So Jamie, let’s just assume that authors or developers of an app did a great job, and they put all the accessibility semantics that you need in order to do a great job developing a great screen reader. Let’s say all of that is there in a particular product or web page. What do you see as the greatest challenges? I mean, it sounds like– so Leonie’s saying, OK, you’re making this graphical information available to the user, or you’ve got to make her productive.

Glen’s saying, well, all the information that you need is available in this– we’ll call it engineered accessibility interface. Let’s say it’s all there. Is making a screen reader simple? Just regurgitate that information? Or what would you say are the greatest challenges there?

JAMES TEH: I suppose the analogy that I use to describe a screen reader is it’s like looking through a tiny straw. And so as Glen started to explain, basic screen reader usage is like moving that straw tiny little bit by tiny little bit trying to find what you want. It’s painstaking.

And even for really accessible content, the user interface tends to be designed visual first so that even if a sighted user can easily find what they want, it’s not so easy for a blind user. Glen described a spreadsheet as a great example where you’re moving around. And a sighed user can easily glance at the total, but a screen reader user needs something else.

And so a major function of a screen reader, then, isn’t just to take that information and regurgitate it. It’s actually help the user move the straw to places of interest as quickly as possible. And that makes the user more efficient and faster.

And so giving the user a lot of these tools, unfortunately, means that there’s more commands, which means that screen readers get this steeper learning curve that they’re famous for. So ease of use and efficiency are a really, really tricky balance. And that’s sometimes one of the biggest challenges.

s we talked about how screen readers rely on information provided by authors and apps. And that’s how we get the greatest accuracy. But there are cases where, if an author doesn’t provide that info, a screen reader might try to fill in some of those gaps where it can. There’s always a risk that it will be inaccurate or unreliable in some way. And the problem with inaccuracy is that not only does it decrease the user’s trust, it can also be really dangerous. Imagine you’re doing your banking, and you accidentally end up transferring money to the wrong person, or goodness knows what you end up doing.

So there’s a really fine balance, then, between providing the best possible access, so filling gaps where you need to, but also ensuring the greatest accuracy so that the user can trust it.

MATT KING: So you have challenges on two major fronts. One is even in the ideal scenario, where you have all the information you need to build a great accessible interface, you still have to kind of sort things out for the user, in a sense. You have to make it easy for them to find what’s important and be able to access the information they need when they need it. But part of that is sort of understanding where that information is. So that’s a job of the screen reader has to do.

But then there’s this entire separate front, which is the whole world of missing or incorrect accessibility information in the products that your customers are trying to access.

Leonie, I can imagine as an accessibility consultant, this is your bread and butter. You see this every day, missing or incorrect information in the real world that people are trying to access?

LEONIE WATSON: Very much so. Much as it keeps me in work, I really wish it were otherwise. But it’s true. We see a lot of apps, websites, other interfaces that are developed with little or no thought towards accessibility. And I think there are three main reasons why that’s most often the case– education, complication, and abstraction.

The accessibility profession, I think, is fighting a rear guard action, and it’s going to keep doing that until we fix the education problem, which is to say until we reach the point where young designers, developers are going through school, going through college reading online courses, tutorials, blog posts, and they are all just teaching them how to do things accessibly by default. Not this bolt-on, not this afterthought, just the code examples that you see as you’re learning a new language, new markup language, are just accessible and that’s how it is.

We’re always just going to be chasing after that. It’s hard to re-educate. It’s hard to learn once you’re out in the workplace. I think complication is a big factor in the sense that I’ve been using the web since it was pretty much new. And websites, particularly, have got a lot more complicated in the intervening 20, 25 years. There are a lot more moving parts, a lot more things to think about, a lot more complexity.

And coupled with the last thing, which is abstraction. Rapid iteration is really useful. Being able to put an interface together very, very quickly has all sorts of benefits. But one of the drawbacks it has is that people know less about what it is they’re actually doing. They understand the tools of their trade much less, so it’s very quick and easy to use a JavaScript framework to pull components together into an interface. But the developers that I meet less and less seem to understand the importance of things like HTML. And as Glen mentioned just a little bit earlier, HTML is one of the main sources of that semantic information that screen readers find so important.

And those three things, when you put them together, sadly, more often than not, cause more trouble than success.

GLEN GORDON: I want to jump in here just for a second because to me, things are so much better than they were 20 years ago that this is not a world where blind people, for the most part, can’t get stuff done. The major players clearly have gotten this and understand it. And in many, many cases, people are able to do what they need to do and couldn’t do without the combination of somewhat thoughtfully designed apps and screen readers.

But the other part really has to do with the level of user sophistication. The more sophisticated a user, the more likely a user is going to be able to actually make do even if the site is not 100% correct. And the less experience they have, the more and more they may struggle.

MATT KING: Yeah. So that’s a good point. Where the level of sophistication varies broadly– this is something I see in my everyday work– we have to address a large audience where the level of sophistication is extremely varied, from very beginner or even complete newcomer to very advanced.

And one of the challenges that we face when developing products is exactly how to deal with that. And one of the ways that a lot of people, I think, have this notion of where advanced technologies could help– Leonie, you were talking about the problem of education. And some people are wondering, well, what if we had a super smart screen reader so we didn’t need all this fancy accessibility information built into the product?

Jamie, I’m kind of wondering, imagine really smart AI. I don’t know if you’ve had the experience of actually trying to use a human as a screen reader. Personally, that’s the very last thing that I will ever try. It’s my last resort. But imagine us thinking about the smartest screen reader possible. What do you think of this idea that you could have one so smart that could we obviate the need for a lot of this engineered accessibility that’s really complicated for app developers to develop?

JAMES TEH: Well, I think we’re already seeing the beginnings of this happening. We’re already seeing AI being used to fill gaps in really creative ways. So as one example, voiceover on iOS has a feature called screen recognition, which uses AI to recognize and describe elements on the screen for apps that aren’t inaccessible. So if there’s an app that just draws a button and doesn’t provide information, or even if it just draws text graphically, you can actually get some access to that and interact with it.

It’s far from perfect right now, but it can be really useful in some cases, and I can see other really amazing use cases for that as well. So the one example that’s popped into my mind many times is what could we do with an AI that could help us fill in inaccessible PDF forms, make the form fields available to us?

Relying on it entirely, though, is a completely different matter, I think. So as most people would know, AI requires a lot of training. And it can be downright useless for cases where it hasn’t been trained. It can even be just completely accurate. And so what about those cases? What about those cases it hasn’t been trained for? Even if it’s been trained for the majority, it’s all too common for minority use cases to just be ignored. I think for AI, that’s even more dangerous than what we have now.

You also have to consider the social implications here. So not just the technical. But if something like this is created, it’s really easy, then, for product teams who are designing products to just say, oh, well, blind users have the AI. We don’t have to worry about them at all. And where do we get to when that happens? Those users, they’re just not considered at all in the design process. I think that that could be really dangerous. And so I think that there’s still a world where there needs to be some deliberate intention when it comes to–

MATT KING: Yeah. Yeah. So Leonie, Jamie mentioned the voiceover is already doing this on the iPhone with some user interface elements. And it may occasionally close some gaps in the accessibility of a particular app. So as an accessibility consultant, would you actually ask people, your clients, when they’re testing, do you want them to turn that off or leave it on? What is your approach to this as a consultant?

LEONIE WATSON: We don’t tend to go as far as advising them to turn it off. But we do caution against depending on it as information because there’s a huge difference between voiceover telling the user, telling me that maybe it’s a back button, and it actually being a back button.

And Jamie’s absolutely right. The data that any of these systems get fed is absolutely critical. We see it all the time. Map technology. People who use GPS systems in their car haven’t updated it, and suddenly discover they’re looking for road that doesn’t exist or another feature that surprises them because it isn’t on the map, and their GPS isn’t aware of it.

If you extend that out to the screen reading technology, the repercussions are even more entertaining.

MATT KING: Or it may be critical, right, in the case of somebody losing their job for doing the wrong thing.

LEONIE WATSON: Absolutely, yes. So having dependable data is always going to be a better user experience than potential data. If you haven’t got any data, then sure, AI coming in and saying maybe this is the Submit button is better than nothing. But I do agree with Jamie.

MATT KING: Glen, are you on the same page here looking at it as a gap filler? I know you’ve put Picture Smart into jobs, for example, so people can get descriptions of pictures. But what are the guardrails that you are thinking about when it comes to this kind of technology?

GLEN GORDON: I find myself very often thinking and getting very excited about a particular technology, mostly because it’s sexy at the moment, and it seems manageable. I can get my head around it. And then after a while, I realize I’m missing some of the key components. So yes, AI would be great to fill in some of the gaps.

But that’s not one of the largest problems that people have with complicated apps. The problems they have, typically, with screen readers that are not designed keyboard first, or even voice input first, is how do you get from A to B to C quickly? And how do you accomplish tasks quickly?

It’s not entirely clear to me that AI, at least the way we’re thinking about it in terms of recognizing and describing things, really helps with that interactivity model. And you’re getting input to the app, if it’s not thought out to be efficient, if you’re not clicking with a mouse or tapping, that’s the difference between something being quote, barely accessible and something being really efficient to use.

MATT KING: OK. So it seems like we’re all on the same page here that people who are blind deserve to have user interfaces that are designed to meet their needs just as much as anybody else. And we see significant danger in saying, well, let’s let AI fill this gap in screen reading. It might be fine to fill some gaps where people are not doing their proper due diligence to provide accessible apps because they haven’t really designed intentionally. But if you’re being a responsible app developer and designing inclusively, don’t rely on the AI is what I’m hearing here.

So let’s tackle for just a brief moment one of these other problems that you were talking about, Leonie. We could take another approach to this education problem. I’ve long thought that the industry as a whole, and I’m not talking about the screen reader industry but the entire IT industry, tech industry, hasn’t done enough to help people who can see design intentionally for people who can’t see. Like we don’t have very good tools in this space. What is your outlook on that, and what we can do with advanced tech?

LEONIE WATSON: I agree. Considering that we’ve been building websites as an industry now for a quarter of a century, we should have got better at thinking about designing for different modes of interaction. Thankfully, we’ve moved past the time when people thought that a text only website was the best idea of accessible interface design. But having said that, we think about visual design.

Increasingly, we are starting to think about voice interface design. But we haven’t really given screen reader output interface design much thought. I’m encouraged to discover that the BBC is actually starting to think along these lines. They are looking at the audio experience of the page. And by that, they mean the screen reader experience as one of the aspects of good design as part of their new global experience language.

And I think it’s really important. If we’re going to think about color schemes, we should also think about what something sounds like when you’re navigating and consuming the content, listening to it through a screen reader. And I hope we’ll see more of that across the industry in years to come.

MATT KING: Yeah. I’ve personally kind of envied architects. So when you think about what a CAD program does, it takes a two-dimensional design and makes a visualization so that people can see what that design reveals, what it will look like in the real world. And it’s very helpful to people who can’t make that leap from 2D to 3D in their head.

I kind of wonder if we had a way of helping people who are developing apps and are sighted, if they had the ability to have a visual experience, but is built only from the information available to screen readers through the engineered accessibility interfaces. And so they could think and act visually but have all the constraints of a screen reader somehow. I don’t know how this is totally possible. But does anybody have any thoughts on whether or not that would help move us forward?

GLEN GORDON: This reminds me a little bit of those sensitivity exercises that people have. Put on a blindfold and act as if you’re blind for a day. And I think what that generally does is cause people to say, oh my god, where’s the razor blade? I need to end it all because they can’t possibly imagine what it would be like to be blind.

And I think there is the danger of saying to someone, this is all a blind person needs, or all a blind person gets. I think the tendency– and we’ve seen this over and over again– is people over describe and over speak where it may be useful, the very first time you interact with a control and a web page, but after that you want a terse output, and that’s not what’s being provided.

So yes, I absolutely think it would help. But it needs to be in collaboration in combination with some kind of training and real understanding.

MATT KING: Yes. Properly impose the constraints, but still have them work in the visual world in a visual way. I sometimes cringe at what you were talking about, these empathy experiences, getting somebody to work without the screen on for a day is not always the most helpful thing to do.

So if we think more about what we were just talking about a minute ago, Glen, getting the job done, getting tasks done. You’ve done some other things in Jaws recently, and so voiceover has done some similar things in terms of adding voice control. And we use the keyboard in the title of this section rather provocatively. It’s a symbol of the complexity of the hundreds of keyboard commands that Jamie referenced earlier.

So with voice control in Jaws, do you see this as a sea change? What kind of change is this in terms of employing natural language commands in combination with screen reading?

GLEN GORDON: We think, and they’re betting on the fact that mainstream is going to continue to move in ways that people can do more and more complex things with voice. And screen reader users are less keyboard savvy as a broad group than they were 20 years ago because a lot of people are coming from a touch device, where you essentially swipe and use pretty simple gestures to get things done, but not necessarily the most efficiently.

So we’re trying to come up with ways that people can minimize the number of keyboard commands that they use for really motor fine tasks and do some of these uber tasks that you don’t do as often, but would be simpler to do by voice, continuing to base things on speech commands.

MATT KING: So hopefully a way of reducing some complexity here, possibly a way of speeding up some tasks. Jamie, how do you look at this from the perspective of a screen reader developer? Where do you see the greatest opportunity here?

JAMES TEH: I think there’s some really, really compelling use cases for voice. And even aside from tasks, I’m thinking about situations like you’re cooking. You don’t to stick your grubby fat hands all over your computer or take your hands off something that’s cooking. Or if you’re doing something and you know a site well, you might just want to say, transfer between my accounts, or find the current balance or something like that.

And I still think, though, there are cases where, if you don’t know something well, as Glen mentioned, you’re going to be wanting to move really fast. And there the keyboard is still really useful for that hyper efficiency stuff. And I guess one thing I would say is imagine you didn’t have a keyboard or a screen, and you had to use Siri or Alexa all day to get everything done, including your job, and I mean all day.

And these assistants can be really amazing for those one off queries, and I think that’s true for screen readers, too. But no one can work with just that interaction all day long. I just don’t think we’ve come up with the right interfaces yet to make that happen. And so I think that there’s still always going to be a place for multiple ways of doing things for different situations. I don’t think we should shy away from that.

MATT KING: Yeah, that’s really good insight there, thinking about trying to use something like that all day long, just conversing and getting the things done that you need to get done. I think I’d pull my hair out.

What about you, Leonie? How do you think about how we could move product design forward in collaboration with screen reader developers, and maybe find more efficient ways of tackling some of the larger scope problems the screenwriter developers face?

LEONIE WATSON: I do think voice is an avenue worth exploring. Jamie is absolutely right. If I had to do my whole every day just using voice, my productivity would disappear through the floor. I dictate a lot of text messages. And you can always tell when I do it because they’re full of spelling mistakes and malapropisms because I’ve got the wrong word. You know the drill. Yeah, absolutely, because it’s a hell of a lot easier than touch typing on a touch device, which drives me bonkers. I can’t do it. It’s too slow. Nothing will ever replace a keyboard for me for typing text entry. Nothing will ever come close. I can’t really ever see that happening.

But one of the things I do like doing is wandering around the house when I’m on a conference call. If I could extend that so in the morning, instead of getting myself a cup of tea and then sitting down and reading my email, I could double up.

So I could walk around the house with my headset on, tell my screen reader, OK, open my email, read the next email while I’m making my cup of tea, and maybe sitting on the couch and drinking my tea, and still reading my email instead of being stuck at my desk all the time, my productivity is quite likely to increase. So I think being able to talk for certain things in certain contexts is definitely going to change things.

And as Jamie mentioned, conversational commands, so that they’re much simpler. And Glen pointed out that screen reader users, by and large, are less familiar with keyboards. They’ve got less experience of memorizing lots of keyboard shortcuts. So if it makes it easier for non-professional IT technology users to use screen readers and get what they need to do done, I think there’s a huge win to be had there, too.

MATT KING: So that sounds like just more reason for people who are developing products to be collaborating more directly with people who are developing assistive technologies.

LEONIE WATSON: And perhaps even this is another way AI could be introduced to it. I have this wish in my head somewhere that I’ll end up with Jarvis from “Iron Man”. But if AI could say, actually, you’ve got some new emails. Would you like to read them? Yes, I would, thank you. That’d be great. Are there any marked important? Yes, there are. Would you like to read those? Sure. You know, that kind of interaction with a bit of intelligent prompting for things might be worth exploring as well. It would need some usability testing. But I’d be curious to find out.

MATT KING: Yeah. That kind of vision sounds like a good place for us to wrap up, where we see lots of opportunity for collaboration, more collaboration across the industry as we’re all thinking more about how to design our products to be fully inclusive for everybody.

I have one last question. We promised that we would answer the question if the Jetsons had screen readers, would they be using keyboard commands? And I think we’ve heard from all of you we don’t see imminent demise of the keyboard in the near future. But imagine something tragic happens in some episode of The Jetsons and George loses his vision. Is he going to be using keyboard commands? We know his computer Rudy had some keys on it. I don’t know how many. But Jamie, what would you expect here? What do you speculate?

JAMES TEH: Well, I think George would still need his screen reader. I think it would still need to have keyboard commands as well as pure voice. But for cases where speed and efficiency and silence are important, I think that’s necessary, at least until they developed some kind of direct neural interface, which would obviously be faster and more intuitive and quieter, but also opens up a whole other can of ethically complicated worms.

MATT KING: Yeah. Leonie, what do you think George would be doing? Would he be using keyboard commands?

LEONIE WATSON: I’ve got a sneaking suspicion he won’t. If I talk to my friends’ children, who are variously in their tens and teens, they tend not to use their laptops unless they’re doing their schoolwork. They’re very much attached to their touch screen interfaces, and increasingly to home assistants.

So for me, if I were around in the Jetsons’ age, yes, I’d still be using a keyboard. But I suspect people who are younger than me, maybe not so much.

MATT KING: Glen, I’ll give you the final word here. What’s your thought about how you would write that episode of the Jetsons?

GLEN GORDON: I think if George was interested in just surviving, he would not use a keyboard. But if he was interested in continued self-actualization after going blind, a keyboard would come into his life to help augment what he could do by voice.

MATT KING: Let’s let you write that script. That is awesome.

[LAUGHTER]

I thank each one of you for joining in this conversation. It was great. Thank you, everybody, for watching, and I hope you enjoy the rest of Sight Tech Global 2020. Back to you, Will.

[MUSIC PLAYING]

If the Jetsons had screen readers, would they be using keyboard commands?

Speakers