Alexa, what is your future?
DESCRIPTIONWhen Alexa launched six years ago, no one imagined that the voice assistant would reach into millions of daily lives and become a huge convenience for people who are blind or visually impaired. This fall, Alexa introduced personalization and conversational capabilities that are a step-change toward more human-like home companionship. Amazon’s Josh Miele and Anne Toth will discuss the impact on accessibility as Alexa becomes more capable.
DEVIN COLDEWEY: Well, thank you, Josh and Anne, for joining us here today. I think to start out with I’d like to hear a little bit about how– because Alexa is such a big part of Amazon and Amazon is such a big company that does so many things, I’d like to hear a little bit about how you consider your roles at Alexa before we start talking about them. Maybe we can start with you, Josh?
JOSH MIELE: Well, that’s a funny place to start. Actually, I’m technically not in the Alexa org. Amazon is a complicated place. I am part of the devices organization. And devices, of course, includes lots of things that Alexa runs on.
We’re responsible for things like the accessibility of Fire tablets, and Fire TVs, and the multimodal devices, which are the Echo Shows and the things that Alexa uses that do have screens. And then, of course, there are lots of other devices. But Alexa is a whole area of its own and because Alexa is not tied to a particular device.
So I’m doing a lot of work with screen readers and device accessibility in general because I’m blind, because I’m experienced in designing interfaces and tools for people with disabilities, especially visual disabilities. I’ve become a great resource for folks that are designing voice forward experiences for Alexa. But I’m not actually technically in the Alexa org, but Anne is.
ANNE TOTH: Thank you for that segue.
Yeah. So I’m Anne. Thank you for having me. I am relatively new to the Alexa organization. I’m at a three-month mark.
And I work on the Alexa Trust org, which is the part of the team that works on issues related to primarily around privacy, and accessibility, and designing Alexa for all, for everyone, which includes aging populations. I think the unifying theme on the team is really thinking about Alexa in the most inclusive way and thinking about how we can strengthen our relationship of trust that we have with users, with our customers. Because when you’re bringing a device into your home, into your family life, that is really an incredible privilege to be granted that level of trust at the outset.
And we have to re-earn that every single day. So I’m on the team that is really thinking about how we can be continually trustworthy with our customers.
DEVIN COLDEWEY: Right. Go ahead
JOSH MIELE: Just to jump back in, I was being a little pithy I think. I mean, I, of course, do a lot– I mean, Alexa is everywhere. And I do a lot of stuff with Alexa, so, for example, things like we want on a multimodal– sorry. On an Echo Show, you can do things like tell her to turn on or off the screen reader because the thing has a screen. So you want to be able to interact with it sometimes, although it’s always, of course, better to be able to just talk to Alexa.
And so I often am working with the Alexa org very, very deeply to figure out what are the things that Alexa can do that people and customers with disabilities are really going to value, and use, and benefit from. So that’s sort of my involvement with Alexa.
DEVIN COLDEWEY: Right. And you and Anne– also, you just mentioned this I think is important. It’s this idea of the device being in the home and meeting that trust. a Voice controls and voice interfaces have been kind of proliferating all over the place for the last few years. But I think they sort of most famously and most effectively have been found in devices like the Echo and Echo Show.
They’re homebound devices. I’m trying to figure out– and maybe you can help me understand. Why do you think that a stationary stuck at home device has become the center of this stuff? What is the value that comes specific to a homebound device? Like why don’t we all just use our phones?
ANNE TOTH: Well, I’d love to take a shot at that question. But I’d love to sort of take my Amazon employee hat off and speak as an Alexa and Echo customer. So I’ve had one of these devices in my home for the past six years, which is as old as Alexa is.
It’s a household device. It started that way. And I don’t know if it was some stroke of product genius or if it was just lucky happenstance, but I think in terms of building a relationship with a service like Alexa, starting from the home is a natural place to start when you’re talking about a voice-enabled service.
We’re now the week of Thanksgiving. I guess I can say that. We’re filming early. And so I have a house full of teenage boys who come home from school. And so it’s all about yelling upstairs and yelling across the house. And Alexa is effectively another member of the household.
So starting that relationship that way made perfect sense I think in terms of establishing that relationship so that when we take Alexa out of the home, which is really where we’re going with this in all places, you have that foundation that you built from all of those daily interactions. And I also just want to mention that I have 150-year-old house that is probably the smartest 150-year-old house in all of the state of Virginia, right? I have smart thermostats. And I have Ring cameras. And I have an Alexa-enabled washer and dryer.
And all of these things are super convenient. They’re affordable, and they’re super convenient for me. But for many, many, many people, this is not just about a little bit of extra convenience.
It’s about actually making the place where you live work for you and be accessible for you without someone else’s assistance. About it being voice-enabled in the home, I think makes a lot of sense. But it also actually is probably the best testament to the value of the services that Alexa provides.
DEVIN COLDEWEY: Josh, do you think that there’s something sort of special about a device being a housebound device versus being the phone that we carry with us anywhere?
JOSH MIELE: Well, I mean, I think that it’s not the fact that it is stationary that makes it attractive. It started out that way. I think the thing that’s really amazing about Alexa is just the true hands-free nature of it, the fact that you can be wherever you want to be and talk and ask her to do the thing that she’s supposed to do. And you don’t need to take your phone out of your pocket.
You can hear her well because the speakers are well-designed and intended to be heard through your house. They integrate well. So it’s really a very freeing experience to be able to just be wherever you need to be and say what you need. And you’ll get the response.
I do a lot of reading. I get a lot of my material from Audible and from Kindle. It’s nice to be able to just tell her to read those. And I don’t need to– if I get an alert on my phone, I can do that sort of simultaneously with it. But I don’t have to have my hands busy in order to ask for information, or place in order, or read my book.
I think that that’s one of the most valuable things about it. And as we move out of the home, I hope that we’re able to maintain that kind of really hands-free experience where we can say what we need and get the information or the experience that we’re after.
ANNE TOTH: And I think it’s worth noting– I mean, I think for many, many people Alexa and the Echo devices are one and the same. But they’re not, right? Alexa comes in a lot of different form factors and with a lot of other hardware that’s not made by Amazon. I think increasingly you’re going to see more and more the case that Alexa is present well outside of the specific pieces of hardware that you’re used to thinking about.
DEVIN COLDEWEY: Yeah, certainly, Alexa, and Alexa-enabled devices, and so on. You find them all over the world, many different languages, many different situations. And partly that brings up my next question, which is that Amazon and Alexa are at the frontiers of natural language processing for sure.
Whenever I’ve read about the latest advances, it feels like all the landmarks are being arrived at mainly in normative American English. And I know that there’s research going on in other areas, and it’s being translated to other languages as well. But we’re mainly staying within this sort of limited linguistic space.
There are so many dialects out there, so many accents, so many speech impediments, and things like that. How do we make sure that a device like an Echo is accessible to all people, regardless of how they speak, what language, or what region they’re from?
ANNE TOTH: Yeah, I think that’s a really good question. And looking at the lineup for the events, for the panels at this particular event, there are a lot of people here who are going to be talking about AI, and machine learning, and how data plays into this. So I’ll start first by saying that, historically, we’ve tied the notion of big data to machine learning and artificial intelligence in part because it’s not necessarily that we need the biggest data. But you need the right data.
And if you don’t exactly know where the right data is– if you have big data, it’s probably going to be in there somewhere. And I think that’s the challenge with trying to train systems where the populations that you’re trying to train for are maybe not that large. And so you can come at it from a couple of different angles.
Certainly, there’s a lot of usability research and testing that goes on with different populations. But being able to identify this is the data that’s coming from these groups that we need to focus on, that’s not a trivial problem to solve. We’re actively working on it. I can tell you there’s a lot of work that’s happening behind the scenes to be able to get better, to be able to introduce Alexa new languages and learn faster, the ability to be able to measure performance against different groups, and be able to say that we are, in fact, building Alexa to be inclusive and to be fair.
I think fairness in AI is an area that there’s a lot of research going on in the space. Amazon has partnered with the National Science Foundation, specifically to help fund greater research in the area of fairness in AI. And I think it’s an area that’s not a simple– it’s not a simple solution to it. I just think that it’s an area that we’re focused on getting better at all the time. Hopefully, we can develop this with the intention of getting it right at the beginning.
But in order to know that we’re doing it well, it’s an iterative process that involves a lot of testing and a lot of feedback before we’re going to get it right. And I will be honest. I think it’s true that it doesn’t work perfectly well for everyone right now. But our intention is that it should and it will.
JOSH MIELE: And to just kind of build on what Anne is talking about, I mean, we have customers all over the world. They speak all different languages, and they speak in all different ways. That’s what you’ve sort of laid out for us in your question. But
Amazon likes to believe that we are the most customer-centric company in the world. And believe me, we want to serve all of those customers. And we want them to be able to– we want everyone to be able to speak and be equally well understood by Alexa. And so it’s absolutely a goal. We want to be better.
For customers with disabilities, to be able to speak to and get responses from– I’m trying not to say her name because I realize that my son left his on in the other room. And I turned all mine [INAUDIBLE]
To be able to do that, it is an incredible thing for people with limited dexterity, for people who can’t see, like I mentioned earlier reading books and so on.
It’s so powerful and has so much potential with smart home and all of the possibilities. And yet if we can’t understand someone with a disability because they don’t speak normatively, we’re clearly not delighting them. I can promise you that we’re all thinking about that pretty hard.
DEVIN COLDEWEY: Gotcha.
ANNE TOTH: And I’ll give you one other example, which is kind of interesting is that even the AARP representing older Americans is seeking to make sure that artificial intelligence and machine learning actually takes into account the elderly population because they’re not usually considered early adopters of technology. So they might not be as well represented in the data. And they’re also eager to ensure that that populations needs are met. So it’s everyone really. We’re trying to be useful to as many people as possible.
DEVIN COLDEWEY: Absolutely. And I think sort of the flip side of this part of the technology– and, Josh, maybe you have because you’re both a user and a designer of this kind of experience– is the synthetic voice side of things. I know we’ve had really major advances in recent years that have improved the quality, and cadence, and adaptability of synthetic voices.
But it feels like we don’t– like what I see in papers and the samples that I hear when I look at conferences and stuff like that, it’s not trickling down as fast as I’d like. It’s not showing up in the audiobook reading, or article reading, and stuff. Like it’s good, but it’s not quite there. How do we get that tech like in our homes right now?
JOSH MIELE: Well, I mean that is the place where it’s so good now that if you just hear a few words of text to speech, very often, unless you know the voice, you can’t really tell. Like we were listening to something the other day. And my wife was like is this text to speech? I was like, no, that’s a person.
And so the gains are really, I think, mostly to be made now in sort of really smart stylistic approaches. So we, for example, have announced at the end of this year where we’ve got a thing called speaking style adaptation that’s going to really– there’s a lot of machine learning behind the scenes there. But, basically, it just means that the way speech is presented.
Like I just emphasize the word way. And that’s the kind of thing that we really have a tough time doing to make speech sound naturalistic. And so we’ve got a whole bunch of improvements lined up in that regard based on quite several different technologies.
Maybe, Anne, actually– I don’t know if you want to talk about the details there or if we just– I mean, maybe it’s not where we need to go. But, I mean, I think that for the common use cases, for people who want to understand and feel like they’re interacting with a natural speaker, it’s the subtleties that we have to work on. And that’s where machine learning is really actually driving most of the improvements. And I think that very soon those are the kinds of improvements you’re going to start seeing.
DEVIN COLDEWEY: Absolutely. And I’m glad you mentioned that there’s– we need more data. We need more feedback, more involvement from the people who are using this to give that kind of feedback. What is it that needs to happen in terms of partnerships and cooperating with advocacy organizations and things like that to get the kind of data and feedback that you need to make this product how you want it to be?
ANNE TOTH: Well, I think absolutely that there’s a role to be played. There are lots of groups that we partner with already and many others that we are talking with about how to make these experiences better. You can look at simply performance within the audiences that we have. So you can look at the data you have to say we perform well with these cohorts and not as well with these clusters over there. Or you can actually have the conversation around what are the anecdotal experiences that people are having.
And, also, if you’re looking only at the data you have, you’re not necessarily going to know much about the groups that are not represented in that data, right? So in order to build a product that is better for all of our customers, talking to the groups that actually represent those audiences and have better insight into that is pretty important. And so those are some of the relationships I’ve been building and working on in my time here.
And I think for sure– I’m sure Josh and others in the organization have examples they can call upon as well. But, yeah, it’s a group effort. And I think we’re all sort of faced with some of the same challenges. So it doesn’t make sense for us to sort of go down this avenue by ourselves when there are a lot of other people that we can learn from.
DEVIN COLDEWEY: Sorry, Josh. Were you going to say something?
JOSH MIELE: No, no, that’s OK.
DEVIN COLDEWEY: But along those lines though, I feel like a lot of the conversation, the sort of cultural conversation, that is had about devices like Echo, and HomePod, and voice assistance, and things like that– a lot of it focuses on sort of trivial conveniences for people that have no disabilities or no disadvantages even. Like do we need to change the conversation around this technology to make people more aware of the possibilities that it has and the major changes that it can produce in many people’s lives?
ANNE TOTH: Well, I think people are gravitating towards that. I mean, we’re certainly looking at different ways not just make it fun, and magical, and all of those things that people like, but I mean we recently announced the Alexa Care Hub, which is really, again, focusing on the aging community, which if we’re fortunate enough to live long enough, we will all at some point in our lives be faced with a disability simply as a result of hopefully being 150 years old. But with that comes a variety of challenges, whether they’re physical or cognitive.
And so we are looking at features that we can develop that help keep elderly people sort of safer in their homes, that give families a greater sense of security and assurance about loved ones wherever they may be. I mean, and that’s a use case that isn’t just sort of fun and convenient but one that could be lifesaving, especially when we’re living through an era like we are right now where people are unfortunately not able to be near one another. And so that’s one example of sort of expansion into an area and raising awareness of how the functionality that is just sort of native to these devices can be really helpful to people who might not otherwise be thinking about including them in their homes.
JOSH MIELE: I don’t really like to break things down into sort of trivial, or convenient, or lifesaving because it really depends on who you are and what your needs are. So I feel like I’ve built a career on adapting like cool off-the-shelf technology into really meaningful stuff for people with disabilities. And that continues– that’s what I do at Amazon. And that’s what Amazon is really– I think that’s one of the things that we’re really great at is being able to read a– you take something like e-books where e-books are sort of a convenience for a lot of people, right?
It’s nice to be able to download thousands of books into something that’s just a couple of ounces and be able to carry it around and read them wherever you want. But that’s a convenience for many people. But for me, it means that I can actually read those books because they’re electronic, right? And the same thing is true of many of the things that Alexa does.
So in terms of, oh, shopping, so if I want to shop online, I’m a sophisticated screen reader user. I can use my screen reader and go to the Amazon website. And I can type in my search string, and click on the Search button, and then jump around by headings. Not everybody has been blind as long as I have. Not everybody knows how to use a screen reader in that same way. Not everybody is comfortable.
Not everybody has a computer. But a lot of people can afford an Echo device. And it doesn’t take a lot of computer sophistication to be able to use it. And so voice control, voice command, and the ability to shop, or read, or watch a movie online just using your voice is extraordinarily powerful for a lot of people with disabilities of all kinds.
They weren’t designed for us. That’s not the only group that was intended to use those services. But for us, it’s transformative. For many people, it’s convenient.
And I think Amazon’s consciousness of the fact that it is so essential for many of us and convenient for the rest of us is part of why we’re making so much progress. We’re really leaning into those experiences that we realize are fun for many people and essential for some.
DEVIN COLDEWEY: Yes. You’re quite right to say that what’s trivial for some people like myself may be a life changing for others. I think I’ve fallen victim to my own narrative. So thank you for correcting me there. I’m glad you brought up a shopping though because we have something like the Echo Show, which can now recognize UPC codes and stuff that’s super useful, obviously practical.
But when there’s this overlap between that kind of technology, assistive technology, and Amazon’s retail side, I feel like you have to walk a narrow path so that you avoid the perception of Amazon using assistive technology as a sort of lead generation device. How do you maintain a relationship of trust and normalcy with people when there is the Amazon that is the multi-global retail giant and Amazon, the one that’s making this life-changing technology available to you?
ANNE TOTH: Yeah. I think that when you think about it like even in the context of knowledge, of a fact-based question, especially in the last year, different people seem to want to have different facts. But when we’re returning results based on a question that’s being asked, we’re really working hard to try to provide transparency as to where we’re sourcing that to give people a better sense because it is voice based. You can’t spend 50 minutes answering a simple question with all of the possible results that you could get.
You have to make a decision about what you’re returning and returning something that is relevant but also that is relatively transparent as to where it comes from I think is something that we’re working really hard to do not just in answering factual questions but giving people greater context as to how that response, how that came to be in the first place. And that’s going to matter– that’s going to depend a lot on the form factors that we’re looking at, whether there’s a screen, whether it’s simply a speaker-based device.
So I think that’s an area that I know there is a great deal of commitment to precisely because the question of trust and trusting why you’re getting, or what it is, or how that decisions being made. Greater visibility into that, I think, is going to help engender more trust from customers on the receiving end of that.
JOSH MIELE: It’s a great question. I mean, it really is. It’s an important balance to strike, right? You want to be able to ask questions and get information and without the idea that you’re going to be always upsold or whatever. And so I think that’s not something that I have a lot to say about.
But, obviously, we want to make our customers happy. And nobody’s going to be happy if they’re always sort of annoyed with extraneous or additional information. There’s this multi-skill experience that we’re working on now. I think it’s available in English in the US. It’s a way of inferring what kind of information and what kinds of things the customer wants.
So, for example, if I say how long should I boil an egg, and she says 3 and 1/2 minutes. The next thing she might say is do you want me to start a timer? But there are definitely places where we don’t want her to follow up with some offer. And that’s another area where big data and machine learning is really helping us. We’re learning from our customers every day what kinds of things they want follow up on and what kinds of things they don’t.
And we have a term internally, trust busters. Customers are very vocal, and they let us when we haven’t quite gotten it right. And you only get so many of those before somebody decides that they’re not going to be using your product anymore.
And when we get a report about that, it goes into sort of the trust buster file. And we take this very seriously. We follow up on them, and we try to make sure that we minimize those interactions or eliminate them entirely one would hope.
DEVIN COLDEWEY: Yeah, so a hard one because you rarely hear about it when things go right. But you always hear about it when they go wrong. So I guess my last sort of major question here is where is this all going? Where is this all tending towards?
You’ve both been part of the trends that we’ve seen towards voice communication, towards ubiquitous computing, towards all of these different things enabled by improved hardware, improved software. Where are we going with this? Is Alexa going to go straight into our brains?
Is it going to be in our ear pieces all the time? Is it less weird than that? And is it more normal? I’m just curious where you see in the long term what all this is for. And like why are you doing it?
ANNE TOTH: That’s not an easy question.
DEVIN COLDEWEY: Yeah, just wrap it up please. Just two sentences each will do.
ANNE TOTH: Well, I mean, in near term, I think we touched on some of the things about making Alexa more conversational, more natural, easier to use, smarter, and ultimately more tailored to individuals, which is I think the way we become the most inclusive is that we react individually to everybody differently and not the same way to everybody because not everybody is the same. That’s demographics, and geography, and culture, and all of those things. So, ideally, we’re coming to a world where your Alexa is not going to be my Alexa, is not going to be Josh’s Alexa. It’s going to be an entirely different experience.
I don’t think that’s very far off. In terms of where we’re going in the bigger picture, I think we’re talking about making people’s lives easier all the way around. And I think it’s easy for us to skip over the fact that the internet’s only really been around in our lifetimes for about 20 years and these things that we take for granted, being able to just simply say, hey, reorder more Cascade tabs because I’m out of them now.
I mean, that’s not trivial or that advancement, the amount of effort that it took us to get to the place where I could just sort of automagically reorder a dishwashing detergent when I run out. I mean, there’s a lot that went into it. So if you think about what we’re going to be doing 20 years from now, like I can’t even wrap my head around it because I’m not nearly that smart. So I’m going to let Josh take that one.
DEVIN COLDEWEY: [INAUDIBLE] please.
JOSH MIELE: Speaking of not being that smart, I don’t think I am either. I don’t think– I’m going to say she is either. We want her to be smarter. We want her to be better at helping us, making inferences about what we need. We want her to be available to us in more situations, not just when we’re in our homes but when we’re walking around.
We want to be able to deliver improvements to people’s lives no matter where they are. And we want to be able to do it well. And I think ultimately, tying it back to disability, we want to do it for everybody. And we want to make it meaningful and really make improvements to people’s lives.
We want it to be delightful for everyone. And so we’re looking at– we want to be able to expand the ways in which we’re able to help people. And we want to expand the situations where we’re able to help people. And I think making her smarter and more able to understand what we need and how to help us with those things is where I hope we’re going.
ANNE TOTH: And I think it’s true that you get the world that you make. And so to the extent that you shape the world by what you put out into the universe, I’m going to just say that I’m working really hard to go with gender neutral pronouns for Alexa. I’m sticking with they/them because my future Alexa is not gendered. So that’s my little plug for the world I’d like to see, which I mean no offense at all there. It’s my own personal plug for gender neutrality for [INAUDIBLE].
DEVIN COLDEWEY: That’s awesome. Like you said, a different Alexa for everybody.
ANNE TOTH: Everybody.
DEVIN COLDEWEY: I think that that’s exactly right. And thank you very much for both of your input on that last question and for illuminating the world of Alexa, and voice-operated interfaces, and all this kind of stuff. It’s been fascinating. Thank you for joining me.
JOSH MIELE: Thank you.
ANNE TOTH: Thanks very much.