Augmented Reality and Perception: What’s the best way to get the message across?
DESCRIPTIONIt's one thing for an AI-based system to "know" when it's time to turn left, who came through the door or how far away the couch is: it's quite another to convey that information in a timely fashion with minimal distraction. Researchers are making use of haptics, visual augmented reality (AR), sound and language to figure out the right solutions.
[MUSIC PLAYING] NICK GIUDICE: Thank you, Will. And hello, everybody. My name is Nicholas Giudice. I’m from the University of Maine, and I’ll be moderating this panel with Amos, Ashley, and Sile. We have really a truly excellent panel of innovators. I’m really excited.
So our topic deals with the relation of human perception and augmented reality, and really interactive technologies, more. Broadly and our goal is to kind of think about some of the big issues in these areas that developers need to think about, researchers should be thinking about. And ultimately, that will benefit users and what users may want to think about when using these technologies as they develop, and as we move forward into the future.
So our panel is our experts that have all been working on different solutions using these technologies. And those are some of the things that we’ll talk about. I’ll start by throwing out some questions to you folks. And I warn you, I may jump in and interrupt if I think we’re getting off track or to keep us going. But I think these will be good starting points.
So as a first question, kind of a general question– and I’m coming from a perspective as someone that’s congenitally blind and worked in this field for a long time– I argue that most challenges faced by people who are blind or visually impaired are really due to insufficient access to critical information and not necessarily due to vision loss. And that’s something we can debate. But I think what isn’t debatable is that there’s a clear link between having greater access to information and people’s increased sense of independence and sense of well-being.
And the reality is that augmented reality and interactive systems have huge potential for providing this information. So that’s certainly true. But then there’s also this situation where when we use this technology, it often feels that we’re kind of dumbing down the ways that we communicate. We’re limiting the skills and the way that we actually interact with these technologies. So my question is, how can AI and related tech best support humans in ways that actually help us learn and help us grow as this stuff develops into the future? And I’ll throw it to Amos to get us started.
AMOS MILLER: Oh, wow, that’s a big question, Nick. My name is Amos Miller. I’m from Microsoft Research, and I lead our work on Soundscape, and we play around with that question a lot.
Like you say, AI has the ability to really see for us, hear for us, detect things for us, inform us, or a whole range of abilities of what’s going on around us that would be very empowering. But it also has the power to infer from that what we should be doing, what steps we should take, how far to walk, when to turn, when to lift our arm, when to put it down.
And I think that there is a boundary there where you overstep and take away the agency from the individual. One way, for example, that’s in the world of Soundscape that we deal with that question is that we avoid the technology from giving the user instructions. Stay always on the right-hand side of that continuum and say, it is about providing information that empowers the user to then think about what options they have, what inputs they get from their senses, and take their best decision, whether they’re navigating, whether they’re looking for something, whether they’re trying to make a decision– do I turn left, or do I go straight at this next intersection?
NICK GIUDICE: That makes sense– and anyone can jump in here– what happens, Amos if you– some people, I feel, want to be given that information. They want the system to be kind of guiding them. And so how do we deal with that dichotomy of your guiding logic of letting the human be human and not be driven by the AI and the technology versus people that really want that?
AMOS MILLER: Well, I don’t think we should– I don’t think it’s about hiding information away. I don’t think that technology should be clever and say, well, I’m not going to tell you the answer.
NICK GIUDICE: [LAUGHS]
AMOS MILLER: But if we think about how could it help us have that information, it’s about– a great example I heard it is a heart rate pacer, or a heart rate monitor. For runners, it’s very important for them to know and to be able to monitor their own heart rate. If you have a heart rate monitor on your wrist, then you just look at it and you don’t develop the skill to listen to your own body.
But you could also use that by having the heart rate monitors ask you, what do you think your heart rate is at? And you give it the answer and you say, well, it’s just a bit above that. So with the same piece of information, you help the person and quickly build their own skill in listening to their body.
NICK GIUDICE: Let me jump to another question. Most of us on this panel, including myself, have some interest in navigation of blind people in different ways, but how people move and interact with their environment. However, I would argue that most of the research that’s out there and most of the systems that are out there deal with– focus on totally blind folks and focus on conveying nonvisual information. And the reality is that the majority of legally blind people have some usable residual vision, like 90% or more.
And so my question is, moving forward, what are the unmet navigation needs for people with usable vision that should be addressed in this type of technology? And what technology is best used to provide this information, and to assist these users? Ashley, why don’t I throw that to you to start.
ASHLEY TUAN: Sure. For the people that still have some remaining eyesight, what’s the current trend, at least the area they’re working on, is to use augmented reality to help them navigate their surrounding.
The way we’re doing it is we enhance the visual information for them. You’re usually involved with– you need to have an outward-facing imager. You’re taking information, and then you enhance the information by enhancing the contrast or the edge detection so it can help the person that use their remaining vision to see those extra enhanced information by actually patterns showing you where the curb is, the contrast showing you where’s the [INAUDIBLE]. That can help them navigate through the world. And AI definitely has a place for that.
What we find out is the images, the enhancing algorithm, will need some kind of auto contrast function. And the auto contrast function is highly related to what are you trying to achieve. So let’s for example, if you’re trying to see some gross information on the road, the scenery, you wanted to see the street. You want to see the cars coming, where’s the sidewalk. The threshold that you need to set for the contrast and edge detection needs to be relatively high, or else you will pick up a lot of noise.
But if you’re looking at a face, you wanted to see that if you recognize this person, or even more, if you wanted to see what’s this person’s facial expression, we can do image enhancement too. But you have to have a different threshold then if you wanted to look at a face and understand a person’s facial expression.
So AI machine learning, it can discern where you’re looking at, what is your intent, then it can help the algorithm to figure out where’s the threshold to set, and help you carry out your work. And with that kind of ability, you will give a visually impaired person a lot more confidence, because then they can get around their surrounding, and keep a social life in a most natural way.
NICK GIUDICE: Making perceptually important things more salient, I mean, it seems like a very practical [CHUCKLES] solution that so few people are doing. And your work is excellent in that area. What about other folks? Anyone else have ideas on this?
SILE O’MODHRAIN: Yeah, I was– this is Sile– going to follow up on Ashley’s comment about perceptual salience because I think this is actually a really key route into providing access to virtual reality for visually impaired people. If we think about the types of cues that we use as– I’m a visually impaired person myself. I have a little bit of light perception but not much.
And so I have a particular set of cues that allow me to function in a real world environment. And I’m convinced that if I can provide similar cues to people in a virtual environment based on what they are able to perceive or what cues they find most important in the real world, then perhaps we can have them transfer some skill to be able to navigate objects, to find things, to interact with things in a virtual environment as well. And this is to sort of pre-empt the amount of things that are going to start happening in virtual environments, from parts of education to training to even route training, perhaps. So trying to think about not just a one-size-fits-all to the design of perceptual cues for virtual environments, I think, is going to be really important.
NICK GIUDICE: But building on that, I think this hits something that all three of you have talked about. I mean, when we talk about augmented reality, virtual reality, we can just call XR collapsing, and all these different types of realities.
Generally, when we talk about this– and most of the people here that are listening to this and watching this panel think of this generally in terms of visual technologies, visual illusions, visual perceptions. We have a lot of traditional ideas of immersion. Traditional ideas of how sighted people interact with virtual reality are generally synonymous with visual environments and visual interactions.
And so it kind of brings up the question, the things that you just talked about, Sile, and the things that we know as blind folks, we interact with the world– obviously, we do it not using visual cues, and it works very well. But if we are going to extend these experiences, what are the technologies that we can use to support immersion, to allow this skill transfer that you’re talking about for blind people using nonvisual virtual environments?
AMOS MILLER: I’m going to jump in. It’s Amos here. I think the first point is understanding that reality is constructed in the mind, in the brain, based on sensory input. So even if we can’t see the environment around us, we have a sense for that reality. And the more input we have, we can refine that sense and make it more accurate, and so building on that and delivering the information to the person in a way that leverages their ability to sense their reality.
So for example, in the world of Soundscape, we use audio to compensate for the lack of vision. But we still specialize that audio and place audio cues and labels in 3D space so that it complements your sense of that space, if it’s congruent with the other senses. And when multiple senses agree with each other, that really enhances that sense of reality.
And one of the things that we’ve worked on more recently is what we call Soundscape Street Preview, which allows basically a user to walk around, walk, and explore an area with a Street View experience but with sound only. And we’ve really learned, even in that situation, that you can walk down the street. You can hear what buildings are on either side of you. You can hear when you walk up to an intersection, which road goes in which direction.
And you can– even in that scenario, you are building on the multisensory experience. Like, for example, we asked people to stand up and move and turn the body in the direction that they want to go, because the combination of the proprioception and the auditory experience really work together to give you that virtual sense of reality.
SILE O’MODHRAIN: This is Sile again. I wanted to follow because Microsoft also did this lovely project– I don’t know if you were involved, Amos– which involved also rendering some tactile or haptic cues using a virtual cane, which I think also has some very interesting possibilities, because for cane users, that means, again, that you can reproduce those cues that are familiar from the real world in helping you navigate your surroundings. So being able to carry those into the virtual environment in the same way as sighted people carry their memory for the way that scenes change when they turn their head, we also can carry this memory of the way that we can find things using an extended, if you like, hand or finger probing the virtual world.
AMOS MILLER: Yeah, I experienced that. It’s a very surreal experience, I would say. But it really does play on that multisensory, the haptic sense of what you’re feeling, what you’re hearing, and your movement. I think we must never forget that your physical movement is part of your experience of an environment. When you walk three steps, when you move a certain distance, and everything– your reality around you needs to change accordingly. And It’s the combination of all of these inputs that really help you shape that reality in the mind’s eye.
NICK GIUDICE: What I’m hearing is the importance of multimodal information. Since we’re trying to tie this perception, I think that’s so important. If we think about the brain as probably the best– obviously, the best multisensory processor we have, and if we have more of these biologically inspired interfaces and systems moving forward doing this, combining haptics, which shares a lot of spatial information with touch, combining spatialized audio, like Amos was saying– and that’s a big deal.
Just so if people aren’t familiar with this– Amos, jump in. You’re the expert here. But most systems use language because it’s easy, just spatial speech, saying something is at– the bank is at 10 o’clock and 30 feet. You have to interpret what that means. That’s cognitively processed. And if you don’t know what 10 o’clock is or 30 feet, you don’t know where that bank is.
But when you hear it localized in space, as Soundscape is doing, and as other projects are doing, that makes it a perceptual interface. That makes it much more cognitive– much more intuitive, much more usable. And when you combine that with movement and haptics, I think that’s what we’re kind of synthesizing to say that’s important.
AMOS MILLER: Yeah, I think the language space is a really interesting one. I think it is a high processing. And when you process a language like a sentence– this is 20 feet to your right– you actually block a lot of the other sensory input while you focus and process that information.
On the other hand, language could be extremely empowering. And I always think about how using more poetic or metaphoric language, you can, with a very short sentence, deliver a whole image into the mind that can create a whole– when you have really great audio describers that describe a scene or a church, in half a sentence, you can convey the whole thing. And that’s an area that I think is also part of perception that I have not yet seen explored in AI to a large extent.
ASHLEY TUAN: We also believe multisensory input is important for navigation so the input that we had was visually impaired. Some people that they’ve told us when they are on the street, when they’re stressed out, when there’s so much going on, they can get sensory overload. So they appreciate that there’s more than one source telling them where things are.
NICK GIUDICE: So building on this a little bit, the “reality”– may be a bad word– the potentiality, what is actually happening, not virtual, [CHUCKLES] is that XR and these types of technologies, AI-based technologies, virtual technologies, are just becoming more mainstream. And there’s all types of applications and fields that they’re being used in and being relied on, from education to engineering, from medical training to social media, and pretty much everything in between.
And so my question here is, what do researchers and designers, such as yourself, and what advice would you give to other people moving forward, what do we do to ensure that blind folks and visually impaired folks don’t get left behind as this technology becomes more pervasive moving forward? Sile, let’s go with you.
SILE O’MODHRAIN: [CHUCKLES] Gosh. Well, I think it’s we’re always going to lag behind. But I think we do have some things in place, like the work that Amos is doing which really demonstrate that it’s possible to build very rich virtual environments that just don’t have any visual component. I think one key element here is going to be providing accessible content creation tools that blind people themselves can use because I think the more we can do to demonstrate what’s useful for us, the better– the quicker we can get something, the ideas that make sense for us out into the rest of the world, because as is always the case, it’s harder for somebody who doesn’t share our experience to design something that’s going to be really meaningful for us. So I think accessible tools and getting blind and low vision people involved early in the design process is going to be really important.
NICK GIUDICE: Brilliant. You can’t just have an end result. You need to have people involved in doing it.
SILE O’MODHRAIN: Yeah.
AMOS MILLER: I was going to say exactly the same thing, I think. There is such an incredible talent out there, including people with a range of abilities at the outset. It is just mindblowingly useful.
We’ve recently explored the contribution to a panel that’s looking at disability for autonomous vehicles. How do you ensure that autonomous vehicles are inclusive? And often people think about, OK, well, it’s got to have some kind of a voice interface so that we can communicate.
But when you ask people with disabilities what are the things that concern them, it’s a completely different plane. It’s all about trust. It’s about knowing what’s going on and feeling that they have some of it– some form of control about what’s going on. It’s so, so important to involve people at the outset. You will just discover a whole new dimension to the experience you’re trying to create.
ASHLEY TUAN: Yeah, I agree. So I was a clinician, a optometrist, low vision was my specialty for many years. And with this project, we worked with business center, and we talked to many visually impaired subjects. And we talk about what is their concerns.
So to address– to make sure that how can we use the mixed reality to help people in terms of learning, I feel that there’s different aspects that mixed reality can contribute to that. So when it comes to new work for students when they come to learning, what we’ve heard a lot is people with visual field restriction, they have problem reading normal textbook because as they scan over line by line, they tend to lose places. So how do we help them to make sure that they are not going to those places easily, and therefore, they can read as fast as they could and comprehend that?
For people then with scotomas, again, similar, it’s easy for them to lose places. So if we have a solution that can help them to keep their places, a lot of people told us that they can have a much better learning experience that way.
In terms of distance learning, so if you sit in a large lecture hall, you need to see the whiteboard or blackboard, people tend to need to use a telescope or a reverse telescope, depending on their needs. And their alignment takes time. So they need to do frequent alignment because they need to look at the board, and then they need to look down to take notes, et cetera. That is an issue for them.
So our module lens that what we are trying to do is we can pre-align the projector and the outward-facing imager so that they don’t have this frequent alignment issue. Once aligned, then they can just focus on what they need to do, what they need to learn.
And on top of that, we feel that in school, another aspect really important in education is social-emotional. So we aim to provide them that near to zero social cost of solution, that invisible computing. So they can hang out with their normally-sighted friends. They can see their body language or their facial expression easily. So they can have a normal social-emotional development just like rest of the population so that, hopefully, they won’t have missing information to communicate with normally-sighted population.
NICK GIUDICE: Good point. So we’ve talked about a lot of new technology here. We’ve talked about a bunch of different ideas and solutions for augmenting human perception. I guess my question is, how do we avoid, first, what I call the engineering trap, which is essentially– we’ve all seen this– years of development of people that have designed stuff, mostly because they don’t have blind people involved, that is often a solution looking for a problem, or doesn’t actually meet needs, and it isn’t very usable. And so we have a lot of stuff that ends up just not being very good.
So when we want to do this right, what are the best ways that we have to measure benefits and to figure out the efficacy of these solutions, to know if they really are– what’s the most important thing moving forward. And I guess I’m asking, how do we do this both using qualitative approaches, which is kind of common– does it make sense, and do you like it– and also quantitative approaches? I’ll start with Sile on that one.
SILE O’MODHRAIN: It’s OK.
NICK GIUDICE: I know. I called you [INAUDIBLE].
SILE O’MODHRAIN: So I think that’s a very good question. And I think what sometimes bothers me is, I wonder, if we could talk to some of the people at large organizations, like NSF, and NIH, and places like that, and figure out the amount of money that has been spent on things which never really– if they’d asked us, we would have said that’s probably not– it’s an interesting technical challenge, but it’s not going to solve a problem.
So I think part of it has to be about us being proactive. And maybe it’s time for us to have a kind of Dragons’ Den type thing where we vet a lot of these ideas and try and do some mythbusting before the money gets spent in both the public and the private domain, and also get a lot of people involved, not just people like us, but teenagers, people who have lost their sight, just a broad range of people who can be almost like a panel that could say, on a scale of 1 to 10, this idea is probably a 3 or a 9 or a 4. And this sort of happens informally anyway. But it might be a really interesting idea.
And I think, yes, just so just getting the broadest possible input at the earliest possible stage, but also being mindful of the fact that, as Steve Jobs said, sometimes people don’t know what they want until they see it. So there’s a kind of a trade-off here.
AMOS MILLER: I think it’s also important to be clear on what you are measuring. I think that one of the things that we look at a lot is mental mapping and mental modeling– sorry, memory of a space. And we have done a series of research studies that demonstrates that the technique that we used, that we talked about earlier, with empowering the user with more information rather than giving them turn-by-turn instructions, for example, which makes it really easy, but the latter really is not effective when it comes to mental mapping. Where if you do use information-based guidance, you have much better results for mental mapping.
However, it requires a lot more effort, as you alluded to Nick. Does that mean that it’s bad or good or neutral? I don’t think it’s any of those. I think it means that it’s better for mental mapping.
And then you have to ask yourself, is mental mapping important, the engagement of the hippocampus and parts of the brain that are so critical for so many of our other faculties? And if you reach that conclusion that mental mapping is actually a core ability that we want to maintain and not dumb down in humanity, then yes, it is important, then we have to figure out what it is– how do you motivate the people to actually engage in tasks that develop their mental mapping.
So choosing what to measure, being clear on what you’re trying to achieve, evaluating whether it’s the right metrics, I, think is as important as whether it’s successful in the market in the first instance or not.
ASHLEY TUAN: I agree. For us, we are trying to focus on aiding the mobility aspect of visually impaired. So for us, we focus on try to quantify the impact. One of the things is we focus on using contrast sensitivity measurement. A lot of people that used to hear, oh, my vision is 20/400, or I have a visual field of 20 degree, those are ways to quantify the visual impairment.
But another less heard measure, that is the contrast sensitivity. That is to measure the threshold of a person can distinguish the shade of gray over a white background. And we find that has certain pretty good correlation with daily activities.
For example, a visual threshold of 2.0 is normally-sighted population. And then 1.8 is early visual impairment. So once the number from 1.8 drops to 1.5, then we find that it’s reporting of three times more fear of falling. That is significant clinically. So our goal is to look at, how do we help a person with 1.5 threshold contrast sensitivity, and can see that 1.8 target through our image enhancement? If we can do that, that is a way that we quantify the improvement of the device.
Another way that we’re thinking to do is a daily activity questionnaire. There is a NIH, or National Eye Institute, National Institute of Health, that they have [INAUDIBLE] questionnaire. So they ask daily activities, and then you can report before and after the device, whether the user feels they can perform their activity better.
And one last thing that we’ll look at was looking at setting of different obstacle course. So if the device can help people move around faster than without the device, then we know that– [AUDIO OUT]
NICK GIUDICE: So just to wrap up, it sounds like there’s multiple ways that we need to be able to think about measuring and trying to be as inclusive as possible. I think the bottom line is that we need to make sure that we are including blind folks not only in the studies that we’re doing, but in the designs that we’re doing. People out there in the audience, I think this is a really important message, use multimodal information. And then users get their input to make your stuff better, [CHUCKLES] to put it scientifically. Anyway, thank you very much. You guys have been a great panel, and I’ll throw it back to Will.