Be My AI: What happens when an accessibility favorite makes the jump to AI?
DESCRIPTIONFounded in 2015 by Hans Jørgen Wiberg, Be My Eyes quickly established itself as a wildly helpful mobile phone app for people with no or limited vision. Today, more than 500,000 blind users rely on 6.8 million sighted volunteers (covering 180 languages) to take their call and, by looking through the camera on the blind user’s phone, describe what they see. The huge leaps in AI capabilities in the past year, however, have opened incredible possibilities. Can AI do better than all those human volunteers? In September, Be My Eyes launched its chatGPT4 AI-based beta, “Be My AI” in an exclusive collaboration with the leader in generative AI, Sam Altman’s Open AI. We’ll hear from the Be My Eyes team about how they integrated AI, what they are hearing from thousands of users in the beta, and how humans are still in the loop – for now – and how they handle chatGPT’s tendency to “hallucinate.” Immediately after this session, the speakers will be available for live questions in a breakout session listed in the agenda.
GREG STILSON: All right. Thanks, Alice. It is awesome to be here with Sight Tech Global. And I have the privilege of chatting with Mike and Jesper from Be My Eyes. And Mike, Jesper, can you first introduce yourselves and tell folks who you are related to Be My Eyes?
MIKE BUCKLEY: Jesper?
JESPER HVIRRING HENRIKSEN: Sure. I’m Jesper. And I’m the CTO at Be My Eyes. That means I run product development.
MIKE BUCKLEY: I’m Mike. I’m the chairman and CEO of Be My Eyes. And that means I frustrate Jesper daily.
GREG STILSON: You’re the one that makes promises that Jesper’s like, hold on. Just wait.
MIKE BUCKLEY: That’s generally how it goes. Yes.
GREG STILSON: I love it. Well, Be My Eyes has been– within the blind and low vision community, one of the hottest topics this year because of the work you guys have been doing with OpenAI and the Be My AI feature and things like that. Mike, can you give just a brief summary of how Be My Eyes initially got started?
MIKE BUCKLEY: The company as a whole was started by a Danish furniture craftsman named Hans Jorgen Wiberg who got tired of calling his family all the time when he had site assistance needs. And so he launched this beautiful app into the world that I think about as the merging of technology and human kindness. And it got some attention early. And by the end of the first week, there were 10,000 people using it. And so for those that don’t know, it enables a one-way video call between someone who’s blind or low vision and– pardon me. Of course my phone’s ringing. –and a volunteer. And the interactions are pretty seamless. More than 90% of the calls are successful, and we operate in 150 countries and 180 languages. As you know Greg, as we’ve evolved as an organization and as a company, we’ve looked for technological tools to better assist our community. And we have now moved forward while maintaining that volunteer service to what we think is a state-of-the-art AI visual assistance tool.
GREG STILSON: Yeah. And see, this is why you lead in to the next part of the question, which is, when did you guys start thinking about AI as the next frontier, and how did that connection with OpenAI happen?
MIKE BUCKLEY: I would say that I was probably a little late to this. I know Jesper has done work in artificial intelligence and machine learning for a long time. And so he can talk about that. But the truth is, we cold called OpenAI in January this year. And for whatever reason, we got through, and they called us back and said, hey, Be My Eyes, can you keep a secret? And we said, sure. And they told us that they were going to launch this model and GPT-4 was coming. And they asked us if we wanted to be a launch partner. And we said, wow. Absolutely. There’s only one thing you need to know. We keep our tools and services free for the members of the blind and low vision community. Is that OK with you? And they said, let’s go to work. And then literally, Jesper and his small but mighty team within four weeks had built a product that was– well, I can let Jesper talk about it. But that’s the Genesis of the relationship.
GREG STILSON: Yeah. Jesper can you share some of the– I mean, this happened very, very quickly it sounds like. How did your team and you personally handle that? I startups are fast and furious, but how did this all come to fruition from the technical side?
JESPER HVIRRING HENRIKSEN: Well, for the logic, we got a lot of help from OpenAI. And I think what they built has really enabled a very small company like Be My Eyes to build like a very, very cool feature like Be My AI in a short period of time, without having the dedicated data scientists or ML engineers. My team, they are normal software engineers. And they were able to build this in the four weeks. I think OpenAI pushed the deadline by two weeks. And we were happy about that. There were no complaints. So I think we ended up spending six weeks getting it out. But that was it. And honestly, we spent– the majority of our time was spent on building the UI for our use case for the blind and low vision users.
GREG STILSON: Let’s talk about that UI. Because I think– one of the points of confusion I think is how Be My Eyes differs from the ChatGPT piece of this. And is it the UI that you guys attribute the success here to? Is it just such an easy onboarding access for blind and low vision people? Or how does your implementation differ from, say, if you’re just on ChatGPT?
MIKE BUCKLEY: I’ll let Jesper talk a little bit more about this, of course, but I think the UI is one, Greg. I think the second is, obviously, the fact that it’s free, right? It’s not a kind of paid subscription. And the third that I won’t go into a lot of detail is there are a couple of Be My Eyes only tweaks in the model and the permissions that we have that aren’t in the experience that’s in just the off-the-shelf GPT-4. And so we’ve done a lot of fine-tuning. Jesper, I don’t know if you’d want to add some things. We learned a lot from the beta testers on the UI, in particular.
JESPER HVIRRING HENRIKSEN: Yeah. And well, I think that is our speciality. Is to make sure that it is accessible and it works in the blind and low vision use cases. So we really spent a lot of time. And I think that’s part of what we learned. Is how important it is with the super small things. You can make little tweaks to the UI then that makes a huge difference for the users daily. Just making it a little bit easier to get to the composer box or add another image to the session. So I think that is what we add and what we worked on.
GREG STILSON: Yeah. I was an early beta tester of this and saw the UI changes that you guys went– I mean, from what people see today to what there was at the beginning, it was vastly different. The flow is extremely refined. And it seems like you guys took a lot from the standard messaging experience that people have on their smartphones today. You guys tried to mimic that. Is that correct?
JESPER HVIRRING HENRIKSEN: Yeah. So definitely. I think there is this both in web design and app UI design saying about your users will spend more time on everybody else’s website or on everybody else’s app. So the best you can do is try to lean on what others are doing. So it’s a familiar experience. And I think the early versions we had of the chat experience was a little bit different from other– like what you’re used to from other messaging apps. And so we treat that along the way and made it easier. So hopefully, it is very familiar for users who are used to using an iPhone or an Android device.
GREG STILSON: Yeah. It’s, I mean, even down to– you guys made some even small refinements for even braille screen input people that now a three-finger swipe up can actually send the message just like in all the other things. So just to see the way that you all are listening to the community and then kind of putting in these quality of life changes that makes a big difference. So Mike, kind of going back to that OpenAI relationship and you kind of mentioned some of the tweaks that you all make to the model, one of the things that kind of was an early hot button issue was face identification and blurring out the faces and things like that. Can you talk through how that change happened and what it took for you guys to work with OpenAI to stop the faces from being blurred out or at least stop commenting on that?
MIKE BUCKLEY: Yeah. I mean, and you know about this, Greg. There are lessons and then there are painful lessons. And this one was probably the latter. The short story is that there are laws in certain jurisdictions that are pretty stringent on the use, processing, storage, you name it, of any biometric information. And even though it’s arguable whether or not an AI chat session constitutes biometric information because it’s not facial recognition, it’s facial description. It’s not like I have Greg, your biometric markers just from a picture.
GREG STILSON: Right.
MIKE BUCKLEY: That said, there was about a 48-hour, maybe a little longer period, one weekend where OpenAI effectively turned off the facial descriptions because of, I think, legal regulatory concerns and just wanting to have a really defined process for thinking through that. But when that was turned off, people were angry. I think I got lit up and we got lit up on social media and everything else. And it taught us a couple of things. One is, even when you bring in people who are blind or low vision to be your core beta tester and designers, you can’t stop listening every day. You have to really get to the fine print about design experience, needs on a constant basis. And I think you know Greg, we had this brilliant, robust every day Slack group with feedback that was instrumental in helping us get through that. But one is, listen every day. The second is, we went back and lobbied very hard OpenAI to turn that back on. And we talked about it as really a matter of equity and fairness to put this tool in the hands of people. And to their credit, Greg, they listened. I’m disappointed to say that there’s still some carve outs in Illinois where there’s a very stringent biometric law on the books. I think we’ve got some path through that and we are doing some lobbying on this front. But it was an important lesson for us.
GREG STILSON: Well, kudos to them for understanding. I mean, a lot of times when you work with larger organizations and things like that, the legal red tape can become a roadblock. And so good for them for not letting that get in the way. You talk about keeping it free for the community of blind and low vision users. Can you talk a little bit about how you guys are doing that? With OpenAI, there is a cost to using their services. So can you talk a little bit on how you guys are working to keep this free for the community?
MIKE BUCKLEY: Yeah. Absolutely. We’re still talking about, with OpenAI about long term ways to make sure that we keep this working together. Functional, fast, and free. The big thing as you know for Be My Eyes, though, is we have corporate customers. Essentially, our core products are about enabling better customer service for large enterprises for their blind and low vision consumers. So since 2018, for example, Microsoft and Google have been customers of ours. Thank you, Microsoft and Google. Where we, through our app, enable a simple video call, one-way video call to assist with customer service needs. We’re expanding those company and corporate relations she ships every day. We just put out an announcement with Microsoft about having deployed the Be My AI at the front end of a Microsoft call center experience where the AI is successfully answering 9 out of 10 questions for consumers in about 1/3 of the time. And so literally, we pay our bills and it’s everybody from AWS to Twilio to everybody else in between, through these corporate relationships. And look, it’s an important part of our ethos. You and I have talked a lot. We’ve all seen the stats about the number of people who are blind or have low vision who are either underemployed or unemployed. And so part of what we– our mission and part of what we do and how we think is that we have to put power of these tools in the hands of people for free.
GREG STILSON: Great. Jesper, can we talk a little bit? So you guys had a very lengthy beta. Was it seven months or something like that? Six months? Is that how long you guys kept it in beta before you did the public release?
JESPER HVIRRING HENRIKSEN: Yes. And we’re still calling it a beta.
GREG STILSON: Yeah. Yep. That’s good to clarify. If you could pick out one thing that you remember– one of the unexpected learnings that you had during the beta, what would that be?
JESPER HVIRRING HENRIKSEN: I think there were many, but one was in the beginning. I’m used to– I’m an engineer. Like work within the software engineering for 25 years. And we’re used to computers being very good at the hard facts. One of the things that– in this case, in the early days, the model wasn’t very good at text recognition, OCR, which got more or less a solved problem– has been for many years. Luckily, we work with OpenAI, they were working on the model also. It wasn’t just all based on our feedback, but we gave them a lot of feedback initially that OCR just did not work initially. The model would hallucinate, it would get the first line or the first paragraph fine, and then it would start hallucinating. And that is just much, much better in the current version. So that was definitely a learning and something we fed back. There were also other use cases around devices with buttons like remote controls and thermostats and things like that where it wasn’t great at those use cases. And we fed that back to them and worked with them. And so I think this had to be a lengthy beta because we effectively onboarded an alpha API in February. And it was in beta up until last week when it was announced at the OpenAI DevDay. But it’s still a preview API it’s what they call it. So they have several phases that they take it through. So we don’t think we could launch anything on top of a beta API in production. So we have to keep it there. So this is still very, very new. There’s also been some capacity issues with OpenAI because the announcement they had last week was extremely successful and gathered a lot of interest. And we’re running on the same data center and on the same computer clusters as everyone else. So we’re seeing the pains of that. So it is not at the point where we think we can– we made it available to everyone, but it’s not at the point where we want to call this– like take the beta label off. Our focus has been on getting it out to as many as possible as fast as we could. But we’re keeping it a beta to set expectations at the right level.
GREG STILSON: That really helps. Mike, I think this is sort of a catch-22, right? When you understand very quickly how many people you’re impacting when things don’t work and you’ve had a couple outages where OpenAI has gone down or things like that. And there’s a lot of noise that’s made when that happens. You want to talk a little bit about how that goes down and what you guys have kind of done in response to that?
MIKE BUCKLEY: Yeah. I mean, there was one day where it was down for six hours just in case anybody doesn’t know. And people were really upset. And they are right to be. I get it. It was hard to read, it was hard to see it, was hard to feel, it hurt. I don’t know, Greg. What it underscored for me is the responsibility that we have. And like literally, at this point, there is a large group of people who consider this a utility, right? That’s very different than just a toy or something to have fun with or a game. Literally, it has become a utility and a piece of people’s lives. And so I think two months ago, we had 3,000 monthly active users. Last month it was 25,000 monthly actives and we’re going to be higher than that now. So you see the product market fit and the power of this, for lack of a better word. And so what we’ve done is kind of talked with OpenAI and we’re working through this. We just literally– we’re switching to another version of the model right now and that promises more stability and less downtime. And I think that we’re going to keep iterating on that. I don’t want to claim perfection here, it’s still new, it’s still beta for both of us in a lot of ways. And I anticipate that there will be some problems going forward. But I’m cautiously optimistic that we’ll mitigate them and lessen them and make sure that this is available. And we got to get to 5 nines uptime, right? And I think we can get there.
GREG STILSON: And you talked about your monthly actives and daily actives. But just out of curiosity, do you have metrics on how many– I don’t know how you phrase them. But how many pictures are being taken or how many queries are being posed?
MIKE BUCKLEY: The average person is doing between five and six AI chat sessions a day. And so that’s a big deal. If you use something sporadically once a month, once a week, it might be a quality product. When you see 5 to 6 use cases on a daily basis for the average user, which means some people like me use it 15 times a day, it’s a good sign of the value of the product.
GREG STILSON: I always tell people, it takes a lot for an app or a company to get onto the first screen of my home screen. And so I can tell you right now, Be My Eyes is in the bottom right corner on my home screen. So you guys have done that. So there you go.
MIKE BUCKLEY: Well, you helped design it. So–
GREG STILSON: Couple more questions before we wrap up here. When we look at– Jesper, you mentioned hallucination and errors. Could you take a second to define what those are? Because I know that there’s been a lot of talk about AI hallucinating and sort of seeing things that really aren’t there. Is that still something you guys are seeing? Is this something that people should still be concerned about?
JESPER HVIRRING HENRIKSEN: I think we’re still talking about computers. So I would always be a little bit concerned. But the model that– we’ve seen it evolve over the last six, seven months. And it is much better than where we started. And I think the hallucinations are much fewer. It’s at the point now where I don’t think I’m seeing what I would call just straight out errors in what’s coming back as image descriptions. I’m seeing little details that I’m like, I really have to look closer at the photo and to figure out where did it see that? Or like a few times where it’s mixing up whether something is on the left or on the right. But it’s at a very good level. But we do think like in all other cases that we’re currently hearing a lot about generative AI, right? And all the use cases we’re hearing about. Whether it’s someone making images with DALL-E or using ChatGPT to write a job description. There is a human in the loop, right? And that’s the person who’s asking for the image or the job description. And they can verify it. But in the use case of Be My AI, most of our users will have a very hard time validating what’s coming back. So we still think there is a need for a human in the loop. And that’s why we’re working on a feature where we will be able to reach out to our volunteers and get this– double check is what we’re going to call it of the answer that comes back from the model.
MIKE BUCKLEY: Yeah. It’s interesting, Greg because– and I’d like to hear about your experience as well. But the way we think about a hallucination is– as opposed to an error is a hallucination is when the model makes something up out of thin air. Right? That’s either not in the photo or is just surreal sometimes in terms of what it said. Whereas an error is, imagine a complex graph and it gets one of the numbers wrong on the x-axis or the y-axis. I have not been able to produce a hallucination in the last month. And I try hard every day. Can still produce errors, the model is still weak on recognizing complex button layouts. Think about a hotel phone, think about a television remote. So it’s got to improve there. But as far as sort of making stuff up out of thin air, if it’s happened to you Greg, please let me know. But I haven’t been able to get it to do something like that in a while.
GREG STILSON: No. Same here. But I feel you on the button layout and that kind of stuff. It’s still got work to do on those type of things. So I think these are things that the community may need to just be aware of that. These type of tasks the AI is still learning. Jesper, you mentioned a feature you guys are working on. This double check feature. Mike, what’s next for Be My Eyes?
MIKE BUCKLEY: Well, we’ve got a lot in the hopper. We’re going to launch on Android and I’m really excited about that. We talked about human verification of the AI results. Specifically, what that means is the user of our AI product at their discretion and choice will be able to check and verify the results of the AI. How that will work mechanically is we’ll send it out to a large group of our volunteers in the hope of getting 10 thumbs ups for accuracy or thumbs downs in the case of a mistake within a very short period of time so that the usability of that and the check is coming. The other thing that we’re launching is– at Be My Eyes is a friends and family product where it will enable you as a user to create a closed network of close confidants. Maybe you don’t want to go out to a volunteer or maybe you don’t want to go to the AI because it’s a particularly high stakes use case or something you just don’t feel right about putting in the hands of a machine. You’ll be able to curate a group of people where your queries can go out to that, again, is personally curated for people who have trust. And so if you’ll let me for a minute, I want to thank Jesper and his team for the lack of sleep on building all these things. But we’re excited. There’s a lot in the pipeline.
GREG STILSON: That’s exciting. The double check feature is something I’m particularly excited about because you do get these incorrect things. And so being able to– one question I would have is, with the double check feature, is it going to allow those volunteers to write corrective messages back to you so that you get a sense of where the machine made a mistake?
JESPER HVIRRING HENRIKSEN: I think we’re still iterating on it and we’re doing the first very small test right now with how it’s supposed to work. So I think we will know more when– by the time that this video is out– we will know a bit more about how the final version of the functionality will look. But it is definitely like an option. And so I wouldn’t rule it out.
GREG STILSON: Very cool. Very, very cool. Awesome. I want to thank you both for the time here. This is exciting. And when we look back on 2023, what Be My Eyes has done is probably one of the biggest things to happen in our community in a long, long time. So you both should be incredibly proud of that. I commend you for cultivating such a relationship on a cold call with OpenAI. That’s a really cool story. Mike, from an OpenAI perspective, do you foresee additional changes or tweaks that they’re going to have to make for you all or for the community to really customize this further or are you guys pretty much happy with where the model is today and what you’re using at this point?
MIKE BUCKLEY: I think there will always be certain tweaks, Greg. Right? These models are going to get better, smarter, faster all the time. And if you think about what Jesper was talking about with human in the loop, in an ideal scenario, if we took a picture of– as you and I were talking about, that remote control and we wanted to verify the accuracy, in theory, we could get the 10 volunteers filtering back highly specific information to us that says, no. The Room Service button isn’t on the right, it’s actually on the left. And in theory then, you might be able to put that information back into the model for correction to make it, again, better, smarter, faster all the time. And so I don’t know about you Jesper, but I don’t think the iterations are ever going to stop on this, Greg. You and I have talked about there is a very real difference of opinion in terms of the verbosity of the results that people want. Some people want a paragraph of description that’s really long. Other people just want to say, can’t the model just answer yes, right? I don’t need to know that, oh, the sweater looks like it’s made of wool. It happens to be lying on a bed and it’s– no. I just want to know if it’s green.
GREG STILSON: Right.
MIKE BUCKLEY: Right. And so you Greg told that story about, can I please add a prompt that where the model doesn’t tell me my feet are in the picture.
GREG STILSON: Yes. Exactly.
MIKE BUCKLEY: By the way, you can. And Jesper and I can– we are going to Institute some things where you can control some of your prompts in a setting, right? Which would be kind of cool. But as fast as it’s moving and as great as the models are, I think we’re always going to want to make them better and have tweaks to them.
GREG STILSON: Yeah. I love the idea of building in these pre-prompts that customize the feedback that you’re getting. And really, that’s where you guys are putting the community secret sauce on this. It’s customized for our community. And I think that that’s really one of the things. And we’re going to start to see this more. Because there’s going to be, hopefully, more and additional tools that are going to rely on these large language models. And I think you hit it on the head, Jesper. It’s the UI and the secret sauce that you guys are putting in there that keeps what Be My Eyes is doing. A step above that. And so kudos to you all for a really exciting 2023. We’re looking forward to hearing more about what’s coming up with Be My Eyes. And congratulations on getting Be My Eyes on my first home screen. So there you go.
MIKE BUCKLEY: Thank you, Greg. And thank you to the 19,000 blind and low vision beta testers who helped us. It was significant. And it continues to be significant. And I say this all the time. If you have feedback– positive, negative, in between, firstname.lastname@example.org is my email. I read every one. Sometimes it takes a while, but we’re really committed to constant iteration and doing a lot of listening.
GREG STILSON: Thanks so much. Thanks for all the work, Jesper and taking those phone calls from Mike probably way too late in the evening for you. So keep up the great work you guys. And thanks so much for taking the time to be here on Sight Tech Global. Back to you, Alice.