Envision: What happens when smart glasses really get smart?

DESCRIPTION

Envision is a pioneer in the effort to connect computer vision with everyday life in the form of tech-enabled glasses that can tell a blind user about their surroundings. Using the Google glass platform, Envision found a market with blind users who value a hands-free interaction, and the experience only got better with the launch of scene description AIs in the past two years. But what’s really changed the game for Envision is generative AI, and the tantalizing possibility of a multimodal AI that’s more like an all-around personal assistant. Immediately following this session, Karthik Mahadevan will be available to take questions live in a breakout session.

Speakers
- Karthik Mahadevan, Co-founder & CEO, Envision
- Moderator: Joe Devon, Entrepreneur and Accessibility Advocate, GAAD / A11y Audits
SESSION TRANSCRIPT

Download transcript as .txt file

[MUSIC PLAYING]

JOE DEVON: Thank you, Alex. It’s my great privilege today to introduce Kartik Mahadevan who is the CEO and founder behind envision and the groundbreaking work in AI and accessible technology. Kartik, can you just tell us a bit about yourself and your company?

KARTHIK MAHADEVAN: Yes, sure. Hi, Joe. I’m Karthik. I’m the co-founder of Envision. At Envision, we build AI-based software to help people who are blind or have low vision independently access the world around them. It started with a smartphone app. But over the years, we’ve evolved to a smart glasses-based solution. So with just a tap of a button, people can access all information that would otherwise be inaccessible to them.

JOE DEVON: Right and can you share a bit about the traction that you have with the company? And are things going the way that you envisioned or have you had to pivot a little bit based on seeing how the reception was?

KARTHIK MAHADEVAN: Yeah, I think each year, there’s a whole bunch of new insights and learnings to catch up with this whole thing. It started as a student a project. So when we started, the intention was not really to build the kind of a company that we have today. So each year has been a learning for us. On the app front, we are doing incredibly well. We have over 100,000 monthly active users of the app around the world, who do want to use this app for all kinds of incredible use cases in their everyday life. The smart glasses thing is still a new experiment, like a venture for us. We have sold over a 2,000 of them all over the world. So we are getting a lot of very useful and insightful insights on that as well. And the big hope is that as these glasses become a lot more accessible to people around the world, the application and the applicability of it also increases.

JOE DEVON: Wow. Those are some impressive numbers. I’m just curious how many people do you have in the company to support all of these users.

KARTHIK MAHADEVAN: So we are 15 people in the company. It’s a mostly based in the Netherlands, but we also have folks all over the world who work for us as well.

JOE DEVON: Cool. So before we dive deeper into Envision, let’s just chat a bit about AI safety because it’s been a big concern of a lot of the AI luminaries out there. There’s been some talk of legislating in different ways. For example, making it a little bit harder for open source models to just push out this powerful technology and this recently culminated in some serious boardroom drama, as most people know, where OpenAI’s CEO was hired– sorry was fired, then, six days later, was rehired. I’m just curious, where do you see things? Do you think that we should regulate AI and how do you see the open source versus the big tech companies providing the foundational technology? Where do you think the safety concerns lie and how we should handle it?

KARTHIK MAHADEVAN: Yeah. I think there’s always a balance in things like this. Of course, there is a need for a regulation and a safety aspect, but it shouldn’t step into the realm of regulations, where you just squash out any scope of innovation, right? So I think it is a fine balance that has to be maintained. And I think a way to achieve that is through education and awareness. I think a lot of people who are doing a regulations are probably not aware of exactly what they’re regulating, right? Because this whole landscape is evolving so fast that even for people like us who are in the business don’t really understand a lot of the things that’s going on. So I can imagine how it could be a bit scarier or daunting for the regulators. So I think there is a need for people who are building these tools to do a better job at educating and spreading awareness of exactly what they’re building so that it can be regulated but appropriately. So there isn’t any aspect of regulations. And when it comes to open source and a private company building stuff, I think, over there, also, it’s always going to be a balance, like it has always been when it has come to development of any kind of technology, be it operating system or the internet or anything. I think there will always go hand-in-hand. The big companies are the ones who have the data at the moment. They are the ones who have, over the years, been able to mine all this data that’s available. And a model is just as good as the data, and I don’t think open source can compete on that front. But there are other fronts where open source is a lot more useful in the way it’s working. So I think it’s always going to be your leg b combination. So an ideal outcome is that they are both operate in a complementary way instead of operating in a competing way, I would say.

JOE DEVON: That’s interesting what you’re bringing up here about the data. The data is really the moat, isn’t it? And so when you are building technology, if you’re an enterprise company, you can take that combination of your data. And then do you build on top of open source, or do you build on top of one of the big tech companies foundational models? What is the better path to take?

KARTHIK MAHADEVAN: Yeah. So I would say for a company like us, right? Like who are just starting out and are building stuff. We pick the best that’s out there from the perspective of a customer, right? Because that’s the people who we are serving. That’s the people who are going to be paying us. That’s where our revenue comes from. So when there is a decision making process within envision as to which of the model that’s out there we need to pick and use, we do a test. And we pick the one that actually is the best of the lot. And sometimes, it happens to be an open source, once that actually is doing a much better job, as compared to something from a big tech. And sometimes, it’s the other way around. So I think this kind of growth is actually super helpful for folks like us who actually are building applications on top of these tools. Because as these tools start to exist, we actually have the freedom to pick the best and offer the best to the customers at the end of today.

JOE DEVON: Yeah, it makes sense. Now one of the big problems that we have, and part of what people are complaining about on the safety is the hallucinations. It introduces liability. It can lead people down the wrong path. How do you think that they should be handled, and do you think that it’s ever going to be a solved problem? And what I mean by that part of the question is, sometimes, there’s facts. There’s that maybe you can tell, and then there’s, sometimes, where there’s opinions where you have different perspectives. And are we trying to go too far if we say eliminate all hallucinations when there might just be conflicting opinions? So how do you think it should be handled?

KARTHIK MAHADEVAN: Yeah. Yeah. No, I think early hallucinations is a part of the technology. So the technology, it comes with both the good and the bad. And one of the bads is the hallucinations aspect of it. And it really stems from the fact that we don’t really understand how these neural networks, they work, right? We don’t really understand exactly how they operate, other than the fact that they’re just predicting the next word in a sentence. So I think we will probably not be too good at understanding how they work in short term. And as long as we don’t understand how they work, they will always be issues, like hallucinations. But that does not mean that it cannot still be a helpful tool for us, right? I think a good example is ChatGPT itself, right? So ChatGPT you hallucinates a lot. Depending on the prompt you give it and stuff that you ask it, there is always hallucinations that is a part of it, especially if you’re trying to ask it for something that is a factual. Your ChatGPT is not a good tool to go for factual stuff. You might be– you be better off going to like Wikipedia or something like that. But there are so many other very incredible use cases of ChatGPT, where you’re OK with the hallucinations part, where you’re OK with it not being 100% accurate, but still actually having an incredible immense use. And that’s exactly how we see it when we’re trying to incorporate these tools for our applications for increasing accessibility, right? It doesn’t always have to get things 100% are right. Or sometimes, even an 80% accurate information is a better thing for a user, who has a 0% information at the moment. So I think what’s important is that people should be aware of it that, hey, there is a possibility that these models will sometimes hallucinate, sometimes, talk about stuff that is not in– that is not actually out there in front of you. And as long as you have that awareness and as long as you take every output from this with a pinch of salt, I think it’s good because then you can actually make the decision as to how much you want to actually put your faith in an output from an AI or not.

JOE DEVON: Yeah. And what’s your experience been with the drift of LLMs there? Like, lately, I found it very frustrating dealing with the requests to ChatGPT or Dall-E 3. The quality is all over the place. There’s been a lot of talk in the last couple of days that it’s gotten lazy. I even joked that ChatGPT has quiet quit, didn’t like what was going on with the boardroom drama. And now, it just literally says, I don’t want to code. I don’t want to answer your question. Look it up yourself. There’s been some interesting stuff. How hard is it to build when you see this drift where you don’t have a consistent type of responses, and you constantly have to update the prompting behind it?

KARTHIK MAHADEVAN: Yeah. I think it’s exciting, right? I think there’s– I think it’s like– I think if you plot the development of LM or a generative AI on a chart, there will be ups and downs. It will have spikes. But when you zoom out, it’s still all going up, right? The amount of– in terms of the capabilities of this AI, it is still going up, even though there are spikes up and down in the short term. So I think that a picture is what we are very excited about that. If you really zoom out the amount of your progress and growth you have achieved in a generative AI. Just this year, it’s immensely more than all of the past decade put together, right? Even a fact that we have something like a GPT, a vision that can actually give you such a description on the basis of an image while a two years back and vision was struggling with an AI which will only give it a single sentence. But even that would not be accurate, right? So much of a progress has happened. So we are OK with, are they being a bit of fluctuations as we tinker with this technology and to optimize it? But I think we should be so happy about the progress that has been made in such a short period of time. I think there will be the short time spikes up and down, but I think over a long enough timeline, this would be a period of immense progress in terms of what has been able to achieve.

JOE DEVON: Yeah. And speaking about that progress, Ray Kurzweil is looking really good right now. He kept telling everybody that you’ve got this exponential growth, and our brains can only think about things as a linear growth. And the past year has certainly proven out that we’re just seeing incredible improvements. And you’ve got the hardware and the software on all these different layers where the growth is rapid. Are we getting to the point where AI assistants, like the glasses that Envision has, that we’re going to have an angel on our shoulder that’s going to be personalized to our own needs and wants and desires and just sit on our shoulder and guide us through life?

KARTHIK MAHADEVAN: Yeah., exactly. I think that’s exactly like what Envision is a building at this moment. If you were to ask me the same question in the beginning of this year, my answer would be, yes, in about three to five years, I think that is what we’ll be able to achieve. But as of now, my answer is that this is achievable. Now, this is achievable with the technology that’s available as of today. And that’s what Envision is building a personalized visual assistants that understands you as a personality. It understands you. It understands the stuff that you need. It understands how you like your information. It understands a lot about you. So it’s not just a general AI that you interact with. It’s something that is really, really personal to you. And it has a broad enough understanding of the world and the visuals ahead of you to be able to filter that information and offer that to you in a way that is actually that is digestible to you. So we always used– I use this as a metaphor earlier that, at some point, will be like having a bird on your shoulder you can talk with. But it feels like a reality today. And it has happened in such a fast sequence of time that having that kind of a bird on your shoulder that you can interact with and ask for assistance is actually something that will happen right now instead of in five years.

JOE DEVON: Yeah. And what I liked in your website– so transformers were one of the big changes, leaps forward. And I like on your website how you describing that Envision is taking the environment and transforming or translating it into a spoken format. You’re taking the visual and just translating it sort of to another language. And Darryl Adams, the head of accessibility for Intel, he really described that that’s what our future is going to be– personalized artificial intelligence that gives you the information you need. Because if you’re blind, you have a very verbal life. If you are deaf, your life is very much a visual. And if you’re deaf blind, maybe even haptics. Do you see– yeah, haptics, AR, VR, or mixed reality, whatever is the hot way of referring to it now, are you seeing that being a part of it, too, robotics as well? What other technologies are going to be layered on top of AI and all of these transformers? What’s the form factor?

KARTHIK MAHADEVAN: Yeah, so it’s basically the two big things that Envision is putting bets on, right? First thing is AI, especially generative AI. The way it has grown in the past two years, we’re putting a bet on that it’s going to continue that trajectory. It is going to still have that explosive growth over next year. I think we are still just looking at the tip of the iceberg as to what’s possible with the AI that we’re building today. So I think there is a big growth happening in the AI world. But there’s another technological progress that’s actually starting to take off. Next year is going to be a big year for that. It’s actually going to be wearables, right? Especially wearable cameras are actually going to make a big splash in the next couple of years. We already are seeing early hints of it with, like Meta partnering with like Ray-Ban. And they have these Ray-Ban smart glasses. We have this humane pin, which is like– which has a form factor of a pin that you actually have on you. And all of these are very much are betting on actually having AI as a part of these smart glasses. Apple has put its hat in the ring with a Vision Pro, which, as of now, is a big and a bulky headset. But if you do extrapolate that, OK, now every year, Apple is going to iterate on that and improve. It can actually extrapolate how in the next three years Apple will probably have a lightweight headset that they’re going to go into, right? So there is a resurgence. And then there is a growth again happening in the wearable hardware space. And I think when you do a combine a wearable or hardware with a generative AI, the combination of these two is it is exactly what’s going to unlock that parrot in the shoulder kind of a feeling actually for you. It’s not going to be on your phone. It’s not going to be something you have to take out of your pocket and fiddle with. Before you actually have access to the information. The access to information is going to be instantaneous. And that is only possible, if there is a variable that’s always on you, that you can just peek and engage with on a constant basis.

JOE DEVON: Yeah and the form factor is super interesting and challenging. I always thought in my own head, the benchmark for me, for when we’ve really reach that AI assistant level is when a system will know every bit of food that you’ve eaten, been able to tell you how many calories you have ingested because it really understands your environment to the tune that it knows every bite. But if you don’t– I think glasses are probably a good form factor. But as someone who got the surgery to get rid of glasses, I don’t want to be stuck with glasses for life. Is this going to be a brain implant one day? The founder of tab, which is similar to the humane pin, he felt that their mistake was making it a pin that’s easy to pull off? People know that’s a $700 item. They could pull it off because it’s a magnet that maybe a necklace is better. What do you think is the best form factor for a wearable?

KARTHIK MAHADEVAN: Yeah. I think it’s going to be a spectrum to begin with. I think everybody will have a personal preference for this. I think there will be people who will prefer a smart glasses, like a form factor. Because when something is on your head, it is so much easier to orient with your head, if you want to look at something specific and look for that information. It’s a lot more easier to look around with your head. It’s a lot closer to your mouth, if you do want to have a mic in it. It’s just like a form factor that makes sense. But at the same time, there will be people who will be like, hey, I don’t want something on my face. Because when I talk to people, I don’t want them to be looking at a camera on my face. I want them to be looking at me, right? So there will be people who will be like, hey, I am OK. If it’s a pin on my chest, or there’ll be people who’ll be like, hey, I want it as a necklace. There might be people who will be like, I want it on my watch, something that I can actually have it on a wrist. So I do believe that there will be a spectrum of different pieces of hardware. And I think different people will gravitate towards the different form factor, which actually is a feature I’m super excited about. Because I’m getting bored of everybody having a phone that looks the same. Now, everybody has a glass a rectangle in their pockets. So I’m actually excited about us exploring the industrial design of having all of these different kinds of your wearable that’s out there. But I think there will be probably a spectrum of these to begin with. Eventually, I think after a decade or so, we might settle on a few that everybody agrees is the most comfortable option. But I think we are entering into a decade where everybody– every year will have a new a form factor that’ll be interesting to see.

JOE DEVON: Yeah, really excited for it. So now, let’s focus a bit on accessibility. What does the future look like for accessibility when you have a device that’s going to be able to read a web page or an app and interact with it? Do we need accessibility subject matter experts anymore? Is it going to matter how a website is, what an aria label is, what the code looks like under the hood, when the AI is just figuring it all out? What is it going to look like?

KARTHIK MAHADEVAN: Yeah, I think there could be a trend that we start to see is that instead of people optimizing their tools, their websites, and everything for all of these different screen readers or magnifiers and things like that, there might be a trend that could start where people are optimizing it for AI, right? They’re optimizing their websites for an AI to be able to access it in the most easiest way. Very similar to how now every website, it has a robots.txt file so that a search engine can scroll it easily and take the information or look for it. There, probably, are going to be a similar embeds, maybe a visual embed in every website so that it’s easier for an AI to be able to access the website and offer it in a personalized accessible way to a user. So if I am a user who wants everything on a web page spoken out to me and if my AI understands that, it is going to look for that in the website and just pick out everything to me. But if I’m a person who only wants to have the minimal information, then I can also use that as a verbosity settings for me. So I think that might be a trend we might start to see, where a lot of these websites, instead of trying to make it accessible to every kinds of software, they might just make it accessible to AI. And then the AI can be a middleman to translate a website in a way to a person in the end.

JOE DEVON: Yeah, it sounds like almost a different kind of API for AI, which would be pretty easy to do because of how it works.

KARTHIK MAHADEVAN: Yeah. I think it’ll be like a user-centered design. Instead of a human-centered design, we’re going to design your websites or your stuff that are optimized for an AI to use instead for a human to use.

JOE DEVON: Yeah. Yeah. Interesting times ahead for the good and the bad. So we have about one minute. I could talk to you for hours, but we have only about a minute left. So I’m going to have you leave us with some final thoughts and also how to contact you or pick up the Envision app or glasses.

KARTHIK MAHADEVAN: Sure. You can reach out to us from our website, letsenvision.com. That’s L-E-T-S-E-N-V-I-S-I-O-N .com. You can download our app on the App Store or the Play Store. The app is available for free for everyone. We do have these glasses that you can also just purchase from the website. Or if you have a distributor available or locally, they can also actually offer you these glasses. There’s a lot of exciting developments under the hood that we’ll be launching in the new year. So if you are somebody who wants to keep up with all of these AI developments, you should definitely stay in touch with us.

JOE DEVON: Awesome. Thank you so much. This was great.

KARTHIK MAHADEVAN: Thanks. Thanks, everyone.

Envision: What happens when smart glasses really get smart?

Speakers