Benetech: Using Artificial Intelligence to unlock STE(A)M Education
DESCRIPTIONArtificial Intelligence is a term that has been around for decades and AI applications and techniques are already being used in everything from HR, healthcare and ecommerce. But what is the future of AI in supporting accessibility and inclusive education? This session will provide a basic understanding of various AI techniques, including Machine Learning and Computer Vision, and how Benetech is applying these techniques to transform complex books. For accessible formats, text is easy but equations, images and other non-text content is not straightforward. Join us to hear more about the future of Assistive Technology and how it is opening new worlds for the blind and visually impaired.
- Brad Turner, VP and GM, Global Education and Literacy, Benetech
BRAD TURNER: As Lisa mentioned I am the Vice President and General Manager of Global Education and Literacy at Benetech. Our goal in life is to scale solutions for underserved populations. And really in global education literacy we believe that access to information is fundamentally what we need to be able to be doing as human beings. We use technology to drive lasting social change so everyone can learn work and pursue pursue their dreams.
Regardless of ability regardless of disability you know, allow people to reach their potential. So, many of you have heard of Bookshare some of you may not we’re the world’s largest library of ebooks for people with reading barriers – blind, low vision, dyslexia or other learning disability or mobility impairment that keeps you from using a printed book.
So if you can’t hold the book can’t see the book or can’t decode the book you qualify for Bookshare. If you don’t have a disability then you cannot qualify for Bookshare it’s a very special library that serves people with disabilities only. And in doing that we operate under US copyright law and now an international treaty that allows us to convert books, under publisher permission, or under or even without publisher permission convert books into multiple formats. So, we have books in audio, we have books in what I like to call ebook karaoke, synchronized text and audio, electronic braille; we have this industry standard publishing format and even a Microsoft Word format of all of our books, which is a great format for a screen reader. Which reads what’s on your screen. We can convert books any book that is published in the United States without any permission.
The flip side of that is we much prefer working with publishers and you’ll see a little bit of those statistics later but most of our books come directly from publishers as donations to say – hey you know what this is a great thing you know help us reach your audience with our books. So they give us they give us the books we convert them into multiple formats then we allow people to access those in those multiple formats. Another thing that we do is we make sure that people can read it on any commonly used device. Whether it’s a phone, we don’t care if it’s Android or or iOS a computer, a tablet assistive technology devices.
I mentioned e-braille, people go what’s electronic braille, it’s a it runs on a little device that’s you know about the size of maybe three or four cell phones stacked at up to the size of a kind of a standard keyboard but it’s a basically reverse typewriter for braille. The pins pop up underneath their fingers and then you hit next line and so you can just read a book and it stores hundreds of books on the device and allows people to read on braille.
So again, we support any kind of device online and offline we want people to read their way, and that and that’s super important.
Personalized learning is super critical. We’re funded part of our funding comes from the United States Department of Education the Office of Special Education Programs and, certain students read in different ways. Some read with their fingers, some read with their eyes, somebody with their ears. We want to be able to allow that student to personalize their learning experience.
So our goal is any book, anytime, anywhere in the format they want on the device of their choice, and that’s how Bookshare has grown. And in fact, over the last you know 15 20 years we’ve downloaded over 17 million accessible ebooks to people. We have over 800,000 Bookshare users in 95 different countries. I mentioned our publisher partners we work with over 900 publishers who donate content to us.
We also do purchase books and scan them but a vast majority of our titles now come from publisher partners and we add something like 10 thousand titles a month to the collection. We have books in 47 different languages. So if a publisher gives us the little prints in French we’ll put it in. If they give it to us in English we’ll put it in the collection. If they give it to us in Tamil or Marathi or Bengali or German or you know whatever we have we have books in a bunch of different languages. Not only do we have the largest collection of books in the world but we power 15 national libraries around the world. Some of the larger libraries in Canada, in Australia, in the UK, in the Middle East, in Africa, in southeast Asia.
We run national libraries around the world with the back end technology that also powers Bookshare. There is a, call it a new normal or a now normal because it might be different tomorrow called distance learning due to a pandemic and we’ve had record growth through that pandemic because people have tried to figure out how to best get books to their students. And you know Bookshare has been a fantastic solution for them. Whether they’re in the classroom or whether they’re at home trying to get that same education.
You know Bookshare, Bookshare allows you to download books that way and and through both Bookshare and through our partners we manage over 1.5 million books and most of those are in remember in those five different formats so it’s really 5 million different reading options for people with disabilities. So that’s a little bit about Bookshare, but let’s get into kind of the meat of the presentation here that that AI is important for what we do. Remember I said up front, I said we scale solutions using technology for underserved populations, and you know the folks with reading disabilities, reading barriers are are very much an underserved population.
The World Blind Union states that 95% or more of all content is locked in printed form. So for people who go to the bookstore and browse the shelves and pull a book off the shelf and go buy it or walk into the public library and you know find a book and pull it off that. Those books are locked out locked away from people with visual disabilities. So, it’s very much an underserved population. As we convert books for that population there’s there’s kind of multiple stages of it.
The text conversion is largely solved. In the 1920s optical character recognition, scanning and a picture of a page and turning the words back into readable text. In the 20s that was you know converted text and telegraph code and then in the 70s and and again in the 80s improvements were made to that. Kurzweil in the 70s and then a company called Calera in the 1980s. interestingly Jim Fruchterman was at Calera he founded Benetech and Bookshare in 2000.
So then text in multiple languages is largely solved. There are hundreds of languages that you can scan and convert back into readable format. Script-based, Roman character sets, languages using a bunch of different diacritics like Arabic languages the optical characteristic and the recognition. OCR uses some rudimentary form of AI and that it can learn and improve.
So text is largely solved.
Scale is largely solved.
Google drive and you can scan OCR Google’s OCR engine is pretty good and you can scan every one of your documents as it comes into Google Drive.
They store as of May 2017 they were storing over two trillion files. I would submit scale is solved. We import about 10,000 books every single month. Convert them into multiple formats. So scale’s not as much of an issue when it comes to text.
But what if it’s not text? What if it’s STEM or STEAM (science, technology engineering, art and math). So that’s that’s a lot more challenging. On this slide I have some math equations I have a picture of the Mona Lisa. I have the chemical formula for caffeine, which is near and dear to my heart.
I have the chloroplast stroma energy release. I have the cell reproduction cycle. That that’s not scannable. Those can’t be turned into words very easily. And so all of a sudden you have some different solutions. You put ALT text on it.
To give it a quick example, you know, a picture of a woman. Well, a picture of a woman isn’t very descriptive when you’re talking about one of the most famous paintings ever, you know. So it probably deserves a long description. How do you describe that math equation that’s sitting up there? A chemistry formula, chart and graphs, drawings, and art? You know engineering schematics. That that stuff, that STEAM stuff, is hard, and so what that means is that it becomes very manual and very expensive and very slow.
And because of that especially in the global south a lot of lower income countries STEM topics are not taught to persons with a disability after their primary school education because they don’t have the materials. You cannot study math unless you have special dispensation because you cannot get access to those materials.
But what if we could automate it like we did text? Remember I just talked to you about text and Bookshare when we started in in 2000 about 20 years ago Bookshare started by using the ebook as a core format. And that’s an important important element there because what we did was instead of taking a book and reading it into a recording so human narrating it or manually transcribing the braille.
We took the ebook format and then we automatically converted it into text-to-speech so we have an audio version of it and electronic braille and all of our different we started adding more and more formats to support more and more different types of disabilities. What if we could automate STEM like that? And so that comes to let’s use AI to do it. And so just a quick primer and I apologize this will be super basic for some people and I hope it’s informative for others. You know AI is you know kind of all-encompassing technology, sorry all-encompassing phrase, that utilizes a bunch of different technologies.
You know you can think of AI as the self-driving car because it uses machine learning and computer vision and natural language processing and neural networks and a bunch of other technologies.
But let me focus on these. Machine learning is really computer algorithms that improve automatically through experience and an example of that is a classification engine what category does something belongs in. So if I can train my model to say this is a dog and this is not a dog and so I show it you know a thousand pictures of dogs and then I show it an elephant it should be able to say oh that’s not that doesn’t look like those other things what which of these doesn’t belong, and so all of a sudden you can use this as a as a classification engine. And you don’t have to show it a picture it’s already seen. It knows that oh look a dog is furry and an elephant is not and so that’s not a dog. You can use it in regression analysis. So predict the probability that this internet purchase is going to be fraudulent. So a decision tree things like that. Computer vision.
So, automatic extraction analysis and understanding of useful information from an image or sequence of images. And a question actually and I love this, thank you, a question came into chat before the presentation that said: hey can we can we use computer vision to describe graphs, and we’ll talk a little bit about it but the sneak peek is spoiler alert yes you can do it. So extraction then also some understanding of useful information from a single image or sequence of images. So the way it works is it examines the data in very very small blocks to determine pixel density. So in graphs which areas of the image are aligned and which areas of the image are not.
And all of a sudden when you understand in very small blocks, yes this is this has pixels this you know the pixel density in black this does not this does this does not you can start to recreate the graph and describe it that way.
So we’ll talk a little bit more about that description. Natural Language Processing – read and understand human language. This is probably the most use of everything right now because of the Amazon Alexa and the Google Assistant you know it assigns probabilities to a given sequence of words and then you get pattern recognition and machine learning. So if I say really fast elephant or teleplan all of a sudden you know it’s going to hear the “te” at the beginning of teleplan a very rudimentary engine might not be able to tell the difference between elephant and teleplan. But it’ll learn over time and so we use it in data mining, in machine translations, in context-sensitive descriptions, speech recognition of course.
So, the last piece is neural networks. A type of machine learning that attempts to recreate a human brain the way a human brain processes. So what it does is it links together a bunch of different nodes or even a bunch of different other neural networks. An interesting way to describe it is you know so you’ll go through a decision tree and come out with an answer and then that will determine whether or not it starts another action.
One way to think about it is if you ask a man to describe I don’t know why I’m hung up on elephants today but if you ask a man to describe an elephant but he can only feel the leg. So close your eyes and tell me about this elephant and all he can feel is the leg. He won’t be able to describe the whole elephant, but if you put multiple people describing that elephant and they can each kind of piece together what it is that’s that begins to get towards the neural network.
Even more importantly, is what happens when that elephant starts to move? If that if each of those people can communicate about how that elephant is walking you can start to put that process together.
So, linking different nodes and networks. So each of those technologies within artificial intelligence are used pretty extensively in some of the things that we’re doing uh within Bookshare. So what? blah blah blah machine learning blah blah blah train the model blah blah blah competing in standards. How about what if we could use some of these techniques to address these challenges? Imagine now a book that comes to us from a publisher and because it’s coming from a publisher we get the text. They send it as an e-text as an e-book and so we don’t have to convert the text into readable format it’s already there. But the equations the math equations come in as images, and they do that because they don’t know if it’s going to be on a small screen or a big screen or a large monitor. And the text will scale pretty easily but the image i’m sorry the math does not scale as easily so they turn that math into an image and when it’s a scalable vector graphic by definition it scales.
So you get you get texts and svgs. But what happens then is when it gets read it says “solve the following equations image image image end of list” and that is what I call the mathless math book. By the way that also happens when we scan a book. If we scan a math book we can use OCR to turn the text into words but we still need to deal with the pictures of the math equations.
All right, so we use a class classification engine using neural networks and computer vision. The first thing we do is we go through and we we find all the math in the book. We run it through a classification engine to determine whether it’s math or not math. What is not math in a math book you might ask. I say it’s a picture of a guy standing on a diving board with t equals zero at his feet and a dotted line to the water and T equals question mark and it says solve the equation.
Right, and so or find the equation that’s not a scannable equation for us. We can’t turn we have to do an image description for that but the quadratic equation might be you know hey here’s the quadratic equation we know that this is a math equation. So then we send it to our classification engine we get a confidence rating on whether it’s math or not. If it’s image based math then we can send it to a very specific math scanning scanning engine.
OCR scanning tool for math. We pre-process the image to make sure that it is scannable and then we send that to the OCR engine. It comes back with the math equation. We get a confidence rating on that, and then we either send it to manual approval cycle or we just re-inject it if it’s confident that it’s matching the equation we re-inject it into the book.
All of a sudden what you’re doing is you’re taking literally months of time to convert a math book that has on average 5,000 and sometimes 8,10,12 thousand equations in it. Instead of having to hand transcribe every single one of those. You’re able to do that with this classification engine and special math OCR engine we’re able to turn that math book in three days, two days.
So, what are the challenges there? Well certainly having the resources to train the engines. It can take weeks and weeks to train a complex model and and interestingly as we get deeper and deeper into this and go beyond math we have to determine whether this is a math formula or a chemistry formula or a physics formula. Right and and they’re different and so all of a sudden you’re building multiple models and training in multiple different ways.
It takes a bunch of computing power. We’re a big Amazon shop and so we’re able to scale up using AWS but certainly not cheap. More data is more accuracy. You know these these models are, they’re data junkies. The more data you give them the happier they are. And then of course tons and tons and tons of different content types.
I wish it would be hey Brad gets to set the rules but there are standards that are in play because this is a global problem. So once the STEM items are described how you display them and that and that becomes a challenge as well. And in fact I’ll even use this example up in the top left of the screen you see 12 slash 02 slash 2020. That would be today’s date but if you’re in Europe that would be the 12th of February because they go day month year, we go month day year in the United States. And if you’re a screen reader reading that, you’re not sure if that’s the date or if that’s a math equation 12 divided by 2 divided by 2020. So that’s challenging, right.
How do you there’s some context, there that needs to that needs to come in. Top right of the screen I have T parentheses E close parentheses equals S parenthesis T close parenthesis. Well, that might be a math or a chemistry equation or a physics equation depending on what your variables are. But a screen reader might just read that as test because it doesn’t recognize the parentheses or the equal sign. If it doesn’t know that it’s math, so again multiple challenges there. I also have, make sure I can get display.
SCREEN READER: heading level two determine the degree of each of the following polynomials list two items image ,image
BRAD TURNER: So that that was straight out of a math book and it said determine the degree of each of the following polynomials and then there were two math equations. And it said image image, and the entire page in that map book when it’s the math plus math book problem that entire page says image image even the word problems read incorrectly because they don’t read as math.
But using the techniques that we just talked about classification engine we’ll go pull those and now they’re grayed out on this on the slide and I apologize, we’ll pull those images and turn them into math and it turns into…
SCREEN READER: heading level two, determine the degree of each of the following polynomials list two items f left parenthesis x right parenthesis equals fraction start x to the four over x squared end of fraction minus 3.5 x to the 1.5 plus 0.85 math 1 of 2. If left parenthesis x right parenthesis equals x to the 6 plus 2.5 x to the 4 minus fraction start 1 over 2 end of fraction x math 2 of 2 end of list
BRAD TURNER: All of a sudden you have usable math. You need to be able to go back and forth to that because that’s pretty quick, but when you get practiced at listening to what you read when you get practice reading with your ears you can transcribe and visualize that equation and start to solve that equation.
That’s where technology all of a sudden leverages AI without us having to go identify whether it’s a mathematician or not. That’s where technology gives us the ability to scale a solution we just went through and processed about 20 million images from math books in Bookshare. 8 million of them came back as math so we’ve just added eight million math equations in the Bookshare for people to to read.
So it’s a, I mean, it’s a complete game changer across the industry
SCREEN READER: heading level two determine the degree of each of the following polynomials adding by two determine the degree in each of the following polynomials list two items f left parenthesis x right parenthesis equals fraction start x two
BRAD TURNER: There we go, sorry started it again. So that’s one place we use AI. Book image analysis for alt text and descriptions is another one. Rather than math, what if we see that it’s not math in our classification engine? How about we then send it to another classification engine to say is this a picture or not. If it’s a picture like a photograph why don’t we send it to a photograph identification tool.
And you know, Google has an has a photo AI descripting tool. Amazon has one. Microsoft has one. Right, so they are commercially available they cost a bunch of money. So, you know, we’re still working through that but the ability to do image analysis to add alt text. You know, here’s a coffee mug. Well is it a coffee mug to show you that cup, you know the coffee mugs come in green or is it a coffee mug to show you there’s steam coming out the top or is it a coffee mug to show you that it’s a different vessel. Right, so all text and descriptions are super important. Alt text is easier long descriptions all often time need a ton of context often from the author.
A recommendation engine we’re all familiar with the Netflix or the Amazon type of recommendation engines – if you like that movie you’ll like this movie and it’s based on how you ranked it and what other people who watch that same movie also liked. You know so lots of different machine learning and decision analysis there. We are launching a smart speaker client, and of course that uses the voice recognition stuff. And and I mentioned the obstacle character the OCR optical character recognition. So, lots of different places and we will continue to use more and more AI within Benetech. You know two quick closing slides and we’ll open up the questions. You know, accessibility for my audience, for people with learning disabilities, and for people who have visual impairment, other reading barriers accessibility becomes a great equalizer for them.
Screen readers, something that will read what’s on your screen. I mentioned that ebook karaoke. Synchronized text, that’s highlighted and synchronized with the audio so you can see the words as they are read to you. If you have severe dyslexia that is a brand new world because you stop focusing on what that word is and what that word is and what that word is and all of a sudden you get to follow along as well as understand the meaning of the page or the paragraph. You know, accessibility from our recommendation engines. Accessibility from the text that we just from the math transformation that we just talked about. Inclusive learning is is really really really more important than ever, and we like to think our technology makes information accessible for all. So personalized learning is the opportunity.
Let me let me stop there. We have just a few minutes for questions, and let me close my PowerPoint here so I can get back to where these questions are coming in. I mentioned that somebody just before the program asked what are the prospects for using machine learning to describe images or graphs.
Marcus thank you for this question. In theory, human volunteers could label large training sets of graphs according to preset guidelines. Specifying key elements to be described and how to describe them. A system like this may provide near instant type of feedback sighted people get from eyeballing graphs. Which is to say, it will add a visual interpretation layer on top of, just instead of, just I’m on top of just reading data without capturing the essence of what is gained by the graphical representation.
So Marc, again great question thank you very much. We actually, about four years ago embarked on a project to describe the images in the top 100 children’s books and it’s really challenging to describe images The Cat in the Hat. If you’re a blind child reading The Cat in the Hat, you don’t know what the cat in the hat looks like. Is it a cat with a baseball cap on? Is it, you know, who would think it’s a tall cat with a top hat on, right? So describing those images becomes kind of a challenge. Graphs and some of the stem topics are a little bit interesting, a little bit easier than that, and we started working on a project that describes some basic graphs so you could tell that it was a upside down or a right side up parabola, it was narrow or wide, it started in the first second third or fourth quadrant.
All of a sudden you get some information about what that graph might look like. You don’t want to give every single point on the graph because you could be there for an infinite amount of time since they’re an infinite number of points but if you can give some basic information, if you could say here’s a parabola it is right side up starting in the fourth in the in the first quadrant, you would know that it’s a two degree equation that’s positive.
And that starts to give you some information about that. So, great question yes we were looking at doing some of that stuff and there’s some actually some other companies out there that are doing some interesting stuff as well. Another question came in, imagine parsing diagrams is very different than equations. Could you speak a little bit more about how you parse and represent diagrams specifically? So, we just talked just a little bit about that. If we can use computer vision to determine where there’s pixels and where there aren’t, of course you have axes which become a little bit of challenge because you say oh there’s pixel density here and is that part of the graph or is that part of the axes. Scale becomes a little bit of a challenge because you if it says you know where is the apex of this graph you have to know what your scale is. So, it’s not easy but you can certainly describe what the graph looks like and arguably you could do it in pretty good detail. Interestingly, of course, the math books don’t always have a perfectly accurate image.
So it says, you know, where is this apex. And they might label it as one comma one but it might be shifted off so if we go too detailed we might find that it’s actually in the drawing that we’re describing because the computer doesn’t doesn’t match it in the drawing it might be at 1.251.25. So, you know there’s a little bit of accuracy that you have to play with as well.
Question came in. What special math OCR engine – I can take one more question Lisa says thank you very much – what special math OCR engine do you use to render sophisticated mathematical formulas and equations? So there’s a couple of them that are out there. We’re using there’s Infty Reader i-n-f-t-y reader there’s also a group that we work with called Mathix, m-a-t-h-t-i-x.
And I think right now in our in our models we’re using Mathix because that also gives us the confidence rating on how accurate their OCR is based on the image that they get. So, check out Mathix.