AI, Voice-Based Assistants, and the Future of Document Management
What does it mean to ask your documents to do things? And how could AI change the way we interact with our work?
In 1960, IBM introduced a revolutionary device called Shoebox. It could recognize and respond to just 16 words when spoken to through a microphone and do simple mathematical calculations in response. An early forerunner of AI and voice recognition, the device was soon set aside for graphic technology. To date, the majority of our interaction with technology in general has been with graphics — things we can see and read. When we open a document on our computer, the technology enables us to view it, but so far we have been doing only that: viewing.
Over the last 50 plus years, technological advancements and AI research have moved voice user interfaces into the forefront of the tech we are interacting with on a daily basis. With the introduction of voice-based assistants such as Siri and Alexa, over half of us are interacting with voice-based assistants for tasks every day. So how can AI like voice-based technology enhance the way we interact with objects like documents that are normally experienced visually?
The more our voice-based assistants are capable of, the more we use them. As speech recognition continues to develop and improve, it could mean incredible things for documents. Imagine being able to fill out a form — or better yet, create one — by using verbal commands instead of typing it all out. The ease of onboarding could be completed using your phone, your watch, or even your Amazon Echo. And this is just one example of the direction voice-based assistants are heading.
Mobility, complexity, and accessibility
David Parmenter has been working on speech recognition applications since the 1990s and was a lead engineer on one of the first commercially successful general purpose speech products, Dragon Naturally Speaking. He is now the director of engineering and data for Adobe Document Cloud, where they are exploring AI and voice-based assistant technology to facilitate document mobility, complexity, and accessibility in a way that graphics alone cannot. “My laser-focus is building features that use machine learning to make documents better,” he says.
At the forefront of voice-based assistant development is the need to make things more mobile. Smartphones and smartwatches have created interfaces we carry with us at all times. “At Document Cloud, our goal is to work where the customer works,” David says. “That goes from a more existential customer third-party integration to literally where customer are physically in the world. Are they in their inbox? Are they using the cloud? Are they in Manhattan? And how do we accommodate that location in a simple, accessible way?”
Although we may not have Acrobat open on a desktop, we have our watch at the gym or our tablet on the plane. Mobile applications make it possible to receive a document wherever you are. Now imagine being able to tell your phone or tablet, “Give me a summary of the document Ben just sent me,” and being able to receive a succinct, accurate summary.
Effective summarization would allow a quick analyzation of whether or not the document is urgent. Or, if you’re searching for a specific document, asking for a summary can quickly tell you if you’ve found what you’re looking for.
But summarization can be tricky. David explains it this way, “If I summarize a document and I’m just a computer, it’s pretty subjective — even if a human summarized it for you, they might leave out things that you would find really valuable.” The solution David suggests for this problem comes in two parts. The first is creating AI technology that doesn’t just summarize, but also highlights. “If you do that,” David says, “then the eye will be naturally drawn to the highlights. It can look and see everything that we marked for summary in context.” The second part of building effective summarization is, essentially, crowdsourcing by using the information gathered by those who look at the document. When other people highlight a document and make notes, that information could be compiled by AI to create a summary.
Complexity is an important element to using voice-based assistants. Not only is the technology itself complex, but documents are complex, too. They are organized into sections with many paragraphs, paragraphs with headings, subheadings, and footnotes across multiple pages — not to mention the complexity in something like a legal document or annotated scholarly paper. That makes it difficult to create technology that can sort through and prioritize these documents properly.
In order to ask a voice-based assistant to search within your existing documents for you, the assistant needs to be able to interpret your question’s meaning, then search across multiple documents and workflows for the answer, and finally relay that answer back to you. The kind of formal language fed to a computer doesn’t always account for slang or phrasing that is common in how we speak, but technically incorrect. AI speech interfaces need to be able to understand those nuances in order to effectively interact with users.
Effective, nuanced voice-based assistants can also greatly cut down on the complexity of tasks. In document management, this means easing the need to learn everything about the software you’re using. Determining how to customize the workflows to fit your needs, finding the right tool, or the right solution can be daunting. “If you look at our enterprise features, what can be done in terms of customization of the workflows and discovering those workflows is really quite complicated,” David says, “And finding the right workflow and the right tool or the right solution is complex enough that a speech interface would really be helpful.”
In the future, being able to have a dialog with a voice interface could create interactions that improve the user experience through their simplicity. Imagine being able to say, “Merge these two documents and erase pages 8-10,” and have the technology do it for you. AI will limit the amount of time spent on mundane or repetitive tasks and free users’ valuable time.
One of the most exciting improvements that voice-based assistants could make in document management is accessibility. “Accessibility is about granting as many people as possible access to the information so that it’s not hidden from them,” David says. An element of accessibility comes to someone who might not have time and energy to devote to reading a document by allowing them to ask for a summary or pull out certain elements of the document by asking a question. This enables more interactions with documents than ever before by widening the scope of those who might be able to use the information it contains.
Speech interfaces also have potential to open access to those with any sort of disability that precludes them from interacting with a document visually.
These improvements are looking way beyond just reading a PDF or document as if it were an e-book. They are creating a more efficient workplace. Having effective document summarization or highlighting from fellow team members as part of the things that run across their desk for review is incredibly valuable and allows them to work with ease at a new, higher level.
What the future holds
Voice-based assistants have come a long way and are rapidly evolving to meet the demands of those in and out of the workplace. As AI continues to advance, the use of voice-based assistants will increase mobility, complexity, and accessibility across the board — creating a more effective, enjoyable workplace.
“We would like our customers to spend their time doing interesting things,” David says. “If something is mundane or repetitive, we would like to automate it and hopefully make it fun. So, get ready for big and small changes that will make the way you work with documents better than ever.”
Read more about our future with artificial intelligence in our Human & Machine collection.