Beyond Stereo — Audio Makes Immersive Experiences Complete

Acclaimed director, George Lucas, is well known for his obsession with sound in film. So much so that he developed the THX certification to ensure Return of the Jedi could be enjoyed in theaters with the complete, intended audio effects. “I feel that sound is half the experience… It’s where you get the most bang for your buck,” he explained in multiple interviews.

Audio is even more important for virtual reality (VR) and other immersive experiences. “It’s more critical because sound needs to do more than enhance the mood or fill out the visual experience,” says Michael Cragg, senior experience designer at Adobe. “In VR, sound cues actually have to help orient the viewer, so they have a better sense of where they are in space and where they should be looking.”

Michael knows from first-hand experience. He’s been working with Yaniv de Ridder, senior experience developer at Adobe, to improve audio editing tools and techniques for immersive video for several years. Although they both began their careers focusing on visual effects and animation, each credits a long-term love of music and musicianship as pivotal in their experience.

“I studied programming in school in Belgium, and I found myself developing a lot of content using software like Flash and After Effects, and that eventually brought me to Adobe,” says Yaniv. “But I’ve always been very passionate about music, too. I played some instruments when I was young, and I’ve been involved with three different record labels over the years, the most recent focused on hybrid audio-visual experiences and VR music videos.”

As for Michael? “Both my parents were musicians. It’s always been important to me. I was even a music major for a year in college, but eventually shifted my focus to another love — motion graphics,” he says. “As part of that, I’ve always had a huge interest in how music combines with visuals to impact emotion and mood.”

The challenge of 2D tools in a 3D world.

One of the biggest challenges facing immersive audio is the traditional toolset for editing stereo audio. Stereo audio works well for images on a two-dimensional surface because the left and right nature of the audio is a match for the left and right sides of the screen. Surround sound creates the illusion of depth, by placing audio effects around the viewer, but it can’t respond to the movement or head-turns of the viewer.

“Audio and video need to be correlated together, along with the position of the viewer, in order to have immersion. Otherwise you can easily step out of that experience if something doesn’t feel right,” Yaniv explains. “If someone talks to you and they are on your right, you want to hear them coming from the right. If you turn your head and face them, now you want to feel that sound coming from in front of you.”

Another challenge is established workflows in filmmaking. Yaniv adds, “When you look at a film or broadcasting workflow, you have separate audio engineering and video editing tools. They’re typically detached from one another. That’s not ideal for immersive audio because we have this requirement to maintain strong relationships between the visual and the audio components. There’s a lot to reinvent there.”

The past as prologue.

Unexpectedly, many of the ideas and technologies that are most useful for solving the challenges of immersive audio are decades, if not thousands of years old.

“You can trace the concept of immersive sound all the way back to Pythagoras,” says Yaniv. “We know him as a mathematician, but he was also a musician and he would teach music to his students by putting them in a dark room. He’d play them different sounds from different places around the room and ask them to feel the sounds and understand the characteristics of different frequencies. He was trying to make his students feel the sound and where they were in space.”

And more recently, the ambisonics audio technology developed in the 1970s is proving useful for capturing immersive audio. “They had these microphones that actually had four heads on them. It’s called a first-order ambisonics microphone, and it records four different channels — giving you 360-degree surround audio. The technology has been around for a long time, it’s just been waiting for the right application,” Michael says.

Of course, more cutting-edge solutions are required as well. Michael explains the challenge, “One of the things we found when we recorded ambisonics audio, is that it’s really hard to get it aligned correctly to your video. A lot of it has to do with the way you hold the mic in the real world, where it’s difficult to identify the front of the microphone from the back. You didn’t have to worry about that with stereo. You just dropped it in. But now, the viewer can be facing any direction, so we need a way to be able to align that.”

The solution was inspired by an Adobe MAX Sneaks presentation called #Syncmaster that Michael and Yaniv worked on last year. It used colors to add information about bass, mid, and treble sounds to the traditional audio editing waveform, making it easier for editors to understand and sync their audio.

Michael says, “Being able to visualize the audio in new ways was really cool. So what we’ve done with immersive editing is add these color particles that represent where the audio is coming from spatially, and then the color represents the pitch, or frequency, of the audio. So, once you lay that on top of the video, you can more easily align it to the visuals of the video so that they line up.”

“We’re having to invent a whole new narrative language and a lot of new techniques for creating immersive experiences,” says Yaniv. “We are now at this moment in time where those things come together. It’s a big epiphany. I can use all my own knowledge and experience from the different parts of my life and make them one. To me, that’s incredibly exciting.”

This year, Yaniv was the presenter at MAX Sneaks. Check out #SonicScape to see the technology in action.

Read more about the future of immersive experiences in our Beyond the Screen collection.

