Patrick Flanagan - Open Your Ears
Welcome to "AigoraCast", conversations with industry experts on how new technologies are transforming sensory and consumer science!
Patrick Flanagan is an Expert Audio Experience Scientist at Logitech. His primary research areas are in the perception of spatial audio processing, psychoacoustics, acoustics, and signal processing. Patrick has co-authored a series of papers on the Evaluation of Binaural Renderers, and has been a member of the Audio Engineering Society since 2004. Before joining Logitech, Patrick was VP of Audio Architecture at THX.
Transcript (Semi-automated, forgive typos!)
John: Patrick, thanks for being on the show.
Patrick: Thanks, John. Thanks for having me.
John: It's really a pleasure. I mean, part of why I've asked you to be on the show is that I think there needs to be a lot more attention on audio in sensory. I think that's been a little bit of a forgotten modality. But something that you and I were talking about before the show is kind of your backstory, which I think is really interesting. It's fascinating to me how many different walks of life end up in sensory so I just like to start the show with you kind of talking about the journey you've taken to end up in sensory.
Patrick: Yeah. So when I started out, I wanted to be a musician and work in the recording studios and produce bands and make them sound good. And I've always had a passion for like behind the scenes of audio. I like making sounds. I have since like around me and in my lab here and I like make noise and really playing around with sound. So that was always my passion was sound from a very early age. And I studied engineering and wanted to work in recording studios. And then I found myself, you know, trying to find something else within audio. And I reach out to this company in Chicago and they were a manufacturer and distributor of high end electronics, stereos that cost more than people's cars like over-engineered. Every component is hand matched to the other component. You know, going to the nth degree in engineering to design a circuit and a lot of that was about listening. Like we would design an amplifier and a speaker and listen to it. And then you would change your component and then you would listen to it. It's like this is heavily rooted in psychology. This is like the subjective evaluation. You listen to things and you try and point out what's wrong. Like the base is wrong or there's distortion or there’s limiters and there's always other ancillary audio processing going on in the electronics that affect quality. And you have to tweak all of those when you're designing an audio system to make it sound correct, like if you're hitting a limiter too hard, you're going to hear it. You're trained to hear it. So that's a whole other aspect of audios, is the design of the audio system and listening and being a trained listener to listen for distortions, for pumping when the EQ's off by a couple of DV. You know it comes with repetition and training. And that kind of just sparked me into doing more things within system design and acoustics design. And, you know, here I am now doing even deeper investigations into the psychology of why we hear the way we hear. Yeah, it’s a really fascinating area to be in for me is the intersection of audio and music and even film and why do we like it? Like, why does the speaker sound different than the speaker? Why does this headphone sound different than the cell phone? And a lot of what that is keep the human. You can put a microphone on it. But microphones don't buy equipment, humans do.
Patrick: That's where it all comes down to. You have to have a panel of people that you trust and and even consumer insights. You know, naive assessors need to go through it and evaluate what they feel, some sort of opinion on the product.
John: That's fascinating. I never really thought about that from the standpoint of sound, because, you know, I mean, we were talking about this before the call, how music and sound and there's a lot of mathematics involved. And so I guess I thought of it in a very kind of sterile way, that the sound wave is coming and waveforms coming in. But actually, there's, as you said, this interaction with humans. That human is I mean, it's very similar to analogous. I had Curtis Luckett was on the show talking about texture and he was saying that food doesn't have texture. Food has physical properties. It only adds texture when somebody eats it, right? If you take a hard candy and you swallow it, it's not crunchy. So, you know, it's interesting. So can you talk a little bit about the individual differences then in sound perception and how that, I mean, how does it get measured, quantified, how do you see it in your research?
Patrick: So sound is our first sense. When we were a baby, I think twenty eight weeks or something around there. We've developed the process of hearing. It's through the stomach. So it's, you know, it's kind of a big filter across it, but we're getting used to hearing in the womb the mother's voice. So, we’re always hearing even when we're sleeping, we're hearing and we develop skills in hearing. And there's a whole other subset of neuroscience with how we grow to hear more and more throughout our life. Basically we are our own non-linear computer. So we have two microphones, one on each side of her head, and they do things independently. So and then we have a face, right, and we have shoulders and the way that our makeup is interacts with sound, so the sound is traveling from in front of you. It travels through space. It travels through the medium of air. And it interacts with your face, your ears and your torso and all of those individual items affect the sound wave differently, depending on the angle of incidence so directly in front of you, the time of arrival at each year is roughly the same. But if I'm off, if the sound source is off to the left the time of arrival at the left ear is faster than time of arrival at the right ear. They don't arrive at the same time because there is a path. Go ahead from left ear to right ear. So that's an integral time delay. And there's also a level delay. There's a level difference within this frequency dependent. So there's some absorption and there's some amplification based on your pinna and your shoulders do some low frequency absorption and those all effect how you perceive sound. But you don't do it, it's an ingrained thing in our evolution to be able to localize things behind you and things around you when you were hunting that back in the caveman eras. But we're all different. And the thing about audio is how do you measure something that, you know it is just like sensory science. It's just like taste. Everyone has different taste profile. Everyone has different ears. Physical devices that sound and we all have different ideas about what we like. And that's all. You grew up listening to classical music. So you don't like rock music or, you know, those are different types of things that affect preference. I'm talking about the mathematical things that are infected, which is transfer functions. The exact like what is the mathematical formula of you versus me? We're completely different. We have differences. We have timing differences. We have level differences that affect the way that sound enters a brain and is processed. And then there's hearing acuity, which is your tympanic membrane and just your normal audiometric hearing function. Can you hear this frequency and that’s about hearing aids and things like that. So but that's a different area altogether.
John: And the thing that strikes me the most interesting here is how sensitive we are to these little changes, because I know an area that we're both interested in is augmented reality or these kind of extended realities where I mean, Logitech, for example, you are working on gaming systems. And I'm sure that a big part of that is trying to make sure that sound is presented in some sort of realistic or believable way. But is it the case for that, even a small mistakes will really stand out? I mean, what happens to people when you're in an extended reality environment and the sound that these kind of details aren't being taken into account by the sound engineers?
Patrick: Well, then, if you're disillusioned and you're not being immersed. If there is a massive delay between the sound and your lips moving. That's the ventriloquist effect and it doesn't look real because if there is a delay in your brain like that this isn't working? Because I hear the sound and your lips aren’t in time. So those types of things happen. In an AR environment or VR environment, those are different modalities. The VR environment, you're completely excluded from the outside world. So it's a little easier to get immersed because you don't have any outside influence. If you have headphones on, there is sound isolating, you can't really hear what's going on in the outside world and your eyes are completely included in the virtual environment. But the sounds need to sound like they're coming from where they are. So if you have a virtual avatar inside. This VR experience, the sound needs to seem like it's coming from in front of you. If it sounds like it's coming from inside your head then it's not as believable. So those are the HRTF (head related transfer functions) so those impulse responses that tell the audio where it is coming from in space. So if you want to produce something that zero degrees elevation and zero degrees, which is right in front of, you need to have the proper impulse responses to involve of the audio with. To make it seem like it's coming from it.
John: That's fascinating. How much does that interact with our other senses? I mean, I suppose that when you look like you're saying in an augmented reality as opposed to virtual reality, do we have expectations about how sounds are going to reflect things that we see around us?
Patrick: So when you walk into a grand church, your brain and your eyes are saying, oh, this is really big and I'm used to a river that is about this big. And then when you speak, you're going to hear like, okay, I'm expecting an echo. So if you're in a VR environment and you're presented in, you know, the Notre Dame Cathedral and the river doesn't sound like it, but you've been there and you remember what it should sound like. Then you have another mismatch. Then your brain is like, I'm not believing it. So those are the other things that need to come because we are very visual species. But there’s always a constant back and forth between our eyes and our ears on who wants to take priority.
John: It sounds impossibly hard, Patrick. I'm not sure if you're making progress on this area. So what are some of the problems that you're working on? What are the things that are kind of captivating your attention right now?
Patrick: Right now, I'm focused on gaming audio. So, gaming is a major business vertical in the world. And we want to understand, you know, competitive gaming. So these people play for hours and hours and how are they so good? Because it's a very audit, it's moving around, but they're listening to their teammates. They're listening to the gunshots in the footsteps of these games. How do they do that and there's like a training phase and they've understood how to pick out footsteps and gunshots from the audio to do it better than someone else. But why and how? You know, are they doing it in a stereo, you know world where there is no spatial cues. They're just listening to a stereo mix, or are they doing it in a more multi-channel environment over headphones where they're trying to render out the space to be bigger than it is. So there's different ways to go about rendering the audio to present it to the user. So which one's better? We don't know. Is there a better EQ? Like different headphones are designed differently to sound different. Is there an ideal headphone? Like for gaming, is there an ideal transfer function the way that the gear inner ear interacts with the transducer on your head? Is there an ideal target function that should be achieved? In loudspeaker world, we know through a lot of research that a loudspeaker should be completely flat in an environment. But we don't know that about headphones because headphones is coupled directly to your head and it doesn't have the room to interact with. When you listen to loudspeakers, you're listening a loudspeaker and the room. When you're listening to the headphones, you're listening to the headphones.
John: So, do pro gamer then use different headphones? Are they like golfers where they have their preferred clubs, so they have preferred headphones?
Patrick: Yeah, most of them are on teams like they're sponsored by Logitech or Astro or Razer or all these other gaming audio companies. They have sponsors and they wear certain headphone and then they get a certain design that goes with it, you know, the way they like it to sound. And they get to customize the EQ's and dials on it to tune it into the way they like it. But that's one of the investigation is, why do you hear better than me? Why do you play this better than me? And then if I manipulate something, if I manipulate the signal processing, you know, do you have to retrain yourself? So there's a whole neuroplasticity phase about, you know, retraining your brain to hear a new type of cue that helps you play the game better. And what is that neuroplasticity phase? How long does it take to train to a new sound? That's a fascinating area of research.
John: It seems exactly analogous to like the basketball shoes or the...
Patrick: Yeah like sports players, you know, they play better because they have better shorts or better shoes or better socks. Okay, they're at such a level of their craft that minor differences in materials make them play incrementally better. There has to be something for that, for audio. Otherwise, I won't have a job.
John: Okay, so there's the gaming side, and so what other things are you thinking about?
Patrick: AR and VR are always very interesting. AR is a mixture of the real world and the augmented world. And audio is a major factor in that, because if you're here, like I'm sitting in my lab here in California and you're virtually here over AR glasses. But you're in your states.
John: AR and VR are always very interesting. AR is a mixture of the real world and the augmented world. And audio is a major factor in that, because if you're here, like I'm sitting in my lab here in California and you're virtually here over AR glasses. But you're in your states.
Patrick: Yeah. Just so you're in Virginia in your office and you're speaking into your AR device and it's capturing your voice and your room. So it's capturing your voices interaction with the room into the microphone. And I'm in MySpace, but you're virtually in MySpace now. So we're virtually sitting across each other from a table. But your voice sounds like you're in a different space. What happens? So there's a lot of signal process and it's going to have to happen to extract your voice from the room that you're in and then reprocess your voice to make it sound like you're in the room with me and vice versa. I need to sound like I'm in your space. So how do you do that? It's a lot of signal processing to remove blind source separation, principal component analysis. All of that stuff needs to happen to extract the cues that are going on the sound to make it sound like it's coming from where it should be. So that's a major Mount Everest for AR. It's a lot more complicated. If you're walking down the street, you know, and you're in an augmented, you know, navigation and you have headphones or some sort of off ear device that's transporting the sound to you. You need to be able to hear around you as well. So you can't be liable for someone getting hit by a car if there are an uncanny valley AR experienced where they're not noticing that that car is coming. Because we're still we still have to be able to hear in the real world, right? There's a lot of things to climb in that world.
John: Right. So the sound has to really come, you're walking down the street, there's some sound that is appearing in the AR environment. Has the sound as if it was coming from the real world?
Patrick: Exactly. So, yeah, a major, you know, if you're in New York City. It's got to sound like it's in New York City versus if you're in the woods. So the sound is going to have to be dynamic. So you're going to have like 5G and edge computing is going to be a lot about like, okay, this person is in this environment so we need to we need to modify the sound that we're sending to this user at the edge so we can do it. We can't just have a pre-built experience. Otherwise it's not going to be as immersive. You're not going to have the uncanny valley experience, and that's what they're looking for AR.
John: Right. And for our listeners, who maybe not familiar with the term, the uncanny valley is maybe you can explain the idea.
Patrick: You're so immersed that you don't know where you are, basically. You're completely in the moment. I can't remember what book was it from, the expression uncanny. Yeah, you're just completely into whatever the experiences and you have no idea that you were in a different world like you're in The Matrix.
John: Interesting. Okay, so Patrick, time is flying by here. I definitely want to get to how you see what you're doing, interacting with the wider sensory world, because I know that you're connected to, there are a few, a small number of sensory researchers who are interested in sound and audition. And I'm kind of curious to what extent you're able to interact with that world or do you pretty much have to say, I mean, it seems to me that there should be more connections between these different fields. So how do you see the work that you do connecting to the kind of wider sensory world? Do you see yourself bringing lot of tools back from the sensory world?
Patrick: Definitely. Most of the audio tools that we have today are all derived from the food world. The tests, the color wheel. We use that for audio, which is reassigns. And a lot of that work came out of Delta SenseLab in Denmark. There should be a lot more interaction. I think we talked about this a while ago. There is a study about presenting, does the taste of beer taste different in VR versus a lab? I think it was out of Greece or Italy.
John: Yeah, Italy. I believe.
Patrick: So they put the subjects in a VR experience in a pub and had them taste the beer. And then they put subjects in a lab, white room, white tables and taste beer and they got different answers. A lot of that has to do with the visual, but also has to do with the audio. When you hear the chatter and you hear the, you know, the football game on, you think that's an experience that's eliciting emotion. Emotion is triggering a different response as opposed to. And you're going into a lab and get 20 bucks to taste beer. Yeah. Definitely, there needs to be a lot more I guess inside and collaboration between what we do in the audio and what's going on in the other sensory science.
John: Right. Yeah, especially as a sensory scientist start to use more of these immersive environments. I was talking to Christer Volk in Denmark. I believe, you know, recently he's going to be on the show actually in a few weeks. And he was saying that surprising to him that when you read papers in sensory that are using immersive environments, sometimes the audio levels aren't even reported in the methods. That the audio is just completely taken for granted without really any awareness of the fact that there are variables involved.
Patrick: They might be doing it in stereo. Not producing the audio in a sense that is relative to the VR experience. They might just be like just put some music in the background and then it's not fitting the environment that the subject is in.
John: So what are the things before we have to wrap it up here, what are the things you would recommend to our sensory listeners who are interested in audition, but they don't have a lot of experience with the audio? What are the things that they should be learning about things that are most important when you're learning about audio?
Patrick: It's a good question. I think just understanding that music and film impact emotion. They just really like if you're designing an experiment. Think about what it is when you go to the movie theater. It's not all visual. It's visual and sound. And sometimes the sound is what makes you get goosebumps when you listen to a particular piece of music. Music drives emotion and emotion drives answers. So just think about when you're designing experiments, do you want music to influence the answer? Or do you want an audio to influence the answer? Or do you not like, the problem with audio is it's taken a step back is influenced by a lot of different modalities because you can always see. So if you're trying to get someone to listen to pair of speakers right next to another pair, they look different. Might one might have gold on it and one is black. So I'm pretty biased to think gold is better. But actually they're both the same speaker just as gold on it. But those are other things that happen. So I think, don't forget that we're a computer with microphones on our head, and if you want to, you know, talk more, send me a message on LinkedIn. I can point you to some papers. There's a lot of stuff going on in neuroscience and psychology, psychoacoustics is the study of audio within psychology. It's a vast world of understanding how people hear and I think we still don't know what we're doing. And I think that's with every science, like there's always room for discovery. There's always room to figure out why things happen, so I guess, be a skeptic and always, you know, don't forget you have ears.
John: Yeah, that's right. I mean, it's interesting to hear you talk about this because it really is true, like when I've been in the field a lot in sensory or consumer research and I spend a lot of attention to how things look. And I have really not paid attention to how things sound, that you bring people in for an experiment and you really obsess over the appearance of everything. But then it's easy to forget that the sound is going to be a really important factor influencing people's decisions as well.
Patrick: If you're sitting in the lobby listening to metallica and then another subject group is sitting in the lobby listening to jazz. What happens? It's not just what they see around them, but it's what they hear.
John: Yeah, that's really, really interesting. Okay, this has been great, Patrick, so it's kind of last question I always like to ask. If you were going to give advice to someone just starting out there in their career in sensory, I mean, we've talked about paying attention to sound, is there any additional advice you would offer that person? Maybe the recent college graduate, something like that?
Patrick: I mean, there's a massive amount of research going on in the AES, which is the Audio Engineering Society about audio sensory science. It's heavily rooted in spatial audio, which is how we hear in the real world with impulse responses and VR and AR. There's a lot of new work going on across the world, different universities. They are doing a lot of work on that, I would suggest to join the AES if you're interested in audio. It's like one hundred dollar a year to get access to all the academic papers.
John: AES is the the audio engineering society?
Patrick: Yeah, the audio engineering society. Yeah. It's a worldwide organization. We have a massive event all of October. It's going to be entirely online, obviously, but it's the entire month and it's about architectural acoustics and spatial audio and VR and gaming audio and broadcast audio. Audio is a major part of our daily lives. You know, you listen to Spotify or you listen to YouTube or Apple Music or you watch Comcast like there's audio and there's there's standards bodies that say the audio needs to be a certain level of consumers. So there's a lot going on that people aren't aware of just in audio. Reach out to the AES. There's a lot of really good information in their audio and neuroscience, auditory neuroscience books are interesting. There's several books on the sensory evaluation of sound and also the Zakharov and Sternbach and those are some guys from Denmark. They've done a lot of work in the space. Middlebrook's is also a great resource. So your nose and throat professor and Irvine, I believe, pretty does a lot of sensory science stuff.
John: Fascinating. Alright. Well, this has been great, Patrick. You really opened my ears, I guess it is a way to put it. Yeah, and you know, we had Steve Keller on recently, so if someone is listening to this,I would recommend also Steve's episode. We're going to try to be talking more about sound. So really appreciate you being on the show, Patrick, and helping to raise awareness on sound.
Patrick: Happy to be here.
John: Great. Thanks a lot.
Patrick: Thank you.
John: Okay, that's it. Hope you enjoyed this conversation. If you did, please help us grow our audience by telling your friend about AigoraCast and leaving us a positive review on iTunes. Thanks.
That's it for now. If you'd like to receive email updates from Aigora, including weekly video recaps of our blog activity, click on the button below to join our email list. Thanks for stopping by!