Nick Zacharov - Sounds of Progress
Welcome to "AigoraCast", conversations with industry experts on how new technologies are transforming sensory and consumer science!
Dr. Nick Zacharov is a lead technologist at FORCE Technology, SenseLab, Denmark. With an academic background in electroacoustics, acoustics, and signal processing, Nick has broad industrial experience in the audio profession. Having held engineering and managerial posts at Nokia and Genelec, Nick co-founded FORCE Technologies SenseLab in 2007, providing world-class listening tests and sensory evaluation services to the audio community.
He holds several audio patents and has more than 90 major publications to his name. Nick is the co-author of “Perceptual Audio Evaluation – theory, method and application”, and editor of the book “Sensory Evaluation of Sound”, both acknowledged textbooks in the field.
Transcript (Semi-automated, forgive typos!)
John: So, Nick, thanks a lot for being on the show.
Nick: Thanks a lot, John for having me on.
John: Yeah, it's really a pleasure. I mean, I have to say, I'm very excited to have you on the show because I think that there is this kind of merger of different, you know a whole bunch of future things are coming together right now. Thanks to technology. And I think that one of the things that is happening is we're moving into a kind of extended reality where you've got the virtual invisible worlds coming together. And as part of that, I think we're going to have a merger of the sort of chemical sense-based sensory and some of the more UX kind of the more focused senses, such as sight and sound that are going to be coming together and interacting. And I see you as someone who's really been a pioneer in the kind of intersection of those two areas that you have been, I think, embracing and practicing a lot of the sensory evaluation techniques that have come out of the more chemical sense oriented side of sensory. But you're also interacting with the kind of more electronic technology-focused like UX applications. So I think it's really great to hear your background, you know, how you got to be where you are because I think you've really kind of a linchpin of something that's going to be very important in the next few years.
Nick: Right. Well, I think you're right that there is this, let's say, joining of technologies going forward and a lot of multisensory stuff going to be coming to consumers and professionals. So a little bit of background about me. So, well, I'm international. I'm European. So I have a mixture of nationalities and multiple passports. So this is a British type, there's a French type, there's a Russian side and now I live in Finland and work for a Danish company. My background is in electroacoustic and that is a lot of engineering about sound engineering, essentially how you make loudspeakers, microphones, stuff like that and that's where I did all my course studies for over the years, I started to get more and more into how we listen to sound and sound quality. This was a passion for me as a musician many years ago, and slowly but surely I started to get drawn into, well, what is good audio quality. And then I started to get inspired by what was happening in the food industry, sort of learning about wine and what the people at Davis were doing and in about ninety-nine, I bought sensory evaluation of food and I met on Noble and had a long conversation about what I might do in a PhD and how we might apply sensory evaluation techniques to audio. I had a very interesting conversation. She basically just said, well, what's stopping you? Just go and do it. And that's what I've been doing since 99. So I think it's quite valuable to do these things, which is to bridge technology and methods from one field and let's say bootstrap another field. And we've had a lot of good traction in the last bit over 20 years and let's say learning from traditional sensory evaluation techniques from the food industry and other related industries and sort of mapping them into our domain, which is sort of the same same but different sound. And the auditory perception is similar to taste and smell, but also the products that we have all have their own challenges and both from the how we measure and how we model and stuff like that.
John: Yeah. Okay, well let's just kind of really hear you talk about what's the same, what's different? You know, what are the similarities when you're studying sound versus the chemical senses, and then what are the important differences? If you could give us a little summary of that.
Nick: Yeah, I think it's pretty straightforward at the outset. I mean, in a way, I think all of the senses are very similar. We hear the conversations among experts. But okay, well, my domain is more complex than yours and I think that's a bit of an arm wrestle, which is amusing, but at the end of the day, one can really apply a lot of the sensory techniques to, I believe, any of the senses and even multimodal cases as we have a little bit of experience. But what is particularly different is that when we talk about sound, we're talking about the sound from let's say that originates from an instrument. That originates from a bird or something like that. But the audio industry is more about reproducing sound. So it is sound, which is going through across the Internet, across radio waves, and then being reproduced by a loudspeaker, a headphone, a hearing aid, something like that. And these the multi gazillion dollar business of audio is reproduced sound. Now how about if it is from the food industry is we have a container and we are transmitting sound and let's say unpacking it. So this interaction between the medium, the container, and the actual original sound is vital. So always we are measuring, let's say the sound system and the sound scramble. So it could be a speech sample. It could be a music track. It could be a bird song and it's noise. But what we're really interested in is what the technology is doing to that and how it's modifying. And so the analogy in the food industry is that actually we're not just evaluating the wine, we're evaluating the wine in the glass or in the bottle. And that's our starting point. So we always have multiple interactions when we study things.
John: Yeah, okay, well, it's hard for me to interview because I have so many questions because I think that what you've been doing is so interesting to me. So let's talk a little bit about you've been talking to me before the show about individual differences and individual differences in sound perception. So what do people maybe even just talk on the topic and educate our listeners a little bit on the differences in how people perceive sound?
Nick: Yeah, I mean, there are several things that I mean, we're all individuals and we all have opinions and we will experience things somewhat differently. But also, if we look at the field of audio, a lot of the assumptions from many years is that, okay, we've got a fundamentally similar set of base. We've got similar of that and we've got a similar brain. And so we're going to perceive things in a similar manner and a lot of the work is to say, well, what if we come up with an average model of Mr. Consumer and Mrs. Consumer? And we'll use that to base a lot of audio algorithm development on. If we think about things like audio coding, like MP3, AAC, and all of these other things. I do in that we have a Mr. Average consumer with normal hearing and that's a known characteristic. And that is fair enough for those people who have normal hearing.
Nick: Now in some areas like audiology. So I talk about hearing impairment. There is a greater understanding that we're not all equal. So particularly if you're like me, are listening for too many years to too much loud music or what an industry, you may have some hearing impairment. It may not be symmetrical in you both of your ears, one may be worse than the other. It is not getting into the need for having very personally tailored technology. So a hearing aid is actually tailored to an individual, not only to an individual. It's tailored to your left ear and to your right ear specifically and that will evolve with time and your needs and let's say your hearing loss characteristics. So I think what we're starting to see is, first of all, an appreciation in the field about individual differences. I think, from the audiology world. And the question will be how this evolves. Differences on many levels of hearing loss are one that's not such a nice one. But let's say people are different in terms of their hearing acuity and their interest. So people who tend to be musically oriented, like going to classical concerts, listening or any concert for that matter, and paying attention to music, tend to be a little bit more acute to audio or interested. And you have other people who well, it's not so important so maybe they have different expectations. Then you have audio files. You have actually qualified experts in the field. And so we have different dimensions of our expectations and our individuality. So it is going to be interesting to see how we use that to benefit the audio technology. Do we start tuning technology in a particular way for different groups versus different groups of consumers?
Nick: Yeah. So one thing which we've been working on is in the last couple of years is looking at the benefits of machine learning and how it could complement, our traditional sensory evaluation toolset and using machine learning or regression models to predict audio quality that would normally be done through a listening test has been going on for quite a long time. But one thing which is we're seeing is quite interesting is that we can't necessarily do listening tests for all of these different groups of people. It's just fundamentally too time-consuming, too costly. It doesn't matter how clever we are. Now, experimental designs, it's just something that we don't do in the industry. So what's interesting is that we can use machine learning to train for different processor types, for different hearing types. And we wanted to see whether we could actually succeed in that area. And we've had a research project for a couple of years looking at stuff like that. And it turns out, yes, you can actually predict quite well the expectations of different hearing loss types, of course, stereotype through certain groups of people. But using that technique, we have the possibility to be able to estimate quite well, I wouldn't say accurately predict the estimate quite well how a sound file will be perceived by a normal hearing group or a mild or moderate hearing loss group and this is not trivial for sure. The advantage of machine learning in this regard is going to be quite interesting because we're going to be able to do stuff with machine learning that we just wouldn't access through traditional sensory evaluation, listening test type of practices and I think that's a great opportunity. So these are things which 10 years ago we wouldn't have even thought about.
John: Okay, so what are some examples of things that you can now do with machine learning?
Nick: Well, the first thing is being able to predict lots of different groups and I think what we've shown with that study is that we are going to be able to predict also with the acceptance levels for a consumer versus an audio file versus an expert. But also the machine learning stuff in audio is getting to the point where we're starting to see vast steps in algorithmic development where we're able to better optimize certain algorithms so that they can function with very high performance in a mobile device in hearables and devices that have low power and limited computational power. We're able to get to a point we made in the industry. We're starting the trend that machine learning is going to be able to crack some problems that previously weren't feasible through traditional signal processing techniques.
John: So there's a few things about that. I mean, one is, actually, a lot of what you're talking about reminds me of what's happening in the kind of food industry where the product development lifecycle sped up because we can basically do simulations. You're talking about simple things like simulated testing. Right? However, it seems like there's kind of more here that you're talking about, maybe analyzing the models, trying to figure out what are the important variables, the key signals, this kind of thing. So are you starting to get into that as well now?
Nick: Actually that's quite interesting what you're saying because you're talking about what we would like to do next basically. I mean, going back to this experience with machine learning and different hearing impairment, then what we're starting to say is that non-trivial differences in the patterns between these different groups of assesses. The point where we can start to explore what those differences are using the models. So in a way, you're sort of seeing some evidence of something interesting and particular happening between the different data sets. And I think the future is to actually use machine learning to try and explore that a bit further, to delve into the data in a way that you didn't really consider before. Because in your computer, you're going to be able to actually pick around and sort of try and pull it apart and try different metrics and try and explain it. So I think that the exploratory option is also extremely powerful because you've got a whole gamut of data that you can dig into. So I think the opportunities are not endless, but there are some very interesting new things coming along.
John: And how do you see this interacting with devices? I mean, it seems to me, you're way more expert on this than I am so I just like to get your opinion. But it seems to me there is a kind of merger that's happening between headphones and hearing aids that you have these devices like the Apple AirPods, right? You're just going to keep getting smaller and at some point, there will be hearing aids. They'll have a cooler name and they'll probably look better.
Nick: So basically, there is the category which we've now termed hearables, which is the middle ground. So it's between your true wireless AirPods and hearing aid and in Europe we having some deregulation. So you can now get over-the-counter devices, which are meant to assist you, let's say some hearing defects, and so on. They're not medical devices and on medical devices, but they are potentially going to be able to bridge that gap. And you're totally right that there is I mean, this is just a matter of time when we start to see that our normal headphones are able to help us in lots of years to go with our speech communication or maybe denoising things. There's a lot of tech here, which is I mean, a lot of this comes from the hearing aid technologies, from the mobile phone technologies, let's say, thousands of many as of signal processing background to feed into this field. So, yep, I think it actually looks quite rosy in our experience that we're going to have in the next decade if it's not already good.
John: Yes. Well, I mean, it's fascinating because you can imagine something like Real-Time Translation where you're talking to somebody. Right? And they're talking to you in a different language in your ear.
Nick: It is essentially feasible. I'm not sure whether there are too many solutions out there yet, but it is coming. It's a matter of the power that's needed to do that. But, yes, it's a matter of time, really, before we all have it.
John: Right. So that comes actually two topics that I want to make sure we get to, which are an extended reality and the role of audio and extended reality. And then the kind of related question of human-computer interface. Because what you're talking about, if you've got these hearables and getting information in real-time, maybe it's translation, maybe just directions, maybe you ask a question to the ear and I don't know. I think the audio is very exciting. Well, you know that we're a fan of smart speaker surveys. We do surveys on smart speakers. They can be done. I've got some echo frames here. We can do them on a frame. Soon we'll be doing them on hearables, right? So I'd just like to hear your thoughts on how your research and of course, I think machine learning is a key part of that. But just in general, how the audio world is going to interact with this movement into extended reality and also the role it will play with human-computer interfaces?
Nick: Yeah, I think the extended reality is the next thing. And I think there's going to be a very big challenge for all of us in the field of sensory and also all of the UX people to try and make sure that this technology succeeds. We've got several different waves of VR glasses in the last 30 years and they haven't all been very successful. So I think getting to a point where something like this with a little headphone is going to be sufficiently transparent that we see consumer acceptance and there's a lot of technological challenge in terms of power consumption, computing power like that, which have to be overcome. But I think the other side of it is making sure that these are things that we want to use. That they're not big and clumsy and sort of embarrassing and so on. So I think that's going to be a very interesting task for everybody, really. And it's non-trivial for sure. I mean, to make it transparent, but giving added value to us, giving us the right information in the right doses. There are a lot of open questions, but I think it is going to be an important thing and one could speculate about what it might be. I mean, we talk about hearing aids today and hearing impairment, just the idea of having a transcription so that you can actually read some of the things that are being said to you in an environment could easily augment, let's say, the hearing aid and the traditional auditory playback. So there's going to be some stuff there. But let's not underestimate the challenge of the audio capture. So we've been talking about reproduction, mainly what I've been told about that mainly, but the audio capture. Here I am talking to you in a quiet office with a great microphone and everything's good. When we're out on the underground and or jogging and we want to talk to our frames and our glasses and that needs to be perfectly captured and the noise, there's a whole load of audio processing to be done there, whether it's a traditional signal processing or more of machine learning type of approach. That's a matter of. So two selection and development. I think it's going to happen the extra stuff that I'm sure that it will be extremely important for the professional domains. But I think the question is whether we will succeed in seeing widespread consumer adoption and that's the big question mark.
John: Right, yeah and we've talked about that for you being a sound challenge. Right? I mean, that's why I think Apple has really excelled always right. It is on the design and it's why they've seen widespread adoption led the way. So, okay, well, we're actually going to wrap up here in a minute. But I'd like to know what are some of the applications you're most excited about? So you talked about the machine learning work that you've been doing? What are some of the projects you've been involved in, categories you've been working on, you know kind of the, where have you found that the technology you've been working on have been delivering the most and kind of the real-world applications, things that you found most exciting?
Nick: I think well, that's a difficult one. We're very broadly spread. So we say everything from a wind turbine to a hearing aid to Bluetooth technology and so on. I think what's nice to say, I mean, we were one of the major test labs for the recent Bluetooth coding technology and that's a low complexity, low latency codec. And its performance is just amazing. So in that technology which will be used across the entire audio and consumer industry, whether it's going to be a call with your hearables, with your hearing aid, with your mobile phone, everything. So when you say the performance getting to the point where it's what we would say transparent, and that means it's very close to the original, as we would have heard it on our reference point is roughly the CD quality. And when you can say that that's now something which is just going to be everywhere across every single device, that's just brilliant. It's just wonderful compared to where we were with the beginnings of audio coding, it's just come such a long way in just over two decades. So I think that's very exciting and I think that nowadays we're saying I think also the hearable field is very exciting. So many of you and the listeners will have tried some AirPods or something like that where there is a lot of clever technology going on, whether it's active noise control or whether it is voice recognition, stuff like that. And I think the trends there are just going to be getting better and better and more exciting. And all of the time, what we're trying to do is handle more and more complex use cases and make the audio as good as possible and as transparent as possible so you don't notice that it's there. So things like Airpods and these hearable devices are getting smaller and smaller. They're not going to disappear completely, but they are getting more and more transparent. And in the industry, people are starting to understand what people need in different situations, whether they're jogging, whether they're in their office if they're in the car if they're on the plane. And we're saying that there's a sensitivity in the product development teams to create features and functions that make these things even more transparent and elegant to those situations. So I think I think these things are all very exciting. And I'm involved in some very exciting new trends, which I can't talk about sadly. There's some cool kit coming along for sure. And it's pretty exciting to be involved with it nowadays.
John: Now, it's amazing. I mean, you think about how it just happen in our lifetimes. You know, where we're almost living in a science fiction. What I would have thought is science fiction. You know my son just talks to the air. He expects that Alexa is wherever he goes, he talks to the air. It's a very strange world.
Nick: It's a long way off from the walkman and the wired headphones. Yeah.
John: Okay, well, to your final question, so I would like to get, you know, increasingly, you see the multimodal research happening in sensory where people are trying to look. I actually had Charles Spence on the podcast a few, I think Charles was our latest guest. You know, and of course, there are a lot of people looking at this kind of multi-modal interaction. What advice would you give to sensory researchers who are maybe not as experienced with audio if they're going to start to bring sound into their research? What would be some advice that you would give to them to help them to maybe not go down some wrong paths or make mistakes or whatever? What would you say to them?
Nick: Yeah, okay, so there are a few things I think what is very important to understand is what you measure with a microphone is not the same as what we hear with our ears. This is a highly amazing capture system. We have bolted to the side of our head. And you need to appreciate that the filter, the perception filter is very important, very elegant, very complex, very non-linear, and understanding and appreciating that will be useful. I think that's the key thing. I think the multimodal thing is extremely challenging and to say I don't know whether it's true in the food industry, in the traditional sensory field, but certainly, we say that even in telecoms where we have audio and the visual stuff going on in parallel, there's audio teams and the visual teams and there's not much going on. There's not much bridging going on. So being a bridge person is an amazingly powerful position. If you can bridge several topics, whether it's food sensory and audio sensory, as I've done for a couple of decades or going and really trying to tackle the audio-visual stuff or another cross-modal thing, those are difficult problems that they need to bridge organizational boundaries but there may be some very interesting stuff to achieve there as well.
John: Yes, I totally agree. Innovation often happens in the overlap between fields and new combinations, yes. Okay, so that brings us to the last question, which is advice for young researchers, someone who either maybe they're going to go to graduate school, what should they be studying or they're just starting their career. You know, what kind of general advice would you have for them?
Nick: I think there are some really amazing opportunities in machine learning and then beyond that to AI in audio, I think we're just approaching the surface now. I think it's going to be and I don't just mean on the analysis side, I think algorithmically that we've just opened up a massive new field and the opportunities are endless. So if you're hungry for a challenge, that's a good way to go. Every every student today can do machine learning, but you need to appreciate the data. So domain knowledge is key. Lots of people can do machine learning and say turn the crank, but you need domain knowledge to be able to get the real value out. So I would say having machine learning is as one element, whether it's towards algorithms or towards evaluation, but then having domain knowledge, don't forget about that. Appreciate the data and understand the data and then model it well and for whatever purpose.
John: So you would encourage, it sounds like a sensory scientist to learn some basic data science? Would you consider that to be just a requisite skill going forward?
Nick: Absolutely. I think that you know, just being a code or just being a machine learning expert isn't enough. You need to understand the data. You need to understand how to do the basics of the data. And then you can use it well, but with one skill without the other. I think it's like walking with a limp.
John: That's an interesting analogy. I think I can appreciate what you're saying about. That's great. Alright, Nick, well this has been a real pleasure. So how can people get in touch with you? Supposed someone wants to connect with you?
Nick: You can find Nick Zacharov on LinkedIn. There are not too many Nick Zacharov in the world so just Google me or check me up on LinkedIn. You can get my contact information there and have to answer people's questions.
John: Okay, well, that sounds great and we'll put the link in the show notes. I would say if I was just starting my career out, I would love to work for you or study under you. I think you're doing really interesting work and it would be yeah, I would recommend people to connect with you. Alright, Nick, thank you so much for being on the show.
Nick: Thanks a lot, John. It's been really nice.
John: Okay, that's it. Hope you enjoyed this conversation. If you did, please help us grow our audience by telling your friend about AigoraCast and leaving us a positive review on iTunes. Thanks.
That's it for now. If you'd like to receive email updates from Aigora, including weekly video recaps of our blog activity, click on the button below to join our email list. Thanks for stopping by!