Michael Meyners - Work in Progress
Welcome to "AigoraCast", conversations with industry experts on how new technologies are transforming sensory and consumer science!
AigoraCast is available on Apple Podcasts, Stitcher, Google Podcasts, Spotify, PodCast Republic, Pandora, and Amazon Music. Remember to subscribe, and please leave a positive review if you like what you hear!
Michael Meyners is a Principal Scientist at Procter & Gamble. With a PhD in Statistics, Michael also regularly teaches at the Technical University of Dortmund, Germany. He is a past Chair of the Sensometric Society and serves as Associate Editor of Food Quality and Preference. Michael has numerous publications related to the statistical design and analysis of sensory and consumer data, and provides a wide range of statistical support in his day-to-day work at P&G.
Transcript (Semi-automated, forgive typos!)
John: So, Michael, welcome to the show.
Michael: Thanks, John and thanks for the invitation.
John: My pleasure. So I think it would be good for our listeners, I mean, in our field I think it'd be hard for someone not to have heard of you, but I think it would be good still for people to hear your kind of story, your background, you know, how you got to be in sensory. So when you were training as a statistician, were you thinking that you would specifically go into sensory and consumer science or were you generally training? I mean, what was your interest? You know, at the beginning when you started?
Michael: Really my interest wasn't in statistics when I started. And I felt like statistics is an area that you can apply to many applications. And this breadth of application is something that attracted me along with the interest in mathematics, of course. Moving into sensory was really more by chance when I attended the seminar by my later PhD supervisor, Yohan Kunhardt, and that was a seminar on sensometrics methods. That was the first time I ever heard about procrustes analysis and methods like ZIs, and I got quite interested in that. So I ended up doing my diplomacy is on a similar topic and as of then I got addicted to sensory. So I did PhD and still worked in that area and I got involved with sensometrics conferences in the sensometrics society. But really there was when I started statistics, this was really about statistics and moving in the sensory world and consumer world was more by chance really.
John: But that was still when you were in graduate school that you were aware of sensory science by the time you were doing a PhD?
Michael: I was aware of sensory science by the end of my graduate study so before I got my diploma, which is similar to a master today, that's when I was aware of senseometrics and sensory in general and started working a little bit on that and then there was a deliberate decision to kind of follow a PhD with a sensometric scheme. It's still a statistics PhD, but it was sensometric scheme.
John: I see. And was that focused on multivariate analysis, that kind of thing? Michael: It was indeed. It was actually a comparison between generalized procrustes analysis and status. And so I developed some methodological properties of these two methods and compared them based on that. And that was in collaboration also with Mostafa Kanabi from France. So it was quite an interesting time for me because learning a lot about these sensory methods, not knowing a lot about sensory itself, but really more on the state side and then kind of bridging between the different countries, the different groups of research groups in Dortmund, Germany and northern France. That was quite fun and interesting and great learning for me.
John: I see. And then did you go straight from your PhD to Procter Gamble or what was that your journey? You know, after you got your PhD and what is the next steps for you?
Michael: Yeah, I know that would have been too easy, I guess so when I actually finished my PhD, I was trying to find a job in sensory, but I failed. So there wasn't a lot available in Germany certainly, and not even in Europe. So I ended up going into pharmaceuticals. So I was doing biostatistics, non-clinical biostatistics for a few years at Boehringer Ingelheim in Germany and that was quite fun. I learned a lot there but it was nothing to do with sensory at all. And then after a while, I really felt like maybe there's something else that I could do and go back and see whether there's a way back into sensory because I still was interested in that. And then there was a job opening in the Netherlands was Quest International. And I got hired by them with Monette Tupas who was pretty strong and sensory. And I thought that's a great opportunity for me to collaborate with her. So I spent a few years in the Netherlands working with that group until Quest International actually got acquired by Givaudan. And so I didn't want to move along with them to Cincinnati. And so I ended up applying for for a new job and that was then in Switzerland was Nestlé. So I worked with one of the major food companies for two years before I then finally ended up in 2010 I think with Procter and Gamble.
John: Okay, that's interesting. You know, I actually have no idea how old you are or even when you were graduated with your PhD. You're one of those people who's sufficiently fit that you could be anywhere from I don't know, 35-55. So when is it that you were in Holland?
Michael: That was 2005 to early 2008, I think and I can tell you, I just turned 47. So I wast exactly in the middle of your range. Pretty much, I guess.
John: Okay. What's funny is I'm really good, I can tell the difference between a 4 year old and a 5 year old really well. You know, you have kids. Those ages are very clear, but for adults it gets to be hard. Okay, great. Well, I thought you gave a great keynote at the sensometrics conference this year. Actually, one of the best ones I've ever seen. It was, I thought, excellent in terms of providing direction to the field. So I think it would be good if people could hear a little bit from you as far as the big, you know, the messages that you deliver in that keynote about what needs to happen for sensometrics to be relevant. What is the role of, you know, these various statistical analysis in our field? You know, like you're kind of, I thought that the forward look that you gave was really powerful. So maybe if you could summarize some of those thoughts for our listeners that we really value.
Michael: Yeah, thanks for the kind words on that. It's kind of, it really was a broad overview and was a collection of thoughts that I had over a couple of years, which nicely summarized into this opportunity to present at the sensometrics conference. There are different things that where I believe that we as a sensometrics organization or sensometrics community need to be well on top of things and need to be careful that we actually have a good vision on where we want to go. And certainly this includes how to to remain relevant to the field. We can talk a lot about very sophisticated statistical methods, and none of the sensory practitioners will ever use them because they have no clue what they are about and we are not able to communicate it well. But that does not mean that we should not do this research because it's important to understand what it does and whether it has properties that is really superior in practice. But we always have to make sure that we all so compare it to existing methods and decide whether there is a real gain or whether this is more on academic gain that we see that might not be worth the extra effort for the practitioner. So this is one of the aspects where I do believe it is important that we are more rigorous in investigating these because otherwise we will end up not well, mostly being ignored because people will feel like, oh, well, what's the sensometricisians and the statisticians suggest all the time. I had enough no good use. It doesn't make a difference and it's just too complicated and why would we bother? We've never seen the advantages of that. And he issue with that is not that people may not use methods that weren't as important. But the issue is that if we have a new method that is really makes a difference, it will be ignored because people don't trust us anymore. And another important aspect to me is that we have to be very careful in how we conduct our research and how we can make it reproducible in a sense. And one of the examples that I had was that, I had students looking at the different data sets from the literature. So authors were kind enough to provide me with the original data and about half of the papers, actually, based on the raw data that we received, we were not able to reproduce the results from the papers. And the bad thing about that is when we went back to the authors, while the authors couldn't represent them either, so it was entirely unclear what happened. And all speculation goes from wrong interpretation of models over, you know, doing some statistics in Excel and forgetting a column or a row in your selection when you do a mean value or whatever people do. And these are things that are not reproducible if you don't store them well. So the question is, how can we improve on that in the sense that we make our research perfectly reproducible whenever we have. Well, if there is a back in the code, there is a back in the code, but at least we would be able to detect it later on and understand what it is and how it would impact the results. And I do believe that it's really important that we are on top of that, that we have reproducible research which also includes having data keeping data available. If we don't make it available with the papers, at least making sure that we have data available for quite some time. I'm not talking about a year or two but we might go back to papers that are 5 or 10 years old and the data is no longer accessible even for the authors and that's a pity because we can't be sure if we doubt the outcomes of the paper and the authors can't prove that. Well, what they did was right and the data analysis was correct and the data was not corrupted in anything. At the end of the day, it's difficult for the authors to defend the analysis and the results that they published at some point. And that's certainly a situation that you don't want to be in. And in that talk, I made clear that this is a situation that you do not want to be in as a person, as an individual. No way I would want to be in that situation. But then think about the next step as the next level is your company. So do you want your company to be attributed with such possibly poor research? You'd rather not because you could compromise the credibility of the entire company and depending on what it is, this might have huge implications on your business.
John: Yeah, I mean, I'd like to continue to talk about your talk, but I do want to respond a little bit to this. I think this is I really agree with you that there's this idea, now reproducible research can be a very general term which would even mean that if you did the experiment again, you could get the same result. So that's not what we need. What we need is from the data is sometimes called really reproducible research. I think Roger Henning came up with that term for saying if you have the data, the raw data, you should be able to then through a series of scripts reproduce exactly the final report. Right? And I completely agree with that. In fact, when I'm reviewing papers, increasingly I'm saying, look, I really need to see the scripts here. Especially if it's computational. It's one thing if you're talking about routine statistical analysis, it's another if you have some. I know you have some Bayesian approach where the details really might matter. You need to get the code, I think and ideally, I'd like to see I mean, I know this might seem like a strange thing to ask of everybody, but if there was a GitHub repository for every paper that had the data and it had the scripts, it would actually be better for everybody. Easier to write a paper if we could get to these widespread adoption of these data scientific tools, we, I think, would be in a much better place in the field. So, yeah, I'd just like to put my few senses on that.
Michael: I fully agree with you on that and actually I can disclose that a couple of years ago already in an editor meeting for food quality and preference, I suggested that we should make that a must for any submission that the data is published along with it unless there really really good reasons for that. It could be a GitHub repository. It could be along with a paper on the Web page, doesn't matter in the end of the day just make it available ideally along with that code. Well, we couldn't get it in place at that point in time. But the minimum that we achieved with that people are authors are more encouraged these days to submit their papers, sometimes also by reviewers. I recently got involved in the discussion where authors were reluctant to share the data and the analysis details even the code. Not only the data, but even sharing the code without the data to the review. As I said, well, we're going to share that after the review, after the paper is published because we can't disclose this to the reviewers and I feel like, that's the wrong attitude here completely, because, I mean, it's a review, it has actually those reviewers requested it and if you have reviewers requesting it that means they have doubts about the analysis or they want to understand the details. And if you cannot prove that or provide the details of your analysis, there's no way you will get your paper accepted. And I do believe we have to make it a habit and understand that people are reluctant to share too much of their raw data. On the other hand, I do believe that in most cases you share the key information already. So there might not be that much extra information in the raw data because you summarize it already for the paper. If not, you haven't done a good job in writing the paper, to be honest. So what the extra confidentiality that is needed to publish or that prevents you from publishing the data along with it? We don't ask to have the full data set. I mean, we are happy with having the data that actually has been used in a certain paper, in a certain research. We don't need all the extra variables that you may want to analyze at a later point in time. But frankly, I'm afraid that people primarily fear that they are proven wrong by someone who's been playing around with the data and trying to analyze it again. And they might find some issues. And of course, it may put you in a bad position, but then, okay, well, if nothing else, you may have to write a corrigendum. You would hope that other people are fair enough to tell you upfront and give you a chance to react rather than kind of accusing you of bad research or anything. So it's a give and take there. And I hope that we move into that more and more, I mean, other disciplines do that if you go into some journals like Claus or others, they require YouTube to publish the data or to share it along with your publication. Otherwise, there's no way you can get into those journals.
John: Right. Yeah. And maybe if you're a methodological paper, the you know, the names of even the product category may not be relevant. You don't have to disclose very much. We just need to see the numbers. Right? So we can verify the analysis. So, yeah, I definitely agree with that.
Michael: Yeah, that's a fair point and I do believe that form. I might make it a little bit too easy for myself because most of what I do is methodological. So I'm always sensitive situation where I can pretty much eat. I could even use artificial data, but what I would use is mass data so I can provide the data. It's not of relevance because I will not tell you what product they are, what attributes they are. I will just give you the raw data and then you can play around and it's variable, 123 and product ABC. And while you make your mind up of those. It's always more intriguing to read this, I know if you talk about real products and we try to do that whenever we can. But if not on a methodological paper, that's okay. You can you can deal with that. For applied papers, it's a little bit more difficult. Then the application is the core content of the paper and you disclose already a lot from that and in that sense, I'm not sure there's a lot of extra information that is worth protecting at the risk of you know, of not sharing and maybe running into issues later.
John: I totally agree. And I would also add to this that when a department moves over to reproducible research, there's a huge benefit to the whole group. Right? Because any time someone has a project that resembles previous project, sometimes 90% of the work is done. Maybe it's all done, right? If the scripts are written well enough or you have a code base that you can just recycle, you know. I mean, it opens up a new way of working. So I definitely agree with you that we should be thinking about reproducible research.
Michael: And that's an interesting thought. And maybe one more add on that is I have experienced that internally with people doing analyses in Excel or whatever they use. And then they ended up and exactly the same that I mentioned before. So people are trying while ignoring some variables or some observations or whatever it is. And then they sent me results and I try to reproduce it in R and no way was the data that they provide was me. And then you realize, okay, they're probably deselected some things manually but none of that is actually traceable. So try to redo the entire analysis and it's pretty frustrating if you cannot. So if you put all of that into a script, you can look into the script. If you understand, if you can read it, at least you can look into the script and understand what exactly they did. So which observations were excluded. And ideally, you have a rational along with that as well, because I mean, you wouldn't exclude some observations, but sometimes you say, well, I only need these observations because I only want to look at the last five weeks instead of the entire data sets, for example. But that's something that you would document in your script as well. So you have both what has happened and why it has happened and by that you have two benefits. The first benefit, everybody else can see what you've been doing. You can always show what you've been doing. Everybody else can reproduce it if they want to in a different software. And then the extra benefits that you said, well, if you have that work done once, you can reapply next time.
John: Yeah. I think we just have to move to this way and we're looking at other tools to help our clients, Nim is something, I'm not sure if you use Nim very much. But Nim is sort of in between goes and coding where you can have R code that runs on kind of these nodes within Nim. Nim is a way of managing workflows. So there are some tools, I think that can help people who are maybe a little bit less code oriented, you know, but I do think we have to move to a world where in the future we can always look back and understand what we did and why we did it exactly like you said. So, yeah. Okay, so as far as your talk then, are there any other major points that you feel like we should cover before we move on? I think I should have asked computational statistics?
Michael: Yeah. Not much really. I do believe that it's important for us to have a more open discussion about ideas. And I was kind of one of the things that I had towards the end of my talk, so daring to disagree, daring to put ideas out there for discussion. It's not that I have a paper and I do believe this is the last and most famous truths forever. It's something, it's a suggestion. And in two years from now, a good PhD student may come around and tell me wrong. Ideally, he won't tell me wrong, he just comes up with something better. And then I can appreciate that. I would also plea for people to accept that they might have been wrong and possibly even state that. So I can tell for myself and I know it's difficult, but I can tell for myself that I've made a statement on CATA analysis a couple of years ago and probably my most referenced paper so far, which I have to take back in that generality that I put it and I'm about to do that. Actually, we have a publication coming up. Well, I hope we will get that out soon. It's in a minor revision state now, so it shouldn't be too difficult where I revokes his statement and clarify that what it means and why it wasn't the best statement at that point. And, well, I learned something and I hope people readers will learn from it, too. But I'm happy to admit that what I said before wasn't correct or worse, maybe not not perfect. So that's something where I believe we have to embrace in particular for the younger people as well. We have as the kind of more established we have to embrace the new ideas and then coming up and telling us, well, we could have done better. And then appreciating that and invest time when that they improve the methods. And that's fantastic. I mean, that's a future of our science anyway.
John: Yeah. There's some good points there, definitely. I mean, this idea, this kind of realization that this scientific inquiry is never finished and that all of these ideas are works in progress, that what you believe last year may not be what you believe next year? You know, it's interesting. I think social media has actually made it harder for people to learn these lessons because you can be on social media and you can look at things people wrote two or three years ago, 10 years ago and in your brain, it seems like they're seeing it today. Right? And so there's been this collapsing of time where we have this kind of very static view of the world, I think that the social media has kind of pushed on us. So I think that's a really healthy point that you're making about understanding that actually these things are evolving over time. Views are changing when we have to have the discussion. Nothing is ever settled, you know that we are just continuing these conversations. And so that's how I think real science happens, as opposed to capital science which is like the science said, well, science is always evolving. So, yeah, it's interesting. Okay, well, Michael, amazingly, we are running out of time, let me just ask you computational statistics. I just gave a talk at Eurosense. I talked about the importance of computational thinking. I'm talking about the rise of computational sensory science. What are some of the other computational principles or computational tools that you think sensory scientists should be aware of right now or should be thinking about in the near future?
Michael: Well, it has different aspects. I think you can cover a lot on the computational statistic or computational thinking. There are really mostly statistical aspects and then there are more sensory aspects. I will start with the first if that's okay and we can move into others a little bit later. In terms of statistics computational methods, I do believe they are important. If you look back at the literature, even from the 1930's, you had Fisher and Pitman. All their work was really based on randomization procedures. They didn't first come up with a normal distribution and students test and over and fancy stuff like that. It was more like, okay, we have a problem here. We can solve the problem with a randomization test. But hold on, we don't have computers yet. So what are we going to do? Well, either we invent computers, but, well, it wasn't a time for that yet. So instead we have to work with approximations and that's what they did. So they came up and found that in particular, Pitman's work is important. They're showing that normal distribution actually approximate what you would get from a randomization procedure very, very well. So there's no point in doing all the randomization which is cumbersome. If you exceed the sample size of five or six say where you can still do it manually. But anything beyond that is almost impossible. So that's where these normal distribution based methods come from, originally, I think and we've lost sight of that for quite some time because people are kind of focusing on t tests and and overs and all these methods that are available. And then we do non- parametric methods because we don't believe in normality anymore. And we do the same thing now. We approximate something that is not normal anymore, but somehow distributed still a randomization distribution at the end of the day. We approximate that by a different distribution. And that's okay. It's a good shortcut at times. But then you have a situation where either the assumptions are not clearly not met. The situations where clearly there is no parametric or even non parametric method available, and that's the first starting point where you start to embrace these randomization or permutation or you can also think about bootstrapping and Montecarlo methods in that package. Right? I just I don't want to focus on randomization. I just use that as an example just throughout . And that's where these methods really come into play, because they allow you to design and execute valid statistical tasks without making any further assumptions. You don't need any assumptions about distribution. You can choose your test statistic in a way that suits you well, not in a way that gives you the smallest P value. By the way, that's not the intent, but something that is sensitive for the alternatives that you have in mind. So something that is most likely to detect what you hope to detect. And so you can kind of twist and design and tailor the statistical methods in a way that is most suitable to your project and then go back and see, okay, how do these methods work in situations where we have normally distributed data? Well, we don't really need them because in those situations, because a quick link in excel or using the t test or whatever function in excel or in some other software will do the trick for you. But what happens if normality assumptions are not fulfilled? And I started working on this already during my PhD studies where we had a sensory data set, where we had a lot of zeros. So most of the observations were zeros and then we had a few numbers, which were 1,2,3,4 and that's pretty much it on a ten point scale and the question that we try to address was, can we use ANOVA? Right? Can we use the F test for this data? And surprisingly, to some extent, surprisingly, it showed that, yes, you can. The data is clearly not normal, but still you can in that situation. And another step further is a process that we presented in sensometrics and that's actually the paper that I was talking about earlier, too, is can even use that for CATA data provided you have enough data. It's not working on very small datasets, but we did it. And that's kind of the statement that I made. I felt like, well, you can't use ANOVA for binary data, which is CATA data at the end of the day. So you can't be any further from the normality then was CATA data. Still, if you have a decent number of elicitation for an attribute, you could still use ANOVA methods and it works just fine. And that was amazed myself. So there are violations of the assumptions that we typically do not pay a lot of attention to. That are more serious and those are non-independent of observations, not about major difference and variances might be an issue for the F test was a t test. But the normality of the data is not really an issue in most applications. And that's surprising. And that's kind of something that we need to communicate in that sense of practitioners need to learn. So in some cases, you can do it and we have to communicate that as statisticians. When can you use it and when is the time to call an expert to do something more sophisticated to make sure that you're not breaking the rules of science and research?
John: Fascinating. And to check then the validity of these applications, you're using computational approaches,? You would take the data, do permutation test or some other appropriate computational test and check that against the test of the Anova?
Michael: Yeah, exactly. You actually even take the entire distribution to see whether the distributions pretty much overlap. So you're not going to just, okay P values the same because that's not really informative, but can check the entire distribution and you do, I don't know what maybe ten thousand replications. And then you will find that the distributions that you get from the randomization or the permutations completely coincides with a theoretical distribution. And then, hey, guess what? Doesn't matter which one you use, you will get on with the same critical value and hence the same p value and test decision.
John: Yeah, it's a different way of working I think when you raise these tools, suddenly you can do a lot more. You can do these investigations. Another thing, actually, I'm going to run out of time, I can talk to you for hours, Michael. For example, something is commonly done in our field is to cluster liking and then somehow use those clusters as part of a model to predict liking, which, of course, you've got this information leakage problem where you're using information from the thing you're trying to predict it so that you're going to get overoptimistic estimates. A question is how bad is that? And you can investigate that computationally as well through simulations?
Michael: Yeah. And that's where I always have a little bit of a headache if you do that, unless we do it properly and do some bootstrapping around it which allows me to at least to quantify the uncertainty or the kinds of variations that I get into that. That's certainly something that is helpful and then you can do it to some extent. But yeah, we certainly have to be careful in kind of first using the same data to select the best model. And that's more general seem really when people try to select the best model and then use that for prediction or for statistical testing. Well, yeah, but the model is already tweaked towards what you see. And then surprise, surprise, you see differences because yeah, well, you're already put it in a way that you will see differences. That's the way it was done. So you have to be very careful there. So for description and for learning, sometimes it might be useful. But beyond that, you have to be very careful.
John: I totally agree and I do think that computational methods give us a way to assess just how big some of these. How much does it actually matter when you do things that maybe theoretically are not correct? So, yeah, that's its own kind of form. So we actually do have to wrap it up. But very quickly, how can people get in touch with you when they have some questions? I find you to be a great resource for someone who's really approachable, easy to talk to. How could people reach out to you?
Michael: Yeah, I'm on LinkedIn so you can find me there easily. You will also find an email there so email typically works faster, but LinkedIn is a good first step to to get in touch.
John: Okay and any last bits of advice for our audience before we wrap up here?
Michael: Well, I do believe and that's something that I learned in my time in industry while still trying to connect with academia and scientific research, try to find your niche, try to set aside some time on a relatively regular basis, try to connect with people because that makes you actually do that and reserve time for that, because people will poke on you and say, well, I have a question to you. So and then it takes you out of your routine and you look into something new instead. Find those times, finds those niches, try to be effective on your routine work, try to automate that as much as you can which is something that I did quite a lot in my career which allowed me to free up some time, which I then dedicated to scientific research which I could then back again use in the business. So that's kind of how I got the balance between industry and the research. And for me and I do believe that's a model that could work for us. So you have to be intentional about that and not be pulled into another project because you freed up a lot of time on the first projects that you were assigned to.
John: Right. That's a good point. Okay, well, thank you very much for being on the show. It's a pleasure talking to you.
Michael: Thanks for having me, John.
John: Okay, that's it. Hope you enjoyed this conversation. If you did, please help us grow our audience by telling your friend about AigoraCast and leaving us a positive review on iTunes. Thanks.
That's it for now. If you'd like to receive email updates from Aigora, including weekly video recaps of our blog activity, click on the button below to join our email list. Thanks for stopping by!