Welcome to "AigoraCast", conversations with industry experts on how new technologies are transforming sensory and consumer science!
AigoraCast is available on Apple Podcasts, Stitcher, Google Podcasts, Spotify, PodCast Republic, Pandora, and Amazon Music. Remember to subscribe, and please leave a positive review if you like what you hear!
Dr. Kli Pappas is a Manager of Data Science at the Colgate-Palmolive company. In his current role, he founded a group called Predictive Innovation which is focused on applying data science and software engineering techniques to accelerate innovation processes. His team focuses on building and deploying algorithms to empower scientists and marketers to build better products faster. Kli received his Ph.D in Chemistry from Princeton University in 2016. His thesis work focused on bridging the gap between first-principles thermochemistry, computational chemistry, and experimental science.
Transcript (Semi-automated, forgive typos!)
John: So, Kli welcome to the show.
Kli: Hey, John, great to be here.
John: Okay, great. So, Kli we're talking a little bit before the show about how this kind of a commonality, I think, between data science and sensory science in that there are multiple paths into both fields. I think it'd be good to kind of hear your story of how you ended up in your current role. Maybe even starting as early back, you know as your undergraduate work and your kind of internship that started your journey at Palmolive.
Kli: Yeah, sure. So I think I've had a kind of winding journey to my current position. I actually started off so I did my undergrad degree at Rutgers University in New Jersey and I started, funny enough, majoring in cognitive science. So cognitive science was really where my interest was. And as part of the CogSci program at Rutgers, you had to take CS 111, which at that time was Java and CS 111 at Rutgers and Java was hours and hours of writing-sorting algorithms which was the worst possible thing that I could imagine when I was a sophomore undergrad. So I ran for the high hills and left that program altogether and moved to chemistry which really tells you how little I thought of writing-sorting algorithms in Java. So I ended up studying chemistry. And during my junior and senior year at Rutgers, I was an intern at Colgate. Rutgers and Colgate were actually right next to each other doing bench chemistry there. So I interned at Colgate for two years, graduated from Rutgers with a degree in chemistry and got a full time job at Colgate doing product development. So, you know, working at the bench, I was working on mouth rinses. So new mouth rinse products. Developing products that could get launched in different parts of the world and trying all sorts of fun things. And two years into working at Colgate, you know, I had always wanted to get my PhD in something and I realized that it was now or never. So I went to grad school and did my PhD in Chemistry at Princeton, which is actually right up the road from Rutgers, down route one. And, you know, during my time there, I really had a lot of interest in trying to merge some things that were going on in the department around computational chemistry with what my group was doing which was really a synthetic inorganic chemistry group. And my thesis work kind of broadly focused on how to bring this to areas together. So I finished my degree in 2016 and decided Colgate was a great place to be and went back. And almost immediately, you know, when I went back to Colgate, took a right hand turn into the data science area. I had sort of rediscovered a love for programming and got into Python and really started to understand that there was so much information at such a large organization that was being collected and stored in a library and no one was reading the books. And just a lot of latent frustration from everyone around who realized, wow, we've done a lot in the past and we have a lot of people generating a lot of different data. Shouldn't we be doing something with this? And the answer was no one is doing anything with it. And that sort of started me down the journey to where I am today in the Predictive Innovation Group.
John: That's a group that you started, is that correct? Was that your own? Is that something that there was already an energy for and that you were kind of picked to lead? It was something you pushed for yourself out of that kind of get started?
Kli: Yeah. So I'm really the founder of the group. It really started with me tinkering a little bit specifically, you know, in the consumer in the CPG space, one of the things that you have to do when you're launching new products is ensure that they're stable on the shelf. So a lot of what we sell has an active ingredient in it, like toothpaste and fluoride, for example. And also it needs to have a certain consistency to be consumer acceptable. And the flavor and fragrance can't go bad. And so you have to prove shelf stability, which is a pretty arduous process, because typically that means you actually make the product. You put it in an oven, you wait six months. And so I started tinkering a little bit with grabbing all of the historical data I could get my hands on about what ingredients went into a particular product and what were the stability attributes of it. How did the active ingredient track over time? How did flavor and fragrance track over time? And combining that just notionally with what I knew about chemistry basics. So when I say basics, I mean like 1800 chemistry, like pH coming from pKa and just basic chemical interactions and playing a little bit with building some models. You know, how much data it would take to be able to take an arbitrary list of ingredients and predict for you whether the fragrance would go bad or whether the viscosity would drop off and got a lot of traction doing that. And brought in two postdocs to work with me on that and started to get a little bit of senior leadership by in around this project and, you know, effectively make the argument to them which they had all understood. There's something there. You know, everyone knew there was something there. There just wasn't anyone who had picked it up before. And after about a year of doing that, I made a, you know, sort of broader pitch to say data science is really a gap for the organization. And this was about two years ago where Harvard Business Review has front page stories about data science and everyone saying it's the sexiest career to be in right now. So I think the timing was really right for that. And from there, the senior leadership team at Colgate decided to form a new function altogether called predictive innovation. And that function broadly does what I described, which is working with scientists on the bench and also works with marketers and insights teams to try to understand more external to Colgate where trends are going.
John: Yeah, that really speaks to me. Actually, it's interesting the timing of that, because it's been about two years ago, I founded Aigora. And there's a lot of the same energy, really, you know, in terms of the needs in the field. So I think it would be interesting to hear, you know, you've kind of built your group. You know, how did you start off with kind of well-formed plan of all the pieces you were going to need, or was it more responding to particular business cases? I mean, I think what you said about finding a business application for data science with the shelf life predictions. You know, that really speaks to me because I find much more effective to get some wins and get some momentum from successes where you kind of led by successes, or did you at that point have were you in a position to plan out something larger and start to fill out your team?
Kli: Yeah, it's a good question. I mean, you definitely need and this would be, you know, the typical advice you have to have some wins on the board and some clear things of monetary value that you can bring back to the business and say this is what we're going to do. And oftentimes maybe you don't even end up doing those things, but you need to show that there's thought behind it to get people to be able to trust you. You know, I think one of the things I was able to effectively do is it's the standard question. If you're in leadership to ask someone who's asking for money to start a new group or to start a new project to tell them, well, first I want you to do a proof of concept and then I want you to come back and show what the value is and then based on how that works out and we'll move forward. And I think I effectively made the argument that the only reason you do that whole proof of concept and then stage two and stage three is if you're not certain that the idea is a good idea. Right? And then I effectively made the argument that the writing is on the wall with this stuff like you should not be putting together a data science team because this guy in the organization is telling you it's important and maybe he's right or wrong. Like pick up a newspaper, pick up any business publication. They're all telling it to you. And so let's move past this, pick a small project and show me what it's going to do and just go directly for it. And don't take my opinion for, you know, read The Economist, read The Wall Street Journal. They're all telling you the same thing. So I think I was able to make that argument, which was important to start with, because if you go down this route of small projects, it takes a long time and you fall down these rabbit holes. And longer term, I think it really does a disservice to what can be done with data science, especially organizationally, because you need to be able to reach wide at first when you're putting together data infrastructure, data engineering pipelines. And if you focus on these small problems, it sets you up so that years down the road, you run into a case where that could be all your team is doing is this sort of transactional small projects which I think a lot more can be done than that.
John: Right. Well, that's great. I'm going to use some of that because it is true. Like that the idea that, yeah, you don't have to prove. It's not a question. Is data science going to be important? You know, it’s just obvious that any team or any company that can leverage their historical data is going to be have an advantage over their competition? And any company that doesn't do that, is it going to be able to compete for very long? So I think that's good. I think when you're inside a company, maybe the experience a little bit different, because for me as a consultant, I have to sell not just the idea of data science, but my particular ability to do it, right?
Kli: Yeah right.
John: So, yeah. That is interesting. I'm going to use that. I feel that way about smart speakers actually. Obviously in five years surveys will be happening on smart speakers and so we might as well get started. That's my view on that.
Kli: Yeah, I guess that's the challenge when you're really passionate about something, you always want to say, well, obviously this is the case. And then you run up against the less passionate people and they want to know why it is obvious? So there's a little bit of a challenge there. But in data science, I mean, you really would have to be short sighted to have someone tell you that, hey, we need to be in data science and we're doing nothing and have them tell you, well, first you need to, you know, prove to me that there's a future for this field. That seems crazy to me.
John: That's funny. Why is it obvious? There's a math joke in there somewhere.
Kli: Yeah, right.
John: All right. So let's talk a little bit about the process in building your team. So which were the hires that you thought were most important to make first to get started on this project? What was the kind of order in which you built out your team?
Kli: Yes, we had an interesting challenge which was we built a team of just about 15 people and it went from one myself up to fifteen, which is a unique situation to be in. I mean, a lot of times team teams don't come together on that big of a scale all at once. And the other unique situation that we were in is within my organization at Colgate. We didn't have titles for data scientists or data analysts or data engineer. There weren't job descriptions written for these people. We did have a rich statistics group, which I should say upfront. We made the decision that the statistics group and this newly formed group should be one group together. So I'll talk more about that later. But so there was a statistics team, but not a data science team proper. So we had to figure out what are the roles and, you know, what are the people that that we're going to bring in? About half of those people actually came from internal hires. And what we found was that although we did not have, you know, data scientist and data analysts within global technology at Colgate. There were a lot of, you know, closet coders or people who were really passionate about this area who either already knew what they were doing. Maybe they had an undergraduate degree in math or done a master's degree in a field that caused them to learn a little bit about statistics or programming. And those people are incredibly powerful because one thing that you really need on a data science team, and this has been formalized by others, but there's a notion of having translators so people can be in betweens and who deeply, you know, know the organization and know what the goals are. Know what makes people tick, what makes the company tick, what makes the company money. And so about half of the team came from, you know, internal people who have done work in this area before who really needed some, you know, basic upskilling, which we have some other programs around that. So the other half of the group came from outside new hires. And, you know, these new hires we're hiring at sort of entry level positions at Colgate. And we really wanted to make this cohort of people as well-rounded as possible. And we realized that, you know, you can't go out there and find someone who has 10 years of experience in data science and it just doesn't exist. Field hasn't been around for that long. And B, if someone does have a data science and they've been doing it for 10 years, they're not going to work at a CPG company. Right? They're going to go work, you know, Amazon or Google and they're probably already there. Right? It's just a really nice and new field. So we really took to this idea of, you know, what we would call converts. I mean, I am a convert, right? My degrees in chemistry. And I moved over to the dark side of data analytics. And so, you know, we were really interested in this idea of finding people who had done a degree in a related field. So neuroscience, for example, or economics or in mathematics. And then who had either, you know, done one of these boot camps or worked in a job doing statistics afterwards. You know, the reason for that is it really takes someone special who has a degree in a certain area or specialty in a certain area, and who makes the conscious decision to change their field into something that's not easy to do and, you know, enroll in a program or work in a job in a totally different area. And so I think that was really important to us to really focus on motivation and hire people who have the ability to learn. You know, one thing in this field as it changes so quickly that it doesn't really matter what you learn, because six months later, there's new research out there and new best practices for doing things. So trying to figure out how to hire for that skill. Someone who has the initiative to learn how to how to do something that was important. And so we brought in, you know, people from diverse backgrounds. That's not to say we didn't you know, we did hire statisticians and people. We specifically have training in the area. But we've really tried to, my bosses are saying that he likes to use that diversity and inclusion is the catalyst to innovation and that so far has really panned out great for us. Just getting people who like to learn, who are willing to be wrong and find out new things and to, you know came to what they're doing from a different area, so they have a unique perspective on problems.
John: Right. Now, that really speaks to me, I mean, definitely the ability to manage information, the enthusiasm for learning is extremely important. You know, because like you said, I mean, like I study every night and it's true that whenever I finish a book on machine learning, it's out of date. And it's just like, okay, with the process of I mean, of course, you learn things, but it's never going to be, by the time you finish the book is there's a new book, right? So it's never going to be finished in any kind of sense.
Kli: Yeah. So one thing that I didn't mention that I think is important to note is that we did, you know, specifically bring in people with two people with a background in software engineering. So people who actually have computer science degrees who can build applications because there was a recognition early on that the thing that's really going to get this whole program off the ground is not just building great models, but being able to share them with people in a visually intuitive way and build applications. And I think sometimes in data science groups, that part gets missed because everyone likes to think that their problems are the most complicated problems and need the most complicated solutions. But simple, you know, anyone in statistics will tell you this, that simple models are often the best or the most understandable. And the gap is how you can share it. And, you know, having people who can do software development on the team has really been a huge advantage for us because we can build beautiful things that scale and get them into people's hands. And they've been able to share that information with our data scientists or data scientists or better programmers or programmers know more about statistics. And that has really brought a lot of richness to the team is just, you know, bringing in software developers. There's this notion of a full stack data scientists, which people call a unicorn. But I think you can build a full stack of data science team, which means you have statisticians and you have actual software developers and full stack developers and you have, you know, people with formal background data science. So that's just one thing, you know, I think has been really powerful for us is having a software engineering capability.
John: Well, it's definitely true. There's so many lessons here from software engineering that you don't need to relearn. I mean, for example, the importance of unit testing or I mean, there's all sorts of topics here where if you're trying to build things that are stable, that are in production, you don't need to reinvent the wheel. I mean, these are like software engineers have figured out how to solve these problems they have.
Kli: Yeah, it's funny. I went to a conference, it was PI data. And they're the guys who run and develop Pemdas and numPy. And there was a talk there about software engineering principles for data scientists and the speakers, so there's a concept of solid in software engineering, which are best practices. And he was standing up there just pleading with a group of data scientists. You know, people have spent 40 years figuring out best practices in software engineering and they're not very complicated. Just please understand what these best practices are. For how you build things, make sure that they don't break and make sure that other people can understand what you're doing. And I really took that to heart. He's right. There's a lot out there. It's easy to ignore, but it really gets you later if you do.
Kli: Yeah. So you're right. We have you know, in our team, we have data analysts, we have data scientists, and we have data engineers. Those are the kind of different levels, the data engineers. Those are really the you know, our senior data engineer likes to call himself a plumber. He's just kind of doesn't get any, no one tips their plumber, you know, which is a problem because the plumber does really important stuff. And like, when things break, you know, he's the guy that gets the call in the middle of the night. So, you know, the data engineering contingent is obviously the key part. You know, there's a saying 80 percent of the job is, you know, the piping and getting the data together. So we have a data engineering contingent who manages databases and making sure that the data is in the cleanest format possible so the data scientists don't have to worry about any low level managing. I've found that data scientists generally don't like that. They like to think about models and validating. And you know what we're going to build on top of it. So there is always some level of managing, but the real dirty work is better handled. You know, inflow as the data is getting moved over. Then we've got a group of data analysts and software engineers also, and they're really the ones who are working directly with with clients. So, you know, you get a lot of one off requests or, you know, people run the organization have a specific question, specific way that they like to see the information. And there's all sorts of intricacies in there. And so, you know, we have a group of people who work with them. Make sure that the front ends meet whatever requirements are there. And we separately have some design people on the IT team at Colgate who they work with. So they'll do a lot of the front end programming or one off dashboards that need to be built. And then kind of in the middle is our core data scientists and statisticians. And, you know, the way we kind of have things structured from an architecture standpoint is we try to be, in everything is an API kind of group, which means if you are a data scientist and you're building a model, your ultimate end product is an API that someone working on a front end application can send data to and query the model for information. And it gets sent back to the front end application. Which means you can really focus on doing the modeling, whether it's like an NLP model or something that's doing some sort of, you know, kind of metric forecast. And then the people working directly with the clients around the organization who are building the front end, they don't have to try to also build the model into it. All they need to know is I know I'm going to send a request to you data scientist, your API is going to give me the model information is going to pass it back. Data scientists can update the model as much as they want, and there's an agreement on what the transfer looks like. So those are really the you know, the kind of different levels. We've got the data scientists focusing on the modeling data engineers who are, you know, making sure that the infrastructure is there and the front end developers and data analysts who are kind of stitching it together and putting together what the final and business user would see.
John: Yeah, now that pretty consistent with my experience of what works well. You know with our dashboard is the same way where we've got essentially different modules within the dashboard and we have a general framework for how we send information off and how it comes back. So the modular approach definitely speaks to me. So what about, so machine learning engineering? That's something that, you know, we have a machine learning engineer that we that we work with. Is that something you considered to be kind of in your data science, your data analyst?
Kli: Yes. So we have a data scientist who is a PhD mathematician by training, and he's someone who I know, I would consider him a convert. Right? He didn't study data science, but he has a PhD in mathematics. Obviously very capable and was an assistant professor before joining the company and sort of made a changeover to you know, data science and machine learning in particular. So he is sort of, you know, we don't have a specific title for machine learning engineer, just, you know, because of the way we have it structured. He's a data scientist, but his core role is really being the subject matter expert when it comes to building machine learning models. You know, in some cases, off the shelf models are great. You know, you can import, saikat, learn and try, you know other types of models. But in some cases, you really need someone who's going to go deeper into it and take what's in the open source world and customize it. Especially when you're starting to play in areas where what's in the open source world doesn't exactly fit or maybe does fit but needs to be combined. So we do have one person on the team who is our machine learning engineer expert. I would say for sure, people building data science teams that would not be the first person to bring on like in our group he's an important role. But if you have a pyramid of people, that's the cream on top. But you can still do a lot of amazing things if you have smaller resources to put together with kind of, you know, the first knife that you buy in the kitchen is not like a paring knife. It's the chef's knife. Right? Because you can do just about everything with it. And I think that's the same way with data scientists. Just someone who knows how to do most things is going to get a lot of traction. And someone has a machine learning expert will get you into areas that aremore niche, you know, outside of what you can get off the shelf.
John: Right. What about database engineer? That's something we've talked a lot about. You know, I suppose you have an IT group that's supporting you on the database side. So how much of that do you touch versus how much that is kind of handled for you about the IT group? How is that?
Kli: Yeah, it's a good question. We try to touch as much as possible. You know, we do have great support for our team. But what I found is that the closer your team is, so a lot of decisions that you make about how to architect the database have carry on implications for how you actually approach the data analysis. You know, whether you plan it to be that way or not, when you have a data scientist looking at the data, if it's structured in one way or another, they're going to look at the problem slightly differently. And the realm of things that they think are possible are going to be a little bit different. So we try to be as close as possible to that. And what we found is that especially if you're using cloud platforms, so Google Cloud platform or Amazon database engineering is very learnable and approachable now. There are great easy to use tools. You can go on YouTube and watch one or two hours of videos where you can set up your own pipelines. So we like to use air flow for this and move data into SQL databases or into Google big query. So we try to be as close as possible to that really because we found that it changes the way that people think about problems. So we want the data scientists involved in the discussion about how to structure the database because they're the ones that are going to be doing analytics on it. Our goal with building databases is not to build the most efficient database. Like, you know, in computer science classes, in schools. You know, the idea is you want to normalize things as much as possible. But memories is cheap now. And when you're dealing with data science problems, it's not that many, many different people will be making lots of different changes to the database. At the same time, you need data that's structured for a small number of people to see lots of information. So that's why things tend towards plan. And so short answer to your question, we try to be as involved as possible. Obviously, there's some things on the networking and security side that we don't want to be involved in because we're in over our head and there can be larger implications. But when it comes to deciding how databases get structured. And how we're going to move data around, we've really tried to obscure the team to be able to do that.
John: Right. Now, I can relate to that, too. And, you know, it is true that our data, at least in sensory and consumer science is pretty static. It's not the case that we've got have to worry about. I mean, the data aren't being updated all the time and values changing. And you have to worry about keeping track if you change something. Here does it need to change in other places. I mean, it's I think from a database standpoint, we have maybe a little more freedom on what we can do, given the nature of our data as opposed to some of the other more complicated kind of database problems.
Kli: Yeah, and I found it's not much to ask. And if you have the right people on your team to ask your data scientist to, you know, do some online courses, just the basics of putting together databases, how to organize them. It's interesting stuff, conceptually, not very difficult. And there's tools that make it easy to do without a lot of overhead. So I think empowering just data scientists to say this is part of your job. And what I found is usually they love to do it because it has a big impact on the next stages of whatever project they're working on.
John: Right. I definitely agree to that. Alright, Kli, amazingly, we have blown through half an hour here. So I could talk to you easily for many more hours, but we have to wrap it up. So let's talk a little bit about, normally, we get advice for sensory scientists, but I think it would be interesting to hear your advice to young data scientist, someone who is or is interested in data science. Maybe they're a student, maybe they're, you know, early professional, what would be your advice for someone who wants to get started in data science?
John: Yeah, I really agree with you. One of my life sayings is no productive work is ever wasted. So if you're doing something that's interesting to you and it's productive and you're putting, like you're saying, developing some little code base and putting your project on the web, that will only, I think even if there's not an obvious reason to do it other than that you're interested in it, like there will be a benefit to.
Kli: Yeah, my advisor in grad school, you know, always say that if you don't publish it, it didn't happen. You know, maybe that's because he was trying to get papers out the door. But I think there is there's something in the process of knowing that what you're going to do is going to go out there in the world that forces you to take a second critical look. And, you know, going through that, I think is really powerful for people who are trying to get into any field, really.
John: Alright. Definitely with that. Okay Kli, so one last thing, how do people get in touch with you? If they want to follow up, maybe they want to apply for a job at Colgate-Palmolive? What would be or they just are interested in and would like to ask you some questions, what would be the best way for them to reach you?
Kli: Yes, the best way to reach me is through my my LinkedIn page, which I think there'll be a link to on the podcast. And I'm happy to talk to anyone about topics of interest or if you're interested in the field or open positions at Colgate. So feel free to message me.
John: Okay, awesome. Alright, thank you so much. This has been great.
Kli: Great. Thanks, John. Great to be here. Thanks for the invitation.
John: Of course. Okay, that's it. Hope you enjoyed this conversation. If you did, please help us grow our audience by telling your friend about AigoraCast and leaving us a positive review on iTunes. Thanks.
That's it for now. If you'd like to receive email updates from Aigora, including weekly video recaps of our blog activity, click on the button below to join our email list. Thanks for stopping by!