Summary
Deep learning is gaining an immense amount of popularity due to the incredible results that it is able to offer with comparatively little effort. Because of this there are a number of engineers who are trying their hand at building machine learning models with the wealth of frameworks that are available. Andrew Ferlitsch wrote a book to capture the useful patterns and best practices for building models with deep learning to make it more approachable for newcomers ot the field. In this episode he shares his deep expertise and extensive experience in building and teaching machine learning across many companies and industries. This is an entertaining and educational conversation about how to build maintainable models across a variety of applications.
Announcements
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- We’ve all been asked to help with an ad-hoc request for data by the sales and marketing team. Then it becomes a critical report that they need updated every week or every day. Then what do you do? Send a CSV via email? Write some Python scripts to automate it? But what about incremental sync, API quotas, error handling, and all of the other details that eat up your time? Today, there is a better way. With Census, just write SQL or plug in your dbt models and start syncing your cloud warehouse to SaaS applications like Salesforce, Marketo, Hubspot, and many more. Go to pythonpodcast.com/census today to get a free 14-day trial.
- Scaling your data infrastructure is hard. Maintaining data quality standards as you scale is harder. Databand solves this. Their Unified Data Observability platform gives data engineers visibility over their stack without changing existing pipeline code. Get end-to-end visibility on your pipelines, and identify the root cause of issues before bad data is delivered. Seamlessly integrate with over 20 tools like Apache Airflow, Spark, Snowflake, and more. Use customizable dashboards to see where pipelines are broken and how that impacts delivery downstream. Get alerts on leading indicators of pipeline failure. Open up your pipeline and see exactly which code strings are broken – so you can fix the issue immediately. Create more reliable data products. Go to pythonpodcast.com/databand today to start your free trial!
- Your host as usual is Tobias Macey and today I’m interviewing Andrew Ferlitsch about the patterns and practices for deep learning applications
Interview
- Introductions
- How did you get introduced to Python?
- Can you start by describing the major elements of a model architecture?
- What is the relationship between the specific learning task being addressed and the architecture of the learning network?
- In your experience, what is the level of awareness of a typical ML engineer or data scientist with respect to the most current design patterns in deep learning?
- Your currently working on a book about deep learning patterns and practices. What was your motivation for starting that project?
- What are your goals for the book?
- How have advancements in the operability of machine learning influenced the ways that the models are designed and trained?
- How do recent approaches such as transfer learning impact the needs of the supporting tools and infrastructure?
- Can you describe the different design patterns that you cover in your book and the selection process for when and how to apply them?
- What are the aspects of bringing deep learning to production that continue to be a challenge?
- What are some of the emerging practices that you are optimistic about?
- What are some of the industry trends or areas of current research that you are most excited about?
- What are the most interesting, innovative, or unexpected patterns that you have encountered?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on the book?
- What are some of the other resources that you recommend for listeners to learn more about how to build production ready models?
Keep In Touch
- @AndrewFerlitsch on Twitter
- andrewferlitsch on GitHub
Picks
- Tobias
- Designing Data Intensive Applications (affiliate link)
Closing Announcements
- Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
- Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
Links
- Google Cloud AI
- Sharp Corporation
- Deep Learning Patterns and Practices (affiliate link) use the code podinit21 at checkout for 35% off all books at Manning!
- CID Bioscience
- Latent Space
- AI Winter
- Numerical Stability
- Surrogate Model
- GAN == Generative Adversarial Network
- Gradient Descent
- The Gang of 4 – Design Patterns: Elements of Reusable Object-Oriented Software (affiliate link)
- The Lottery Hypothesis
- Manning Publications (affiliate link)
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to podcast dot in it, the podcast about Python and the people who make it great. When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode. With the launch of their managed Kubernetes platform, it's easy to get started with the next generation of deployment and scaling powered by the battle tested Linode platform, including simple pricing, node balancers, 40 gigabit networking, dedicated CPU and GPU instances, and worldwide data centers.
Go to python podcast.com/linode, that's l I n o d e, today and get a $100 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show. Scaling your data infrastructure is hard. Maintaining data quality standards as you scale is harder. DataBank solves this. Their unified data observability platform gives data engineers visibility over their stack without changing existing pipeline code. Get end to end visibility into your pipelines and identify the root cause of issues before bad data is delivered. Seamlessly integrate with over 20 tools like Apache Airflow, Spark, Snowflake, and more.
Use customizable dashboards to see where pipelines are broken and how that impacts delivery downstream. Get alerts on leading indicators of pipeline failure. Open up your pipeline and see exactly which code strings are broken so you can fix the issue immediately. Create more reliable data products. Go to python podcast.com/databand today to start your free trial. Your host as usual is Tobias Macy. And today, I'm interviewing Andrew Furlich about the patterns and practices for deep learning applications and the book that he's writing to help people learn about it. So, Andrew, can you start by introducing yourself?
[00:01:58] Unknown:
Thank you for inviting me to your podcast. I'm Andrew Furlich. I work for Google Cloud AI. I'm in developer relations. Prior to, joining Google, I was actually in Japanese IT. I worked for Sharp Corporation in Japan for over 20 years as a principal research scientist, and my specialty was in imaging. You could say I was an imaging scientist, did a lot of computer vision. And, you know, what's really interesting I found about deep learning is everything we do today in deep learning with computer vision, we used to also do it before deep learning and machine learning. But it took, like, a mass number of people with PhDs and 1, 000, 000 and 1, 000, 000 of dollars. And what I find just so fascinating about machine learning and deep learning is how that is really scaled way down the cost, the speed, and has really brought it into the realm of, you know, your software engineer can now, deploy build and deploy applications that used to take this massive amount of, cost and resource.
[00:03:06] Unknown:
And do you remember how you first got introduced to Python?
[00:03:10] Unknown:
It was probably more accidental. You know, I knew when I first got first used, it was 2016. And I know before that, I wouldn't have thought it was just some kind of toy programming language I didn't take serious. You know? I was doing this consulting job with an agricultural technology company, CID Bioscience. They were developing an autonomous farm vehicle that would use computer vision to count crops and, you know, detect infestations in the crops. Okay? And so all the experimental code that they were putting together with some Python. Mhmm. Of course, I had to jump in and participate, so I had learn Python.
[00:03:52] Unknown:
And so now in terms of the work that you're doing with the deep learning patterns and practices book and your work at Google and your previous work with Sharp, you know, definitely gives you a good background in terms of the machine learning and deep learning space. And as far as the actual, you know, deep learning aspects before we get too far into, you know, some of the patterns and practices, I'm wondering if you can just give a bit of an overview about what are the major elements of a model architecture that people need to be aware of as they're starting down the path of building something with deep learning and thinking about how they want to approach the problem space and approach the actual building of the project?
[00:04:31] Unknown:
Well, what we find today in models instead of sort of being the sort of wild west, you know, graph of nodes is now there's, you know, formal sections to models. Okay? Call them components and sort of a formalization or designs on how to put those components together and how to connect them. And so at really at the top level, whether you're doing computer vision, natural language understanding, or structured data, you're gonna have what's called a stem, then a learner. And then the last part is the task component, and that's the task you actually wanna learn. And the learner itself will gets broken down into groups. Those groups get broken down in the blocks.
And how they're assembled will really say whether you're doing, say, repressational learning or transformational learning, etcetera. I hope that helped address that question.
[00:05:25] Unknown:
Yeah. That definitely is useful because I know that, you know, for people who are new to this field, you hear the terms, you know, neural networks or deep learning architectures, and it's just this, you know, black box, and you don't really understand what are the different pieces to it. But being able to break it down into these are the different subcomponents of the network helps to at least make it an approachable problem of, okay. Well, now I know I need these different components to be able to put it together so that you can then direct your research to figuring out how to approach that problem more specifically.
And in terms of the specifics of a given task, you mentioned that a lot of the detail for, you know, for given projects that you're trying to work on is in the learner or the task layer of the network. And I'm wondering if you could just dig a bit more into the relationship between the specifics of the task that you're trying to build for, whether it's natural language understanding or computer vision or image recognition and the specific architecture and the components that you're pulling off of the shelf for being able to build that full network?
[00:06:29] Unknown:
Well, 1, we could probably spend a whole hour just talking on that, so I'll just kinda briefly summarize. You know, over the last few years in the research, we came up with this phrase. You see, I used a lot called essential features. Okay? So if you look at a model and you start at the top of the model and you sort of work deeper and deeper in the model, you get near the end before you do the task learning. Okay? You have what's called a latent space. It's a representation of the input in a low dimensionality. And what we really want in there are what's called the essential features.
Okay? And if we can keep that constraint, we'll prevent things like memorization, memorizing examples, and allow the model to better generalize to examples it's never seen, particularly if those examples are outside the distribution that the model was trained on. So a lot of effort in the last few years is all the different ways to represent that latent space and how to train the model so that latent space just has the essential features. But once you got that, the task learning is simple. And it's actually plug and play. I can take that same bottom part of the model that I could put on a regressor, you know, to say, predict the selling price of a house.
I could take that off and put on a class ifier and predict who the target demographic is that would be most interested in buying the house. So I don't even have to have 2 separate models. It really just comes down to really the latent space and how it's trained, what I can accomplish on the other end.
[00:08:13] Unknown:
That puts me in mind of the, you know, recent developments in transfer learning. Is that pretty much the exact same thing of what you're talking about where you're using the regressor versus the classifier?
[00:08:24] Unknown:
Before I even jump into it, there's a little humorous part here. I don't mind to date myself, but, you know, I got my degree in artificial intelligence, graduate degree in the late 19 eighties. And when I got out of college, you know, it was the AI winner. So, you know, nothing happened, you know. I'm in a different course. And 1 of my specialty things was just a really obscure statistical area called distribution. And it turns out that deep learning is all about distributions, whether it's distributions in the data or distributions in the weights inside of the model. So suddenly here 30 some odd years later, what I actually learned in college became important.
Okay. Okay. So in transfer learning, there's really it goes into several different directions. But essentially, what you're saying is you're either transferring 2 things. Either trying to transfer the essential features in the latent space and say those essential features are similar enough to another domain. Okay? For example, let's say you have a model that's learned to tell the make and models of cars. Okay? It's probable that the essential features in that latent space are pretty much almost identical will be needed for trucks. Okay? So I'm just gonna transfer all of that up to the latent space, take my truck dataset and then just fine tune it. The other cases, the transfer may not be related to the latent space, particularly if the datasets are sort of far apart domain.
And the other important thing we find in a model is the distribution of the weights on the model. I'll try to keep it simple. You know, when you first train and you have all those nodes and each 1 has this just simply a weight on it. Okay? You have to start with some initial value. And but if you started with the same value, let's say everybody was a 1 or a 0, their updates would be identical and effectively to be symmetric. And this is the same as having a model with just 1 node. So you have to start off with this sort of random amount, you know, distribution of weights when you train it.
And the thing is, is 1 random distribute not all random distributions are good. Okay. Some are better than others. And the idea, how do I find a good distribution that gives me what's called numerical stability in my model? So that when I train it, I get that convergence on that global or best optimal outcome I can have for that model. Well, in some cases, and that could take a lot of, you know, training and experimenting. But if you have an existing architecture that has a highly numerical stabilized weights from previous training, sometimes you just say, that's my initial weight distribution instead of doing a random draw. So that's a different form of transfer
[00:11:28] Unknown:
learning. In terms of the numerical stability of a model and trying to optimize that, that sounds like the hyperparameter search and hyperparameter optimization problem that I've heard a lot about. Actually, that comes afterwards. Okay. Okay. 1 of the mistakes that people were making in hyperparameter
[00:11:44] Unknown:
search is you take a model. Right? So you got this generic tuning. Tuning. The problem is, why would you do your hyperparameter tuning on weights that are not even stable yet? So first, you're gonna do all these pretext tasks to make sure you have stabilized weights, numerically stabilized weights, then you do your hyperparameter tuning.
[00:12:13] Unknown:
As far as your experience of working in the space and working with other people on building these deep learning models and applying them to different problem spaces. I'm wondering what you have seen as the sort of overall level of awareness and understanding of some of the different considerations and challenges in the overall life cycle of building and developing and deploying a model among sort of the general community of machine learning engineers or data scientists and sort of the emerging best practices in the deep learning space?
[00:12:48] Unknown:
Yeah. I guess I could cover another hour, but keep it brief. You know, there's always challenges. You know, 1 area that never ceases to amaze me is on the data side. The more and more data we have, it continues to be a challenge. Particularly now that, you know, we're on such a large scale, we have to find solutions using unsupervised at least at some degree, use unsupervised training so that the data doesn't have to be labeled. What are the right ways of doing it? How will they fit into a model that is then later fully trained on labeled data? And then we get to the other end. You know, I gotta deploy these models, and I gotta deploy them, you know, at a large scale.
When you talk about the real world, you know, today, models go into production. It's a business. It's an enterprise business. You know, if it's a social media site, you may have, you know, tens of millions of people a day using the site. Right? So we're on a very, very large scale. And no matter how you train the model, what data you've got, it's pretty much guaranteed that the distribution of your training data won't be exactly the same as the distribution that the model sees in the real world. And so from a statistical point of view, what it sees in the real world is a population. Okay? What you're training on is on some subpopulation of that population. And on top of that, you don't have every example in that subpopulation.
More data, but how can I train it in a way that given this population I'm training on, it will still generalize into that differential between the subpopulation and the
[00:14:43] Unknown:
the problem of trying to counteract any potential bias in that, you know, randomized subsample of the population?
[00:14:51] Unknown:
Yeah. You know, we we see all kind you know, a few years ago, all kinds of techniques tried that would fail or misdetecting. I mean, first, I'll give you a classic example, ideas of how it's done. When they started to first make X-ray classification models, they inadvertently were biased on what's called the view. Okay? Here is a scenario. You know, most of this data was coming from big hospital groups. And big hospital groups are very cost billing conscious. Okay. So at any 1 of these big hospitals, you're gonna have more than 1 x ray department. So you're gonna have 1 department that's low cost, you know, more basic X-ray equipment, And you have another department has very say costly X-ray equipment that's high cost. Now you're a doctor and a patient comes in. Right?
And as a doctor you try to make a prediction of the likelihood that this person has Lymphoma. If you think it's low likelihood, you would send them to the low cost department. And if you thought that there was high likelihood, you'd send them to the high cost 1. Okay. The problem is these departments use different, you know, models of the x-ray machines. So when you get your data, it turns out that almost every instance from the low cost 1 is a negative, not pneumonia. And almost every instance from the high cost 1 is pneumonia. So it's totally skewed. And of course, how they take the images is not identical.
That sort of framing, that's your view. And what happens in the early days is the models were learning the view, not the x-ray. Okay? And so so known as an unseen covariant. Okay? You know, we have all kinds of ways of trying to deal. At first, you have to detect the existence of that. So we might use a surrogate model. Okay? So if we have suspicion that, let's say, feature x is is a bias or an unseen covariant. Okay? So you have your regular model, your training, and your predicting. And what I might do is, it's not that I remove the feature, is that I will on a prediction level, I will intentionally invert that feature.
And what I wanna do is look at the outcome of the prediction. What I'm looking for is not a bias, then the change in the outcome should be random. That is it should not have any effect. But if on the other hand is not random, then I know, oh, it's injecting some bias. So now I gotta look at a wide variety of techniques to remove the bias. Big 1 I see being used are GANs, for example, other means of generating synthetic data. You know, previously years ago, people tried to use boosting. The problem with boosting is you're adding so many identical examples that models tend to be overparameterizing, you're really increasing the likelihood of memorization in the model.
[00:18:00] Unknown:
That brings us to the work that you're doing right now of the book that you're writing to try and encapsulate some of the lessons that you've learned and the knowledge that you're sharing right now of some of the useful patterns and practices for how to go about working with deep learning and building these models. And I'm wondering if you can just give a bit of an overview about what your goals are for the book and how it is that you decided to go down the path of writing the book and starting the project.
[00:18:27] Unknown:
Kinda an interesting road. I would say I started to teach I'd call it be more data science, you know, so more classical machine learning probably back in 2016. And most of the people I was teaching, you know, came from the math statistical world. And then, I'd say about late mid 2017, you know, deep learning, it was a big hype. Okay? And I really saw, demographic change in the students. A lot of them were software engineers. They weren't statisticians and math people and that. We don't do stats. We program. And so, you know, you're trying to teach them this stuff and just goes over the head. And so I had to learn somewhere in this process of stumbling how to teach to that target audience, you know, I realized that I had to reframe everything in deep learning in the concept of a software engineer. What are their terminologies?
What are their mock methodologies? What are their best practices? And how can I fit deep learning into that? So I did that for a number of years, and my courses got really popular. Today, I teach vast numbers of software engineers, not data scientists who do machine learning. And this book, you can say, is sort of my compilation of all those experiences of how to map this what was once a very statistical PhD type of world
[00:19:59] Unknown:
into your rank and file software engineering world. And in your experience of making that translation from your experience with your mathematical and statistical background into the parlance of software engineers and the problems and, you know, what their priorities are, I'm curious what were some of the more interesting ways that you had to reframe things or some of the interesting sort of misconceptions or biases that the software engineers had coming into this space of, you know, a very mathematically oriented field.
[00:20:30] Unknown:
Well, they all thought you had to know what gradient descent is, and I never teach gradient descent. I'll start with that. You know, early in my career, I was a software engineer in the 19 nineties, you know, and there was a sort of a groundbreaking book, Design Patterns in C plus plus by the Gang of 4. It's really what started the whole concept of design patterns and software engineering. And so I was already that concept already, you know, programmed into me. And I did early on come to the realization to teach this to software engineers.
You're gonna have to define the design patterns. Now that doesn't mean they weren't there. The problem was is it was all sort of scattered across seminal research papers. Okay? And each researcher invented their own words. Many times, 2 different papers could be talking about practically the exact same thing, and their terminology is totally different. And their diagrams, how they draw it totally So a lot of it was first identifying, where all those bits and pieces are in the research papers. Finding what's the most common terminology, you know, then solidifying it so you're always talking the same even if the paper says it differently.
And then really coming down, mapping all those architectural and DAGs and that into a more common set of diagrams that could be uniformly applied to any of these models on research papers. And you see that in the book. I use a sort of a overall, you know, what you say DAG framework description that we call idiomatic. And so 1 of the nice things about the book is no matter what model architecture you're looking at or what DAG deployment process, it's in a consistent style of diagram. You don't have to learn 1 style from another.
[00:22:24] Unknown:
And as far as the sort of identifying those common patterns and best practices across the different research and throughout your own experience of working in the space, I'm curious how broadly those lessons have been learned in the industry or if it's something that you still see as being very nascent and something that everybody has to kind of
[00:22:47] Unknown:
discover and learn about on their own. I would say in the last couple years, at least in the research field, there's been a lot of consolidation terminology and representation. It's not the wild west anymore.
[00:23:01] Unknown:
And then as far as the actual tooling and practices and infrastructure around deep learning and machine learning and bringing it into production contexts. I'm curious what you see as some of the most useful advancements and some of the areas that are still under supported or underrepresented as far as the overall life cycle of going from idea to delivery.
[00:23:25] Unknown:
I think a big contribution is really the growth of how to augment the training using unlabeled data or noisy data that's crowdsourced. There are a variety of what we call pretext tasks. You're not learning a task like a classification or predicting the value of the house. You're learning a transformation. So I have input x, and I define some transformation function, f of x. Okay? And, you know, that's a static function, so I can actually put the data into it, get what the transformation was, and use the transformation as the label. Okay? And so I'm trying to force the model to learn essential features in that latent space before I actually do the training on labelled data, which is expensive.
Okay? And so, you know, there is the opportunity. I've seen a lot of growth that we can start creating models, to solve, more and more problems that it would be, cost challenge for data. Another area that I see again improvements on data is, again, data costs money. So in, you know, let's say you go into Fintech. Okay? Maybe you're trying to do a model for fraud detection. Okay? Well, you might actually have not 10, 20 fields in your structured data. You might have 100 of fields. Okay? You don't know which ones are the good ones. Right? But every 1 of them has a cost, data acquisition cost.
Okay? And so before, you would try to figure out which ones actually contribute or how well to the outcome doing the PCA analysis. But, you know, on that scale, that's highly expensive. So 1 of the, you know, with the growth of, explainability, being able to instrument your model and directly attribute a prediction back to the feature list is really shown a great promise to substantially reduce the cost of identifying how my data contributes to the outcome. And as you know, a business manager make cost effective decisions on data acquisition costs.
[00:25:42] Unknown:
And in terms of the explainability advancements, I'm curious what your thoughts are on how that's going to influence the overall sort of state of the art of being able to build these models and identify useful data at the outset to be able to build more effective models downstream. And if there are any sort of potential optimizations or performance improvements in terms of the actual time to build and deliver these models because of the fact that there is more explainability that then contributes to a better understanding of how the model is actually coming to these different conclusions.
[00:26:16] Unknown:
Just the whole idea of explainability implies that there will be development production improvements. I mean, let's say we have no explainability in the model. It's just a black box. Right? We we throw it out there, and we have some outside observation. Okay. Is that quite what we want? And so we make a guess of what's happening inside the black box, just 1 notch above random, right? It's an educated guess, you know. And then we throw another 1 out and we make an observation, we see how far apart those observations are, we make another educated guess. Compare that to, it's not a black box anymore.
I can look inside and understand why it's doing that. And now I can do something better than an educated guess. So the number of iterations, just the nature of it, you know, would dramatically reduce.
[00:27:07] Unknown:
We've all been asked to help with an ad hoc request for data by the sales and marketing team. Then it becomes a critical report that they need updated every week or every day. Then what do you do? Send a CSV file via email? Write some Python scripts to automate it? But what about incremental sync, API quotas, error handling, and all the other details that eat up your time? Today, there is a better way. With Census, just write SQL or plug in your dbt models and start syncing your cloud data warehouse to SaaS applications like Salesforce, Marketo, HubSpot, and many more. Go to python podcast.com/census today to get a free 14 day trial and make your life a lot easier.
And then another interesting challenge that I've come across myself is just understanding what are the cases where machine learning or deep learning even is applicable to a particular problem or, you know, the types of solutions that you can build with machine learning and deep learning and being able to identify those opportunities, particularly if you have some sort of data source and you don't necessarily know what are the possibilities that I can build on top of from this. And I'm wondering how or what your experience has shown as far as the level of imagination necessary or the ability to identify useful opportunities for applying machine learning in a business or, you know, educational context.
[00:28:29] Unknown:
Well, I know that when I'm in meetings or when our clients, you know, business people in there, they're throwing everything at the whiteboard. They wanna apply deep learning and AI to every aspect they can. They see cost savings and, you know, production increases and so forth. And, of course, that's the challenge. And, you know, the new to me, this is my opinionated opinion is the new frontier is model amalgamation. Okay? What that is, if you just kinda go back a little bit in time, you know, we deployed single models that did a single task.
Okay? But that by itself is not an application. And so you would still would have that big application on a server, like written in Java. And here and there is making out a call to a model, like the model is assisting this much bigger application. In model amalgamation, can you take essentially every aspect of that application and convert it into a model paradigm, and then connect all those models together on their own graph, and the models become the whole application. And so we we're definitely seeing attempts at in large companies of how to rearchitect and redesign these applications so they are entirely boggled in the automations.
[00:29:48] Unknown:
Digging more into some of the design patterns and best practices that you're covering in the book, I'm wondering if you can give a bit of an overview about what types of problem domains are applicable to some of the decision factors or signals that go into identifying when to use which problems or, you know, how you decided to break them down and which patterns to include as far as their, you know, broadest applicability for people who are trying to get into the space? Much of the book actually cover the seminal
[00:30:18] Unknown:
models. You know, those models were vast majority were, you know, designed, you could say, before there was any kind of design pattern. So not only do I explain what's happening inside the model, I show how to reverse map that existing seminal model into a modern design pattern. Okay? And then, you know, show the pros and cons and how it improved the science from the last 1. It's hard to say to really answer your question because there's all kinds of model architectures out there. Okay? They're continuously improving.
Okay? My book mainly focuses on computer vision. Okay? And 1 could make the argument that if you're doing a regressor or a classification, just use an efficient net. So thanks. It comes down to that. But the reality is we're doing a lot of other things like object tracking and video, poise detection, getting landmark points on the body, maybe doing instant segmentation or just a semantic segmentation. And these take different types of model architectures work better because of what ends up in the latent space. And it generally involves the concept of feature reuse, or if you're talking about the natural language world, what to pay attention to. So, you know, what transformer has this concept of an attention head that tells later parts of the model this text or this transformation, this part of the text is more important than the rest. Same thing in imaging.
Okay. I always kind of humor, you know, in the seminal paper on the transformer. Some of them might know that as the original models, the BERT model. The paper was called Attention is All You Need. Okay? Which really started a whole new era. We didn't need recurrent neural networks anymore, to solve the problem. So, yeah, an analogy I would make, again, I'll just use natural language processing. What I really mean by essential features. Let's say I had a document several pages long, and then I had a summary of that.
I would want it if my model really was correctly trained on essential features, I could take both the long form and the short form and still get the same prediction. Okay? And that just means that my model has to learn what's essential, what to pay attention to, and just as equally, what to throw away.
[00:32:53] Unknown:
As you were figuring out what to include in the book, because of the fact that deep learning and machine learning is a space that is seeing such rapid evolution and so much attention. I'm curious what your challenges were of figuring out what to include and what to sort of optimize for so that the book remained useful for a longer period of time rather than having to rewrite it every 6 months because of, you know, new research or changes in the space.
[00:33:20] Unknown:
That's a nail you hit on the head there, I have to admit. I had to make that decision. How far did I go in describing seminal papers and then make the switch how to apply this into production? And you will find that about 2 thirds of the way through the book is where that switch happens. And I said, okay, I'm not going to try to teach you any more seminal stuff. I'm just gonna show you now how it's actually applied into production.
[00:33:47] Unknown:
Yeah. It's definitely 1 of the perennial problems with any sort of technical resources. You know, how long is it going to remain valid for? And then for people who read the book, what are your overall goals as far as what they're going to be able to take away from it and, you know, what other background knowledge they're going to want to have coming into it or what capabilities they'll have coming out of it after having read the full thing?
[00:34:14] Unknown:
You know, in our reviews, it's been you know, our preview is really good. A lot of data scientists have read it, even though they're not the target audience. You know, a lot of them have mentioned that, you know, they had a gap. And the way the book explained it was really helpful for them to fill in that gap. But the primary audience is software engineers. Our goal is to sort of demystify the whole process, for the software engineer and teach them how to reframe it in their world as design patterns, be able to take what they've learned here and practice and start applying it in the workplace.
[00:34:51] Unknown:
And then as far as your own work of being in the space and building deep learning models and teaching other people how to move into this particular area of the industry, what are some of the aspect of deep learning and machine learning and data management that are, you know, requisite for building these models that you see as being continuing challenges that you would like to see addressed more directly?
[00:35:16] Unknown:
Yeah. Well, again, let's roll back the clock a little bit to 2016, 2017. You kinda really thought of this as a serial process. You know, get the data right, train the model, make a prediction, evaluate it. If it looks good enough, make a prediction, kinda like the Kaggle thread. The real world doesn't look anything like that. It's very dynamic. We're continuously training. We're continuously evaluating. We're scaling these models. And so to me, all the challenges are substantially shifting into production. This is now where you get the phrase ML operations.
If anything, that area job wise is growing much faster than, say, research and development is. Okay. So just for the audience, for who are Swiss, who are thinking about entering into the space and where into the space, there is vast employment opportunities and machine learning operations. But the thing is it has similarity to DevOps in that you have lots of moving parts. It's complex and there isn't a perfect way for all these parts to come together. And so you have to use your ingenuity of how to put the parts together, how to monitor it, debug, and whatever issue comes up, how to address it.
And It's not a boring job. You're on the go every moment.
[00:36:43] Unknown:
And in your own experience and as you continue to work in the space and help educate others, I'm curious what are some of the areas of research or, you know, areas of focus either in the industry
[00:36:56] Unknown:
or in sort of upcoming architectures or capabilities that you're particularly interested in or keeping an eye on? When I see something's been going on for a while and then something and a shift in another area. So let's talk about what's been going on for a while. This really just understanding of the distribution of weights in your model and how it affects not only how good the model is, but how big your model has to be. Okay? Historic, we look at models because this stuff was random, and some of it learned part of the model, learned better than other parts. You needed a lot of redundancy. And so you have this what we call overcapacity in your model in parameters. And, of course, the problem with overcapacity is it opens up the door to parts of the model just memorizing data.
And so we throw in all this regularization, this noise, noise, noise, trying to stop it from happening. But the better and better we can find ways to get the right initialization, there's actually a well known paper on this called the lottery hypothesis. The idea is get the winning ticket on the weight initialization. We can start training what we call compact models. These are models that are a lot smaller. They don't have the overcapacity for memorization. We don't need as advanced regularization techniques on them. Okay? And when deployed, they'll be more compute and time efficient. So that's 1 area. Okay?
In the area of automatic learning, you know, historically, whatever system we use to propose 1 model, train it, then propose another model and train it, You were comparing model to model and making a decision what the next proposed model is. And so there was a 1 and 1, and about a year ago, some pretty good work on what's called distribution spaces for models. So, again, now again, where distribution comes in. So having a sort of an architecture instead of taking 1 model, is sort of taking a framework for a model architecture, and make a whole bunch of variants of that model architecture, do low epoch training, and then map out the results across that same entire set, maybe 500 models, and that's a distribution.
Okay? Then do that with another totally different model architecture. Don't do model the model comparison. Do the same thing. Make about 500 variants of it, random variants if you want to. Map out their distribution and compare the distributions and use that to define eventually what the best search space will be to find a good model. So I find out it's it's it's an evolving area. I just found it interesting because once again, distributions. Right? An obscure field I just happened to learn
[00:39:49] Unknown:
a long, long time ago. Ago. Yep. There's the perennial problem of, you know, I never used what I learned in school. Well, at least in your case, eventually, you did. Maybe not immediately, but it's come back around. And as far as working in industry and building models in education and in your current role, I'm wondering what you have found to be some of the most interesting or unexpected or challenging lessons that you've learned while working in the field and writing the book, and also if there are any patterns or practices that you have come across in your own research that you think should be more widely adopted that has so far been, you know, either overlooked or underutilized?
[00:40:31] Unknown:
Well, I I think you just go up base sort of, my current role at Google. You know, I interface with what we call enterprise class customers. So they're very large companies, corporations, multinationals. And a lot of, the struggle really continues to be what the model sees versus what the model was trained for. Okay? And how to not necessarily come up with new model architectures, but how to get those models to better generalize into a population that you really can't train it for.
[00:41:10] Unknown:
As far as people who are looking to get into this space and wanna be able to learn more after they finished your book, what are some of the other resources that you recommend they dig into to help them be more effective at building and running production ready models or being able to dig deeper into the existing literature?
[00:41:30] Unknown:
This is 1 area where I may be a little bad. I'm it's been years since I've really read books or blogs on deep learning. I just read research papers. I'm a scientist, you know. And I yeah. I'll be your software. I'll be your software engineer. Unless you came up as a scientist, you're not gonna be able to parse these papers. But, you know, I do look at, you know, both O'Reilly and Manning, recent publications. There definitely has been a significant shift in other authors, framing deep learning away from a statistical description to a software engineering description. And what I would do is I would just start looking at these books and the reviews and see how well, you know, software engineers really feel that that particular book is instrumental
[00:42:14] Unknown:
in educating them. Well, for anybody who wants to get in touch with you or follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And so with that, I'll move this into the picks. And this is 1 that I believe I've picked before, but it's worth calling out again, particularly given the context of this conversation. And it's the book Designing Data Intensive Applications by Martin Klepman. So it's, you know, sort of adjacent to the machine learning space and that it talks a lot about the sort of data management aspect and being able to build these systems that provide access to all the data necessary to train these models, but just a very well written and well structured book to be able to learn more about some of the principles that go into those types of systems. And so with that, I'll pass it to you, Andrew. Do you have any picks this week? The weather has just been great around here. Last weekend, I was up in the mountains and plan to go to the mountains again this weekend. So my pick is enjoy the great weather and the outdoors. Definitely always a good recommendation and something that we all need to be reminded of occasionally. Thank you again for taking the time today to join me and for the work that you're doing on the book. It's definitely an interesting problem domain, and I appreciate you helping to condense some of your knowledge into a form that software engineers are able to take advantage of and for you taking the time today to help me learn more about the problem space. So thank you again for all of your time and effort, and I hope you enjoy the rest of your day. Thank you for inviting me. Thank you for listening. Don't forget to check out our other show, the Data Engineering Podcast at data engineering podcast dotcom for the latest on modern data management.
And visit the site of python podcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email host@podcastinit.com with your story. To help other people find the show, please leave a review on Itunes and tell your friends and coworkers.
Introduction and Episode Overview
Guest Introduction: Andrew Furlich
Deep Learning and Model Architecture
Transfer Learning and Numerical Stability
Challenges in Data and Model Deployment
Writing the Book: Goals and Audience
Tooling and Practices in Deep Learning
Identifying Opportunities for Machine Learning
Design Patterns and Best Practices
Production Challenges and ML Operations
Future Research and Industry Trends
Lessons Learned and Recommendations