Summary
Deep learning is a phrase that is used more often as it continues to transform the standard approach to artificial intelligence and machine learning projects. Despite its ubiquity, it is often difficult to get a firm understanding of how it works and how it can be applied to a particular problem. In this episode Jon Krohn, author of Deep Learning Illustrated, shares the general concepts and useful applications of this technique, as well as sharing some of his practical experience in using it for his work. This is definitely a helpful episode for getting a better comprehension of the field of deep learning and when to reach for it in your own projects.
Announcements
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With 200 Gbit/s private networking, scalable shared block storage, node balancers, and a 40 Gbit/s public network, all controlled by a brand new API you’ve got everything you need to scale up. And for your tasks that need fast computation, such as training machine learning models, they just launched dedicated CPU instances. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
- You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, Corinium Global Intelligence, Alluxio, and Data Council. Upcoming events include the combined events of the Data Architecture Summit and Graphorum, the Data Orchestration Summit, and Data Council in NYC. Go to pythonpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today.
- Your host as usual is Tobias Macey and today I’m interviewing Jon Krohn about his recent book, deep learning illustrated
Interview
- Introductions
- How did you get introduced to Python?
- Can you start by giving a brief description of what we’re talking about when we say deep learning and how you got involved with the field?
- How does your background in neuroscience factor into your work on designing and building deep learning models?
- What are some of the ways that you leverage deep learning techniques in your work?
- What was your motivation for writing a book on the subject?
- How did the idea of including illustrations come about and what benefit do they provide as compared to other books on this topic?
- While planning the contents of the book what was your thought process for determining the appropriate level of depth to cover?
- How would you characterize the target audience and what level of familiarity and proficiency in employing deep learning do you wish them to have at the end of the book?
- How did you determine what to include and what to leave out of the book?
- The sequencing of the book follows a useful progression from general background to specific uses and problem domains. What were some of the biggest challenges in determining which domains to highlight and how deep in each subtopic to go?
- Because of the continually evolving nature of the field of deep learning and the associated tools, how have you guarded against obsolescence in the content and structure of the book?
- Which libraries did you focus on for your examples and what was your selection process?
- Now that it is published, is there anything that you would have done differently?
- Which libraries did you focus on for your examples and what was your selection process?
- One of the critiques of deep learning is that the models are generally single purpose. How much flexibility and code reuse is possible when trying to repurpose one model pipeline for a slightly different dataset or use case?
- I understand that deployment and maintenance of models in production environments is also difficult. What has been your experience in that regard, and what recommendations do you have for practitioners to reduce their complexity?
- What is involved in actually creating and using a deep learning model?
- Can you go over the different types of neurons and the decision making that is required when selecting the network topology?
- In terms of the actual development process, what are some useful practices for organizing the code and data that goes into a model, given the need for iterative experimentation to achieve desired levels of accuracy?
- What is your personal workflow when building and testing a new model for a new use case?
- What are some of the limitations of deep learning and cases where you would recommend against using it?
- What are you most excited for in the field of deep learning and its applications?
- What are you most concerned by?
- Do you have any parting words or closing advice for listeners and potential readers?
Keep In Touch
- Website
- @jonkrohnlearns on Twitter
- jonkrohn on GitHub
Picks
- Tobias
- Jon
- Data Elixir Newsletter
Closing Announcements
- Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
- Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
Links
- Untapt
- Deep Learning Illustrated
- Pearson
- Columbia University
- New York City Data Science Academy
- NIH (National Institutes of Health)
- Oxford Uniersity
- Matlab
- R Language
- Neuroscience
- Artificial Neural Network
- Deep Learning
- Natural Language Processing
- Computer Vision
- Generative Adversarial Networks
- Deep Learning by Ian Goodfellow, et al.
- Hands On Machine Learning by Aurélien Géron
- O’Reilly Online Learning
- Transfer Learning
- Keras
- Tensorflow
- PyTorch
- Gary Marcus
- Judea Pearl
- Artificial General Intelligence
- Explainable AI
- Yuval Noah Harrari
- Wait But Why?
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to Podcast Thought in It, the podcast about Python and the people who make it great. When you're ready to launch your next app or you want to try a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode. With 200 gigabit private networking, scalable shared block storage, node balancers, and a 40 gigabit public network all controlled by a brand new API, you've got everything you need to scale up. And for your tasks that need fast computation, such as training machine learning models, they just launched dedicated CPU instances. Go to python podcast.com/linode, that's l I n o d e, today to get a $20 credit and launch a new server in under a minute. And don't forget to thank them for their continued support of this show.
And you listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet, listen, and learn from your peers, you don't want to miss out on this year's conference season. We have partnered with organizations such as Dataversity, Corinium Global Intelligence, Alexio, and Data Council. Upcoming events include the combined events of the Data Architecture Summit in Graph Forum, the data orchestration summit, and data council in New York City. Go to python podcast.com/conferences today to learn more about these and other events and take advantage of our partner discounts to save money when you register.
Your host as usual is Tobias Macy. And today, I'm interviewing John Krone about his recent book, Deep Learning Illustrated.
[00:01:37] Unknown:
So, Jon, can you start by introducing yourself? Thank you very much, Tobias. It's an honor to be on the show. I am the chief data scientist at a machine learning company called Untapped, and our focus is automating aspects of business operations, particularly related to human resources and recruiting. So that kind of human side of things where we need to be building algorithms that carefully remove bias, that might exist in the training data, for example. So that's my day job, but I also, on the side, have been writing a book that just came out a couple of weeks ago called Deep Learning Illustrated, that was published by Pearson.
And that book is a product of me doing, a a number of different ways of teaching deep learning. So I've been running a deep learning study group community in New York for a number of years. I teach, graduate electrical engineers at Columbia University, every once in a while. And I also have my own curriculum, which is a a 30 hour deep learning curriculum that I offer at a professional academy here in New York called the New York City Data Science Academy. And yeah. So wearing a lot of different hats and recently added 1 final thing to that, which is that, as of September 2019, I have a National Institutes of Health grant to work with medical researchers at Columbia University to, automate aspects of, diagnosing infant brain scans. So lots of different things going on, but generally, the thread that ties them all together is deep learning. And do you remember how you first got introduced to Python?
Yes. I do remember being first introduced to Python. I was doing my PhD at Oxford University at the time, and I was working in MATLAB and R. And somebody who I respected a lot, a postdoc in my lab, came up to me and said, you know, there's really not any point in working in R anymore. Everything's moving over to Python, and
[00:03:51] Unknown:
that began my quest. And I noticed too that you have a background in neuroscience. So I'm curious how that has played into your overall So
[00:04:06] Unknown:
while So while the data that I was working with were, neuroscience data, so brain imaging data and, genome data, genetic data, I was learning how to apply machine learning techniques as the kind of primary focus of that PhD. So it has been something post PhD in the last few years that, these artificial neural networks, which form the basis of deep learning neural networks, started to become, useful enough in in a lot of applications, and that's related to, compute becoming a lot cheaper in recent years, data storage becoming a lot cheaper in recent years. And so this, deep neural network approach that is inspired by the way that biological brain cells work, by the way that biological, neural systems work has started to be, started to become useful.
And so there was this kind of this this, part of me because of that neuroscience background, I really took to learning about, deep neural networks, after my PhD. And, I I wherever I can, I draw threads between the biological inspirations that are behind many of the innovations in neural network and deep learning research?
[00:05:38] Unknown:
And before we go too much further, can you just give your description
[00:05:42] Unknown:
of how you would define the term deep learning for somebody who's not familiar with it? That is a great question, Tobias. I'm glad you're asking it at this point. So deep learning is a very specific technique. So while it gets used in the popular press as kind of a synonym for artificial intelligence, While artificial intelligence is an almost impossible to define term, deep learning is a very specific term which can be defined quite concretely. So since the 19 fifties, computer scientists, inspired by the way that the biological brain works and biological brain cells worked, have been creating computer simulations, simple algorithms that are inspired by the way that biological brain cells work. And so we call those algorithms artificial neurons.
Those artificial neurons can be linked together so that the output from 1 artificial neuron can form the input to several other artificial neurons. And in that way, we can have a network of artificial neurons. So these artificial neural networks, if you have several layers of them so you have an input layer that contains whatever your, input into your model is, and then you have an output layer that, that represents whatever prediction you're trying to make with your model. And then in between that input and the output, you have as many what we call hidden layers of artificial neurons as you like. And so if you have if you layer it in this way and you have at least 3 hidden layers, so a total of 5 layers when you include that input and that output layer, you can call this a deep neural network or a deep learning network.
[00:07:33] Unknown:
And you mentioned that the majority of the work that you're doing in your day job is around machine learning applications for being able to apply it to these various use cases. And I'm wondering how you are leveraging deep learning techniques in your own work. Yeah. Great question. So the,
[00:07:51] Unknown:
that structure that I just described, having many layers of these artificial neurons, allows, these deep learning neural networks to be able to automatically extract the most important aspects of the data that you're inputting into your model, for predicting whatever the outcome is that you're trying to predict with your model. So to to have a biological visual system analogy, the way that this works is if you build a, a machine vision algorithm with a deep learning network, then your input layer will have pixels of an image as the input and your output layer, of your neural network might then be, the class that that image corresponds to. So let's say you're building a image classifying machine vision system that is designed to distinguish cats from dogs.
In that case, you might have a 100 images of dogs that you input and a 100 images of cats that you input, and, you label all of those images as being either cats or dogs. So we're setting up our deep, learning model in a way so that it can learn to approximate pixels that represent a cat with the label cat and pixels that represent a dog with the label dog. And so the hidden layers of this machine vision network automatically learn how to extract the most important, information about those pixels in order to represent a cat or a dog or or more specifically to distinguish a cat from a dog. And the first layer of those artificial neurons in this many layered artificial neural network, that first layer will come to represent very, very simple, aspects of the pixels. So essentially just straight lines at particular orientation. So some of the artificial neurons in that first layer will represent vertical lines, some of them horizontal lines and 45 degree angles and so on. And then the second layer of artificial neurons in this deep learning network can take in that information about straight line detection, and and those straight lines can be nonlinearly recombined so that that second layer of artificial neurons can detect curves and corners.
And then you can have a third layer after that that does even more complex abstraction on the curves and corners and so on and so on. You can have many, many, such layers of artificial neurons in your deep learning network. And each 1, as you move deeper, can handle more complex, more abstract representations of the input data. And the really, really cool thing about deep learning models is that it is able to figure out what these important high level abstract representations are fully automatically from the training data alone. So you don't need to program any of that specifically.
And so that's what's made deep learning models so popular suddenly is as we've had the compute power and, the availability of data in the last few years, to be training these relatively beefy models, they can then on their own extract all of these these features from the raw data and solve all kinds of, complex problems. So that visual analogy, in a way, you can you can imagine that then in in my line of work, in my day job at untapped, we're concerned with, various models related to human resources. A really common model is predicting the fit of a given job applicant for a for a particular job. So we have clients, big corporate clients or recruitment agencies that handle millions of applications a year to thousands of different roles. And instead of sifting through all of those applicants with, say, a Boolean keyword search, our model can rank all of the applicants, the 1, 000, 000 applicants that you had over the last year for any 1 of the roles, that you are hiring for. And it it does that based on the natural language of the job descriptions, the natural language of the applicant's resume, and we've trained this up on 100 of millions of, decision data points where, where a client, be that a hiring manager or a recruiter has said, okay, based on this candidate profile, based on this job description, yes, I would like to speak, to this candidate, or no, this candidate is not appropriate for this role. So by having this huge dataset and then a deep learning model that's taking in, that natural language from the job descriptions and the resumes at 1 end, and then this outcome that we're trying to predict, is this person a good fit or not a good fit for the rule? We then have this deep learning architecture in the middle where the earliest levels of the architecture can look for very simple, aspects of the natural language. And as you move deeper and deeper into the network, we can model increasingly complex, increasingly abstract aspects of the natural language that is being used on the resumes and the job description. So you could find because of the because of the way that that works, you could end up in a situation where 2 candidates who have none, who have no overlapping words whatsoever on their resumes could be the top 2 candidates for a given job description because this, deep learning hierarchy is able to, distill from individual words the the contextual holistic meaning of an entire candidate profile. So with your background in neuroscience
[00:13:41] Unknown:
and your practical applications of deep learning techniques and your engagement in the education space of helping to upscale people who are trying to understand how to use these same technologies for their own purposes. It seems like a natural progression to then write a book about it, but I'm wondering if you can just talk a bit more about your motivations for doing that and some of the decision making process that went into figuring out how to approach it and how the idea of including illustrations
[00:14:16] Unknown:
York that I run and by teaching, say, at the New York City Data Science Academy, I developed a pretty good understanding of what topics needed to be covered in order to give somebody a a wide ranging education in deep learning. So this is, you know, covering the fundamentals of how deep learning works as well as the applications that people are most interested in, which are machine vision, natural language processing, these this technique called generative adversarial networks that can create, what appears to be artwork, and then as well as these game playing algorithms, deep reinforcement learning, algorithms.
So I, yeah. So so I kind of I gradually became more and more familiar with this body of knowledge. And by teaching it to students, I started to understand where they were most easily able to understand the content and where things were tricky. And what I found was by using a whiteboard, and this was actually something something that I've always been doing is I love teaching on whiteboards. And so drawing, figures that represent, concepts. So so instead of trying for a lot of people, an equation can be a lot easier to understand if I can draw it kind of visually in terms of, you know, how the the matrices of data are being used and transformed, what how these operations are happening in kind of a visual way. So that's always been kind of a natural thing to me, and that and it became clear to me through teaching that this is something that works for a a large number of students. It's a it's a way that they really take to learning this relatively complex content.
So at brunch 1 day, on a Sunday in New York, I was out with 1 of my best friends who has been at, at Alphabet at, working at Google or YouTube for about, 12 years. And at the time, his girlfriend, now his wife, Aglay Bassins, she is a professional artist. And I pitched her this idea over brunch of, you know, I think if we, made this as as a book, I think if we had an illustrated approach to learning about deep learning, this is something that a lot of people would really benefit from. What do you think about that? And perhaps because through her, now husband being exposed so much to machine learning techniques at, at Alphabet, she was immediately very interested. And, she was an absolute joy to work with over the entire process.
So,
[00:16:58] Unknown:
yeah, that's how it all came about. And I'm curious if you can give a bit of a comparison to some of the other books that you've encountered on the subject of machine learning or deep learning, and some of the benefits that you see your book providing comparatively, and some of the some of the ways that the target audience that you're focusing on would gain better understanding or better value than some of the other books that they might be able to pick up. Perfect. So that's not a question I've been asked before, and it's an interesting 1 because
[00:17:31] Unknown:
all of the books that, I guess, my book, quote, unquote, competes with, you know, there's some there's some benefit to them, relative to my book and, you know, so there's just there's always some kinds of trade off, with all of the great, books in deep learning. The the seminal academic text in deep learning is called deep learning, and, it's by Ian Goodfellow and Joshua Bengio and Aaron Courville. So these are these are academics, between the University of Montreal and the Google Brain team, and they developed this, yeah, this academic textbook that covers the the mathematical theory of deep learning very thoroughly.
However, it doesn't have any hands on examples. So it's so that so that book is all about learning the theory, and and so our book is completely distinguished from that by by being focused on application. And while we do cover the essential parts of the theory, we do that in a way, that is quite different from that MIT Pressbook. So, where we have to use equations, they are in full color. So we have, so any variables that are used in the book, their coloring is continuous throughout the entire book and you see those colors replicated in both the body of the text and in the equations and in the illustrations.
And so that kind of continuity in that color, can make it easier to understand the underlying theory. And then, of course, just our our having lots of hands on applications means that it can have a lot more value to somebody who's a hands on practitioner. And that those kinds of hands on examples can also give you a great sense of how these things work in practice in a way that just looking at the equations and understanding the equations might not. So that's, you know, that's kind of, you know, 1 of our primary quote, unquote competitors in the space. The most popular book probably today in terms of hands on applications is, Aurelian Girond's hands on machine learning book. And that book, while introducing machine learning in general, also does talk a lot a lot about deep learning, which is a type of machine learning approach in particular.
And, it is a really, really great book. His second edition is coming out, shortly, and I was 1 of the technical reviewers of that second edition. It's it's no surprise that it's the most popular book in machine learning today because it offers such a wide ranging look at, all of the kinds of machine learning approaches that that are out there, and and is replete with, hands on examples. So that book is really great. And where we distinguish from that is, again, we are focused specifically on deep learning. So while that book is focused on being a general machine learning introduction, our book is very specifically about deep learning. And so we can go into that in more depth than Aurelia and Jiral had time for. And then again, of course, we do have these colorful illustrations and the way that we tie together all of the, all of the variables, with in full color throughout the figures and the equations and the body text. So it is a it is a different kind of of book.
To my knowledge, there is no book on the market that makes use of color for explaining, any kind of theory, any kind of, mathematical or statistical theory, in the way that we have. And actually, that's kudos to, my coauthor Grant Balebelt, who had that insight. He suggested, early on that wherever we have to include equations, it would be beneficial to the user to have those be in full color. Yeah. It definitely helps to pick apart the
[00:21:24] Unknown:
most notable
[00:21:25] Unknown:
components of the equation rather than just seeing everything in plain black and white because then it all just sort of blends together, and it requires a lot more effort a lot more effort to be able to parse it and understand how the different pieces are reacting with each other. Yeah. Exactly. That's the idea, Tobias. I'm glad that you, that you see it that way as well. So when we you know, when you see an equation in the book and then below it, it there's an explanation of what this equation is, You can just very quickly say, ah, well, there's the purple part of the equation and here's, you know, that purple part in the text, and you can very quickly make that connection. And then in some cases where we then on top of that say, okay. Here's a figure that kind of explains, you know, how these pieces fit together visually, and you can, again, just at a glance see, across the figure, the equation, and the body of the text very clearly. This is the purple part, and it it,
[00:22:17] Unknown:
I'm I'm glad that you see the value in that too. And for the target audience of the book, I'm curious how much background understanding of programming or statistics or machine learning is necessary, and to what level of facility you expect them to get to by the end of the book? So I deliberately
[00:22:37] Unknown:
designed the book so that the first 4 chapters of 14 chapters has no code and no equations. So the first 4 chapters of the book are intended for any kind of, interested learner. So anybody with an interest in how, deep learning or artificial intelligence works and is interested in getting exposed to the range of applications that it has. So whether, you know, machine vision, natural language processing, creativity, and, complex decision making, regardless of which of those you're interested in or or all of them and just seeing kind of what does it mean, to have artificial intelligence today, or where is this field going, what's possible in my field. Anybody can get that from reading the first 4 chapters. In chapter 5, we begin introducing Python code. And and then through the rest of the book, all the way through to chapter 14, there are, examples in Python.
They are especially in, earlier chapters, these examples are, fairly straightforward. So if you have experience with any object oriented programming language, not necessarily Python, then it should still be quite straightforward to see what's happening, in these code examples. And I went through, to great lengths to make sure that I explained in detail every single line of code in the body of the text. So even if you're not already familiar with Python or even if you're if this is your first exposure to object oriented programming, those thorough explanations, should make it, possible, though maybe not as easy for somebody who who does have Python experience to to follow along and see what's happening in these examples. So, yeah, so some Python experience or at least object oriented programming language experience would definitely make, taking in chapters 5 and onward easier. And then it's the same kind of thing for machine learning or statistics experience.
So if you if you happen to have experience with statistics or some other machine learning approaches like regression modeling or support vector machines or random forest or what have you or just the scikit learn library in general, then the book would definitely be easier. But, again, I went to great lengths to make sure that I was explaining everything as clearly as I could so that even if you didn't have, experience in machine learning or statistics, you should be able to follow along at a high level. And then I provide lots of resources in footnotes so that if something is if there's something that you need to dive deeper on, you can do that on your own time. And 1 of the challenges
[00:25:15] Unknown:
that exists anytime somebody is trying to encapsulate a technical topic in printed form is the idea of timeliness and how you guard against the information becoming obsolete as new techniques evolve, new libraries come about as the libraries themselves evolve. And so I'm curious how you approach that particular problem and your selection process for the technologies and techniques that you decided to incorporate ultimately.
[00:25:42] Unknown:
I love that question, Tobias. So that is a tricky 1. Things move very quickly in, the machine learning field and expect that there will be a second edition of this book coming in the next few years that will be updated to the latest libraries, you know, the latest TensorFlow, PyTorch, or Keras libraries, or whatever is the, invoke deep learning library of the day a couple of years from now. However, in terms of maybe specifics of the particular packages that get used will definitely change. The nice thing about deep learning is that the vast majority of the theory is quite old already.
So the theory around the artificial neurons that make up a deep learning network, that theory has been around since 19 fifties and hasn't changed very much. And then in terms of actually networking those artificial neurons together into a deep learning network, most of that theory was figured out in the eighties, a little bit in the nineties, and then in the early, 20 tens, we had a few key breakthroughs. But those those breakthroughs through the nineties and more recent years, they're kind of, they tack on to the earlier theory. And so in deep learning, at least, we're not seeing old theory kind of wiped away entirely and starting with a a a completely new approach to some, theoretical concept.
Instead, what we've been seeing from the 19 fifties through to today so far is that we, build upon existing theory. And so in that sense, I think the vast majority of the content in this book is future proof at least for a decade or so. Like, you know, it's hard to imagine that it's possible that some completely different kind of approach will make deep learning obsolete in the coming years, but, there's no signs of that, yet. And so, you know, when I when I sit down to write the second edition a couple of years from now, I think it will you know, I'm not gonna need to rewrite all of the theory. Instead, I'll just be tacking on more of the new techniques, new approaches that have come about in the intervening couple of years. And now that it has been published, I'm curious if there are any elements
[00:28:00] Unknown:
of the topics that you covered or the specifics of the code examples that you think you would have done differently or that you think might need updating in the near future? There isn't
[00:28:12] Unknown:
anything that I look at now that I feel would need a complete overhaul or that I wish was done completely differently. The main thing that I look forward to being able to do as I sit down to write a second edition is being able to add more. This already this book is already, a fair bit more dense than, the publisher Pearson was looking for. So they were hoping for at least, 250 pages. The book now is, it's 416 pages. And so it does have a ton of of detailed content, but there's so much more that I would like to add. And so, there is yeah. There isn't really anything that I would like to do differently.
I I just look forward to having the time to add in even more information, and that's also the kind of thing that we saw with, Aurelien Gerald's book, which I mentioned earlier. So that first edition already was so comprehensive, as an introduction to machine learning.
[00:29:12] Unknown:
But with his second edition, he was able to add in even more, detail on so many different topics and make a much thicker book. So I look forward to being able to do that with my second edition as well. And you covered a few different problem domains where a deep learning can be applied, such as natural language processing and computer vision, which you mentioned, and generative adversarial networks. So I'm wondering what your selection process was for the specific problem domains and how you approached determining what the sufficient level of depth was to be able to cover it appropriately so that the reader could get a good understanding of it without So
[00:29:55] Unknown:
the So the initial seed for what content went into the book was, the content that we were covering in the deep learning study group that I run. So at the end of every study group session, our final agenda item was always, alright. Now let's talk about what else we should be learning. What should we be learning for next time or what should we be putting on the list for learning at some point? And so these particular applications, computer vision, natural language processing, generative adversarial networks, and deep reinforcement learning for complex sequential decision making, these 4 areas stood out as clearly the the most important areas. So that's how I came up with the initial list of kinda high level topics to cover in the book. And then in terms of how much for every 1 of those topics, there is at least 1 very deep and detailed, hands on code example. So for some of these techniques, like generative adversarial networks or deep reinforcement learning, you know, those are those are the 2 most complex topics covered in the book. And so for those 2 topics, having 1 thorough code notebook example and covering that from beginning to end was more than enough material for the reader, in my opinion. For the other topics for machine vision and natural language processing, those are, topics with a lot of different things with a lot of different things that we can be doing in them. So, you know, with machine vision, we can be classifying images as being in a particular category. We can be segmenting images into, you know, pixel by pixel into what the different elements of the image are.
In natural language processing, there's a huge variety of tasks that could be handled. So classifying documents, auto generating, content, translation between languages, chatbots. And so some of those start to get way too complex to cover in this kind of overview book. So things like machine translation or chatbots, Well, everything that we cover in the book serves as a great foundation for those applications. You know, they really need they would need whole chapters, to cover properly. And so, with those machine vision and natural language processing topics, what I did was I have several complete thorough examples of kind of intermediate complexity topics. And then I say, hey. If you're interested in these even more, complex topics, here, you know, are a few paragraphs that summarize what's possible today, and here are links to the key papers and GitHub repositories so that you can go off and and learn about those things on your own. And for that natural language processing topic in particular, because that is what I do, at my day job at untapped, It's of particular interest to me, and and it's I also noted that it's of particular interest to, to readers because, I teach online, you know, Riley Safari twice a month. So I do a 3 hour tutorial, And and, you know, I I do I do, lectures around New York to various meetups and conferences. And at at the end of each of those kinds of venues, and some of these have hundreds of people in the audience, okay. And what are you most interested in learning about next? And I say, is it machine vision? And some hands go up. Okay. Is it generative adversarial networks? And some hands go up. Is it deep reinforcement learning? And some hands go up. But when I ask, is it natural language processing or is it handling time series information? Because natural language is just an example of time series data, because it you know, whether it's words on a page or audio of speech, it flows in 1 dimension over time. And so, it's that topic that you see a huge number of hands go up. So I know that that's of huge interest, and I and so my my next book actually, and I have a verbal agreement with Pearson on this already. They're awaiting my, you know, my full proposal. The my next book is going to be focused,
[00:33:47] Unknown:
entirely on natural language processing. And so that will give me the opportunity to expand more on that particular topic. And 1 of the other things that I liked in the way that you structured the book is that at the tail end of it, you had some examples of what else you can do and ways that you can continue your learning and some different project ideas or categories that the reader can engage with. And I especially liked the fact that you were encouraging people to do things that will have beneficial social impact and some resources for them to be able to find ideas for that and engage with different organizations that would benefit from that technical acumen. I'm I'm really glad that you enjoyed that part of it. For me, that was a really important,
[00:34:29] Unknown:
chapter to write. And the ideas behind that final chapter were were spurred by my experience largely teaching this content at the New York City Data Science Academy. So, you know, it's this 30 hour curriculum that I do over 5 Saturdays, and, it's this textbook really is is the accompanying content to, those lectures and exercises that I do over those 30 hours, at the academy. So, I knew from from doing that teaching that what students want to be able to do is not just be able to go through the examples that you've done in class. People want to be able to devise their own projects. They wanna be creative with deep learning. They wanna be able to apply deep learning to their particular field of interest. And so that chapter, that final chapter comes out of my experience, mentoring students on developing their own deep learning projects. And so a a big part of that 30 hour course from the very beginning, from the 1st week, I say, okay. You know? You don't have to do your own self guided project, but it will really help you cement the ideas that we're covering in in this course. And so I highly recommend that you do that. And from the very first week, I have a framework for, initially ideating and then later concretizing a particular project and executing upon it over over the course of the course. So that final chapter is is the kind of is is that process where I outline, okay. You know, if you're not if you don't have a particularly creative idea of something that you'd like to do, here are some here are some relatively easy ideas off the shelf datasets you can use. If you wanna be doing something with your own data, here are some tips for doing that. If you want to be, exploring more complex datasets or scraping your own datasets off the web, here are some resources for doing that. And then the final piece there about the kind of social impact, that you can have and and and making that clear, you know, that's just something that, you know, I didn't have to be including that, and I I'm not aware of many other textbooks that kinda make that, social impact summary or recommendation at the end. But for me as a relatively young person at this time in the history of our planet, we have terrific opportunity in in so many quantitative ways. Life has never been better on this planet for humans, at least, where, you know, in terms of lifespan and quality of life, we live today, in a way that, kings a century ago couldn't have imagined. And so in on the 1 hand, this this is you know, we we should definitely be happy and positive about where we are in the world, but there's also a lot of uncertainty about where we are in the world.
There are far more people on this planet than there ever have been in history, and each 1 of those people is, constantly demanding more and more energy and resources. And so, you know, the the burden that we are placing on the ecosystem of our planet is tremendous, and it looks like it's going to become more and more and more. And so machine learning combined with the Internet of things, so the kind of the the prevalence and cheapness of sensors being everywhere, in my view, it has the potential to allow us to continue the wonderful trend that we've had over the last 150 years towards prolonging, human life and making human life more satisfying than ever before while at the same time, allowing us to coexist peacefully and indefinitely, on this planet. So, yeah, so that's that's the kind of a bit of inspiration behind, suggesting that people tackle social impact projects, and and then I include, resources in there for, you know, if you're looking for something to do with your time or your machine learning skills, here are some serious problems that we're facing today that could be worth focusing your attention on. And
[00:38:32] Unknown:
1 of the things that I'm curious about in terms of people coming out of this book or your courseware and having the fundamentals of being able to build these neural networks, and they have built out some sample models is the ability to repurpose some of that same code or some of the model pipelines for different applications or different datasets. Because my understanding is that 1 of the critiques of deep learning is that it is largely single purpose in terms of once you build a single model, it is great at that particular use case, but it is generally difficult to be able to repurpose that to a slightly different context. And I'm wondering what your experience has been in that and some of the recommendations that you have for practitioners and engineers to make it easier to be able to componentize the model pipeline to make it more reusable and more flexible? Outstanding question, Tobias. So neural networks are interesting
[00:39:31] Unknown:
in that they are actually highly flexible and can be retrained to particular tasks. So while any given deep learning network at any given point in time might be very highly specialized to a particular task, you can use what we call transfer learning to take that existing network and repurpose it to some related task, quite effectively. So, in chapter 10 of the book, we go over a machine vision example where we take a very deep neural network that was trained on a huge dataset of millions of image net images called the ImageNet dataset, and it would be very expensive computationally. You know, it would take, weeks on a on a high end server, deep learning server with GPUs to train up such a deep machine vision model on such a large dataset. But with modern, deep learning frameworks, including, the Keras API in TensorFlow, you can trivially easily in a line of code use that that deep, very nuanced machine vision model, load it in, and then you can adapt that model to your own particular use case. So in the textbook, what we have it doing is in deep learning illustrated, we have it. We transfer learn in order to be able to distinguish, images of hot dogs from images that are not hot dogs, so other types of fast food. And, that's a a funny idea inspired by the HBO Silicon Valley series where 1 of the characters on that show builds a hot dog, not hot dog detector. And, and so so in that sense, deep learning models are, quite flexible. Now, of course, you can't take a model that was built for a machine vision task that reads in pixels and outputs what you know, whether it's a hot dog or not and take that and and have it be reading in resumes and job descriptions and predicting, whether, you know, a given person is a good fit for a given role. So there's, you know, there's a limit to to what this transfer learning can accomplish. So I don't I guess I I overall kind of don't agree with the point that that deep learning models are are quite fixed or or not usable for other purposes, that they actually are, quite easily repurposed to related kinds of tasks and that this kind of transfer learning can be a a very powerful, thing to do. And in fact, to just give 1 final example, with the work that we do here, you know, our core model, if if you think about it that way, the only kind of mat the only model that we've applied for a patent for is this job to candidate, matching algorithm, which we've trained on 100 of millions of data points. But many of our clients are interested in other human resources or recruiting related models. And so often what we do is we take that that starting point, this this beefy model trained on a huge dataset, and we can repurpose parts of it, for, other human resources related tasks like matching, a candidate to a pool of of other candidates, for example. And another issue that I've seen identified
[00:42:34] Unknown:
with machine learning in general, but particularly with deep learning, is the fact that it is excellent at creating correlations, but that it is difficult to be able to model causal inference about the results that they produce. And I'm wondering what your thoughts are on that and any useful references that you might be able to point people to to maybe dig deeper on that particular problem space. It is definitely
[00:43:00] Unknown:
a shortcoming of deep learning models today. And, indeed, most statistical or machine learning science is that, you know, the vast majority of techniques that exist for modeling data, whether, yeah, it's statistics, deep learning, or some other machine learning process. For the most part, these techniques are great at identifying correlations, but have little capacity, if usually no capacity, to say anything about causal direction. And that isn't the case with everything, and actually a big part of my PhD was using particular, machine learning models and and Bayesian statistical models to infer causality, which, in the case of my PhD research was, in some cases, relatively straightforward because you could say, for example, you know, somebody's if if you find a correlation between a gene and some behavior, well, we know that genomes are fixed over a person's lifespan except for random mutations. And so there's no way that, the causal direction could be from somebody being anxious and that causing them to have, their genes change in a way that, you know, the genetic sequence can't change in a way that then is more likely to be the profile of somebody who's anxious. So in some ways, you you know, so there are some problem spaces where you can define causal direction based on your knowledge of the data.
But, of course, the model itself is not aware of that, underlying understanding. So, yeah, Gary Marcus is a researcher at, New York University who, has called up a lot of, shortcomings of deep learning models today, and and 1 of the big ones is this, causality this this this inability to infer causality. I don't actually have specific resources on how to resolve that. Gary Marcus might, and I and I cite a Gary Marcus paper in chapter 14 of my textbook that probably could point you in the direction of, resources on on trying, you know, to to to pursue models that, that have more causality, in them. And and, actually, I can well, so 1 resource I could point people in the direction of is, there's an author named Judea Pearl, j u d e a Pearl, like pearls from an oyster. And, Judea Pearl has read extensively on causality, and he has several books on it, including 1 called causality, and and that might provide people with some techniques for, identifying causal direction with the data that they're working with. But, yes, as deep learning models, you know, as they typically stand, there's there's no capacity to infer causal direction, and that is 1 of the shortcomings that we'll have to overcome, as Gary Marcus himself points out, in order to bridge the gap from the, narrowly defined artificial intelligence systems that we have today, which are, say, able to identify what's in an image accurately and expanding to a general intelligence that is more like the broad intellectual capacities that you and I have as human beings. Yeah. So there's a huge amount of work required in that space, and I imagine there's going to be tens of thousands of, deep learning engineers
[00:46:19] Unknown:
over the coming decades tackling that problem. And what are some of the other limitations of deep learning as a particular practice and some of the cases where you would recommend against using it? Another really great question, Tobias. So in the vein of
[00:46:35] Unknown:
causality, being a difficult thing for deep learning algorithms to identify, in the same way, it is, it is often difficult to understand, to explain why a deep learning model has made a particular decision. And so if you use a a linear regression model to solve a particular problem, then, you know, your regression model might have a dozen inputs, and each of those dozen inputs is associated with a very specific weight. And we can we can then for any prediction that that regression model makes, we can look at those weights and say, okay. Well, the reason why this linear regression model made this prediction is because of factor x, y, and z, and and those factors x, y, and z are weighted by these very, very specific amounts that the linear regression model has has identified. In deep learning, because we can have, 1, 000, 000 or even billions of parameters in our networks, it can be difficult to put your finger on exactly why it has some particular output. And so some people talk about, deep learning models as being a black box because of that. Now there is quite a bit of research being done on on explainable AI, quote, unquote, which kinda gives us some insight into what is happening in the black box. And because of my work at untapped, building these human resources models, this is something that all of us here are are very familiar with. Because when you're building a model that can recommend a particular individual for a particular role, it's absolutely imperative to our clients that they know that that isn't happening based on some demographic factor like gender, age, or race. And so it is possible to begin to, distill the important parts.
You know, so if you're building a model to to to predict the applicability of a given candidate to a given role, for example, 1 thing that we've done is we say, okay. We have this big pool of female applicants for a role, a big pool of male applicants for a role. What are their relative scores? How did they score on average for this role? And we see that with our model and and the modeling process that we followed, okay, the regardless of gender, the the distribution of probabilities looks identical. So although although we might not understand how every single neuron amongst the millions of neurons in our artificial neural networks are behaving in order to produce the outcome that we're creating, we can be mindful about what what training data we're using to train the network and remove, bias or or problematic data from the inputs. And then also after we've done training, we can then do these kinds of tests like that 1 I just described to make sure that, you know, males and females are getting the same, scores for a given role, to ensure that these precautions that we took have been effective in preventing bias. And so that even though there's a black box, we we understand its behavior sufficiently that we're comfortable using it. And in terms of the ongoing research
[00:49:51] Unknown:
or applications
[00:49:53] Unknown:
of deep learning and the direction that the field is going, I'm wondering what you're most excited by and what you're personally concerned by. So the it's easy for me to answer the most excited by question because I kind of already answered that, earlier on in this podcast, which is that the thing that most excites me about, deep learning is its capacity to allow us to continue to make exponential strides in, human quality of life and maybe even quality of life for other animals on the planet while simultaneously avoiding an ecosystem catastrophe. So that that potential for machine learning techniques and particularly deep learning techniques is what I'm most excited about.
Beyond that, which are applications that, I can foresee, being possible over the coming decades, beyond that, there is this possibility that and and it's only a theoretical possibility that we can engineer an intelligent system that is as intelligent or more intelligent than a person or any group of people. And that's potentially exciting. And, you know, people call that the singularity. So Ray Kurzweil, I think, coined that term, to refer to that moment where we build a machine that is more intelligent than humans. And, you know, that's kind of exciting in a way too, although that brings about a a huge amount of fear as well because we have no idea, how such a system will will treat humankind or how we will interact with such a system. It's impossible for us to even imagine it. So there are some great resources if you're interested in in kind of thinking about that problem. You've all Noah Harari, who is most famous for his book, Sapiens, also has a great book called Homo Deus, which is a Latin term he coined referring to so if Homo sapiens is thinking man, then Homo Deus is god man, and and he and a big part of that book is is talking about what could happen and how we might be treated by an an an intelligent life form on this earth that we create that is much, much more intelligent than us. And for kind of that's kind of a dense read. If you're interested in a in a relatively quick introduction to that topic, Tim Urban, who, writes the blog Wait But Why, and who, really kindly gave us an endorsement, for Deep Learning Illustrator that appears on the on the back of the book and and inside the front cover.
He does a great long form series of blog posts. It's 2 blog posts that cover what artificial intelligence is today and what could happen as we approach or go past that singularity. So, yeah, so that's so that's the so in the short term, you know, over the coming decades, the thing that I'm most excited about is is the capacity for machine learning, deep learning, and the Internet of things to, to to make, life on this planet more, wonderful and peaceful than ever before, continuing the trends that we've seen over the past century. And then in the longer term, I'm in equal measures excited and afraid what could happen as,
[00:53:06] Unknown:
if the singularity happens. Are there any other aspects of the field of deep learning or your work on the book that we didn't discuss yet that you'd like to cover before we close out the show or any other parting words that you'd like to,
[00:53:18] Unknown:
give to the listeners and potential readers? That's a great opening, Tobias. Nothing actually really comes to mind. I got to talk about so many of the things that excite me most about deep learning, in our conversation today, and I even got to talk about, you know, the social impact concepts a couple of times. Now I'm really satisfied, Tobias. I I really enjoyed this podcast today, and, I hope, your readers,
[00:53:42] Unknown:
take away some interesting tidbits from it. Well, for anybody who wants to get in touch with you or follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And so with that, I'll move us into the picks. And this week, I'm going to choose a website called spurious correlations. And so it's just a list of different datasets that correlate but are obviously not causally related. So 1 of the examples is that the divorce rate in Maine correlates with per capita consumption of margarine. So it's just a list of different charts with hilarious correlations that are obviously not causally related as a sort of, warning to not read too much into the fact that 2 datasets happen to relate to each other and give you a second to think twice about that. And so with that, I'll pass it to you, John. Do you have any picks this week? That is a really fun website. I've come across that before.
[00:54:32] Unknown:
A recommendation I have, that I use primarily to keep up with, innovation in data science and deep learning in particular, there's a great newsletter called Data Elixir, elixir. And, Data Elixir is it's a it's a 1 man blog that has, yeah, between half a dozen and a dozen, articles in it each week. And I have never come across anything that so
[00:55:00] Unknown:
succinctly captures everything all of the major events that you need to keep an eye on, in the world of data. Well, thank you very much for taking the time today to join me and discuss your work on the book and your experience both working in and teaching deep learning. It's definitely a fascinating field, and I've enjoyed the time I have spent with the book, and I definitely plan to read it in its entirety. So thank you for all of your efforts on that front, and I hope you enjoy the rest of your day. Awesome, Tobias. It's great to hear that, and it's been an absolute pleasure being on your show. I have never come across such a thoughtful
[00:55:34] Unknown:
and thorough list of questions. So thank you very much for the time.
[00:55:39] Unknown:
Thank you for listening. Don't forget to check out our other show, the Data Engineering Podcast at data engineering podcast.com for the latest on modern data management. And visit the site of pythonpodcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email host at podcastinit.com with your story. To help other people find the show, please leave a review on Itunes and tell your friends and
[00:56:10] Unknown:
coworkers.
Introduction and Episode Overview
Guest Introduction: John Krone
John's Background and Career in Deep Learning
Defining Deep Learning
Applications of Deep Learning in Business
Motivations for Writing 'Deep Learning Illustrated'
Comparison with Other Deep Learning Books
Target Audience and Learning Path
Timeliness and Future-Proofing the Book
Reflections on the Book and Future Editions
Selecting Problem Domains for the Book
Encouraging Social Impact Projects
Repurposing Deep Learning Models
Challenges in Causal Inference
Limitations and Explainability in Deep Learning
Future Directions and Concerns in Deep Learning
Closing Thoughts and Contact Information