Illustrating The Landscape And Applications Of Deep Learning

Hello, and welcome to Podcast Thought in It, the podcast about Python and the people who make it great.

When you're ready to launch your next app or you want to try a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode.

With 200 gigabit private networking, scalable shared block storage, node balancers, and a 40 gigabit public network all controlled by a brand new API, you've got everything you need to scale up.

And for your tasks that need fast computation, such as training machine learning models, they just launched dedicated CPU instances.

Go to python podcast.com/linode,

that's l I n o d e, today to get a $20 credit and launch a new server in under a minute. And don't forget to thank them for their continued support of this show.

And you listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis.

For even more opportunities to meet, listen, and learn from your peers, you don't want to miss out on this year's conference season.

We have partnered with organizations such as Dataversity,

Corinium Global Intelligence, Alexio, and Data Council.

Upcoming events include the combined events of the Data Architecture Summit in Graph Forum, the data orchestration summit, and data council in New York City.

Go to python podcast.com/conferences

today to learn more about these and other events and take advantage of our partner discounts to save money when you register.

Your host as usual is Tobias Macy. And today, I'm interviewing John Krone about his recent book, Deep Learning Illustrated.

So, Jon, can you start by introducing yourself? Thank you very much, Tobias. It's an honor to be on the show.

I am the chief data scientist at a machine learning company called

Untapped,

and our focus is

automating

aspects

of business operations, particularly

related to human resources and recruiting. So that kind of human side of things where we need to be building

algorithms that carefully remove bias,

that might exist in the training data, for example.

So that's my day job, but I also,

on the side, have

been writing a book that just came out a couple of weeks ago called Deep Learning Illustrated,

that was published by Pearson.

And that book is a product of me doing,

a a number of different

ways of teaching

deep learning. So I've been running

a deep learning study group community

in New York for a number of years. I teach,

graduate electrical engineers at Columbia University,

every once in a while. And I also have my own curriculum,

which is a a 30 hour deep learning curriculum that I offer at a professional

academy here in New York called the New York City Data Science Academy. And yeah. So wearing a lot of different hats and recently added 1 final thing to that, which is that,

as of September

2019,

I have a National Institutes of Health grant

to work with

medical researchers at Columbia University

to,

automate aspects of,

diagnosing

infant brain scans. So

lots of different things going on, but generally, the thread that ties them all together is deep learning. And do you remember how you first got introduced to Python?

Yes. I do remember being first introduced to Python.

I was doing my PhD at Oxford University at the time, and

I was working

in MATLAB

and R.

And somebody

who I respected a lot, a postdoc in my lab, came up to me and said, you know, there's really not any point in working in R anymore.

Everything's moving over to Python,

and

that began my quest. And I noticed too that you have a background in neuroscience.

So I'm curious how that has played into your overall

So

while

So while the data that I was working with were,

neuroscience data, so brain imaging data

and,

genome data, genetic data, I was learning how to apply

machine learning techniques as the kind of primary focus

of that PhD. So

it has been something

post PhD

in the last few years

that,

these artificial neural networks, which form the basis of deep learning neural networks,

started to become,

useful enough in in a lot of applications, and that's related to,

compute becoming a lot cheaper in recent years, data storage

becoming a lot cheaper in recent years. And so this,

deep neural network approach

that is inspired by the way that biological brain cells work, by the way that biological,

neural systems work

has started to be, started to become useful.

And so

there was this kind of this this,

part of me because of that neuroscience background,

I really took to

learning about,

deep neural networks,

after

my PhD. And,

I I wherever I can,

I draw threads

between the biological

inspirations

that are behind many of the innovations in neural network and deep learning research?

And before we go too much further, can you just give your description

of how you would define the term deep learning for somebody who's not familiar with it? That is a great question, Tobias. I'm glad you're asking it at this point. So deep learning is a very specific

technique. So

while it gets used in the popular press as kind of a synonym for artificial intelligence,

While artificial intelligence is an almost impossible to define term,

deep learning is a very specific term which can be defined quite concretely. So since the 19 fifties,

computer scientists,

inspired by the way that the biological

brain works and biological brain cells worked,

have been creating computer simulations, simple algorithms

that are inspired by the way that biological brain cells work. And so we call those algorithms

artificial neurons.

Those artificial neurons

can be linked together so that the output from 1 artificial neuron can form the input to several other artificial neurons. And in that way, we can have a network of artificial neurons.

So these artificial neural networks,

if you have

several layers of them so you have an input layer that contains

whatever your,

input into your model is, and then you have an output layer

that,

that represents

whatever

prediction you're trying to make with your model. And then in between that input and the output, you have as many what we call hidden layers of artificial neurons as you like. And so if you have if you layer it in this way and you have

at least 3 hidden layers, so a total of 5 layers when you include that input and that output layer, you can call this a deep neural network or a deep learning network.

And you mentioned that the majority of the work that you're doing in your day job is around machine learning

applications

for being able to apply it to these various use cases. And I'm wondering how you are leveraging deep learning techniques in your own work. Yeah. Great question. So the,

that structure that I just described, having many layers of these artificial neurons,

allows,

these deep learning

neural networks to be able to automatically

extract

the most important aspects

of the data that you're inputting into your model,

for predicting whatever

the outcome is that you're trying to predict with your model. So to to have a biological visual system analogy,

the way that this works is if you build a,

a machine vision algorithm with a deep learning network, then your input layer will have pixels of an image as the input

and your output layer,

of your neural network

might then be,

the class

that that image corresponds to. So let's say you're building a image classifying machine vision system that is designed to distinguish cats from dogs.

In that case, you might have a 100 images of dogs that you input and a 100 images of cats that you input,

and,

you label all of those images as being either cats or dogs.

So we're setting up our deep,

learning model in a way so that it can learn to approximate pixels that represent

a cat with the label cat and pixels that represent a dog with the label dog.

And so

the hidden layers of this machine vision network

automatically

learn how to extract the most important,

information about those pixels in order to represent a cat or a dog or or more specifically to distinguish a cat from a dog. And

the first layer

of those artificial neurons in this many layered artificial neural network, that first layer

will come to represent very, very simple,

aspects of the pixels. So essentially just straight lines at particular orientation. So some of the artificial neurons in that first layer will represent vertical lines, some of them horizontal lines and 45 degree angles and so on. And then the second layer of artificial neurons in this deep learning network

can take in that information

about straight line detection,

and and those straight lines can be nonlinearly

recombined

so that that second layer of artificial neurons can detect curves and corners.

And then you can have a third layer after that that does even more complex abstraction on the curves and corners and so on and so on. You can have many, many,

such layers of artificial neurons in your deep learning network. And each 1, as you move deeper,

can handle more complex,

more abstract

representations of the input data. And the really, really cool thing about deep learning models is that it is able to figure out what these important high level abstract representations are fully automatically from the training data alone. So you don't need to program

any of that specifically.

And so that's what's made deep learning models so popular suddenly is as we've had the compute power and,

the availability of data in the last few years,

to be training these relatively beefy models, they can then on their own extract all of these these features from the raw data and solve

all kinds of,

complex

problems.

So

that visual analogy,

in a way, you can you can imagine that then in in my line of work, in my day job at untapped, we're concerned with,

various

models related to human resources. A

really common model

is predicting the fit

of a given job applicant for a for a particular job. So

we have

clients, big corporate clients or recruitment agencies

that handle millions of applications a year to thousands of different roles. And

instead of

sifting through all of those applicants with, say, a Boolean keyword search,

our model

can

rank all of the applicants, the 1, 000, 000 applicants that you had over the last year for any 1 of the roles,

that you are hiring for. And it it does that based on the natural language of the job descriptions,

the natural language of the applicant's resume, and we've trained this up on 100 of millions of,

decision data points where,

where a client, be that a hiring manager or a recruiter has said, okay, based on this candidate profile,

based on this job description, yes, I would like to speak, to this candidate, or no, this candidate is not appropriate for this role. So by having this huge dataset and then a deep learning model

that's taking in,

that natural language from the job descriptions and the resumes at 1 end, and then this outcome that we're trying to predict, is this person a good fit or not a good fit for the rule? We then have this deep learning architecture in the middle

where the earliest levels of the architecture can look for very simple,

aspects of the natural language. And as you move deeper and deeper into the network,

we can model increasingly complex, increasingly

abstract aspects

of the natural language that is being used

on the resumes and the job description. So you could find

because of the because of the way that that works, you could end up in a situation where

2 candidates

who have none, who have no overlapping words whatsoever on their resumes could be the top 2 candidates

for a given job description because this,

deep learning

hierarchy

is able to,

distill from individual words

the the contextual holistic meaning of an entire candidate profile. So with your background in neuroscience

and your

practical applications of deep learning techniques

and your engagement in the education space of helping to upscale people who are trying to understand how to use these same technologies for their own purposes. It seems like a natural progression to

then write a book about it, but I'm wondering if you can just talk a bit more about your motivations for doing that and some of the decision making process that went into figuring out how to approach it and how the idea of including illustrations

York

that I run

and by teaching, say, at the New York City Data Science Academy,

I

developed a pretty good understanding of

what topics

needed to be covered

in order to give somebody a a wide ranging education in deep learning. So this is, you know, covering the fundamentals of how deep learning works as well as the applications that people are most interested in, which are machine vision,

natural language processing,

these this technique called generative adversarial networks that can create, what appears to be artwork,

and then as well as these game playing algorithms, deep reinforcement learning, algorithms.

So I,

yeah. So so I kind of I gradually became more and more familiar with this body of knowledge. And by teaching it to students,

I started to understand

where

they were most easily able to understand

the content and where things were tricky. And what I

found was by using a whiteboard, and this was actually something something that I've always been doing is I love teaching on whiteboards. And so drawing,

figures

that represent,

concepts. So so instead of trying

for a lot of people, an equation can be a lot easier to understand if I can draw it kind of visually in terms of, you know, how the the matrices of data are being used and transformed,

what how these operations are happening in kind of a visual way. So that's always been kind of a natural thing to me, and that and it became clear to me through teaching that this is something that works for a a large number of students. It's a it's a way that they really take to learning this relatively complex content.

So at

brunch 1 day,

on a Sunday in New York,

I was out with 1 of my best friends who has been at, at Alphabet at, working at Google or YouTube for about, 12 years.

And at the time, his girlfriend, now his wife,

Aglay Bassins,

she is a professional artist.

And I pitched her this idea over brunch of, you know, I think if we,

made this as as a book, I think if we had an illustrated approach to learning about deep learning,

this is something that a lot of people would really benefit from.

What do you think about that? And perhaps because

through her,

now husband being exposed

so much to machine learning techniques at, at Alphabet,

she was immediately

very interested. And, she was an absolute joy to work with over the entire process.

So,

yeah, that's how it all came about. And I'm curious if you can give a bit of a comparison

to some of the other books that you've encountered on the subject of machine learning or deep learning, and some of the benefits that you see your book providing

comparatively,

and some of the some of the ways that the target audience that you're focusing on

would gain better understanding or better value than some of the other books that they might be able to pick up. Perfect. So that's not a question I've been asked before, and it's an interesting 1 because

all of the books that, I guess, my book, quote, unquote, competes with,

you know, there's some there's some benefit to them,

relative to my book and,

you know, so there's just there's always some kinds of trade off,

with all of the great, books in deep learning.

The

the seminal

academic text in deep learning is called deep learning, and,

it's by Ian Goodfellow

and Joshua Bengio and Aaron Courville. So

these are these are academics,

between the University of Montreal and the Google Brain team, and they developed this, yeah, this academic textbook that covers

the the mathematical theory of deep learning very thoroughly.

However, it doesn't have any hands on examples.

So it's so that so that book is all about learning the theory, and and so our book is completely distinguished from that

by by being focused

on application.

And while we do cover the essential parts of the theory,

we do that in a way,

that is quite different from that MIT Pressbook.

So,

where we have to use equations,

they are in full color.

So we have,

so any

variables that are used in the book, their coloring is continuous

throughout the entire book and you see those colors

replicated in both the body of the text

and in the equations and in the illustrations.

And so that kind of continuity in that color,

can make it easier to understand

the underlying theory. And then, of course,

just our our having

lots of hands on applications

means that it can have a lot more value to somebody who's a hands on practitioner.

And that those kinds of hands on examples can also give you a great sense of how these things work in practice in a way that just looking at the equations and understanding the equations might not. So that's, you know, that's kind of, you know, 1 of our primary quote, unquote competitors in the space. The most popular book probably today in terms of hands on applications

is, Aurelian Girond's

hands on machine learning book. And that book, while introducing machine learning in general, also does talk a lot a lot about deep learning, which is a type of machine learning approach in particular.

And,

it is a really, really great book. His second edition is coming out, shortly, and I was 1 of the technical reviewers of that second edition.

It's it's no surprise that it's the most popular book in machine learning today because it offers such a

wide ranging look

at,

all of the kinds of machine learning approaches that that are out there,

and and is replete with,

hands on examples.

So that book is really great. And where we distinguish from that

is, again, we are focused

specifically on deep learning. So while that book is focused on being a general machine learning introduction, our book is very specifically

about deep learning. And so we can go into that in more depth than Aurelia and Jiral had time for. And then again, of course,

we do have these colorful illustrations and the way that we tie together all of the,

all of the variables,

with in full color throughout the figures

and the equations and the body text. So it is a it is a different kind of

of book.

To my knowledge,

there is no book on the market that makes use of color for

explaining, any kind of theory, any kind of, mathematical or statistical theory, in the way that we have. And actually, that's kudos to, my coauthor Grant Balebelt,

who had that insight. He suggested,

early on that wherever we have to include equations,

it would be

beneficial to the user to have those be in full color. Yeah. It definitely helps to pick apart the

most notable

components of the equation rather than just seeing everything in plain black and white because then it all just sort of blends together, and it requires a lot more effort a lot more effort to be able to parse it and understand how the different pieces are reacting with each other. Yeah. Exactly. That's the idea, Tobias. I'm glad that you, that you see it that way as well. So when we you know, when you see an equation in the book and then below it, it there's an explanation of what this equation is, You can just very quickly say, ah, well, there's the purple part of the equation and here's, you know, that purple part in the text, and you can very quickly make that connection. And then in some cases where we then on top of that say, okay. Here's a figure that kind of explains,

you know, how these pieces fit together visually,

and you can, again, just at a glance see,

across

the figure, the equation, and the body of the text very clearly. This is the purple part, and it it,

I'm I'm glad that you see the value in that too. And for the target audience of the book, I'm curious how much background understanding

of programming or statistics or machine learning is necessary,

and to what level

of facility you expect them to get to by the end of the book? So I deliberately

designed the book so that the first 4 chapters

of 14 chapters has no code

and no equations.

So the first 4 chapters of the book are intended for any kind of, interested learner. So anybody with an interest in how,

deep learning or artificial intelligence works and is interested in getting exposed to the range of applications that it has.

So whether, you know, machine vision, natural language processing,

creativity,

and,

complex decision making, regardless of which of those you're interested in or or all of them and just seeing kind of what does it mean,

to have

artificial intelligence today, or where is this field going, what's possible in my field. Anybody can get that from reading

the first 4 chapters. In chapter 5, we begin introducing Python code. And and then through the rest of the book, all the way through to chapter 14, there are,

examples in Python.

They are

especially in,

earlier chapters,

these examples are,

fairly straightforward.

So if you have experience with any object oriented programming language,

not necessarily Python,

then it should still be quite straightforward to see what's happening,

in these code examples.

And I went through,

to great lengths

to make sure that I explained in detail

every single line of code in the body of the text. So even if you're not already familiar with Python or even

if you're if this is your first exposure to object oriented programming, those thorough explanations,

should make it, possible,

though maybe not as easy for somebody who who does have Python experience to to follow along and see what's happening in these examples. So, yeah, so some Python experience or at least object oriented programming language experience would definitely make, taking in chapters 5 and onward easier. And then it's the same kind of thing for

machine learning or statistics experience.

So if you if you happen to have experience with statistics

or some other machine learning approaches like regression modeling or support vector machines or random forest or what have you or just the scikit learn library in general, then

the book would definitely be easier. But, again, I went to great lengths to make sure that I was explaining everything as clearly as I could so that even if you didn't have,

experience in machine learning or statistics, you should be able to follow along at a high level. And then I provide lots of resources

in footnotes

so that if something is

if there's something that you need to dive deeper on, you can do that on your own time. And 1 of the challenges

that exists anytime somebody is trying to encapsulate a technical topic in printed form is the idea of timeliness

and how you guard against the information becoming obsolete as new techniques evolve, new libraries come about

as the libraries themselves

evolve. And so I'm curious how you approach that particular problem and your selection process for the technologies and techniques that you decided to incorporate ultimately.

I love that question, Tobias. So that is a tricky 1. Things move very quickly

in,

the machine learning field and

expect that there will be a second edition of this book coming in the next few years that will be updated to the latest libraries, you know, the latest TensorFlow, PyTorch, or Keras libraries, or whatever is the,

invoke deep learning library of the day a couple of years from now. However,

in terms

of maybe specifics

of the particular packages that get used will definitely change.

The nice thing about deep learning

is that the vast majority of the theory is quite old already.

So

the theory around the artificial neurons that make up a deep learning network, that theory has been around since 19 fifties and hasn't changed very much. And then

in terms of actually networking

those artificial neurons together into a deep learning network, most of that theory was figured out in the eighties,

a little bit in the nineties, and then in the early,

20 tens,

we had a few key breakthroughs. But those those breakthroughs through the nineties and more recent years, they're kind of,

they tack on to the earlier theory. And so in deep learning, at least, we're not seeing

old

theory kind of wiped away entirely

and starting

with a a a completely new approach to some,

theoretical concept.

Instead, what we've

been seeing from

the 19 fifties through to today so far is that we,

build upon existing theory. And so in that sense,

I think the vast majority of the content

in this book is future proof at least for a decade or so. Like, you know, it's hard to imagine that it's possible that some completely different kind of approach will make deep learning obsolete in the coming years, but, there's no signs of that, yet. And so, you know, when I when I sit down to write the second edition a couple of years from now, I think it will you know, I'm not gonna need to rewrite all of the theory.

Instead, I'll just be tacking on more of the new techniques, new approaches that have come about in the intervening couple of years. And now that it has been published, I'm curious if there are any elements

of the topics that you covered or the specifics of the code examples that you think you would have done differently or that you think might need updating in the near future? There isn't

anything that I

look at now

that I feel

would need a complete overhaul or that I wish was done completely differently. The main thing that I look forward to being able to do as I sit down to write a second edition

is being able to add more.

This already this book is already,

a fair bit more dense than, the publisher Pearson was looking for. So they were hoping for at least,

250 pages.

The book now is, it's 416

pages. And so it does have a ton of of detailed content, but there's so much more that I would like to add. And so,

there is yeah. There isn't really anything that I would like to do differently.

I I just look forward to having the time to add in even more information, and that's also the kind of thing that we saw with, Aurelien Gerald's book, which I mentioned earlier.

So that first edition

already was so comprehensive,

as an introduction to machine learning.

But with his second edition, he was able to add in even more, detail on so many different topics and make a much thicker book. So I look forward to being able to do that with my second edition as well. And you covered a few different problem domains where a deep learning can be applied, such as natural language processing and computer vision, which you mentioned, and generative adversarial networks. So I'm wondering what your selection process was for the specific problem domains and how you approached determining what the sufficient level of depth was to be able to

cover it

appropriately so that the reader could get a good understanding of it without

So

the So the initial seed

for what content went into the book was,

the content that we were covering in the deep learning study group that I run. So at the end of every study group session, our final

agenda item was always, alright. Now let's talk about what else we should be learning. What should we be learning for next time or what should we be putting on the list

for learning at some point? And so these particular applications,

computer vision, natural language processing, generative adversarial networks, and deep reinforcement learning for complex sequential decision making, these 4 areas

stood out as clearly the the most important areas. So that's how I came up with the initial list of kinda high level topics to cover in the book. And then in terms of how much

for every 1 of those topics,

there is at least 1

very deep and detailed,

hands on code example. So for some of these techniques, like generative adversarial networks or deep reinforcement learning, you know, those are those are the 2 most complex topics covered in the book. And so for those 2 topics, having 1

thorough

code notebook example and covering that from beginning to end was more than enough material

for the reader,

in my opinion. For the other topics for machine vision and natural language processing,

those are, topics with a lot of different things with a lot of different things that we can be doing in them. So, you know, with machine vision, we can be classifying images as being in a particular category. We can be segmenting images

into,

you know, pixel by pixel into what the different elements of the image are.

In natural language processing,

there's a huge variety of tasks that could be handled.

So classifying documents,

auto generating,

content,

translation between languages, chatbots.

And so some of those start to get way too complex to cover in this kind of overview book. So things like machine translation or chatbots,

Well, everything that we cover in the book serves as a great foundation for those applications. You know, they really need they would need whole chapters,

to cover properly.

And so,

with those machine vision and natural language processing topics, what I did was I have several

complete thorough examples of kind of intermediate complexity

topics. And then I say, hey. If you're interested in these even more,

complex topics,

here,

you know, are a few paragraphs that summarize what's possible today, and here are links to the key papers and GitHub repositories

so that you can go off and and learn about those things on your own. And for that natural language processing topic in particular,

because that is what I do,

at my day job at untapped, It's of particular interest to me, and and it's I also noted that it's of particular interest to,

to readers because,

I teach online, you know, Riley Safari twice a month. So I do a 3 hour tutorial,

And and, you know, I I do I do, lectures around New York to various meetups and conferences. And at at the end of each of those kinds of venues, and some of these have hundreds of people in the audience, okay. And what are you most interested in learning about next? And I say, is it machine vision? And some hands go up. Okay. Is it generative adversarial networks? And some hands go up. Is it deep reinforcement learning? And some hands go up. But when I ask, is it natural language processing or is it handling time series information? Because natural language is just an example of time series data,

because it you know, whether it's words on a page or audio of speech, it flows in 1 dimension over time. And so,

it's that topic that you see a huge number of hands go up. So I know that that's of huge interest, and I and so my my next book actually, and I have a verbal agreement with Pearson on this already. They're awaiting my, you know, my full proposal. The my next book is going to be focused,

entirely on natural language processing. And so that will give me the opportunity to expand more on that particular topic. And 1 of the other things that I liked in the way that you structured the book is that at the tail end of it, you had some examples of what else you can do and ways that you can continue your learning and some different project ideas or categories

that the reader can engage with. And I especially liked the fact that you were

encouraging people to do things that will have beneficial social impact and some resources for them to be able to find ideas for that and engage with different organizations that would benefit from that technical acumen. I'm I'm really glad that you enjoyed that part of it. For me, that was a really important,

chapter to write.

And

the ideas behind that final chapter

were were spurred by

my experience largely teaching this content at the New York City Data Science Academy. So, you know, it's this 30 hour curriculum that I do over 5 Saturdays,

and,

it's this textbook really is is the accompanying content

to,

those lectures and exercises that I do over those 30 hours,

at the academy. So,

I knew from

from doing that teaching that

what students want to be able to do is not just be able to go through

the examples that you've done in class. People want to be able to devise their own projects. They wanna be creative with deep learning. They wanna be able to apply deep learning to their particular field of interest. And so that chapter,

that final chapter comes out of my experience,

mentoring students on developing their own deep learning projects. And so a a big part of that 30 hour course from the very beginning, from the 1st week, I say, okay. You know? You don't have to do your own self guided project, but it will really help you cement the ideas that we're covering in in this course. And so I highly recommend that you do that. And from the very first week,

I have a framework

for,

initially ideating

and then later concretizing a particular project and executing upon it over over the course of the course. So that final chapter is is the kind of is is that process

where I outline, okay. You know, if you're not if you don't have a particularly

creative idea of something that you'd like to do, here are some here are some relatively easy ideas off the shelf datasets you can use. If you wanna be doing something with your own data, here are some tips for doing that. If you want to be,

exploring

more complex datasets or scraping your own datasets off the web, here are some resources for doing that. And then the final piece there

about the kind of social impact,

that you can have and and and making that clear,

you know, that's just something that,

you know, I didn't have to be including that, and I I'm not aware of many other textbooks that kinda make that,

social impact summary or recommendation at the end. But for me as a relatively young person

at this time in the history of our planet, we have

terrific

opportunity in in so many quantitative ways. Life has never been better on this planet for humans, at least, where, you know, in terms of lifespan and quality of life,

we live today,

in a way that,

kings

a century ago couldn't have imagined. And so in on the 1 hand, this this is you know, we we should definitely be happy and positive about where we are in the world, but there's also a lot of uncertainty

about where we are in the world.

There are far more people on this planet than there ever have been in history, and each 1 of those people

is,

constantly demanding more and more energy and resources. And so, you know, the the burden that we are placing on the ecosystem of our planet is tremendous,

and it looks like it's going to become more and more and more. And so machine learning

combined with the Internet of things,

so the kind of the the prevalence and cheapness of sensors being everywhere, in my view, it has the potential

to allow us to continue

the wonderful

trend that we've had over the last 150

years towards prolonging,

human life and making human life more satisfying than ever before while at the same time,

allowing us to coexist

peacefully and indefinitely,

on this planet. So, yeah, so that's that's the kind of a bit of inspiration behind,

suggesting that people tackle social impact projects, and and then I include,

resources in there for,

you know, if you're looking for something to do with your time or your machine learning skills,

here are some serious problems that we're facing today that could be worth focusing your attention on. And

1 of the things that I'm curious about

in terms of people coming out of this book or your courseware

and having the fundamentals of being able to build these neural networks, and they have built out some sample models is

the ability

to repurpose some of that same code or some of the model pipelines for different applications or different datasets. Because my understanding is that

1 of the critiques of deep learning is that it is largely single purpose in terms of once you build a single model, it is great at that particular use case, but it is

generally difficult to be able to repurpose that to a slightly different context. And I'm wondering what your experience has been in that and some of the recommendations that you have for practitioners and engineers

to make it easier to be able to componentize the model pipeline to make it more reusable and more flexible? Outstanding question, Tobias. So neural networks are interesting

in that they are

actually

highly flexible and can be retrained

to particular tasks. So

while any given deep learning network at any given point in time might be very highly specialized to a particular task, you can use what we call transfer learning to take that existing network and repurpose it to some related task, quite effectively. So, in chapter 10 of the book, we go over a machine vision example where

we take

a very deep neural network

that was trained on a huge dataset of millions of image net images called the ImageNet dataset, and it would be very expensive computationally.

You know, it would take,

weeks on a on a high end server,

deep learning server with GPUs

to train up such a deep machine vision model on such a large dataset. But with modern,

deep learning frameworks, including,

the Keras API

in TensorFlow, you can trivially easily in a line of code use that

that deep, very nuanced

machine vision model, load it in, and then you can adapt

that model to your own particular use case. So

in the textbook,

what we have it doing is in deep learning illustrated, we have it. We transfer learn in order to be able to distinguish,

images of hot dogs from images that are

not hot dogs, so other types of fast food. And, that's a a funny idea inspired by the HBO Silicon Valley series where 1 of the characters on that show builds a hot dog, not hot dog detector. And,

and so so in that sense, deep learning models are, quite flexible. Now, of course, you can't take a model that was built for a machine vision task that reads in pixels and outputs what you know, whether it's a hot dog or not and take that and and have it be reading in resumes and job descriptions and predicting,

whether, you know, a given person is a good fit for a given role. So there's, you know, there's a limit to to what this transfer learning can accomplish. So I don't I guess I I overall kind of don't agree with the point that that deep learning models are are quite fixed or or not usable for other purposes,

that they actually are,

quite easily repurposed

to related

kinds of tasks and that this kind of transfer learning can be a a very powerful,

thing to do. And in fact,

to just give 1 final example, with the work that we do here,

you know, our core model, if if you think about it that way, the only kind of mat the only model that we've applied for a patent for

is this job to candidate,

matching algorithm, which we've trained on 100 of millions of data points. But many of our clients are interested in other human resources or recruiting

related models. And so often what we do is we take that that starting point, this this beefy

model trained on a huge dataset, and we can repurpose parts of it,

for,

other human resources related tasks like matching,

a candidate to a pool of of other candidates, for example. And another issue that I've seen identified

with machine learning in general, but particularly with deep learning,

is the fact that it is excellent

at creating

correlations,

but that it is difficult to be able to model causal inference about the results that they produce. And I'm wondering what your thoughts are on that and any useful references that you might be able to point people to to maybe dig deeper on that particular problem space. It is definitely

a shortcoming of deep learning models today. And, indeed,

most statistical or machine learning science

is that, you know, the vast majority of techniques

that exist for modeling data,

whether, yeah, it's statistics,

deep learning, or some other machine learning process. For the most part, these techniques are great at identifying correlations, but have little

capacity,

if usually no capacity, to say anything about causal direction. And that isn't the case with everything,

and actually a big part of my PhD was using particular,

machine learning models and and Bayesian statistical models

to infer causality,

which,

in the case of my PhD research was, in some cases, relatively

straightforward because

you could say, for example, you know, somebody's

if if you find a correlation between a gene

and some behavior,

well, we know that genomes are fixed over a person's lifespan except for random mutations. And so

there's no way that,

the causal direction could be from somebody

being anxious

and that causing them to have,

their genes change in a way that, you know, the genetic sequence can't change in a way that then is more likely to be the profile of somebody who's anxious. So in some ways, you you know, so there are some problem spaces where you can define causal direction based on your knowledge

of the data.

But, of course, the model itself is not aware of that, underlying understanding. So, yeah, Gary Marcus

is a researcher at, New York University who,

has called up a lot of, shortcomings of deep learning models today, and and 1 of the big ones is this,

causality

this this this inability

to infer causality. I don't actually have specific resources

on how to resolve that. Gary Marcus might, and I and I cite a Gary Marcus paper in chapter 14 of my textbook that probably could point you in the direction of,

resources on on trying, you know, to to to pursue models

that,

that have more causality,

in them. And and, actually, I can well, so 1 resource I could point people in the direction of is, there's an author named Judea Pearl,

j u d e a Pearl, like pearls from an oyster. And,

Judea Pearl has read extensively

on causality, and he has several books on it, including 1 called causality, and and that might provide people with some techniques for,

identifying causal direction with the data that they're working with. But, yes, as

deep learning models, you know, as they typically stand, there's there's no capacity to infer causal direction, and that is

1 of the shortcomings

that we'll have to overcome,

as Gary Marcus himself points out, in order to bridge the gap from the,

narrowly defined artificial intelligence systems that we have today, which are, say, able to identify

what's in an image accurately and expanding to a general intelligence

that is more

like the broad intellectual capacities

that you and I have as human beings. Yeah. So there's a huge amount of work

required in that space,

and I imagine there's going to be tens of thousands of,

deep learning engineers

over the coming decades tackling that problem. And what are some of the other limitations

of deep learning

as a particular practice and some of the cases where you would recommend against using it? Another really great question, Tobias. So in the vein of

causality,

being a difficult thing for deep learning algorithms to identify, in the same way, it is,

it is often difficult to

understand,

to explain

why a deep learning model has made a particular decision. And so if you use

a a linear regression model to solve a particular problem, then,

you know, your regression model might have a dozen inputs, and each of those dozen inputs is associated with a very specific weight. And we can we can then for any prediction that that regression model makes, we can look at those weights and say, okay. Well, the reason why this linear regression model made this prediction

is because of

factor x, y, and z, and and those factors x, y, and z are weighted by these very, very specific amounts that the linear regression model has has identified. In deep learning, because we can have,

1, 000, 000 or even billions of parameters

in our

networks,

it can be

difficult to put your finger on exactly why it has some particular output. And so some people talk about, deep learning models as being a black box because of that. Now there is quite a bit of research being done on on explainable AI, quote, unquote, which kinda gives us some insight into what is happening in the black box. And because of my work at untapped, building these human resources models, this is something

that all of us here are are very familiar with. Because when you're building a model that can

recommend

a particular individual for a particular role, it's absolutely imperative to our clients

that they

know that that isn't happening based on some demographic factor like gender,

age, or race. And so

it is possible

to begin to,

distill

the important parts.

You know, so if you're building

a model to to to predict

the applicability of a given candidate to a given role, for example, 1 thing that we've done is we say, okay. We have this big pool of

female applicants

for a role, a big pool of male applicants for a role. What are their relative

scores? How did they score on average for

this role? And we see that with our model and and the modeling process that we followed, okay,

the

regardless of gender, the the distribution of probabilities looks identical.

So

although although we might not understand

how every single

neuron amongst the millions of neurons in our artificial neural networks are behaving in order to produce the outcome that we're creating, we can be mindful about what what training data we're using to train the network and remove,

bias or or problematic data from the inputs. And then also after we've done training, we can then do these kinds of tests like that 1 I just described to make sure that, you know, males and females are getting the same,

scores for a given role,

to ensure

that these precautions that we took

have been effective in preventing bias. And so that even though there's a black box, we we understand its behavior sufficiently that we're comfortable using it. And in terms of the ongoing research

or applications

of deep learning and the direction that the field is going, I'm wondering what you're most excited by and what you're personally concerned by. So the it's easy for me to answer the most excited by question because I kind of already answered that, earlier on in this podcast, which is that the thing that most excites me

about,

deep learning is its capacity

to allow us to continue

to make exponential strides

in,

human quality of life and maybe even quality of life for other animals on the planet while simultaneously

avoiding

an ecosystem

catastrophe. So that that potential

for machine learning techniques and particularly deep learning techniques

is what I'm most excited about.

Beyond that, which are applications that, I can foresee,

being possible over the coming decades, beyond that, there is this possibility that and and it's only a theoretical possibility

that we can engineer

an intelligent system that is as intelligent or more intelligent than a person or any group of people. And that's potentially exciting.

And, you know, people call that the singularity.

So Ray Kurzweil, I think, coined that term,

to refer to that moment where we build a machine that is more intelligent than humans.

And, you know, that's kind of exciting in a way too, although

that brings about a a huge amount of fear as well because

we have no idea,

how such a system will will treat humankind or how we will interact with such a system. It's impossible for us to even imagine it. So there are some great resources if you're interested in in kind of thinking about that problem. You've all Noah Harari, who is most famous for his book, Sapiens,

also has a great book called Homo Deus,

which is

a Latin term he coined referring to so if Homo sapiens is thinking man, then Homo Deus

is god man,

and

and he and a big part of that book is is talking about

what could happen

and how we might be treated

by

an an an intelligent

life form on this earth that we create that is much, much more intelligent than us. And for kind of that's kind of a dense read. If you're interested in a in a relatively

quick introduction to that topic,

Tim Urban,

who, writes the blog Wait But Why,

and who, really kindly gave us an endorsement,

for Deep Learning Illustrator that appears on the on the back of the book and and inside the front cover.

He does

a great long form series of blog posts. It's 2 blog posts that cover what artificial intelligence is today

and what could happen as we approach or go past that singularity.

So, yeah, so that's so that's the so in the short term, you know, over the coming decades, the thing that I'm most excited about is is the capacity for machine learning, deep learning, and the Internet of things to,

to to make,

life on this planet more,

wonderful and peaceful than ever before, continuing the trends that we've seen over the past century. And then in the longer term, I'm in equal measures excited and afraid what could happen as,

if the singularity happens. Are there any other aspects

of the field of deep learning or your work on the book that we didn't discuss yet that you'd like to cover before we close out the show or any other parting words that you'd like to,

give to the listeners and potential readers? That's a great opening, Tobias. Nothing actually really comes to mind.

I got to talk about so many of the things that excite me most about deep learning,

in our conversation today, and I even got to talk about, you know, the social impact concepts a couple of times. Now I'm really satisfied, Tobias. I I really enjoyed this podcast today, and,

I hope, your readers,

take away some interesting tidbits from it. Well, for anybody who wants to get in touch with you or follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And so with that, I'll move us into the picks. And this week, I'm going to choose a website called spurious correlations.

And

so it's just a list of different datasets that correlate but are obviously

not causally related. So 1 of the examples is that the divorce rate in Maine correlates with per capita consumption of margarine. So it's just a list of different charts with hilarious

correlations that are obviously not causally related as a sort of,

warning to not read too much into

the fact that 2 datasets happen to relate to each other and give you a second to think twice about that. And so with that, I'll pass it to you, John. Do you have any picks this week? That is a really fun website. I've come across that before.

A recommendation I have, that I use primarily to keep up with,

innovation in data science and deep learning in particular, there's a great newsletter called Data Elixir,

elixir.

And,

Data Elixir is it's a it's a 1 man blog that has, yeah, between half a dozen and a dozen,

articles in it each week.

And I have never come across anything that so

succinctly captures everything all of the major events that you need to keep an eye on, in the world of data. Well, thank you very much for taking the time today to join me and discuss your work on the book and your experience both working in and teaching deep learning. It's definitely a fascinating field, and I've enjoyed the time I have spent with the book, and I definitely plan to read it in its entirety. So thank you for all of your efforts on that front, and I hope you enjoy the rest of your day. Awesome, Tobias. It's great to hear that, and it's been an absolute pleasure being on your show. I have never come across such a thoughtful

and thorough list of questions. So thank you very much for the time.

Thank you for listening. Don't forget to check out our other show, the Data Engineering Podcast at data engineering podcast.com

for the latest on modern data management.

And visit the site of pythonpodcast.com

to subscribe to the show, sign up for the mailing list, and read the show notes.

And if you've learned something or tried out a project from the show, then tell us about it. Email host at podcastinit.com

with your story.

To help other people find the show, please leave a review on Itunes and tell your friends and

coworkers.

The Python Podcast.init

Summary

Announcements

Interview

Keep In Touch

Picks

Closing Announcements

Links

The Python Podcast.__init__

Summary

Announcements

Interview

Keep In Touch

Picks

Closing Announcements

Links

The Python Podcast.init