Leveling Up Natural Language Processing with Transfer Learning

Hello, and welcome to podcast dot in it, the podcast about Python and the people who make it great.

When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode.

With the launch of their managed Kubernetes platform, it's easy to get started with the next generation of deployment and scaling powered by the battle tested Linode platform,

including simple pricing, node balancers,

40 gigabit networking,

dedicated CPU and GPU instances, and worldwide data centers.

Go to python podcast.com/linode,

that's l I n o d e, today and get a $100 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show.

Are you bored with writing scripts to move data into SaaS tools like Salesforce, Marketo, or Facebook Ads?

Hightouch is the easiest way to sync data into the platforms that your business teams rely on.

The data you're looking for is already in your data warehouse and BI tools. Connect your warehouse to Hightouch, paste a SQL query, and use their visual mapper to specify how data should appear in your SaaS systems.

No more scripts, just SQL. Supercharge your business teams with customer data using Hightouch for reverse ETL today. Get started for free at pythonpodcast.com/hitouch.

Your host as usual is Tobias Macy. And today, I'm interviewing Paul Ozunre about using transfer learning for natural language processing. Everyone. My

name is Paul Azunre, and I'm here to talk to you today about natural language processing. I work in the field. I work in the areas of and

I work in the field. I work in the areas of

NLP for low resource languages, and this is where my interest the problem comes from. And do you remember how you first got introduced to Python?

Yeah. That's a funny story. I actually didn't use Python

until

very recently.

My dissertation was written

mostly in Fortran

some c plus plus

and I used a lot of MATLAB

in my undergraduate studies.

But during grad school, I was slowly introduced to Python because I was teaching a class,

6 double o 3 at MIT. I was a TA, and the professor wanted to use Python.

So this is how I first got introduced to it. And then recently, of course, you start working on machine learning. You can't avoid it. So

And so now that brings us a bit to what you're working on now in terms

of transfer learning and natural language processing. Before we get too much into that, can you start by giving a bit of an overview about what transfer learning even is?

So transfer learning is the idea that you do not have to start learning things from scratch

anytime you need to solve

NLP or computer vision problem or any machine learning problem for that matter. If you are starting from scratch, you are likely to need a lot of data and a lot of resources to

achieve a certain level of performance.

However, with transfer learning, you try to

leverage existing knowledge that you might have in order to simplify a task, which is very similar to how human beings learn.

And you could say

it, human learning is an inspiration for this field

because a human being never learns from scratch,

tries to make associations with other things they know, and this helps them to tackle new challenges with very little training

comparatively.

And so in terms of its application to natural language processing,

I know that

it started off a little bit in terms of computer vision research and particularly with the

introduction of deep neural nets and deep learning, you know, a few years ago.

And I'm wondering

how it's being applied to natural language processing and what the sort of translation

process looks like from the computer vision transfer learning approaches to how it's being applied in natural language use cases?

Many people would argue that computer vision was the inspiration

for transfer learning and NLP. So this method has matured,

you know, over the past couple decades

were being used heavily in computer vision, not so much in NLP.

NLP,

the way it manifests itself is a little different in terms of the kind of deep learning architectures that are used heavily. So in computer vision, it is convolutional neural networks. And so the way you train it is very specific to that. Right? Because the last

change from very general to less general as you go through the stack.

In natural language processing, we have a lot more

use of

recurrent neural networks and now transform architectures.

It looks a little different, but a lot of the key ideas are the same. So

you start with a pre trained model

that was trained on a very general data sets

that somehow captures the kind of knowledge the model needs to have in order to

solve problems in this, you know, domain. So So in the case of computer vision, this was the ImageNet dataset.

In the case of natural language processing,

there are a few options recently, but the most prominent 1 is probably glue or super glue.

It includes tasks such as

the common sense reasoning tasks. Like, is this grammatically correct?

Is this question similar to this other question?

You know,

a pronoun in a sentence, which 1 of these possible options does it refer to? So these are very general skills that the model

learns

to do based on training on this general big data set.

And then this is followed by a fine tuning step

where this general knowledge is adapted. So in computer vision, you may download the model that was trained somewhere,

very large model, and only fine tune the last couple layers in it. The same way in the case of NLP,

you may download this model and fine tune

the last

few layers of the encoder in addition to any specific new layers you have for a new task you are trying to solve. A very specific task like named entity recognition or so on. So just to summarize, the ideas

transfer learning for NLP is referred to as the ImageNet moments

of NLP

to draw the similarity. So there are a lot of similarities,

but the implementation

details,

when you get really in the weeds, different, but the high level is very similar. So in terms of your actual

exposure to transfer learning and natural language processing, as I was researching this interview,

I noticed that for your PhD, you were focusing on things like optimization problems and how that applies to different areas of optics. And I'm wondering

how you went from there to your current focus of working with natural language processing and transfer learning applications and also a book that you're writing on the topic and just sort of what was your motivation for going down this path and, you know, acquiring the level of expertise that you're currently at? My journey started,

as you correctly pointed out, in mathematical optimization,

which is a field that is related to machine learning, but has a very different focus.

So I would say, at least, there is a lot more theory that's been done. That's very mathematically robust

and well understood.

So there are spaces where this problem is set in a very theoretical way,

and there's a lot of activity around that. There's also an applied side to it, which I would say not nearly as

widespread

and applied

in industry as machine learning is. And so

in that sense, there's more stress on theory, at least the way I was introduced to optimization

than machine learning.

However, machine learning is an optimization problem. Right? You are trying to fit a model to some data, which you formulate as an optimization

you know, problem. Stochastic gradient descent is just 1 example of an optimization algorithm. So there is similarity that between these fields that helps you move or if you like, transfer

between 1 field and the other. The way it happened for me was, I would say, pure coincidence, I was working in the supplied

optimization problem

in optics and using these algorithms to build good solar panels.

And then, you know, I tried to start a business in that space. At the time, it wasn't like a good time in terms of the economics

to do work in that area.

So I had to shift my focus.

At some point, you know, I got a job. At some point, I got laid off from the job. And I landed at a startup that was trying to fight this information. This is somebody I had met through my entrepreneurial experience

who, you know, I was like, what? The first employee of this company. This company called Yondr,

it's still around.

That fights it it fights disinformation.

My job there was

to

work on the DARPA projects.

And these DARPA projects were related to this disinformation problem in the sense that we were working on NLP and trying to detect nefarious information, nefarious activity

by analyzing content on social media and stuff like that. So this is how I first got into NLP. Now in that context,

we were using transfer learning around the time that the first model for

transfer learning NLP many point 2 was coming out, which is Elmo. Around the same time, we were doing something very similar in a different context. We were using

simulated data to reduce

the requirement for actual training data we needed to solve a particular problem. So we would simulate the data. We would train our model on the simulated data first. This would get us maybe 70% of the way. And then we would fine tune, like, a few hundred examples for each category

of actual labeled human labeled data. Whereas, if we didn't do that, we may need tens of 1, 000, which we didn't have the time or the money to do. So this is how I got introduced to the problem.

At the same time, you know, I'm a person from

Ghana. Ghana is a country in West Africa.

And a lot of Ghanaian languages, I would say even African languages, are considered to be low resource

in the sense that there is not enough, you know, tools or data to train models, these kinds of models, to solve tasks in those languages. Translation tools don't exist and so on and so forth. And, of course, it helps to leverage transfer learning in this space because you are able to take maybe an English model and adapt it somehow

to

your language. And this makes it easier

to solve the task than, you know, if you didn't do that. So I started thinking about this, and I couldn't find,

you know, once I started doing this, I couldn't find

references

that's in my opinion

had the right balance between application and theory. Right? So the field is very theoretical.

It's still empirical, but it gets very detailed and

specialized in the papers on archive. Right? And so you don't really need to understand 100%

the birth paper to apply birth. Right?

You may need to understand

some key concepts

and then that allows you to start making impacts. So this is what this was the idea behind writing this book, building intuition for the problem, understanding the key ideas

behind the problem without getting into, you know, proving things

and the mathematics and notation and all of that.

Code examples that work rather than, you know,

very well understood theoretical examples. Right? So that I could just take that code

and change the path,

you know, and tune some parameters, and I'm I already have something I can start

doing real things with. Right? So this is the purpose for the book. It's more applied

perspective,

perspective that makes it more appealing

to somebody without very deep, you know, expertise

in machine learning

and theoretical academic machine learning to start using these tools to make an impact. You were describing how the

applications of transfer learning, 1 case is for these low resource languages as you put it, you know, languages that don't have enough of readily available textual data for being able to train models on or languages that only have a very small population of people that speak them. So, you know, maybe languages in terms of some, like, Native American tribes that only have some of the elders who are still speaking it and was never really a primarily written language. And I'm wondering,

in addition to that, what are some of the other applications of natural language processing that are either difficult and impractical or intractable without using transfer learning as the basis to work through them as opposed to using some of the more, quote, unquote, traditional approaches to natural language, so doing things like word2vec

or, you know, TFIDF and things like that. So I think language generation

is probably a major example of a major answer.

So you've probably heard of GPT 3. Yes. So its ability to say write poetry. I did an experiment where I wrote some poetry, and I put it on my social media.

I got a lot of very positive comments. But it was just GPT 3, you know, I guess, replicating some patterns

of data it was trained on. Alright. So this is an example. I mean, is that useful? Somebody

could use that, you know, in a creative sense to, like, stimulate ideas. I don't know. Somebody could use it for writing or, like, just, you know, I guess, generating ideas,

more practical applications of the same technology. Like, for instance, generating code from a mere description of it. Because GPT 3 was trained on basically

a lot of code examples, let's say, in Python.

You can ask it, you know, generates,

you know, a Python function

to featurize

something using Web2Vec.

And, surprisingly, it's able to do pretty well. In many cases,

generate

very good functional code,

especially if it is fine tuned for that task. Right? Not the general model, but fine tuned for that specific task. I would say that's something that

almost feels like sci fi. Right?

It's not it's not we need to stress. It's not completely eliminating the human from the picture. It's just giving the human

a multiply effects on their productivity.

It's not gonna write

it correctly a 100% of the time.

Right?

Probably that code is going to just be a template that you start from. So maybe instead of copying it from Stack Overflow, which you used to do, now you start with GPT 3, which just kind of gives it to you without fewer clicks. And I don't know. Maybe you can voice search it or something just to multiply your productivity. It's not eliminating you. I need to stress that because some people are afraid of that. We probably can't even predict

what people are going to do with it tomorrow.

At the beginning of this year, in my work at the organization called Ghana NLP, where we are trying to do this stuff for Ghanaian languages,

couldn't have imagined 6 months ago what we are able to do today with voice.

So voice automatic speech recognition.

Automatic speech recognition is when it transcribes the speech.

Right? Nothing like this exists. For Ghana languages, there are some academic examples in the papers somewhere.

There isn't, like, a well deployed method that's used. We were able to build it with just a couple hours of speech using

a pre trained model that was pre trained on a lot of different languages to learn some common things across them. And then add a couple hours of speech in our language. Couple hours. That's 1 person can do that. If they're sufficiently motivated

to put their own language on the map, they just need a couple hours. Think about it. So I expect

that, you know, within 6 months, all of a sudden, something that we couldn't even imagine was possible will become quite possible for basically

almost all the languages out there. I think that's pretty dramatic.

So maybe in 5 years, we'll actually have a real live Babelfish that we don't necessarily have to stick it to our ear, but we can actually use, you know, in practical conversations. I know that there are some sort of proof of concept

implementations of that where that Google has for being able to, you know, do live transcription from, you know, 1 language,

translate it, and then, you know, text to speech back to the other person to be able to have sort of a halting conversation. But, you know, maybe in 5 years' time, we'll actually be able to have that be more fluid with more languages supported. Have you tried Google Translate lately? Only the textual version, but not on any sort of actual phrases.

The speech recognition is pretty good

already. For the large languages,

already, I would say

we are there. I mean, you have to do a couple clicks. It's not like a, you know, well integrated smooth workflow. So maybe that's what we are waiting for. Like, some engineering solution like that. And there are challenges there that I probably don't even know all of them. Just about, you know, there's noise

in the environment.

So, like, the model we built, if you're trying to teach the person how to speak the language correctly,

it's a great tool for that. Because they will announce it to teach you how to enunciate. Otherwise, it won't recognize it correctly. Because the person who recorded it was a linguist

who was trying to do everything correctly. But come on. We all know that's not how we actually speak in practice. If you want to build a system that we use while we are playing video games, we are not gonna sit there and not say it. Right? So we need this noisy,

more robust,

you know, more real world

transfer learned maybe from this more formal model.

And then, you know, test it

in real settings because there are challenges that come with that. And we are doing some of that, by the way. Digging more into the actual practical aspects of applying transfer learning to NLP problems, can you just talk through the sort of high level steps that are involved in

taking a base language model and translating it to be applicable to a particular problem domain that you're trying to solve for? So I would say there's the 2 main steps

was captured very well by a method known as ULM fits

in transfer learning for NLP.

And there are 2 main

steps here. So the first step,

of course, when you say base model, just so that our listeners

understand what that means, that's the model that was trained pre trained on the large corpus of, say, multiple languages or large corpus in a particular language that we are now going to adapt to a new problem.

So usually, the first step is fine tuning that base model

on supervised corpus of data

to make to adapt it for the data new data distribution,

if that makes sense. So

that data

that the base model was trained on was

data that was supposed to capture the entire all the possibilities

out there. Right? In terms of what it might have to recognize and do.

So it may be able to understand all business sectors,

but at the expense of, you know, loss in accuracy,

slight loss in accuracy. You can take that model and adapt it to, say, financial data

or text from a financial domain or financial news to make that model more specific to your use case. That usually helps. The second step is fine tuning the downstream task layers. Just an additional layer you place on the base model that then maps, you know, the vectors coming out from the base model into, say, a classification problem, like a softmax,

where it's picking between a couple of categories.

Or you're mapping it into some other

activation function to do a regression problem. Right? So that's like a downstream task or task specific

fine tuning step, where usually you have a small amount of supervised data or labeled data

that then,

you know, give you very good performance for your specific task. Yeah. It definitely does. And then in terms of the specifics of actually taking 1 of these pretrained models, and earlier, you're mentioning sort of removing a couple of layers of the network and then rebuilding additional layers on top of it to fit your particular domain. I'm just wondering, getting into the bits and bytes of it, what does that actually look like where you have this binary object that's this pretrained model?

What's actually involved in being able to then say, you know, back out the last 2 or 3 layers of your neural net and then replace it with these other network layers, you know, in terms of, like, actually implementing that. What does that look like as far as what is the binary object of the model that you're working with? How are you able to tell it, you know, these are the layers that I want you to remove. And then from, you know, a machine learning perspective, I know that a number of people who work in this space will understand sort of building these additional neural nets. You know, it's essentially the same as working from scratch. It's just that it's a much shallower network that you're appending to the end of an existing network. But, you know, what are the steps of unwinding and rewinding those different layers?

I think it depends very much on the framework you are using to do it.

So if you are using, like, Keras or TensorFlow, there's a process. And we have examples in the book that take you through that. But basically, you know, you can take

your original intensive flow. You can take your original

model description,

load

the weights, and that will put the weights into it, and then remap

some of the outputs to new outputs. Right? So if it was a base model, it was probably producing

and it was but it was producing a 768

dimensional

feature

vector. Right? So you can take that and create a new model, which, you know, maps that to the softmax for your classification problem. That's pretty much all. And at that point, you have

to call some functions to freeze

the layers you don't wanna train and on freeze the layers you do want to train

in tools such as

Hugging face transformers. It's even easier than that So, you know, you load a class that represent a model like,

but language model.

I don't remember exactly, but for language modeling. That's 1 class they have. So you load the pre trained model into it, and then you can save that model, say, locally.

Right? In whatever form. They even have ways of saving it as a TensorFlow save model. You can save it as a PyTorch model. You can save it as a TensorFlow model. Right? Then you can take another

class, let's say, but for sequence classification,

which now has the new architecture you're going to use, like

classification.

Right? So you don't have to do any mapping. You just use a different class. You tell it how many outputs he wants and what the activation function should be. And it has been built to understand that, okay, I'm loading a base model into a task specific model. So the base model goes in the encoder,

and the rest is, you know, the new stuff you have to train.

Right? And so, again, in that space, you can either use the TensorFlow format to freeze and unfreeze if you are using the TensorFlow back end, Or you can use the PyTorch

syntax to freeze and unfreeze weights

as needed. So

there's some coding.

We've all been asked to help with an ad hoc request for data by the sales and marketing team, then it becomes a critical report that they need updated every week or every day. Then what do you do? Send a CSV file via email? Write some Python scripts to automate it? But what about incremental sync, API quotas, error handling, and all of the other details that eat up your time?

Today, there is a better way. With Census, just write SQL or plug in your dbt models and start syncing your cloud data warehouse to SaaS applications like Salesforce, Marketo, HubSpot, and many more. Go to python podcast.com/census

today to get a free 14 day trial and make your life a lot easier.

And we've been talking about a number of different models that are fairly well known. So there's BERT, there's Ernie, there's Elmo, GPT 3. I'm sure that there are a number of others out there. I'm just wondering, as you say, you know, I have this particular problem that I'm trying to solve for. You know? I want to be able to

build a, you know,

language understanding

or I wanna be able to do sentiment analysis, but I wanna be able to do it on a, you know, low resource language. And so how do I then determine what is the best base pretrained model to work from or, you know, do some type of

economic modeling using

news sources from Yahoo News or something. How do I go from, you know, across these different problem domains

and understand what are the best pretrained models to work from. And then, you know, once I say, okay. I've got this model or maybe there are 2 or 3 different models that have different aspects of what I want to be able to build from. Are there ways to be able to synthesize across those models to be able to make use of some of those different capabilities?

So I think the answer, of course, it depends on

the problem statement. You you gave some several different ones. So if you are interested in classification,

you're definitely going to try BERTs,

And you are probably going to try some smaller versions of BERTs, which are the still BERTs, tiny BERT.

Versions of BERT that were made smaller and faster, maybe at a small loss in performance. Right?

Models like birds, these transformer based models are very good at classification problems. I guess with some experience, if you, say, read a reference, which kind of tries to cover

some of the strengths and weaknesses of the models, and it doesn't have to be a lot. Like, the final chapter of my book tries to do that. There are many other references that are good. Maybe even, like, 4 pages, like a recent survey of recent architectures and why they were built and what their strengths are. So but it's gonna be good for classification. Let's say you are or sentiment analysis is a type of classification. Let's say you're interested in language generation.

Right? That immediately means something like GPT 3 or 1 of its alternatives out there. A generative model. Right? The generative models, you can also use them for classification.

Right? They do produce a set of vectors that are useful for classification. In fact, when gpt 3 came out, it beats all the records in that

domain.

But over time, data models took over.

So

it does seem like the generative models are better for generating language

price. So if you are trying to write poetry or, you know, generate a blog

or generate code,

right, you definitely need a generative model. That's where you're gonna start. If you are thinking about a different language, like a low resource language for which there's no data,

then you need a multilingual variant of 1 of these models. So a lot of them have been pretrained in a lot of languages simultaneously

in order to learn the cross lingual commonalities

and to generalize to new languages. Right? So, of course, if you are interested

in new language, you will start with a multilingual model because the pure English 1 probably doesn't contain as much useful information for you.

There are a lot of considerations to be had. Like, some models were built for long text.

Right? So a lot of these models have a fixed input size.

And so let's say 512

tokens. Right? Which a lot of the time is okay because, I mean, when people usually put the most important stuff at the beginning.

Right? And so that's already a good way to, like, even reduce your data. But sometimes, some applications, that's not true. Sometimes you are looking for something that might be at the end. So if you are truncating,

that's not working for you. So then you'll start looking at, okay, you know, long form a, reform a, big bird. These are just some examples of model that were built for that specific

use case.

Obviously,

there

are voice models and there are text models. Right? So if there's that dimension, some of them are cross lingual. Sorry. Cross

model that texts

image models now

from OpenAI.

So the answer is gonna

depend very much

on the particular problem you want to solve, but this I would say these are like the general

kind of strokes.

If you have something where, you know, maybe I want to be able to do, you know, classification,

and then from that classification,

do some sort of generative text. You know, I want to classify the general sentiments around the economic recovery and then be able to generate a summary of information from what I was just, you know, synthesizing.

Is there a way to be able to combine those multiple different pretrained models into a new

model for being able to combine those types of use cases?

People like to group this transfer learning into various groups depending on whether you're training on a new data distribution,

a new language, or a new task. And so the task is area is called multitask learning,

and that's where you're learning on multiple tasks simultaneously.

And there are various ways of doing that. You can, you know, do them 1 after the other.

Sometimes there are ways of concatenating

them into the same feature vector.

You can encourage the systems to

work well together by constructing an appropriate loss function

and then minimizing that loss function.

So there are various different strategies for this. But, yes, multitask learning is how you would accomplish this.

In terms of the actual tools and frameworks that are available, you mentioned things like Keras, TensorFlow, PyTorch. Those are all pretty well known in terms of the, you know, general deep learning and machine learning frameworks.

And then you also mentioned things like Hugging Face Transformer, and then there's a lot of stuff coming out of OpenAI.

I'm just wondering if you can give your sense of what the current state of the ecosystem looks like for being able to use transfer learning in the NLP domain and what the current strengths are in any areas of weakness or any gaps in the available tooling or, you know, availability

of useful tutorials for being able to get started in this space?

So I'll start with 1 that hasn't come up yet. It's fast dotai.

That's

a open source course

Jeremy Howard and his partner develop.

It's, I would say, in my personal opinion, is very good at trying

some very academic or

otherwise inaccessible

methods very quickly. So it has,

like, functions for

picking the best landing rates

for your problem, which is something actually I haven't seen anywhere else. So, like, this method, ULM fits in just a couple

function calls, you are able to, you know, use this state of the art method right away.

In terms of weaknesses,

very quickly, it's very hard to, in my experience, adapt it to new,

you know, very custom problems that I'm having, that they haven't written their API for, if that makes sense. So if I was building a very customized solution, at the end of the day, I'm probably going to end up in TensorFlow

or PyTorch.

And the reason is deployments.

On deployment front, things like Agenface transformers, for instance, provides an accelerated version,

API that's paid, that will run fast. But if you are not using 1 of those kind of solutions, you are going to have to use something like TensorFlow 7 or PyTorch 7

to, you know, speed it up. Because you are probably going to be running it on maybe

millions of pages or something in production or real production grade problems. And just using the Python library is not gonna be good enough. So but the Python library transformers, I can say it's transformers.

It's pretty much where you're going to find the latest

architectures where you can just load them in, like, 2 lines of code and try it around and see. At least on a small data set, you can compare the different methods. And then you can export the model with a PyTorch or TensorFlow and then put it in this more downstream

tools. And other tools that should be mentioned, the

AWS SageMaker,

obviously.

This is something that allows you to use, like, elastic inference accelerators, which is cheaper than GPU

for scaling, for instance. It allows you to, you know, get your,

like, 7 solutions or maybe even Docker images if that's what how you like to do things and deploy that fast and at scale.

Docker can be important if you have a very specific dependency,

right?

If you do not have access to a GPU locally,

then you probably want to look at Kaggle Kernels or you want to look at the Google tool collab notebooks That will give you free GPU at least something to be doing research with if you don't have the resources

Azure has a solution that's

similar to SageMaker, which is the machine learning studio.

So I would say

this pretty much covers it. Keras, of course, is important, and they're doing a lot of work in this direction now.

And in terms of your experience of working in the space and researching the space and doing the research for the book? What are some of the most interesting or innovative or unexpected

applications of transfer learning and NLP that you've seen?

I think this has kind of been

dripping through the conversation.

I think the the impact on the amount of data I need to train a new language has been mind blowing to me. And just a very practical

impact on something I care about.

The code

generation

was pretty,

you know, head snapping for me. It's like I can do that. I can just tell the machine to

basically outline the broad strokes of a library. And then,

of course, I'm gonna have to get in there and test it, but that cuts down

the development cycle so much.

Most people say gpt 3 or a model like that isn't really understanding human language.

Right? But at the same time,

you could argue it's passing the Turing

test because

You know, you put the text out there and people think you wrote something deep, right?

So they think a human wrote it. Did it pass the Turing test? Right? So

this raises

questions like how important metrics are. Right? Most people train

their translators for Blue score.

Right? But when you read the Blue score papers,

Even the blue score authors outline ways in which it fails So that's what everybody does though Everybody trains on blue score because the people who first did it train on blue score and now we need to compare to their work

So we're stuck with blue score, but is it really

the best way to measure things? I think as a field, maybe machine learning sometimes doesn't stop

because of the speed at which things happen. Right? And so there's momentum and everything moves. But

sometimes it may benefit us to kind of just slow down

and think carefully about what we are doing rather than, you know, just scaling

because it seems to be working. So let's go. Right?

I hope that falls within

the purview of your question.

And then another thing that I've been seeing a lot lately is sort of,

general

fear, uncertainty, and doubt around the

use of things like generative models for being able to create content and sort of the deep fake issues around video and audio being applied to text and being able to, you know, rapidly generate a bunch of

false news reports or

misleading information and just sort of the general aspects aspects of disinformation, which I know you've said that you've done some work in that space. And I'm wondering just what your thoughts are on the

trade offs of these models being so powerful for being able to do beneficial work and also the fact that they're potentially being used to create

disinformation and so uncertainty and doubt?

So 1 thing I would say

based on my work,

in disinformation space is that the most effective disinformation actors are actually not robots. They are cyborgs.

Right? You really need people working really hard with the dissemination tools

to, you know, make this stuff work. Right? So

inherently places some limits on how far

These things can scale at the same time. Yes, it has lowered,

you know, the entry

barrier

for some malicious actors to

do terrible things. And I mean, it's a fundamental philosophical question though that I think we always return

to. Should we innovate or should we not innovate? Because, you know, understanding the atom

means we can create nuclear weapons, but it also means we can create nuclear energy. Right?

Should we not

understand

the atom because we are afraid

of our darker nature. Right?

Any problem like this is kind of like an active adversarial area where basically the good guys and the bad guys fighting each other. At the end of the day,

we hope

that there is more support behind the good actors that they are going to win. That's the only way to really

defeat the evil is to be ahead of it, not playing, right? Saying I'm not gonna participate.

They'll probably figure out how to participate without you, right? So you probably should be there

and balance it in the right direction. Yeah. I know that there was some general sort of back and forth when GPT 3 was first created around

first, they only released the the light version of the model because they were worried about its applications, and then they kinda realized that, you know, it was going to come out 1 way or another. And so they released the full model. So it's definitely an interesting

problem domain. And as you mentioned, it's very sort of existential and philosophical

and but at the same time, a very practical problem. So it's interesting to see how it's playing out. Sometimes you may confuse political statements for the truth.

There's a lot of power

that the winners of this game that, you know, the commercial game that's going on and all these technologies are going to wield. And so, you know, people sometimes don't release their software because they want to keep an advantage. But, of course, they may say that they're doing it for a different reason. Right? How do you really

know? And so in terms of your experience of working on the book and diving deeper into this part particular space of transfer learning and natural language processing,

what are some of the most interesting or unexpected or challenging lessons that you've learned in the process?

An area like this 1,

it's very important

to be, you know, focused

on some key

things you're going to tackle.

Because every day, there's going to be a new paper,

a new model that try something different

and something new. And it may seem that it's kind of like deprecating

your stuff you are working on.

So there is like a balance of picking

the problems that you think are going to be, you know, stick around for a while.

Right? Because you don't want it to become

irrelevant tomorrow because things move so quickly. I mean, you can make that determination if you look carefully. Like, we know, Bert, for instance,

Even today, there may be all these other models out there, but everyone still starts with BERT. Right? When they're working on the problem like this, BERT should be in your testing.

And a lot of

companies have put resources behind making those models particularly fast to deploy and so on. And so, you know, it's very important to pick carefully what you're going to work on

and not get too distracted

or worried about, you know, stuff going on. There'll be a lot of stuff going on. The other part is, I guess, how little

academics

think about

making some of these

messages appealing to the wider public. There seems to be a gap between,

you know, these amazing papers that are being produced,

very applied impactful papers

on all this cutting edge machine learning

and

making it actually easy for people to use it

in production or write some code that actually uses it for some problem. I think that's an area that's not been given as much attention.

And I think

there's a lot of

reason

to close that gap. And this is what this book tries to do. I've seen more resources like that coming out, which encourages me. But I think this we still need some distance to go.

And in terms of people who are looking at transfer learning and they're looking at natural language processing and they've got some sort of language based or textual project that they're trying to work on, what are the cases where transfer learning is the wrong approach

and they might be better served with,

you know, TFIDF or Word2Vec or 1 of the other types of approaches.

So 1 of the dangers of pre trained language models is

potential bias the model may carry that you don't know about because you didn't train it. Right? So if you are working in an application where explainability

is very important

and you need to explain, you know, it's let's say you are working on

detect who is guilty based on the

blood

pressure or something like that. Right? I mean, if there's so much can go wrong,

if you get it wrong,

that, you know, you really should think about

if such a bias can

have a bad impact like that.

If you can

build a model from scratch where you can, you know, explain all the pieces,

that's ideal. And that's not always possible. Right? Some problems are way too hard.

And in that case, you should be doing

working on some kind of explainability mechanism that explains

how your model came to its decision.

In some cases,

your deployment strategy may not be sophisticated enough to handle some of these models. They are more expensive than some of the approaches you mentioned. You know? Like, running production grade neural networks, you know, it's not

cheap,

you know, if you don't have, like, a use case that's valuable

enough. Right? So you may not need

it. It may not be the right application if those all those trade offs don't work out

right for you. You may not need it if you have

sufficient data out there already that you can train your own model from scratch. You know, like, training your own birth model, I mean, it's it's not cheap, but

for, like, a startup, it's not that expensive anymore. Right? So

you may be able to train your own if you have sufficient data. Or I guess probably it's just simple enough. Right? It doesn't need all of this stuff. You might be better served. Yeah.

And as you continue to work in this space, what are some of the trends or techniques or new areas of research that you're keeping an eye on or that you're most excited for?

The major 1

at the beginning of this year, I mentioned already, which is the focus on voice. So, so far, the past couple of years, it's all been text, text, text, text. Now all of a sudden, all these other modalities have been being brought in. So now

we've seen that the same

sorts of techniques

can be used

in voice

technologies,

which is very exciting.

And also, of course, the return back to computer vision.

So all of this was inspired by computer vision to but it's different from computer vision. And now some of these new ideas are going back into computer vision. So now people are building image transformers where they're trying

to contextually embed objects.

Right? So train

on unsupervised data where you kind of try to learn

what kind of context, let's say, a camera usually appears or a human

or glasses or headphones. Right? So that you can try to be generative, descriptive

about the image, and so on. And so the use of transformers in computer vision, which is kind of a reverse direction of the image in that moment, is It's also very exciting.

Are there any other aspects of transfer learning and natural language processing in these particular areas of research or the work that you're doing on the book that we didn't yet that you'd like to cover before we close out the show?

I would like to stress some of the work that

me and some of my colleagues are doing in,

I guess, democratizing

some of these technologies for African languages

through the organization known as Ghana NLP.

And in this context, we have done things like build

new data resources

for some African languages. We've done things like build translation apps

both for iOS and Android. The app is called Khaia,

k h a y a. You can download it, give it a try.

We started with

Ghanaian

languages, 3 hour.

We have since expanded it to the rest of the African map because our users were asking for it. So we have languages like Swahili and

Zulu

and Wallof

and Yoruba

in beta testing now. We are also putting in the voice capabilities. So we recently built a tree

automatic speech recognition system. We are putting it in the apps and just trying to make sure that, you know, some of these tools, like, people take for granted, like Google Translate and so on, are available to, you know, say, people from Africa, which I think is important

for all of us, a more equitable world and where we can all communicate and work together. Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And so with that, I'll move us into the picks. This week, I'm going to choose

a musical group,

group called infected mushroom. They do electronic or techno or whatever you wanna call it, but just a very, you know, interesting group, a lot of, you know, variety in terms of how their different albums go. You know, some of it's entirely

instrumental. Some of it has voice. So just a lot of fun for something to listen to, you know, while you're coding or zoning out or exercising, whatever it is. So definitely worth checking out if you're looking for a new group to listen to. And with that, I'll pass it to you, Paul. Do you have any pics this week? I watched that movie called Tenet, which was a very interesting concept.

I thought the movie

made it difficult to understand. But the concept of traveling backwards in time, not

in 1 go,

but as a in a linear way, where you just switch direction and you're going backwards, and then switch it again and going the other way. It's kind of interesting.

I don't think a lot of, popular culture has thought about it that way. So that comes to mind. Yeah. That was definitely an interesting movie. Another guest has recommended that 1 as well. So, yeah, I watched it, and I'll have to agree with you. Definitely worth a watch, but make sure that you're paying attention because it's there's a lot going on.

Alright. Well, thank you very much for taking the time today to join me and share the work that you're doing on transfer learning and NLP and for taking the time to write the book to make it more accessible to more people. So appreciate all the time and energy you've put into that, and I hope you enjoy the rest of your day. Thank you so much, Tobias. Thank you for having me, and I had a great time talking to everyone.

Thank you for listening. Don't forget to check out our other show, the Data Engineering Podcast at dataengineeringpodcast.com

for the latest on modern data management.

And visit the site of pythonpodcast.com

to subscribe to the show, sign up for the mailing list, and read the show notes.

And if you've learned something or tried out a project from the show, then tell us about it. Email host at podcastinit.com

with your story.

To help other people find the show, please leave a review on Itunes and tell your friends and coworkers.

The Python Podcast.init

Summary

Announcements

Interview

Keep In Touch

Picks

Closing Announcements

Links

The Python Podcast.__init__