Bringing Artificial Intelligence Projects From Idea To Production

Hello, and welcome to Podcast Dot in It, the podcast about Python and the people who make it great.

When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode.

With the launch of their managed Kubernetes platform, it's easy to get started with the next generation of deployment and scaling powered by the battle tested Linode platform,

including simple pricing, node balancers, 40 gigabit networking,

dedicated CPU and GPU instances, and worldwide data centers.

Go to python podcast.com/linode,

that's l I n o d e, today and get a $100 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show.

And do you want to get better at Python?

Now is an excellent time to take an online course.

Whether you're just learning Python or you're looking for deep dives on topics like APIs, memory management,

async and await, and more, our friends at the top Python training have a top notch course for you.

If you're just getting started, be sure to check out the Python for absolute beginners course.

It's like the 1st year of computer science that you never took compressed into 10 fun hours of Python coding and problem solving.

Go to python podcast.com/talkpython

today and get 10% off the course that will help you find your next level. That's python podcast.com/talkpython.

And don't forget to thank them for supporting the show. Your host as usual is Tobias Macy. And today, I'm interviewing Henrik Lundgren about his experiences building AI platforms to transform business capabilities.

So Henrik, can you start by introducing yourself?

Yes. Hi.

My name is Henrik Lundgren. I'm based out of Stockholm, come from Sweden.

I work as an operating partner for the investment

firm EQT Ventures,

and I've been doing this now for 5 years where we have been setting up a new venture fund, trying to disrupt the way that ventures work in a more tech driven way.

So I focus on doing investment work, finding

great great funding teams and investing in them and helping portfolio companies with everything when it comes to data and analytics.

And I also work as a product owner for our own internal projects, which is help ourselves to become better investors using tech and data. We have manifested that into a platform which we call Mother Brain,

which is basically a platform where we collect as much as much data as possible about,

founding teams out there and companies and their performance

and then we try to build algorithms on top of that to help us become

smaller investors to come to the companies that, are most interesting first.

That's in short of what I do right now and before this I have a background from starting building coding when I was a kid. I worked as a developer for some years when I was younger and then wanted to add the business side of things. I went to Gold. I have an industrial engineering and management degree and then I went to the management consultancy track at Accenture and then McKinsey for some years to learn the business side of things. And from that, I went back to tech joining Spotify in 2010. I was part of building up that company from

2010 and then over 5 years.

From that point where I headed up the analytics team. So I summarized my life and my passion areas in 3 circles, which is, tech, data, and business. So that's, everything, around those 3 circles is is what I'm passionate about.

And on the point of the Mother Brain project, when I was doing research for the podcast, the other thing that popped up was a reference to Metroid. So I'm wondering if that's where the name came from.

That could be 1 of the names. As it is with names of things,

no 1 really knows exactly how the story

of the name ended up in the end because some people say that's the 1 of the sources and some people say say some other sources. But so we like it to be a bit mysterious, but, yes, it has some game references for sure.

And so you mentioned that you've been programming for a long time. I'm wondering if you can share how you first got introduced to Python.

So when I started coding, I was 6 years old.

That was the

year 87, in the nineties.

No. Sorry. The eighties, 1987.

And then it was BASIC, so I started coding with BASIC,

learned some other things along the way.

Around the dotcom era of the year 2000, there was a lot of c sharp. First, it was asp.net

and then c sharp.

JavaScript, a lot of JavaScript. I was actually called

in some other random story I will tell you someday, I was called the JavaScript man. I was named that because I really loved JavaScript in the year 2000. And then looking back, I I am a bit proud of it because of all the the explosiveness of JavaScript these days. Python, I was introduced to Python when it came to the first data jobs we we started building in Spotify in 2010.

In 2010, it was super early for to do Python or to use Python for data science or analytics

because everyone was using R.

The great developers I was working with at Spotify, they preferred Python. So that's kind of how I got introduced to that.

As you mentioned, you were pretty instrumental in helping to bring the analytics capabilities into Spotify, which has come to be 1 of the poster child's

companies for the use of artificial intelligence and machine learning for driving the core elements of the business. Yeah. So I'm wondering if you can just start by giving an overview on your thoughts on the when, where, and how

artificial intelligence or machine learning are useful for a business and maybe some of the cases where it

is just

a time sink and isn't going to be as transformational as it can be for certain entities?

Well, this is a topic we can talk for hours about, I think. But try to to start somewhere, at least.

I think it was a big evolution that happened around

2010 when I joined Spotify and and I was set up set out to build up this analytics team.

And I say that because the new infrastructure that allowed with starting with Hadoop, that allowed us to store every single event.

So all the raw data, we were able to store all the raw data, all the clicks, everything every user did

into the data store and also be able to analyze that

because that opened up a completely new way of working. We could actually build a platform where we could ask

any question we would like about why did this happen? Why would the user do this? And, you know, to really understand into the depth of every single session of every single user.

That level of detail was amazing, but also paralyzing

to figure out, like, how can we get value out of this?

How can this help us to make better decisions?

Because as interesting and fantastic it was as paralyzing it also ended up being because it's it's such a time sink as you mentioned

to have access to all of that data. If you don't know, if you're not super disciplined on what you look for, you're gonna spend hours and hours and hours and weeks weeks of a team's time to analyze things that doesn't really help you in the end because you can. And also to create all sorts of this fancy visualizations that you can do these days when you have so much data available, end up helping you make better decisions. So you can waste so much time and, also money if you are not disciplined here.

And I think my, learnings from my McKinsey years as a management consultant was really helpful there because in that role, you're really under time pressure and you really need to boil down to what is it that we really need. And you you have to do this hypothesis driven

approach where you you are extremely

disciplined to figure out exactly just enough data points to make up your mind if you should do a decision, you know, yes or no all the time. There is no value add to do extra analysis and when you've come to the point where you can make the decision.

And I think that's a very important

learning that I still think is very important that I try to keep to learn to teach the or to coach other people doing and trying to do something here to do. Because many, many times during these last 10 years, when I've talked to about AI with people or data projects in general.

Excuse me. This is not COVID. This is just the normal cold. I promise.

And when I talk to to general about general data problems or data projects with traditional companies, they usually say that they've tried some examples themselves.

But most of them, they say that they kind of haven't got any value out of it. Their projects just end up in some isolated experiment somewhere.

And I think most of the time, the missing link is that when people don't have that link between,

you know, the data that they have and what the decisions is that they really wanna make, they just start to explore data, and that's the recipe for disaster, I think, if you wanna get something out of it. But if you do set it up like that, if you really have mapped out your metrics, where you really wanna figure out what the key things for your business are, and then you direct all the resources, all the analytics, all the modern data tools in there to those kind of problems, then you can really do wonders. And, for me, AI and and all the artificial intelligence,

methodologies

are just part of that toolbox.

So if you want to figure out how to improve churn for your business, then you should, of course, use the best tools available there. And if that is

a just a standard cluster cluster algorithm or if you use something more advanced using machine learning or deep learning, then you should use that, of course. So for me, that's just another tool in a toolbox, but it that you should apply for the problems that really matters.

And if you do, you will get much better results than compared to than if you didn't. 1 such example is that when we started with the Mother Brain project,

we did the 1st generations of the platform was to collect data and to have, that platform as being a a data store where you could look up, all the important data points for any company.

And then we started to build the simple algorithms on top of it. And we started doing the classic statistical models to say to do feature engineering, to figure out what are the factors that really are linked to, high performing companies

in the different datasets we have. And that works really well. You can build super nice, well functioning

physical models to do that.

So that's how we started. It worked out well, but then we wanted to try, okay, what if we now use a deep learning approach instead?

So we did that, and it was quite funny because the results of the very first model outperformed all the results that we had done in the past with our statistical models that we had spent so much time building the features for.

So it was super fun and also a bit annoying,

but we really could feel like, here, this is how AI really will revolutionize the world because we could send in not just the our own features that we had created.

We could actually send in the raw data, all of it, all the time series, everything at the rawest level into a neural network and then have that model actually find those patterns much better than what we could have defined ourselves.

So that was a big, like, mind boggling, moment in our history, and I think that's an example that exemplifies

my point there in the beginning, as well. So super long answer, but just a starting point.

Another

potentially interesting digression is the

conflicting nomenclature or the ways that people try to use the terms artificial intelligence and machine learning. And some people are very religious about the distinction between them. Some people use them interchangeably. I'm wondering what your thoughts are on that matter.

Yeah. So for me, I follow this debate a lot. Right? And I'm not at all religious. I don't actually care so much.

For the purpose of the discussion, for me, the most important thing is the output to get to it. I don't care actually so much which bucket you put the different methods in. For me there's a big toolbox of different methodologies that we should use. We should always use the best 1 suited for the task.

And I don't care so much about which bucket it is in, even if it's, like, in the statistical

toolkit bucket.

Yeah. It's really fascinating to follow all the things that happens. Like, almost every week, there is new involvement in the community of new things that people have developed. So it's it's extremely fun to be part of this evolution right now.

When you were discussing some of the early days of bringing the analytical capabilities to Spotify, you mentioned a few different things such as Hadoop and being able to do event tracking.

And there are a number of different potential data sources these days with the possibilities

continually expanding as there are new services, new storage, and new compute technologies.

I'm wondering if you can set the baseline as to what you consider to be foundational capabilities that are necessary for an organization to be able to be successful in bringing a an AI product into production?

Yeah. There are a couple of things

that you need to have in place, which is also something that I,

portfolio companies with

after we have invested.

1st and foremost, you need to have a way to collect the data from all the different sources you that you have, both the internal sources but also external data that you can you can grab.

You need to put them into the same place and to have 1 single data store where you could infrastructure where you could analyze them. And once you've got them in there, then you need to have this processing layer where you can transform and build data pipelines of that data into your, like,

your what is it called? A plethora of all the different information

entities that you want to use later on. And if you do that well, you can build a logical sequence where you build out your pipelines,

each logical step after the other because then all these different entities that you define will have a very nice logical flow and people could build something and then build on top of others' work and then this information model that you're collectively building in the organization

will evolve, be maintained and used in an efficient way. If you do it wrong, people will have a big, big, big spaghetti monster of ad hoc queries that people do because they need to, which introduce a lot of inconsistencies

and a lot of troubleshooting to understand why numbers are not the same in different tools and so forth.

The pipelining layer is is a very important 1 to get right, I think. Then after that, you need to have the environment where the data scientists or the analysts can work to access this all the data in the in the platform. And nowadays, the tools are really good to do this, I think, and a lot of people use Python, of course.

So if the data scientists feel that they have what they need to to analyze and explore and build their models

and to do that and also a place where they can put the the models into the production, into the pipelines, then that's great. What the last step is missing then is the visualization layer

or the the tooling, I would actually think about it as for the operators in the team

as as 1 thing. And then the other last part is the output that goes into, serving

models or output on models to the products that you're that you have in live.

AI features in your apps, for example, and then that needs to be served as well, of course. The internal data points, because I'm usually focusing a lot on that. Right? Because I am a strong believer that there is a lot of algorithmic help we can continue to do to to become better operators in our different companies out there, such as Mother Brain but also we do the same for Spotify.

The first thing

people do is that they build dashboards, right? And that's great, we can visualize the data into the key dashboards.

But I usually try to challenge everyone also to say that it's very easy to build dashboards and it's very easy to build way too many dashboards when then what you end up in is people have

spent so much time,

managers, product owners, people in the organization,

they spend a lot of time to analyze all these dashboards

and sometimes I think that is that really efficient way of of their time, efficient use of the time. It's better to think 1 step ahead

to really

build the right metrics so that they have few dashboards, few metrics to track, the most important ones, and then also build the tools into the dashboard so that they can actually act and take decisions as fast as possible so you kind of integrate it into your back end systems that you have. So for example, if you want to build like, performance marketing,

tools, you should, if you can, like drill down to the very, very, very few metrics that really matters for the team and so that they can track their campaigns on their exact best models that you have on their LTV versus cap ratios, for example,

and then have them, be able to increase or decrease the various campaigns in the actual tool immediately. So they they don't have to spend way too much time on analyzing the data themselves. They shouldn't have to go somewhere else to make those decisions. It should be all be in 1 place. So I've coined the term action board which is not the dashboard, it's a place where you can get suggested actions

of the data and then make the actions in that tool. So this is a way to integrate the the data parts and the algorithms into the actual tools that operators should use.

Our infrastructure to collect data to store all the data at 1 place, the pipeline name, and then the dashboards

the products that you build, that you wanna power by data.

All of those different

layers and components to the life cycle of collecting data through to delivering it and providing actionable insights

requires a pretty diverse set of skills, and each of those can be a potentially full time role for 1 person. Yep. I'm wondering what you have found to be some of the useful compositions in terms of the skill sets and the role definitions

for

an overall team that is looking to build an AI product and be able to serve it effectively?

So in the beginning, when you start,

it's hard to just recruit everyone, right, at the same time. But I think there are 2 roles that are really important in the very beginning, I think.

1 is the technical

role that has like a data engineer type of role who has the ability and experience from knowing how these different components look like and how they could be stitched together and have done that before.

And that person is the 1 that will take these architectural decisions on what to use in the for these different components and then actually put them together

because that work is a technical 1.

If you are a great developer, you can do that in a smart way. If you're not, you're gonna do it in a less efficient way or not use the right toolkit.

So the data engineer is 1 of the 2 most important roles in the in the beginning, I think. The other is then the person that actually asks the questions or does the analysis make sense of this data and call it data scientist or an analyst, but someone who is really good at actually querying the datasets, analyzing it, building the first models.

Someone who knows tech can code, but not the best developer,

but also understands the business side of things so that that that person know what the most important questions are to ask, coming back to this disciplined way of working hypothesis driven

way of working that I mentioned before.

So because if you don't have that, if you have only the more developer type of profiles

asking the questions, you can sometimes

miss out on the most important questions for the business. And I think that's what I've seen

over my years is that that role is something that you really can benefit from having. So someone who has the both the tech and business side

of the equation.

That person should also then be ideally someone who knows and is interested in what how you can leverage all the modern tools including machine learning slash AI

so that the person can see,

be creative around, like, understanding business type of questions

using a modern tech and data approach.

Because if you do, that person will start to prototype things on his or her own and build prototypes of new AI models

It's simple in the beginning just to see if things works. And we all know that experimentation is is extremely important here. We we know that you're gonna have to iterate many, many, many times. So it actually works pretty well-to-do that on a light scale if you have the infrastructure in place to just play with it, build many, many models and see what seems to work. And then as soon as you get to something that seems to be interesting for you to put to production, then you work with a data engineer to create those pipelines for production use. And then data engineer comes really handy because that person is much better than the data scientist to know how to build things for scale, sort things in real time and so forth.

So the tandem, the pairwise work of those 2, I think is the most important

duo to start with. And then from that, you can start to build out the team. So then that person know how to bring in a specialist,

a machine learning person when that time times come. That person can also have a more junior analyst

or data scientist come in and do lots of the a lot of the work that has to go in the beginning to define all these pipelines in the beginning to define all the metrics. There is a lot of work that has to go down to just clean the data, create these nice datasets to be used for later later downstream in the pipelines, so to say. So usually, I think those 2 starts, and then the team expands with having more specialists and more junior people joining the team to increase the capacity.

Even with all of those roles and all of the technical platform in place, there are a number of different statistics that point to a fairly remarkably low success rate for actually being able to bring artificial intelligence projects into production where they might be able to iterate and have some sort of model that's useful or interesting, but then they're not able to actually deliver that and capitalize on that. Or they might be able to bring it into production initially, but the ongoing maintenance proves to be the downfall.

I'm wondering what you have seen as some of the common pitfalls that organizations and engineers

might be aware of when undertaking such a project and things that they should be considering at the outset.

In general, I think people try to make such a big thing out of it, where I just see it as another tool.

If you do, there is not even a thing like AI. Right? There is more, of course, you should build always strive to be as data driven as possible and use the modern tools available for whatever you want to do. And if you think like that, then of course you will use the best model,

which could be an AI model for the task at hand. But if that's not the best model for that task, then you shouldn't use it. Then you should use something else. And so if you try to shoehorn

AI into whatever problem you that you're working on just because you have to, because someone you really want to have AI in your company, then that's probably not gonna fly. And I think many people in many many companies have done just that. So there is something coming from the top. Like, we need to have AI here because I've read about that in the news. They start a project and then they hire some machine learning experts

push from the AI side to push something into somewhere where it's not really being pulled from the business perspective. That's my hypothesis on from the observations I've done. The times projects fail is when you have that kind of setup, when it's not linked to coming from the actual problems that you wanna really do as a company.

And as far as

being able to identify the opportunities

where

AI could be successful in being able to help the business

or be able to serve as a core product for the organization,

what is your overall strategy for being able to identify those opportunities

or to

evaluate some of those opportunities to determine when it's not worth the effort.

The first 1, the standard 1 that you need to have a lot of data to be able to build machine learning algorithms on top of it. If you don't have a lot of data, it's gonna be too little data to to build significant training from,

and then it's not gonna really work. So that's, 1. But the other is

it's also sometimes

interesting to think that sometimes you could think that some problems are too impossible to solve yourself using AI because humans need to do it. People will many times think that, you know, AI cannot simply solve this itself.

And that's, I think, so true so many times. But then if you think about it, okay, but how can we build a process where humans and AI work together? And then it starts to become really interesting. I think that's exactly what we have done in the mothering team. Right? We're not right now, like, building a platform which

automatically invests in companies because at at this point,

there is simply not enough data points or data available to do the full assessment that an investment team does when before investing in a company.

But that doesn't mean that we cannot use data driven process using AI to help ourselves to get better and to make ourselves more efficient and to make ourselves more

to us the companies that look most interesting

based on these all these number of factors. And then we serve that to the people

in the team and then they give their feedback to it and then they start working on these companies and then they, if they're interesting, they start to meet them and we gather more data points around the companies and then we choose to invest or not. And every time we do something like that, we first, we get better suggestions because it makes a lot of sense to build that an algorithm should be able to determine from external factors if the company is more or less interesting,

not to invest in, but just interesting to look into.

And that's a reasonable scope for a machine learning project to do. We have enough data points for that and so forth. And the second point is that once we have acted on the suggestions by the platform, when we have invested more time to understand the companies better, we also provide the feedback back to the algorithm. So we also get this learning loop.

And since we have our process that all of us have to assess companies that comes from other brain every week, That's actually a meaningful or a significant data stream of labels back to the algorithm itself.

So it does not only help us to be become more productive and efficient

by automating the analysis or the screening phase, but it also helps us to improve the algorithm because every single decision that we have that we take is being recorded and stored and used as a label for the model itself.

So it becomes a self learning

model or I would say almost like a self learning

process with investors in the loop, you could say. And I think that's an interesting aspect that I think if companies start to think like that, that it's not AI that has to solve everything itself,

It's not AI

versus us. It's we

empowered by AI. How can we make our processes smarter using AI for different things or any other kind of data method that suits the task? Now how can we build a process around that, that will help us to save time and to become more efficient and train the algorithm at the same time? Then I think you can open up and see much more use cases where this can help. Because try to solve the core everything at once that the whole company with experts does, it's not gonna work in the beginning. And then by doing this approach instead, you will help us right now and then you will slowly build us towards the final vision in the end and later down the road when we will work in a completely different way when we have collected lots of different data points, when we have evolved the algorithms based on all the learnings, and also we have done great decisions

on the whole road of getting there as well.

Continuing on the topic of mother brain, that's another interesting project because of the fact that it is dealing with

trying to evaluate the potential future success of different companies. And it's not as clear cut of a project where,

for instance, with Spotify, as you mentioned, you're collecting all of the event data from users of the platform where it's fairly clear cut that you're trying to understand what are some useful recommendations, what are some useful engagement metrics for being able to understand

either what new music to present to somebody or potential new features that will help to keep people engaged.

Whereas with Mother Brain, it's a little bit more intangible

in terms of

what is this organization doing? What are they trying to do? Who

are the people who are

building this organization?

I'm curious what you have found to be some of the useful

inputs to that model for being able to generate predictions

generate predictions that are helpful for being able to decide whether or not to invest?

Yeah. There is a lot. I mean, the fun part is that

I see that we build this platform

for 2 use cases at the same time. 1, we build it towards

the investment professionals that work in our team that every day want to meet great new founders and assess them

and then and decide if we wanna invest or not. And the other is to build the algorithm to help us do that. But if we focus the the work

primarily on how building the platform to be as to make the investment professional as efficient as possible, help them to make their decisions faster and

faster all the time. Then we will learn what we need. So then we will learn what the data points they or we look at

to determine if a company is something to invest in or not. So gradually, when we kind of every week or every month work to build a platform better and better for our users, our people in the team,

we learn when it fails and when it's good.

We then learn what is missing in the platform, what can we automate, what are datasets that we don't have. Today, people go somewhere else to find. So we have all those answers. They come out in this evolution we do every month of the platform.

And then by doing so, we add more and more datasets that the users want to see themselves and we automate more and more of the analysis that the users already do today,

which makes them happy, right? Because then they have less places to go to. It makes them more efficient because we can automate the analysis they do and it makes it more objective because they always do the same thing, we get benchmarks and it gets tried to make the analysis automated in an objective way.

And in parallel, all these different data points is also what goes into the algorithm

and it makes a lot of sense that probably the things that

the people want to look at to determine if they should invest or not in a company are also probably significant factors to an algorithm to predict if the companies are interesting to invest in or not.

At this point, we, you know, summarizing the kind of data points we do look at right now, the the standard ones is not something exotic yet. It's information about who invested, you know, the funding history and at what time and so forth and also the performance of the traction that we can see from different sources in terms of, like,

web traffic and sort of app store rankings and things like that for the company.

We also have financial data so that we can find about the companies, how the financials look like.

But we also go in and visit every company's website to capture,

information about how what they write, how they position the product, and so forth.

The mix of all these things, time series, hard and tabular data with also unstructured data as text is super interesting because that's exactly the kind of data points that you think could be worth to try to use an AI, algorithms to predict the likelihood of of interestingness of the companies.

That's usually the typical situation where those kind of models thrive, where you can combine the structured and unstructured data at the same time. And that's what we have seen worked super well as well.

In your work between Spotify

and EQT Ventures with these different goals as to what you're trying to understand with this automated analysis from the AI engine,

what have you found to be some of the common factors in the effort that's involved, and what are the areas where they diverged?

So when I was at Spotify,

we were earlier on,

as I were a level, we're almost there to say, when it came to using

AI

for decision making.

You know, we were super early just to be able to collect the robots and to display them in meaningful ways and to make decisions from

metrics defined

from a much granular

definition than before.

Yeah. So that was the complete new thing. Wasn't so much, AI going into business decisions at that point. We started to use AI to build,

the first recommendation

engines that my team didn't build themselves. It was another super strong team that did that.

So for decision making, in the earlier days, didn't use much AI. And if I would do that again now in this point, definitely we would

we would not spend a lot of time in building all these dashboards, as I mentioned. We would instead

think 1 step further and think for every person that use that work at Spotify, you know, what what where are the decisions that they do, try to build the decision tools

and then automate as much as possible of the inputs

that they need to go through every day today. So thinking again about the marketing team is a good example. I think that it should work as a trading house, almost automated where maybe you just oversee the different decisions and maybe go in and, and verify

spend levels and things like that. But it should always suggest it to you instead of like having you to do all of that. And so what we do today, like some similarities, I think it's funny to think that the first AI model, so that's what I find being the recommendations of which music to listen to, you know, best manifested in this Discovery Weekly feature that the team built.

And so every week you get recommendations of which music to listen to, new music.

All the thoughts that goes into that, like to figure out all the 40, 000, 000 tracks that we should recommend to you in this, list of 20 songs in your weekly playlist.

There's so much factors that goes in there, like what's your style, what's the familiarity grade that you want to have, do you wanna listen to your favorites, do you wanna have new music, you know, all sorts of psychological

and also taste

data points in there and also seasonality and local. It's so much that goes into it. The same kind of thing

what we do here at today at, Ekiti Ventures where the Madeline platform

hands out every week to you a list of these are companies for you to look into as suggested by the platform.

And we also need to take all these different aspects into consideration,

and we can improve it,

a lot in the future too where, you know, these companies are probably raising now. These companies have great investors backing them. They're in this sector that is interesting, they're maybe in a sector that you like,

things like that. So designing that, it's almost like we say it's the EcoDenture's

Mother Brain weekly list that goes out every week. It's kind of the same thing.

Yes, there are definitely similarities in what we do but there are also differences mostly linked to that we are in a different time age now where AI is is much more built out than what it was when I was doing this for Spotify.

Equilum's end to end data ingestion platform is relied upon by enterprises across industries to seamlessly stream data to operational real time analytics and machine learning environments.

Equilum combines streaming change data capture, replication,

complex transformations,

batch processing, and full data management using a no code UI.

Equilim also leverages open source data frameworks by orchestrating Apache Spark, Kafka, and others under the hood.

Tool consolidation and linear scalability without the legacy platform price tag. Go to python podcast.com/equalum

today. That's equalum

to start a free 2 week test run of their platform, and don't forget to tell them that we sent you.

In your experience of working in this industry and being involved in

bringing AI into production, I'm sure that there is a lot of information that you track for your own purposes to make sure that you're staying up to date with the industry and to understand what the trends are. And in your experience

of working with AI and keeping up to date with it, what have you found to be some of the most interesting or innovative or unexpected ways that it's being used?

Well, I really like the generative concept where you use AI now to generate things. I mean, the whole GPT

now 3 models where it can generate new text based on the seed input. I think that's an extremely interesting

next generation

feature

features that can be built using AI. We haven't done anything of that yet and I'm sure, like, if you do this podcast again 10 years from now, that's probably gonna be the thing that I would say, oh, during the last 10 years, we started also to use AI to generate content to in our work. At this point, we have not used that yet, but I think that's very fascinating

to see how well it starts to get both in terms of text but also with photos or even video as well, like all the deep fake videos that comes out every day almost.

So I would say that's 1 very, very fascinating evolution that is happening right now, I think.

And another interesting aspect of this is that because of the potential power for these systems, there's also a significant

potential for failure as well. And I'm wondering what you have seen as some of the biggest or most interesting failures of AI that you're aware of.

For sure.

This is what happens with all new tech, I think. And, definitely failures have been seen. I guess probably the most have not been

been publicized

somewhere,

but some certainly have.

1 example that came to my mind is the example that I saw on Twitter the like 2 weeks ago, I think, where the Twitter algorithm that

picks

how to thumbnail a photo that you post

is working in a bit strange way. I don't know if you've seen this, but if you posted a picture of something and there is a face in that picture,

when Twitter shows just a thumbnail of the photo, it will pick the picture of the face and crop it to that face only.

Because it probably saw that the algorithm has found that if you show a picture of a face, that's more an engaging image than something else of the image. That makes sense. But the problem is that when you have multiple faces in the picture,

it seemed like if those Twitter stories that I read were correct that the algorithm would then pick the

white dude's face if there were multiple faces and there were 1 white dude.

And that's a complete failure, of course. And I don't think they've been tried in this in these different threads with different variations of that. So what if you pick the male versus something

else? And then it constantly came back to it seemed that it was overrepresenting

white dudes when it came to cropping the right face as a thumbnail of the picture.

And I don't think that failure is something that

they intentionally did. Of course, I don't think that's the case, but it's interesting to think that that's 1 side effect that you will get when you train systems using data. You will amplify the bias in your training data.

Well, not amplify it, but you would instead use the same bias in output of your own models. And here, I think that failure is interesting because it touches on this point where we,

the people and the companies that use algorithms

as a way to serve content, they actually have a choice to also control this and a unique choice to control this that we haven't seen before. So we could actually, we do that as well. We have a similar problem, right? If you train

algorithms used for finding the best next company

based on historic data, you will have a bias built into that. And then

instead of just using the outputs from the model, we can actually we have added another layer on top where we create tools where we can decide on things that we want to over index on.

So in our models, we are over indexing on certain segments that we believe we want to focus more of our time on. For example, diverse teams versus non diverse teams, for example. So we can do that, and and everyone who builds an AI sort of content can do that. And I think maybe should do that as well. And Twitter could do that, for example. But it raises another question then. That means that all companies that serve content has to take a societal standpoint of what they want to change.

That's something that you need to apply for it to to these models, which I think is is actually really interesting because finally, we can actually start to have some controls, change in the things that we really wanna change in the feeds of users.

Yeah. The overall topic

of bias and ethics in AI is another interesting and long conversation that is always worth having.

I think it's great that you'd called that out and mentioned that because

as we already said, it's possible to amplify

the capabilities of an organization

or

the set of data, and you need to be careful what elements of it you're amplifying

because it can be either have an outsized positive impact, but it can also have an outsized negative impact. And that's something that always needs to be considered when you're building these tools.

Exactly.

But I think it's refreshing because if you have an approach where you build something like this, where you have data and and models to control what you're serving, that that means that you can build mechanisms to actually control this, this feed in a better way. And if you don't do this, you will have no control. So I really like the way if we can do this and if if we can get companies to act in this way, I think we actually have a better a better content in the end. In your work between Spotify and EQT Ventures and your other work in business and management, what are some of the most interesting or unexpected or challenging lessons that you have learned?

So yeah. Well, 1 interesting 1 that people don't think about, and I was surprised myself how important it is, is to create this platform that we're doing now where we have algorithms that suggest companies that we should meet and talk to.

To get people to use that information,

then you need to think a lot about the UX parts and the psychology of our own users of this platform.

So we have to invest a lot and actually to make sure that the platform is so simple and easy to use so that people will not use anything else because we have super smart, experts in our team that will use whatever tool they think is most helpful for their daily work. If our Malibu platform is not

considered to be on par with the best productivity tools out there, they won't simply use it, then they will diverge to do something else. And that's a super nice challenge, I think, to have, but also was a very big surprisingly

thing for us that we need to invest a lot in UX to to really make the flows extremely easy and come up with new ways to how to get the value of the models out into into the hands in a very, very simple way, a low friction way. That's an, fascinating thing, the importance of UX when you build AI systems with, humans in the loop. Another way is the psychology that I mentioned of that, which is you can't just serve things to

or if you do serve too much to people, then they will also want to use it. So you need to take into consideration

what is the receiver, the person on the other side, what is that person's workload? How can I treat that with respect?

So that the system has to get information about how much work does the person have on the other side? Is the person overloaded or not? If the person is overloaded, then don't give any more work until the person has completed the previous task because if you do, they will just zoom out and think this is not for them anymore and then start work somewhere else. Think about the users that are supposed to use the platform, the psychology of of that, and to build in some kind of mechanisms to to make the platform in a nice

way integrated with the human workflow is really important to get it to work. We've done this now for 5 years and for in the beginning, we didn't think like this and then so many attempts failed to get people to really do what they had to do because they had all these other reasons to see this more important. And then now we have a nice interplay between the humans and the algorithms. Thanks that we have shared the information

about the workload for people into the algorithms as well. Are there any other aspects of your work at Spotify and EQT Ventures or any other advice or recommendations that you have for anyone who wants to

build platforms for and products with AI that we didn't discuss yet that you'd like to cover before we close out the show?

Well, I think it's more the agile thinking to start small and go all the way end to end all the time

because if you sit in a corner somewhere and build models

you're alone.

If you hire like a super smart machine learning PhD

but that person is not connected to the real problems then it's not gonna work.

So really

try to solve the actual business problem and then see if you can use some data driven approach to do that problem faster. That might be an AI problem or in the beginning might not be. It might be simple rules. We just start super small and get it all the way out and see, okay, that actually helped us. And then how can we make it even better? Okay. That actually helped us even more. Okay. How can we bring in some new datasets? Did this work or didn't work? Okay. Let's try another way. All the time, you do it all the way to the end so you see if the actual results in the end, the actual decisions became better.

That's, I think, would be the number 1 learning after you put all the data infrastructure in place to be able to do anything of this, of course.

For anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And so with that, I'm going to move us into the picks. And this week, I'm going to choose a couple of tools that I came across recently.

The first 1 is called bat, which is

an alternative to the cat Unix utility for being able to show the contents of a file where it adds things like syntax highlighting for languages and

automatically

will pipe the output through something like less for being able to scroll through the output.

Seems to be a pretty useful utility. I've started experimenting with it a bit, so worth checking out for that. And the other 1 is a tool called Wail, which is a sort of baseline,

easy to get started with way to

build out a data catalog just with markdown where it will just connect to different databases

and scan all the metadata and dump it out into markdown files. And then you can add in things like SQL queries to add in various metrics or sampling information. So just something very useful, easy to get started with.

So definitely worth taking a look at if you're just trying to get an understanding of what kind of data assets you even have. And so with that, I'll pass it to you, Henrik. Do you have any picks this week? So 1 platform that I really find is super interesting, especially thanks to my JavaScript background that I mentioned, is a platform called Observable.

Their website is observablehq.com.

It's like a Jupyter notebook kind of platform but built all in the browser. So client side JavaScript

first.

And it's super fast. It's built the d 3 founding team and it's fascinating how fast you come to

new insights when you use that platform because you get the data in and then you do all the transformations

all in the client. And then to have access to all the different visualization and libraries that are out there for JavaScript, it's just amazing because you have the data in the right format already. So you just use something that exists in their amazing community.

Immediately, you can use any of the visualizations in there to for your data. So I really like that platform.

The other 1 would be a team that I just spoke to the other week which is British team called Dataform.

It's like DBT which is a pipelining.

Dbt is a type of a pipelining framework to build data pipelines, but and Dataform has a similar approach where you can build the pipelines using SQL.

But the interesting aspect of Dataform that I really liked was the emphasis on the front end layer where you can build your pipelines using SQL code and super nice code editors

online. So you don't have to go to command line to do that. And I think that will open up the use case even more for and more the analyst type of profiles out there that wants to build a nice structured pipelines of their queries in a nice way. Observablehq.comanddataform.

Yeah. I'll definitely have to take a look at Observable,

and I actually had the Dataform team and the DBT team on episodes of the Data Engineering podcast, which is the other show I run. I'll add links to those in the show notes as well. Thank you very much for taking the time today to join me and share your experiences

building AI products and bring them into production successfully at a few different organizations. It's definitely

a useful perspective to have and a lot of interesting information that you shared. So thank you for all the time and energy you've put into your work, and I hope you enjoy the rest of your day. Thank you so much for having me.

Thank you for listening. Don't forget to check out our other show, the Data Engineering Podcast at data engineering podcast.com

for the latest on modern data management.

And visit the site of python podcast.com

to subscribe to the show, sign up for the mailing list, and read the show notes.

And if you've learned something or tried out a project from the show, then tell us about it. Email host at podcastinit.com

with your story.

To help other people find the show, please leave a review on Itunes and tell your friends and coworkers.

The Python Podcast.init

Summary

Announcements

Interview

Keep In Touch

Picks

Closing Announcements

Links

The Python Podcast.__init__