Surprise! Recommendation Algorithms with Nicolas Hug

Hello, and welcome to podcast dot in it, the podcast about Python and the people who make it great. I would like to thank everyone who supports the show on Patreon. Your contributions help to make the show sustainable. When you're ready to launch your next project, you'll need somewhere to deploy it, so you should check out linode@podcast

init.com/linode

and get a $20 credit to try out their fast and reliable Linux virtual servers for running your app. And now you can deliver your work to your users even faster with the newly upgraded 200 gigabit network in all of their data centers. If you're tired of cobbling together your deployment pipeline, then it's time to try out GoCD, the open source continuous delivery platform built by the people at Thoughtworks who wrote the book about it. With GoCD, you get complete visibility into the life cycle of your software from 1 location. To download it now, go to podcastinit.com/gocd.

Professional support and enterprise plug ins are available for added

peace of mind. You can visit the site at podcastinnit.com

to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions, I would love to hear them. You can reach me on Twitter at podcastinit

or email me at host@podcastinit.com.

To help other people find the show, please leave a review on Itunes or Google Play Music. Tell your friends and coworkers and share it on social media. Your host as usual is Tobias Macy. And today, I'm interviewing Nicolas Oog about Surprise, a scikit library for building recommender systems. So Nicolas, could you please introduce yourself? Hi. Yes. Thank you for having me. So my name is Nicolas. I just finished my PhD a few months ago in machine learning,

and I'm currently looking for a job, and I'm the main

developer of,

Surprise.

And do you remember how you first got introduced to Python?

Yeah, I do.

It was a few years ago during a research internship

where I was supposed to build a classifier for musical instrument sounds.

And,

basically, my supervisor

gave me a choice between MATLAB and Python. So obviously, I went for Python

because it it felt a lot more appealing because it's

it's a general purpose language

and it's also open source. So I was very much more

attracted to Python than I was to MATLAB.

And I immediately got hooked because

before

discovering Python,

I was doing everything I needed to do. I was doing it in c.

So it was a very big, it made a very big difference to discover

Python. I still love C, obviously, but I do it for other other projects.

And as I mentioned at the beginning, I invited you here to talk about your work on surprise. So just wondering if you can,

give us an idea of what Surprise is and what was your motivation for creating it.

Sure.

So Surprise

is a library that will help you build and evaluate the performance of a recommendation algorithm.

And more precisely, it's targeted towards collaborative filtering algorithm, which rely on explicit ratings.

So

maybe this is worth a few explanation.

So basically, a recommendation

system is a is a system that will try to predict the preference of a user for a given item. So the item could be a book, a movie, a car, or anything. And the biggest part of the recommendation system is the prediction algorithm, which will output the actual prediction.

And we usually distinguish between 2 kinds of prediction algorithms, depending on the information that they will leverage. So we have the content based algorithms, which rely on metadata about the items, and the collaborative filtering algorithms,

which rely on collaborative

information about the users

and surprises targeted toward these kinds of algorithms. So

the collaborative information usually takes the form of ratings.

Like, for example,

Alice has rated Titanic with a rating of 4. And Bob has rating has rated Toy Story with a rating of 2, etcetera. And the goal of the system is to predict other ratings for

other users and other items so that it can ultimately produce recommendation.

So maybe it will recommend

Toy Story to Alice if, the prediction for Alice is high, etcetera, etcetera. And surprise is a library that helps you do that with various tools. And the motivation behind

that I had for creating Surprise was because during my PhD, I needed to investigate some new recommendation algorithm IDs.

And I needed to so I implemented these algorithms. And I needed to compare their performances with other state of the art algorithms. And at that time,

there was there was no library that really suited my needs because I've I've thought that most of the implementation details were hidden in the code, and

they were not explicitly stated in the in the documentation. So rather than just read hundreds of lines of code to try to understand what they were doing, I decided to implement them myself. And it started like that.

And a couple of things that came out from that is,

1, it sounds like surprise is more focused on the development and evaluation

of the recommender algorithms

than it is necessarily,

deploying and running them in a project.

Yes. Exactly. And the other question

is you

touched on the idea of how the algorithm performed. So I'm just wondering if you can go a bit deeper on what you mean by performance in the context of a recommendation algorithm.

You're making prediction, but you don't know if they are correct or not because,

obviously, these are prediction and the real value that you're trying to predict here, you don't know it.

So,

what people usually do is called AB testing. So to take the example

their

users

and the other half of the user to the new version. And they'll see which 1 of the 2 version basically gives them the most money. So

if the recoup if the new recommend there

yields more money, then they'll keep it. And if not, they'll just change it. So that's a very basic way of finding of comparing 2 recommendation

systems. But I think it's the only way

to find out which 1 of the system is better or not once the algorithm is deployed in a real system. But before that, before deploying a system, you will previously test it on some

known data. So you'll have you'll train your algorithm on some data while still hiding a small part of it. And this small part, you will we call it the test set. And you make prediction

on the test set. And then you check how close the prediction are to the real actual values. And you know these actual values because these are some data that you already have. So that's the very common way of evaluating the performance of a recommendation

system. And that's

the the the scheme that we use in general in in machine learning. This is if you do it over many subsets, this is called, cross validation.

So basically, the main,

1 of the main,

metrics to evaluate the performance of the system is to just see how close the prediction are to

the actual ground serve value.

What are the most challenging aspects

of building and maintaining a recommender system?

And what are the aspects of surprise that help simplify that process?

So building a collaborative filtering involves a few steps. And you first have to gather a lot of training data.

You have to choose a prediction algorithm,

and

you have to train this algorithm on your data. And finally, you have to you have to evaluate its performance.

So for collaborative system, 1 of the characteristic that we look, as I said, is how close the predicted ratings are,

how close they are to the predicted

to the actual values.

And Surprise has various features to help simplify this process. So

it offers a simple way of

managing datasets,

and it has various built in prediction algorithm. So if you don't want to roll your own algorithm, you don't have to. But it also make the process of building custom algorithm very easy.

And

finally, it helps evaluating the performance of the algorithm. So I

try to explain what cross validation was

just before. And it has, built in ways, to perform cross validation very easily.

And

there are also other ways

to

other features that will help you tune the parameters of, of an algorithm. So

you have the choice between many different prediction algorithms. And each of these algorithms have a lot of parameters, which will obviously

have make a difference

on the accuracy of the predictions. And Surprise will give you some ways to

automatically

find the best set of parameters for each of the reasons. This is called grid search. It's a very common it's a very common task on machine learning algorithms.

And given the fact that collaborative algorithms

typically

need at least some decently sized set of data to be able to build the appropriate models from if a company or project is just starting out. What are some of the ways that they can bootstrap the recommender system while they're in the process of accruing the necessary data to make it more effective?

Yes. As as you said, 1 of the main drawbacks of collaborative filtering is that they require a lot of data because before before they can be used,

because the the prediction are based on past ratings of a user. So if the new user arrives and has no rating, then we cannot make any predictions for it. So

this is this is known as the cold start problem. It's a very common 1. And, basically, what we usually do, the most common solution is to use content based system

that I referred to before. Like, instead of using ratings to make a prediction, you will use content based feature of the item. For example, if you're

recommending books, you will have,

you will know that this book is, I don't know, written by this author. It it's a romantic book with,

these kind of characters, etcetera. So these will be metadata or content based feature about the items.

And you will try to recommend

to the user some books that he already liked. But if you have absolutely 0 knowledge about

the user, then probably the most it's very difficult to make personalized recommendation. And

the the what we usually do is just recommend the most popular items.

But in general,

you will always have a recommender system that is a hybrid system. Like, it's,

it's not just collaborative filtering system or just content based system, mix and mix between the 2. And the more data you have, the more you tend towards the collaborative filtering, approach, usually.

ANDREW

canonical way that most people

are familiar with interacting with recommendation systems

is via ecommerce sites such as Amazon,

where if you're looking at a product, it will say other people have also viewed. And I'm wondering what are some of the other contexts

that recommender systems are typically used?

Well, as soon as you have some items and some user that may

buy or interact this item, you can use a recommender system. So it could be with anything, really.

You can recommend books, movies, restaurants.

You can use a recommender system for

online dating websites, for

recommending Twitter accounts to follow. A huge application is to is to recommend music. So if you just wanna play around a recommender system, you may very well, I don't know, download your favorite Spotify playlist and run a recommender system to discover new songs or stuff like that. So it's not just Amazon like recommender system. You can you also have a lot of other,

other applications,

recommend videos on YouTube, anything really, or recommending friends on Facebook.

And in terms of a hybrid system where it's using both the content based and the collaborative filtering,

I'm just curious what the internal mechanics of something like that would look like. Is it where you run it through the 2 systems at the same time, and then you merge the results? Or would it be more of a sequential process where you use the content based filtering and then use the collaborative filtering on top of that or vice versa?

I guess it really depends on the application. I wouldn't know really what actual system would be using. But the idea of of stacking different,

different

algorithms, like making

doing the collaborative filtering

prediction and then doing the content based 1 and merging the 2 is a very common approach. So it's called ensemble learning. And it usually works very well because you have a lot of different predictors.

And,

yeah, that that that's a common approach. But, honestly, I wouldn't,

I don't know what are what is actually used in in in practice. Yeah. I I don't think there is a hard threshold between

choosing

for for choosing between a,

a content based system or

a collaborative system. Well, I think it's a lot more it can get very complicated, I think.

And once the

algorithm has been chosen and the model has been trained for a recommender system and then it's been deployed into production,

what are some of the ways that the,

owners of the system can test the accuracy of the suggestions

and feed that knowledge back into the, running system?

So I think this all boils down to AB testing that I described before. You have you'll just present a system to half of the of half of the user, and you'll present the other half to the other system and compare the 2. And the the the metric that you use for comparing the 2 usually is just how much money it gives you or other there might be other criteria, but usually, this is the 1 that is the most important

for for commercial applications.

So they they'll just compare the 2 in an actual setting, like, with real users and then just

choose 1 over the other. That's all. And digging into the internals of the surprise library,

could you talk a bit about how it's implemented and how that implementation has evolved since you first started working on it? Sure. So the code base at the beginning is very messy because it was not intended to be an open source package at the beginning. It wasn't even supposed to be a package. And it was just a bunch of hacky script that I made for myself to connect my own experiment during my PhD. But at some point, I started to have a a good code base with enough features. And I thought that it would be a good idea to transform it into a proper Python,

a proper Python package,

even even just for the sake of teaching myself how to make a how to make a

a a a Python package.

And I think it was last year that I thought that they could be actually useful for other people as well. So I decided to make a a clean and user friendly package with a lot of documentation.

I put it in PyPI

and and created a website and started advertising it, etcetera.

So it it was a very incremental

process. I didn't start thinking I'm gonna make a recommendation system package. It's really it's really happened

almost by accident. And I really used it. I really used surprise as a way of teaching

myself Python and good programming practices. So there still might be some pitfalls in the in in the code and because I'm still learning. You know? But yeah. And as for the implementation, I

don't use NumPy, because NumPy is 1 of the

oh, well, I might use it. But the the data structure of of surprise is quite simple. The ratings are stored

in a dictionary. So it's like a sparse matrix.

Instead of using

the actual

Cypoy sparse metrics, I use the I just use a Python dictionary. And that's a lot. I don't the data structure is not really complicated.

Yeah. I can definitely appreciate wanting to just use the dictionary data structure rather than pulling in all of sci fi for being able to use 1 of its attributes because 1 of the big challenges

in software engineering

is determining

when is the right time to pull in a particular dependency because as soon as you do that, then you become responsible

Yeah. So Yeah. Absolutely. If all you needed was the sparse matrix capability, then just implementing it on your own rather than particularly with scipy pulling in all that additional code seems like a smart move.

Yeah. I think I use Python anyway for some of the of the statistical computation. So

scipy is

actually 1 of the dependency even if it's not explicit. But, yeah, I

try to

use

as few

other packages as possible and keep the code base clean and easy to read, like, instead of instead of using a library

for

for only a small a small feature, I prefer to just cut it myself, even if it's, like, just the same code. I think it's better not to

have too many dependencies

so not to obfuscate the code.

That also makes it an easier decision for somebody who's considering using your library because they don't then have to decide whether or not they also want to bring in all the other transitive dependencies

by virtue of using surprise.

And what have been the most difficult aspects

of building and maintaining surprise?

So

I I think the most difficult 1 is building

a good API. Like, it's very hard to find a good balance between

the what the users will need and the code complexity.

So I try to keep the code as simple as possible. But right now, Surprised is supporting some features that are probably

only useful for, like, less than 1% of the of the cases. But they still make the codes

a lot more complex than it could be. So

for some of these

rare edge cases, I think it was probably not a good idea to to support them. But, yeah, building a simple but powerful API is very difficult. And it was the first API that I ever made. So

I don't I if I had to do it again, I think I will change some some more or some things. Yeah, definitely.

Because you still need need the API to be as generic as possible while still being simple. And

for recommendation system, you can have a lot of different kind of data,

a lot of different kind of of cases. And and it's difficult

to to merge them into a simple and and generic,

API. Yeah. So that was the that was a bit difficult. But it's still subject to change, of course, in in later versions

of of surprise. And about maintaining some pride, I think the most

frustrating thing is the

what what annoys me the most is the some that some user will post issues on the build tracker without even without

without even trying to solve their problems on their own. You know? Before before asking for help, they will just do nothing. And

and this is really annoying me because until now, I always try to help those user even though

they were I thought they were rude, but I'm honestly starting to lose patience. And I I know I know I shouldn't be bothered. It's a it's a problem for every open source software, and I'm learning to deal with it. But, yeah, I think it's the it's 1 of the annoying part. But most of your most of the users are really nice and very helpful. And they bring a lot of useful feedback. So that's that's not the majority of the user. But, yeah.

And 1 of the other things that I'm curious about

is

what do you view as the main competitors to surprise in the Python ecosystem in particular

and maybe outside of it as well?

So the competitors,

I know that there is

a guy that made,

Light. Fm,

and he recently released, how is it called, Spotlight, which is a a recommendation library

for using deep learning methods, which is really great. I haven't I didn't use it, but,

it it works really well.

I don't know really. I think we all

have different purposes. Like, these 2 library, Light FM and Spotlight,

are mostly targeted towards,

implicit ratings. So,

and and and price is targeted toward

explicit ratings.

So to make the difference, explicit ratings are ratings like 1, 2, 3, 4, 5, etcetera.

And implicit ratings are ratings that are not explicitly given

by the users and that's rather inferred from their behavior. Like, for example, if a user

has

made a click on an ad, this would be an implicit ratings toward,

a positive implicit rating toward the item featured in the ad, etcetera. And the techniques

that are used for implicit and explicit ratings are very different from each other. So

the 2 the 2 libraries or the 3 libraries

really competing that we are not targeting the same the the same

use cases,

I think.

And going back to the idea of performance

and

being able to tune the system, I'm curious, what are the attributes

that can be modified to improve the relevance

and pertinence of the recommendations that are provided to the end user?

So it truly depends on the algorithm that you will use because each algorithm has its own parameters. So obviously, the most important way of improving the relevance of the recommendation is the is it's the algorithm itself. But after you have set the algorithm, then there are various parameters

that depend on the algorithm. So

most of the time, those parameters would be set automatically during the training stage. Yeah. I'm not sure I'm really answering your question.

No. That's totally fine. I I recognize that it's a, difficult and very nuanced,

situation. So,

going back to the idea of picking which algorithm to use,

is it something that would be feasible for somebody who has deployed a system to determine the algorithm they were using isn't giving the best results and to actually change the algorithm in a running system?

Well, if you change your algorithm, you will you would have to retrain it

on on the whole data. I mean, you

you cannot do some kind of transfer

from an algorithm to another. I don't think that's possible.

And for somebody who wants to start using surprise in their application to add recommendation capabilities,

what are the steps involved in that process?

So so surprises,

BSD

license. So anybody can use it even for commercial use.

And the package is on PyPI, so you can install it with PIP as as any other package. And

there there's a command line interface

if you want to use Surprised directly. But obviously, the main feature of Surprised is the API.

So you can, yeah, just use the API. There are tons of documentation on the on the website. I tried to make it as clear and simple as possible. So there is a getting started guide, etcetera, etcetera. So, yeah, I think the the documentation

is is fairly clear to get started.

And so if they're using it, for instance, in the context of a Django or Flask application,

would the data that Surprise is using for being able to base its suggestions off of be able to just get pulled from a database? Or is there any particular,

method that would be necessary for being able to hook surprise up to the necessary data sources?

So the there are 2 main ways of,

feeding data to Surprised right now. The first

the the the first way is just

with files,

like, rating files. And

so each line of the file will be a rating. Like, this user has rated this item with this rating.

And users' item are identified

by their IDs. And the other way is to use a pandas state data frame. So you can do the exact same thing with a with a pandas data frame with where each column would be the user column, the item column, and the rating column. So, yeah, if you don't want

to

use files, you'll have to

get Pandas as a dependency.

And so for a running system that has all of their product data stored in a database, it sounds like, basically,

the user would just need to

be able to pull that data into a data frame so that it can then be processed through Surprised?

Yes. Exactly.

I I haven't made any compatibility with SQL databases or anything like that. Okay. And what are some of the new features or improvements that you have planned for the future of Surprise?

So the biggest tip I like to work on is to support implicit ratings.

So like I said, surprise only

support explicit rating for now.

But that would require

a lot of a lot of rec a lot of refactoring. And

it would probably be breaking the API. So that would be for,

a a next version of a next version of surprise. And I'm I'm I still can wrap my head around, like, as I said, explicit and in implicit ratings are very different settings. And I'm still trying to figuring out a generic way of handling both kinds of of ratings and algorithm. And it's kind of it's kind of difficult for now, at least for me. And but I haven't worked in a lot of on it. Another feature that I'd like to implement is to

support some sort of

incremental learning. Like, right now, when you're surprised, you train an algorithm on a given data set. And you would put the ratings based on this data set. But when new data arrives, there is currently no way to merge the new ratings into the algorithm. So you have to retrain the algorithm on the whole on

the old ratings and the new ratings at the same time, which can be bad if you have a huge dataset. So I'd like to implement some way of adding new data to an algorithm

without having to retrain on the whole dataset.

That would be possible for some algorithm, but but not for others. And,

I like to make the API closer to that of scikit learn. So scikit learn is, is 1 of the most popular libraries for machine learning, but it doesn't do a recommendation system. And a lot of scikit learn tools are replicated in Surprised, like automatic cross validation or grid search, etcetera, etcetera. But the API is still a bit different.

So the idea would be to make it as close as possible to the other scikit learn so it could be easier for new users to to adapt. And,

I've been I've been wanting to do all that for for a while, but

I never found the time. So if anyone is interested in participating in the project, then contribution are are more than welcome.

And are there any other topics or aspects of surprise that you think we should talk about before we start to close out the show? No. I don't think I,

I can't think of any of them right now. I think we have, made a good,

good overview of of what it does. Yeah. And it definitely sounds like if somebody is interested in working with Surprise and contributing to its overall community, 1 of the ways that would be useful would be to provide some ready built integrations with things like Django and other web applications so that it's easy for a new user to, have a drop in addition to their system for, getting it up and running in their application? Right. To be honest, I have absolutely 0 experience

with either a Django or Flask. So, yeah, if anybody

has experience on it, then it will be more than welcome to

to to discuss it.

For anybody who wants to follow the work that you're doing and keep up to date, then I'll have you add your preferred contact information to the show notes. Sure. And with that, I'll move us into the picks. And my pick this week is a library called Silk, which is a

Django application

to make it easy to profile your Django app. Because I was just recently

trying to just run the the c profiler from the standard library

and it didn't integrate well with the run server command from the djangomanage.py

and silk was an easy drop in replacement for that capability and it was very simple to get it up and running. So if anybody's,

trying to profile their app, it definitely seems like a very well put together library. And with that, I'll pass it to you. Do you have any picks for us this week, Nicolas?

Sorry. I couldn't think of any.

If you want, I can send you,

after,

but I I didn't think of anything.

Sure. Yeah. If you think of anything that you want me to add to the show notes for a pick for you, then I can do that after the fact. And so I just wanna say I appreciate you taking the time out of your day to talk to me about the work you've been doing with Surprise.

Definitely seems like a very interesting and useful library, and I hope some other people can find some utility from it as well. So thank you for your time, but I hope you enjoy the rest of your day. Great. Thank you. Thank you so much for having me. Have a good day. Thanks. You as well.

The Python Podcast.init

Summary

Preface

Interview

Keep In Touch

Picks

Links

The Python Podcast.__init__