Summary
A relevant and timely recommendation can be a pleasant surprise that will delight your users. Unfortunately it can be difficult to build a system that will produce useful suggestions, which is why this week’s guest, Nicolas Hug, built a library to help with developing and testing collaborative recommendation algorithms. He explains how he took the code he wrote for his PhD thesis and cleaned it up to release as an open source library and his plans for future development on it.
Preface
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- I would like to thank everyone who supports us on Patreon. Your contributions help to make the show sustainable.
- When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at podastinit.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your awesome app. And now you can deliver your work to your users even faster with the newly upgraded 200 GBit network in all of their datacenters.
- If you’re tired of cobbling together your deployment pipeline then it’s time to try out GoCD, the open source continuous delivery platform built by the people at ThoughtWorks who wrote the book about it. With GoCD you get complete visibility into the life-cycle of your software from one location. To download it now go to podcatinit.com/gocd. Professional support and enterprise plugins are available for added piece of mind.
- Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email hosts@podcastinit.com)
- To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.
- Your host as usual is Tobias Macey and today I’m interviewing Nicolas Hug about Surprise, a scikit library for building recommender systems
Interview
- Introductions
- How did you get introduced to Python?
- What is Surprise and what was your motivation for creating it?
- What are the most challenging aspects of building a recommender system and how does Surprise help simplify that process?
- What are some of the ways that a user or company can bootstrap a recommender system while they accrue data to use a collaborative algorithm?
- What are some of the ways that a recommender system can be used, outside of the typical ecommerce example?
- Once an algorithm has been deployed how can a user test the accuracy of the suggestions?
- How is Surprise implemented and how has it evolved since you first started working on it?
- What have been the most difficult aspects of building and maintaining Surprise?
- competitors?
- What are the attributes of the system that can be modified to improve the relevance of the recommendations that are provided?
- For someone who wants to use Surprise in their application, what are the steps involved?
- What are some of the new features or improvements that you have planned for the future of Surprise?
Keep In Touch
- Website
- @hug_nicolas on Twitter
- nicolashug on GitHub
Picks
- Tobias
- Silk profiler for Django
Links
- Surprise
- Gridsearch
- Cold Start Problem
- Content-Based Recommendation
- Ensemble Learning
- Spotlight
- Lightfm
- Pandas
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to podcast dot in it, the podcast about Python and the people who make it great. I would like to thank everyone who supports the show on Patreon. Your contributions help to make the show sustainable. When you're ready to launch your next project, you'll need somewhere to deploy it, so you should check out linode@podcast init.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your app. And now you can deliver your work to your users even faster with the newly upgraded 200 gigabit network in all of their data centers. If you're tired of cobbling together your deployment pipeline, then it's time to try out GoCD, the open source continuous delivery platform built by the people at Thoughtworks who wrote the book about it. With GoCD, you get complete visibility into the life cycle of your software from 1 location. To download it now, go to podcastinit.com/gocd. Professional support and enterprise plug ins are available for added peace of mind. You can visit the site at podcastinnit.com to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions, I would love to hear them. You can reach me on Twitter at podcastinit or email me at host@podcastinit.com.
To help other people find the show, please leave a review on Itunes or Google Play Music. Tell your friends and coworkers and share it on social media. Your host as usual is Tobias Macy. And today, I'm interviewing Nicolas Oog about Surprise, a scikit library for building recommender systems. So Nicolas, could you please introduce yourself? Hi. Yes. Thank you for having me. So my name is Nicolas. I just finished my PhD a few months ago in machine learning,
[00:01:40] Unknown:
and I'm currently looking for a job, and I'm the main developer of, Surprise.
[00:01:46] Unknown:
And do you remember how you first got introduced to Python?
[00:01:49] Unknown:
Yeah, I do. It was a few years ago during a research internship where I was supposed to build a classifier for musical instrument sounds. And, basically, my supervisor gave me a choice between MATLAB and Python. So obviously, I went for Python because it it felt a lot more appealing because it's it's a general purpose language and it's also open source. So I was very much more attracted to Python than I was to MATLAB. And I immediately got hooked because before discovering Python, I was doing everything I needed to do. I was doing it in c. So it was a very big, it made a very big difference to discover Python. I still love C, obviously, but I do it for other other projects.
[00:02:43] Unknown:
And as I mentioned at the beginning, I invited you here to talk about your work on surprise. So just wondering if you can, give us an idea of what Surprise is and what was your motivation for creating it.
[00:02:56] Unknown:
Sure. So Surprise is a library that will help you build and evaluate the performance of a recommendation algorithm. And more precisely, it's targeted towards collaborative filtering algorithm, which rely on explicit ratings. So maybe this is worth a few explanation. So basically, a recommendation system is a is a system that will try to predict the preference of a user for a given item. So the item could be a book, a movie, a car, or anything. And the biggest part of the recommendation system is the prediction algorithm, which will output the actual prediction. And we usually distinguish between 2 kinds of prediction algorithms, depending on the information that they will leverage. So we have the content based algorithms, which rely on metadata about the items, and the collaborative filtering algorithms, which rely on collaborative information about the users and surprises targeted toward these kinds of algorithms. So the collaborative information usually takes the form of ratings.
Like, for example, Alice has rated Titanic with a rating of 4. And Bob has rating has rated Toy Story with a rating of 2, etcetera. And the goal of the system is to predict other ratings for other users and other items so that it can ultimately produce recommendation. So maybe it will recommend Toy Story to Alice if, the prediction for Alice is high, etcetera, etcetera. And surprise is a library that helps you do that with various tools. And the motivation behind that I had for creating Surprise was because during my PhD, I needed to investigate some new recommendation algorithm IDs. And I needed to so I implemented these algorithms. And I needed to compare their performances with other state of the art algorithms. And at that time, there was there was no library that really suited my needs because I've I've thought that most of the implementation details were hidden in the code, and they were not explicitly stated in the in the documentation. So rather than just read hundreds of lines of code to try to understand what they were doing, I decided to implement them myself. And it started like that.
[00:05:10] Unknown:
And a couple of things that came out from that is, 1, it sounds like surprise is more focused on the development and evaluation of the recommender algorithms than it is necessarily, deploying and running them in a project. Yes. Exactly. And the other question is you touched on the idea of how the algorithm performed. So I'm just wondering if you can go a bit deeper on what you mean by performance in the context of a recommendation algorithm.
[00:05:46] Unknown:
You're making prediction, but you don't know if they are correct or not because, obviously, these are prediction and the real value that you're trying to predict here, you don't know it. So, what people usually do is called AB testing. So to take the example their users and the other half of the user to the new version. And they'll see which 1 of the 2 version basically gives them the most money. So if the recoup if the new recommend there yields more money, then they'll keep it. And if not, they'll just change it. So that's a very basic way of finding of comparing 2 recommendation systems. But I think it's the only way to find out which 1 of the system is better or not once the algorithm is deployed in a real system. But before that, before deploying a system, you will previously test it on some known data. So you'll have you'll train your algorithm on some data while still hiding a small part of it. And this small part, you will we call it the test set. And you make prediction on the test set. And then you check how close the prediction are to the real actual values. And you know these actual values because these are some data that you already have. So that's the very common way of evaluating the performance of a recommendation system. And that's the the the scheme that we use in general in in machine learning. This is if you do it over many subsets, this is called, cross validation.
So basically, the main, 1 of the main, metrics to evaluate the performance of the system is to just see how close the prediction are to the actual ground serve value.
[00:07:32] Unknown:
What are the most challenging aspects of building and maintaining a recommender system? And what are the aspects of surprise that help simplify that process?
[00:07:43] Unknown:
So building a collaborative filtering involves a few steps. And you first have to gather a lot of training data. You have to choose a prediction algorithm, and you have to train this algorithm on your data. And finally, you have to you have to evaluate its performance. So for collaborative system, 1 of the characteristic that we look, as I said, is how close the predicted ratings are, how close they are to the predicted to the actual values. And Surprise has various features to help simplify this process. So it offers a simple way of managing datasets, and it has various built in prediction algorithm. So if you don't want to roll your own algorithm, you don't have to. But it also make the process of building custom algorithm very easy.
And finally, it helps evaluating the performance of the algorithm. So I try to explain what cross validation was just before. And it has, built in ways, to perform cross validation very easily. And there are also other ways to other features that will help you tune the parameters of, of an algorithm. So you have the choice between many different prediction algorithms. And each of these algorithms have a lot of parameters, which will obviously have make a difference on the accuracy of the predictions. And Surprise will give you some ways to automatically find the best set of parameters for each of the reasons. This is called grid search. It's a very common it's a very common task on machine learning algorithms.
[00:09:25] Unknown:
And given the fact that collaborative algorithms typically need at least some decently sized set of data to be able to build the appropriate models from if a company or project is just starting out. What are some of the ways that they can bootstrap the recommender system while they're in the process of accruing the necessary data to make it more effective?
[00:09:46] Unknown:
Yes. As as you said, 1 of the main drawbacks of collaborative filtering is that they require a lot of data because before before they can be used, because the the prediction are based on past ratings of a user. So if the new user arrives and has no rating, then we cannot make any predictions for it. So this is this is known as the cold start problem. It's a very common 1. And, basically, what we usually do, the most common solution is to use content based system that I referred to before. Like, instead of using ratings to make a prediction, you will use content based feature of the item. For example, if you're recommending books, you will have, you will know that this book is, I don't know, written by this author. It it's a romantic book with, these kind of characters, etcetera. So these will be metadata or content based feature about the items.
And you will try to recommend to the user some books that he already liked. But if you have absolutely 0 knowledge about the user, then probably the most it's very difficult to make personalized recommendation. And the the what we usually do is just recommend the most popular items. But in general, you will always have a recommender system that is a hybrid system. Like, it's, it's not just collaborative filtering system or just content based system, mix and mix between the 2. And the more data you have, the more you tend towards the collaborative filtering, approach, usually. ANDREW
[00:11:21] Unknown:
canonical way that most people are familiar with interacting with recommendation systems is via ecommerce sites such as Amazon, where if you're looking at a product, it will say other people have also viewed. And I'm wondering what are some of the other contexts that recommender systems are typically used?
[00:11:39] Unknown:
Well, as soon as you have some items and some user that may buy or interact this item, you can use a recommender system. So it could be with anything, really. You can recommend books, movies, restaurants. You can use a recommender system for online dating websites, for recommending Twitter accounts to follow. A huge application is to is to recommend music. So if you just wanna play around a recommender system, you may very well, I don't know, download your favorite Spotify playlist and run a recommender system to discover new songs or stuff like that. So it's not just Amazon like recommender system. You can you also have a lot of other, other applications, recommend videos on YouTube, anything really, or recommending friends on Facebook.
[00:12:28] Unknown:
And in terms of a hybrid system where it's using both the content based and the collaborative filtering, I'm just curious what the internal mechanics of something like that would look like. Is it where you run it through the 2 systems at the same time, and then you merge the results? Or would it be more of a sequential process where you use the content based filtering and then use the collaborative filtering on top of that or vice versa?
[00:12:54] Unknown:
I guess it really depends on the application. I wouldn't know really what actual system would be using. But the idea of of stacking different, different algorithms, like making doing the collaborative filtering prediction and then doing the content based 1 and merging the 2 is a very common approach. So it's called ensemble learning. And it usually works very well because you have a lot of different predictors. And, yeah, that that that's a common approach. But, honestly, I wouldn't, I don't know what are what is actually used in in in practice. Yeah. I I don't think there is a hard threshold between choosing for for choosing between a, a content based system or a collaborative system. Well, I think it's a lot more it can get very complicated, I think.
[00:13:41] Unknown:
And once the algorithm has been chosen and the model has been trained for a recommender system and then it's been deployed into production, what are some of the ways that the, owners of the system can test the accuracy of the suggestions and feed that knowledge back into the, running system?
[00:14:02] Unknown:
So I think this all boils down to AB testing that I described before. You have you'll just present a system to half of the of half of the user, and you'll present the other half to the other system and compare the 2. And the the the metric that you use for comparing the 2 usually is just how much money it gives you or other there might be other criteria, but usually, this is the 1 that is the most important for for commercial applications. So they they'll just compare the 2 in an actual setting, like, with real users and then just
[00:14:36] Unknown:
choose 1 over the other. That's all. And digging into the internals of the surprise library,
[00:14:44] Unknown:
could you talk a bit about how it's implemented and how that implementation has evolved since you first started working on it? Sure. So the code base at the beginning is very messy because it was not intended to be an open source package at the beginning. It wasn't even supposed to be a package. And it was just a bunch of hacky script that I made for myself to connect my own experiment during my PhD. But at some point, I started to have a a good code base with enough features. And I thought that it would be a good idea to transform it into a proper Python, a proper Python package, even even just for the sake of teaching myself how to make a how to make a a a a Python package.
And I think it was last year that I thought that they could be actually useful for other people as well. So I decided to make a a clean and user friendly package with a lot of documentation. I put it in PyPI and and created a website and started advertising it, etcetera. So it it was a very incremental process. I didn't start thinking I'm gonna make a recommendation system package. It's really it's really happened almost by accident. And I really used it. I really used surprise as a way of teaching myself Python and good programming practices. So there still might be some pitfalls in the in in the code and because I'm still learning. You know? But yeah. And as for the implementation, I don't use NumPy, because NumPy is 1 of the oh, well, I might use it. But the the data structure of of surprise is quite simple. The ratings are stored in a dictionary. So it's like a sparse matrix.
Instead of using the actual Cypoy sparse metrics, I use the I just use a Python dictionary. And that's a lot. I don't the data structure is not really complicated.
[00:16:32] Unknown:
Yeah. I can definitely appreciate wanting to just use the dictionary data structure rather than pulling in all of sci fi for being able to use 1 of its attributes because 1 of the big challenges in software engineering is determining when is the right time to pull in a particular dependency because as soon as you do that, then you become responsible Yeah. So Yeah. Absolutely. If all you needed was the sparse matrix capability, then just implementing it on your own rather than particularly with scipy pulling in all that additional code seems like a smart move.
[00:17:13] Unknown:
Yeah. I think I use Python anyway for some of the of the statistical computation. So scipy is actually 1 of the dependency even if it's not explicit. But, yeah, I try to use as few other packages as possible and keep the code base clean and easy to read, like, instead of instead of using a library for for only a small a small feature, I prefer to just cut it myself, even if it's, like, just the same code. I think it's better not to have too many dependencies so not to obfuscate the code.
[00:17:51] Unknown:
That also makes it an easier decision for somebody who's considering using your library because they don't then have to decide whether or not they also want to bring in all the other transitive dependencies by virtue of using surprise. And what have been the most difficult aspects of building and maintaining surprise?
[00:18:08] Unknown:
So I I think the most difficult 1 is building a good API. Like, it's very hard to find a good balance between the what the users will need and the code complexity. So I try to keep the code as simple as possible. But right now, Surprised is supporting some features that are probably only useful for, like, less than 1% of the of the cases. But they still make the codes a lot more complex than it could be. So for some of these rare edge cases, I think it was probably not a good idea to to support them. But, yeah, building a simple but powerful API is very difficult. And it was the first API that I ever made. So I don't I if I had to do it again, I think I will change some some more or some things. Yeah, definitely.
Because you still need need the API to be as generic as possible while still being simple. And for recommendation system, you can have a lot of different kind of data, a lot of different kind of of cases. And and it's difficult to to merge them into a simple and and generic, API. Yeah. So that was the that was a bit difficult. But it's still subject to change, of course, in in later versions of of surprise. And about maintaining some pride, I think the most frustrating thing is the what what annoys me the most is the some that some user will post issues on the build tracker without even without without even trying to solve their problems on their own. You know? Before before asking for help, they will just do nothing. And and this is really annoying me because until now, I always try to help those user even though they were I thought they were rude, but I'm honestly starting to lose patience. And I I know I know I shouldn't be bothered. It's a it's a problem for every open source software, and I'm learning to deal with it. But, yeah, I think it's the it's 1 of the annoying part. But most of your most of the users are really nice and very helpful. And they bring a lot of useful feedback. So that's that's not the majority of the user. But, yeah.
[00:20:15] Unknown:
And 1 of the other things that I'm curious about is what do you view as the main competitors to surprise in the Python ecosystem in particular and maybe outside of it as well?
[00:20:28] Unknown:
So the competitors, I know that there is a guy that made, Light. Fm, and he recently released, how is it called, Spotlight, which is a a recommendation library for using deep learning methods, which is really great. I haven't I didn't use it, but, it it works really well. I don't know really. I think we all have different purposes. Like, these 2 library, Light FM and Spotlight, are mostly targeted towards, implicit ratings. So, and and and price is targeted toward explicit ratings. So to make the difference, explicit ratings are ratings like 1, 2, 3, 4, 5, etcetera. And implicit ratings are ratings that are not explicitly given by the users and that's rather inferred from their behavior. Like, for example, if a user has made a click on an ad, this would be an implicit ratings toward, a positive implicit rating toward the item featured in the ad, etcetera. And the techniques that are used for implicit and explicit ratings are very different from each other. So the 2 the 2 libraries or the 3 libraries really competing that we are not targeting the same the the same use cases, I think.
[00:21:51] Unknown:
And going back to the idea of performance and being able to tune the system, I'm curious, what are the attributes that can be modified to improve the relevance and pertinence of the recommendations that are provided to the end user?
[00:22:08] Unknown:
So it truly depends on the algorithm that you will use because each algorithm has its own parameters. So obviously, the most important way of improving the relevance of the recommendation is the is it's the algorithm itself. But after you have set the algorithm, then there are various parameters that depend on the algorithm. So most of the time, those parameters would be set automatically during the training stage. Yeah. I'm not sure I'm really answering your question.
[00:22:35] Unknown:
No. That's totally fine. I I recognize that it's a, difficult and very nuanced, situation. So, going back to the idea of picking which algorithm to use, is it something that would be feasible for somebody who has deployed a system to determine the algorithm they were using isn't giving the best results and to actually change the algorithm in a running system?
[00:22:57] Unknown:
Well, if you change your algorithm, you will you would have to retrain it on on the whole data. I mean, you you cannot do some kind of transfer from an algorithm to another. I don't think that's possible.
[00:23:12] Unknown:
And for somebody who wants to start using surprise in their application to add recommendation capabilities, what are the steps involved in that process?
[00:23:22] Unknown:
So so surprises, BSD license. So anybody can use it even for commercial use. And the package is on PyPI, so you can install it with PIP as as any other package. And there there's a command line interface if you want to use Surprised directly. But obviously, the main feature of Surprised is the API. So you can, yeah, just use the API. There are tons of documentation on the on the website. I tried to make it as clear and simple as possible. So there is a getting started guide, etcetera, etcetera. So, yeah, I think the the documentation is is fairly clear to get started.
[00:24:01] Unknown:
And so if they're using it, for instance, in the context of a Django or Flask application, would the data that Surprise is using for being able to base its suggestions off of be able to just get pulled from a database? Or is there any particular, method that would be necessary for being able to hook surprise up to the necessary data sources?
[00:24:22] Unknown:
So the there are 2 main ways of, feeding data to Surprised right now. The first the the the first way is just with files, like, rating files. And so each line of the file will be a rating. Like, this user has rated this item with this rating. And users' item are identified by their IDs. And the other way is to use a pandas state data frame. So you can do the exact same thing with a with a pandas data frame with where each column would be the user column, the item column, and the rating column. So, yeah, if you don't want to use files, you'll have to get Pandas as a dependency.
[00:25:06] Unknown:
And so for a running system that has all of their product data stored in a database, it sounds like, basically, the user would just need to be able to pull that data into a data frame so that it can then be processed through Surprised?
[00:25:22] Unknown:
Yes. Exactly.
[00:25:23] Unknown:
I I haven't made any compatibility with SQL databases or anything like that. Okay. And what are some of the new features or improvements that you have planned for the future of Surprise?
[00:25:35] Unknown:
So the biggest tip I like to work on is to support implicit ratings. So like I said, surprise only support explicit rating for now. But that would require a lot of a lot of rec a lot of refactoring. And it would probably be breaking the API. So that would be for, a a next version of a next version of surprise. And I'm I'm I still can wrap my head around, like, as I said, explicit and in implicit ratings are very different settings. And I'm still trying to figuring out a generic way of handling both kinds of of ratings and algorithm. And it's kind of it's kind of difficult for now, at least for me. And but I haven't worked in a lot of on it. Another feature that I'd like to implement is to support some sort of incremental learning. Like, right now, when you're surprised, you train an algorithm on a given data set. And you would put the ratings based on this data set. But when new data arrives, there is currently no way to merge the new ratings into the algorithm. So you have to retrain the algorithm on the whole on the old ratings and the new ratings at the same time, which can be bad if you have a huge dataset. So I'd like to implement some way of adding new data to an algorithm without having to retrain on the whole dataset.
That would be possible for some algorithm, but but not for others. And, I like to make the API closer to that of scikit learn. So scikit learn is, is 1 of the most popular libraries for machine learning, but it doesn't do a recommendation system. And a lot of scikit learn tools are replicated in Surprised, like automatic cross validation or grid search, etcetera, etcetera. But the API is still a bit different. So the idea would be to make it as close as possible to the other scikit learn so it could be easier for new users to to adapt. And, I've been I've been wanting to do all that for for a while, but I never found the time. So if anyone is interested in participating in the project, then contribution are are more than welcome.
[00:27:40] Unknown:
And are there any other topics or aspects of surprise that you think we should talk about before we start to close out the show? No. I don't think I,
[00:27:49] Unknown:
I can't think of any of them right now. I think we have, made a good,
[00:27:54] Unknown:
good overview of of what it does. Yeah. And it definitely sounds like if somebody is interested in working with Surprise and contributing to its overall community, 1 of the ways that would be useful would be to provide some ready built integrations with things like Django and other web applications so that it's easy for a new user to, have a drop in addition to their system for, getting it up and running in their application? Right. To be honest, I have absolutely 0 experience
[00:28:22] Unknown:
with either a Django or Flask. So, yeah, if anybody has experience on it, then it will be more than welcome to to to discuss it.
[00:28:31] Unknown:
For anybody who wants to follow the work that you're doing and keep up to date, then I'll have you add your preferred contact information to the show notes. Sure. And with that, I'll move us into the picks. And my pick this week is a library called Silk, which is a Django application to make it easy to profile your Django app. Because I was just recently trying to just run the the c profiler from the standard library and it didn't integrate well with the run server command from the djangomanage.py and silk was an easy drop in replacement for that capability and it was very simple to get it up and running. So if anybody's, trying to profile their app, it definitely seems like a very well put together library. And with that, I'll pass it to you. Do you have any picks for us this week, Nicolas?
[00:29:19] Unknown:
Sorry. I couldn't think of any. If you want, I can send you, after, but I I didn't think of anything.
[00:29:28] Unknown:
Sure. Yeah. If you think of anything that you want me to add to the show notes for a pick for you, then I can do that after the fact. And so I just wanna say I appreciate you taking the time out of your day to talk to me about the work you've been doing with Surprise. Definitely seems like a very interesting and useful library, and I hope some other people can find some utility from it as well. So thank you for your time, but I hope you enjoy the rest of your day. Great. Thank you. Thank you so much for having me. Have a good day. Thanks. You as well.
Introduction to Nicolas Oog and Surprise Library
Nicolas' Journey with Python
Overview of Surprise Library
Evaluating Recommender System Performance
Challenges in Building Recommender Systems
Applications of Recommender Systems
Internals and Evolution of Surprise Library
Building a Good API and User Challenges
Competitors and Algorithm Selection
Using Surprise in Applications
Future Features and Improvements
Community Contributions and Closing Remarks