Summary
Reinforcement learning is a branch of machine learning and AI that has a lot of promise for applications that need to evolve with changes to their inputs. To support the research happening in the field, including applications for robotics, Carlo D’Eramo and Davide Tateo created MushroomRL. In this episode they share how they have designed the project to be easy to work with, so that students can use it in their study, as well as extensible so that it can be used by businesses and industry professionals. They also discuss the strengths of reinforcement learning, how to design problems that can leverage its capabilities, and how to get started with MushroomRL for your own work.
Announcements
- Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- Your host as usual is Tobias Macey and today I’m interviewing Davide Tateo and Carlo D’Eramo about MushroomRL, a library for building reinforcement learning experiments
Interview
- Introductions
- How did you get introduced to Python?
- Can you start by describing what reinforcement learning is and how it differs from other approaches for machine learning?
- What are some example use cases where reinforcement learning might be necessary?
- Can you describe what MushroomRL is and the story behind it?
- Who are the target users of the project?
- What are its main goals?
- What are your suggestions to other developers for implementing a succesful library?
- What are some of the core concepts that researchers and/or engineers need to understand to be able to effectively use reinforcement learning techniques?
- Can you describe how MushroomRL is architected?
- How have the goals and design of the project changed or evolved since you began working on it?
- What is the workflow for building and executing an experiment with MushroomRL?
- How do you track the states and outcomes of experiments?
- What are some of the considerations involved in designing an environment and reward functions for an agent to interact with?
- What are some of the open questions that are being explored in reinforcement learning?
- How are you using MushroomRL in your own research?
- What are the most interesting, innovative, or unexpected ways that you have seen MushroomRL used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on MushroomRL?
- When is MushroomRL the wrong choice?
- What do you have planned for the future of MushroomRL?
- How can the open-source community contribute to MushroomRL?
- What kind of support you are willing to provide to users?
Keep In Touch
- Davide
- boris-il-forte on GitHub
- Website
- Carlo
- carloderamo on GitHub
- Website
Picks
- Tobias
- Davide
- 1984 by George Orwell
- Carlo
- Twin Peaks TV Series
Closing Announcements
- Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
- Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
Links
- MushroomRL
- TU Darmstadt
- MuJoCo
- PyBullet
- iGibson
- Habitat
- OpenAI Gym
- PyTorch
- RLLib
- Ray
- OpenAI Baselines
- Stable Baselines
- ROS
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to Podcast Thought in It, the podcast about Python and the people who make it great. When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode. With the launch of their managed Kubernetes platform, it's easy to get started with the next generation of deployment and scaling powered by the battle tested Linode platform, including simple pricing, node balancers, 40 gigabit networking, dedicated CPU and GPU instances, and worldwide data centers.
Go to python podcast.com/linode, that's l I n o d e, today and get a $100 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show. Your host as usual is Tobias Macy. And today, I'm interviewing Davide Teteo and Carlo Deramo about Mushroom RL, a library for building reinforcement learning experience. So, Davide, can you start by introducing yourself? Hi. I'm Davide. I'm a postdoctoral
[00:01:11] Unknown:
researcher at, TU Darmstadt. And I focus mainly in robotics and robot learning and also in developing a mushroom REL together with Carlo. And Carlo, how about yourself? Yeah. I'm Carlo, a postdoctoral researcher in the IIS laboratory.
[00:01:29] Unknown:
My focus is mostly on reinforcement learning with a particular focus on multi multitask and curriculum enforcement learning. And before this experience, I graduated in EcoTechnico di Milan, Information Technology. In both these experiences as a postdoctoral researcher and in my PhD, I worked with Davie, then we were actually in the same office during our PhD.
[00:01:49] Unknown:
It's always great to see when people just happen across each other and end up collaborating through a big chunk of their careers. Going back to you, Davy Day, do you remember how you first got introduced to Python? Yeah. Actually, it's kind of funny because I wasn't a fan of Python at all. When Carlos started developing Mushroom,
[00:02:06] Unknown:
he started using Python. And the reason is because Python has a very good for machine learning in general. In particular, there is scipy and umpy and a lot of basic libraries to do optimization. So that's how I got in touch with Mushroom, basically helping
[00:02:24] Unknown:
developing Mushroom. And Carlo, do you remember how you got introduced to Python? Yeah. So as David just said, I started working on Mushroom for my PhD. And at the time, there were libraries in Python for reinforcement learning. 1 example among the most many library is OpenAI Gym. So this library is for running experiments in reinforcement learning. It's developing. It's implementing some environments for experiments. And at the time, I saw that Python was a good compromise for having good performance and easy way to implement algorithms and running experiments.
So for having something flexible to use, also to be accessible for students,
[00:03:03] Unknown:
I started working with Python and implemented the whole mushroom library based on that. Before we get too far into the mushroom project itself, can you start by giving a bit of an overview about what reinforcement learning is and some of the ways that it differs from other approaches to machine learning that people might be familiar with? Sure. So the most famous machine learning approach, I will say, supervised learning.
[00:03:24] Unknown:
In supervised learning, we are actually provided the dataset. So it's a collection of data with also classification of this data. Basically, human human expert or something like this labels this data. And the purpose of supervised learning is to learn the mapping between an input and the output. And this can be applied, for example, to images. 1 expert will classify an image of an animal or a car. And our machine learning model has to learn the mapping from the image of a car to the label of the car. Reinforcement learning differs substantially from this approach because in reinforcement learning, we don't have a dataset.
So the purpose is train an agent that is moving in a unknown environment to solve this environment. Where solving means maximizing that we get from the environment. And there can be modeling whatever we are interested in. For example, in a financial application, the goal will be to maximize the profit. So the reward function will model this profit, and the agent has to learn the strategy. So the execution of actions
[00:04:32] Unknown:
that maximize the profit in a certain amount of time. In terms of some of the use cases where reinforcement learning might be necessary or particularly applicable, I'm wondering if there are any industries or problem spaces that it's uniquely well suited for or that it can potentially outperform other machine learning approaches either in terms of the sort of cost of execution or development or just its overall capability to be able to learn and adapt to the environment?
[00:05:01] Unknown:
So as I said, in supervised learning, a human expert has to intervene and do something to help the model in understanding what we are aiming to understand. In reinforcement learning, we are actually free to leave the agent, explore the environment, and learn its own strategy. So potentially, we don't need any human intervention. The applications are multiple and are basically all the time when a planning model cannot be applied. Planning is basically, I will say, a parallel approach compared to enforcement learning where we are provided the model of the environment. And planning can use this model to plan the best strategy to solve, this problem.
Whenever the model is not provided, so usually when the model is very complex or when the environment to model is very high dimensional, very unpredictable, reinforcement learning can be helpful. So reinforcement learning can actually be used in all real world problems potentially. Yeah. So basically, whenever we cannot understand, completely the environment, And this will include the application, like I said before, financial application, health care application, also, for for example, decision making on the kind of cure to execute on a patient according to the symptoms and stuff like this, or as we are studying in this laboratory in robotics.
So in robotics, the model of a robot can be really complex, and also the model of the environment where the robot is acting can be really high dimensional. So reinforcement learning will put this robot moving in the environment and learning from its action from its action and from the feedback
[00:06:45] Unknown:
that it gets from the environment. And so in terms of the Mushroom RL project, can you give a bit of an overview about what you've built there and some of the story behind how it got created and some of the overall goals that you had in mind? So, yes, during my PhD, I was observing that there were some libraries in reinforcement learning, but they were all very
[00:07:04] Unknown:
hardcoded on specific applications. So the trend was to basically provide the code for a specific experimental campaign conducted on a specific paper. And these algorithms were not flexible, were not, also reproducible most of the time. So in the supervision of students, I found myself many times reimplementing the same stuff all over and over. Of course, this was time consuming plus also prone to having bugs in the code. So my purpose also to have the student that were supervised by me was to develop a library just for me to have some kind of flexible implementation of algorithms that were always the same and reusable.
The nice thing is that it started like this, so, basically, something for me. But since I wanted to make it flexible and, let's say, also with the potential of being extended, the students started to like it. They started to add their own methods, and they also worked to improve the library. And then after years years also with the help of the Avi that joined the project, We have now a very big library that it's very nice. It's being noticed also by the whole reinforcement learning community. So this is very satisfying for us. I would say that I
[00:08:19] Unknown:
joined Carlo, and I was previously developing another library in c plus plus but unfortunately, c plus plus is not very suitable for machine learning in general. So the project was cut off, and then I joined Carlo. And I guess that my previous experience with this previous c plus plus library was kind of helpful how to structure the the code and how to make it modular and how to build a robust platform that can be used to by anyone.
[00:08:55] Unknown:
And in terms of the target users of the project, you mentioned that you initially built it for yourself to be able to simplify some of the work that you were doing, but then it got adopted by some of your students and now the broader community. And I'm wondering if you can just talk through some of the ways that the original focus of it being on yourself and then some of the ways that it has grown has influenced some of the design and structure and sort of feature direction of the project?
[00:09:21] Unknown:
Yeah. So, basically, I put myself in the position of a student wanting to implement experiments from another paper, reproduce the experiments, and also implement their own my own experiments. So when I developed Mushroom, the purpose was to make it very accessible to everyone, very flexible for me. But in the end, as I was thinking, flexible for me. But in the end, as I was thinking in terms of being just another researcher, This in the end was a good strategy also for making a library that was appreciated by other people. So I started working on that and then students, as I said, they also use the library for their own experiments.
And as I was noticing that modular structure I gave to the library was actually being appreciated and very useful to extend the library, I decided to proceed in this direction. So I was very happy to see that my initial idea was really working well for extending the library, and this gave me a strong motivation to continue working on that. Also, because I was seeing that the competition let's call it competition. So other libraries seen on the Internet were not so modular and flexible like MOSHUB was even back in 2018, And so I was quite satisfied to see that this library, even if it was just developed by me and David, was very, very general purpose for many people doing reinforcement learning.
This actually paid off because we see now even students from other universities are sending us emails to thank us about the quality of mushroom, saying that they did a lot for the work in this thesis. Yeah. We are seeing quite a nice feedback from people that we don't know. So it's satisfying for us.
[00:11:05] Unknown:
In terms of your sort of experience of building this project and then having it gain some visibility and adoption within the broader community, what are some of the lessons that you've learned and suggestions that you might make to other developers who want to be able to build a library that ends up being successful and broadly adopted?
[00:11:27] Unknown:
Sure. So to me, the idea of putting myself in the mind of a student worked well because it allowed me to develop a code that was general perk for many people. And so I will suggest to avoid the our code implementation, trying to be as modular as possible as Python also allows, and to really put a strong focus on the flexibility of the code. So I think it's definitely the main important lesson also I learned from the work I did in mushroom in these years.
[00:12:04] Unknown:
In terms of the kind of core concepts that users need to be familiar with to be able to experiment with reinforcement learning. What are some of the elements and background that you found to be particularly useful for them to be able to make effective use of MushroomRL and some of the ways that you've tried to sort of encapsulate and sort of ease the onboarding process for people who are first getting started in this area? The most important concept that you have to learn
[00:12:35] Unknown:
starting doing RL is the concept of agent environment and their interaction. So this is key to understand reinforcement learning. Also, of course, knowing what is our reward function, what is an action value function, which is the value that you assign to a state action in every point of your environment. So this is the key concept that has to be known to understand reinforcement learning and start using this library. Other concepts that are really important in general for reinforcement learning, but particularly for mushroom, I would say that it depends on the type of algorithm that you want to use.
So if you want to use modern deep learning approaches, it's worth to know what is a neural network and have a basic idea of how a neural network can be used. Instead, if you want to use, the all the state of the art approaches that in many scenarios are the ones that can actually work, at least for small projects, and can be deployed in the real system because all then quite established. So you can trust on the fact that they will give you some kind of results. These approaches are heavily based on the on features. So the concept of what is a feature, that is an abstraction that map the state of your environment to a set of values that tells you something about this state. So this is a very important concept.
The most basic aspect the most basic feature that you can imagine is to create a tiling. So, basically, let's imagine that you are in a room. And what you can do is divide this room in a set of square tiles. Okay? What you do is you convert your x and y coordinates in a set a vector that tells you in which style you are in the room. This is very important because depending on how you do these discretization and how many discretization you use, you can control how much you generalize in your reinforcement learning algorithm. Which means that in each position, how you share the information that you get in a given position with other nearby position. So this concept is really important for classical classical reinforcement learning, particularly if you want to apply it on simple toy scenarios.
Instead, if you want to solve a very complex task, then you need to go in the neural network domain for sure. And in terms of structuring the environments,
[00:15:27] Unknown:
I know that MushroomRL also has integrations with some sort of 3 d engines for being able to simulate real world environments or build sort of visual space for the agent to be able to interact with beyond just having a matrix of values that you want to explore for the more sort of simple environmental case. I'm wondering if you can just talk through some of the design approaches that you have for creating these environments that you want the agent to explore and some of the ways that you think about how to approach the overall problem construction to be able to identify the types of outcomes that you're aiming for? What we are trying to do is to use a standard interface to interact with the environment.
[00:16:10] Unknown:
And the standard interface is the classical interface that is used by classical reinforcement learning. As I talked in the beginning about this agent environment interface, so you have always a step function that tells you what to do. It gives you, as a result, the next state given the action that you want to apply in the current state and gives you also the the reward that you get in during this transition. Plus, we have the reset function that is called to get initial state of the episode. So the interface is very similar to the interface that is presented in a classical reinforcement learning algorithm.
Then to integrate some more complex 3 d environment such as MuJoCo, PyBullet, or other physics simulators. Currently, we are thinking about integrating Igibson and Avitata from Facebook. The first, I think it's I don't remember. I think it's NVIDIA or I might be wrong. So we are integrating different environments. And for this, we can we are trying to provide an interface that is as much abstract possible to the user such that it's not a problem for them to use these interfaces. When the simulator allows to build an environment that is very generic, such as 5 bullet environment.
We are trying to provide the tools to build the basic building blocks of the environment such as the reward function, how to reset the state, and how to do the simulation step, how to get the link and joint position, velocities, and this kind of information in an easy way. About designing reward function and other part of the reinforcement learning, we don't provide anything. So this is for sure something that a user must learn by himself how to do it, but we can provide some hints. I would say, for sure, what I can say is never use reward functions decaying exponentially.
The reason is that we empirically saw that this kind of reward function are very difficult to learn. So I suggest always to use quadratic or linear reward functions. And this is something that we just know empiric. We still don't know the reason why it's like this, but, I would say that we have many experience with this kind of reward function and exponentially decaying reward function, which is the most easy
[00:18:48] Unknown:
reward function that you can think about, does not work very well. And also, sparse reward functions also are very difficult to learn, but this is another topic. Digging more into the library itself, can you talk through some of the ways that it's architected and some of the sort of component pieces that you've integrated into it to be able to support different machine learning libraries and environment plug ins and just the overall architectural aspects of how you think about structuring the library to make it accessible to a wider audience.
[00:19:20] Unknown:
And most effective solution that we found was to unify with a single core module different enforcement learning algorithms. So this core module, it's a very, very flexible solution to unify line and online methods of policy and on policy methods that are different, let's say, dichotomies over reinforcement learning algorithms. In many libraries that are not, mushroom, I found that these different algorithms are actually handled with different modules, creating, of course, some kind of complexity that, for a user, makes everything a bit less accessible. In mushroom, the core, it's very simple function to unify all these methods, and it's really it was working well in the beginning where the methods were not so many.
Still now that we added a lot of algorithms for deep reinforcement learning, actor critic, other methods. Still, this very simple solution, it's very effective for keeping the simplicity of Mushroom. Tell you more details about other compatibility with libraries like Penang and stuff like this, also with PyTorch. That What we do
[00:20:31] Unknown:
is we try to provide abstraction layers to access all of this library. So we create the wrappers for every single environment. In particular, we created some wrappers for iBullet, MuJoCo, simulators. We are now extending this to many other environments. We are trying to keep the interface as clean as possible and as simple as possible such that it's easy to use. And we try to hide the complexity of interfacing with these libraries inside the classes so it's not problematic for the user, to use them and to interact with them. And sometimes we also fix some problems of these libraries.
An important key problem in reinforcement learning is how to treat the terminal states. So when a state is terminal, you need to assign to the state a value function of z by definition. And libraries such as OpenAGim, they don't distinguish between a trajectory that has been cut because of the horizon or a state that is a terminal state. And to do that, we kind of hack through the OpenAGM library such that we force the distinction between terminal and absorbing state. And this is 1 of the examples on how we handle the connection with other libraries. And the other interaction that we have is we have a very good integration with PyTorch.
We support learning with Torch inside of the library. So this simplifies a lot the use of this library to do deep learning with our agents and our learning methods?
[00:22:26] Unknown:
In all the work in mushroom, we always wanted to keep invisible to the user the compatibility issues with other libraries. So we always wanted to provide a simple interface for the user to use libraries for environments or for tensor computation like PyTorch.
[00:22:44] Unknown:
In terms of the actual workflow of setting up an experiment or designing it in the 1st place and then going through execution and testing the outcomes and maybe iterating on it to test out different algorithms to use for exploring the different environment, testing out the different reward functions. Wondering if you can share some of the overall process of how to think about all of that setup and then the workflow of actually building that with mushroom RL and then tracking the different sort of inputs and states and outcomes of the experiments to be able to identify
[00:23:17] Unknown:
which ones are successful, which ones are trending in the right direction, and things like that. The first thing that you have to understand is you either have a standard benchmark that already exists and you want to try a new algorithm to solve this standard benchmark, or you are creating a new environment to solve a different problem that is unknown. And these are 2 different work flows that have very, very different solution. So in the case that you have a standard benchmark, what you have to do is you start maybe by creating your MDP, then you design a standard learning agent provided by mushroom.
An expert reinforcement learning consists in the interaction between the environment and the agent. And in mushroom, this happens through the core. In mushroom, you can call the function core learn, and this will make the agent interact with the environment. And this also will call the learning function whenever is necessary on the dataset that you get from the interaction between agent and environment. We suggest in mushroom is to call the method evaluate after every epoch of learning such that you have a clear evaluation of the performance of the agent after a specified amount of steps or episodes on the environment.
And normally, this kind of evaluation is a single epoch. And what you do is you use the mushroom logger or you can use also your own logger to store the learning results of this epoch. And you repeat this process for many epochs. This is how you normally structure a simple reinforcement learning algorithm when you have a known benchmark. When the benchmark is not known, then this becomes much more difficult. What you have to do, first of all, is to understand which kind of learning algorithm would be suitable for your environment, you have to design a reward function that not only is good to not only represent the task that you have in mind, but a reward function such that the task that you are considering can be solved by a learning algorithm. Because sometimes, if you design the reward function, for example, to sparse or with a scale that varies a lot, This can cause issues for specific learning algorithm, particularly with neural networks.
When trying to solve a new task, the workflow is not so straightforward, and you have to iterate back and forth between the design of the reward function and, for example, state normalization and other tricks, feature selection discussed before and the actual tuning of the reinforcement learning algorithm. So this can be continuous back and forth. Unfortunately, there is no standard technique neither to design reward function nor to tune the reinforcement learning algorithm for a specific task. So this might be kind of difficult.
And, unfortunately, in MachineURL, we don't have any tool to do this tuning. What we can offer is a visualization interface, so a graphical interface in which you can see step by step what is the action selected by the agent in terms of magnitude, of course, and you can see the boundaries of the actions. So you can see how it's behaving the policy of the agent. So if the policy is going too much out of the bounds or is not exploring sufficiently or is exploring too much, so the signal is too noisy, you can more or less understand what's going on. And this can be quite too useful in tuning these environments.
But in general, there is no standard recipe to solve this issue, unfortunately.
[00:27:09] Unknown:
In terms of the sort of use case of mushroom RL and reinforcement learning in general in the research environment, I'm wondering if you have any experiments or can provide any juxtaposition as to how the approach to reinforcement learning and the design of the problems might differ between a pure research environment versus when trying to use it for solving some business problem such as the financial example that you provided earlier, Carlo? So, yes, in research experiments are rarely the purpose.
[00:27:42] Unknown:
The purpose most of the time is develop a method, so an algorithm that has some theoretical benefits and then testing the algorithm in an experiments. So mushroom is helpful to test in the experiments and algorithm. And in research, what is commonly done, it's trying to find the theoretical properties of a method and prove them in the experiments. So we don't really care about maximizing the performance, but it's more about proving some theoretical claims in the experiments. I will say that now with the explosion of deep reinforcement learning in also research community, the importance of performance, it's becoming more and more prominent.
But still, in research, it's commonly, less important than in an industrial application, of course. So in industry, you can even relax some constraint of a theoretical framework to really give more importance to the final performance. So if a metal, for example, has certain constraint on the particular parameter or something like this, in an industrial application, you can see, okay, let's not care about this because we just want to maximize the performance. And if some theoretical guarantee will not hold anymore, we are still happy if the performance are maximized.
I will say that this is the main difference between research and industrial applications. And mushroom, of course, allows both kind of applications. So in, it is only a matter of how you tune your experiment
[00:29:10] Unknown:
and what you want to show. In terms of the overall space of reinforcement learning, I'm wondering what are some of the open questions and active areas of research and some of the ways that you're using mushroom RL in your own work to help explore some of those problems? About this, I think we can answer both me and David. I will start saying my actual
[00:29:32] Unknown:
investigating methods for generalizing policies. Generalizing reinforcement learning means to acquire a policy, so a sequence of action that can be applied not only in a specific task, but in different tasks. That is also what human does. So a human is not only able to play a certain sport, for example. It can adapt and play different sports, or it's able to grab a book of different shapes, of different sizes. This is what is called multitasking reinforcement learning. So common approaches in reinforcement learning, especially in early ones, just focus on a single task. Of course, this creates very strong gap between reality and simulation because in reality, we always have different situation, different environments.
Reinforcement learning agents trained with classical methods, it's almost never able to deal with real world problems. Deep reinforcement learning is trying to solve this issue, trying to extract, more general features about the problem that can generalize across different tasks. So it's not anymore a matter of, example, grabbing a book of 400 pages, but learning to grab a book of different sizes from different positions. And this is my current research in multitask reinforcement learning. I'm also studying, slightly related branch called the Curiculum Reinforcement Learning, where the tasks are not different so much, but they're actually put in a sequence of increasingly difficult task. So the task is more or less always in the same environment, but we are making it more and more challenging in this q curriculum sequence.
So basically, the agent can start learning something in the easiest task and then progressively improve its performance until a target task is the most difficult 1 is reached. I work mainly
[00:31:18] Unknown:
on the problem of applying reinforcement learning to robotics, in particular, how to try to learn without how to use a learning agent on a real robotic platform and learn from scratch on this real robotic platform without the need of a simulator between and without the need to transfer from a simulator to the real robot, which is a very interesting problem. It's called sim to real, but it's very limiting because but it means you have to use a simulator and maybe the simulator does not match the reality. In order to do that, you need a way to explore the environment without causing harm to people, without causing harm to the robot, and without destroying the environment itself. So how to do that? You can explore the safe reinforcement learning area. And this safe reinforcement learning area is very active right now. So there are many different approaches.
And we are actually currently working on 1 of it. We want to learn from scratch to play hair hockey. Actually, we are still quite far from learning directly from the real system, and we are still doing in simulation. But what we want to do is that when exploring in the simulation, we don't violate the safety constraint at any time step because violating a safety constraint means crashing the robot. And actually, we were able to learn plane AROCI both with our robotic arm with 7 degrees of freedom and a planar arm maintaining the constraints. And we actually use the mushroom. Rl to implement everything. From the environment that is based on PyBUL, the algorithm is fully implemented in mushroom.
Of course, we use the core and the logging capabilities of mushroom to design the experiments. So everything done by mushroom, and we probably, we will open source the code soon. So we will give the access to to these experiments on the air, okay, with safe reinforcement learning.
[00:33:26] Unknown:
And was it able to beat you at air hockey? No. Actually, not. The the robots have not only
[00:33:32] Unknown:
the problem of learning the task, but they have also a lot of hardware problems that make them quite slow. The real robot that we have is a robotic arm, which from KUKA. This arm is really, really good in terms of technology, but still is not optimal and not optimized to to play the task. So right now, it's very difficult to to make it competitive with humans. Of course, you can design robots that can beat you, but then you have to design the robot specifically for a rock a. So maybe a planar robot instead of 7 degrees of freedom robot because it will be much easier in terms of joint accelerations and these kind of issues. This is a non trivial problem. There are some work, mainly in robotics, not in reinforcement learning, in which you there are robots playing and beating the human at the Air Hockey, but these robots are designed specifically to play Air Hockey. So they are planar robots that move only on the plane. Instead, what we are interested is in using robotic arms that have a much more complex kinematics, and the problem is really to move on the plane and to avoid damaging the cable and this kind of damaging the environment. That's a research question for us. Not the best way to beat a human in a game, of course.
[00:34:58] Unknown:
And so in terms of the ways that you've seen MushroomRL used both in your own research and in the community of people who have adopted it? What are some of the most interesting or innovative or unexpected applications of it that you've seen? So a guy that works in finance actually contacted us to apply
[00:35:15] Unknown:
some mushroom algorithms in some financial application. And this was unexpected. At least to me, in the beginning, the mushroom mushroom was really focused on research and seeing that people working industrial applications were starting to using it was, quite unexpected, I would say. Davide, are there any interesting
[00:35:33] Unknown:
applications of mushroom RL that you've seen either in your own work or in the community?
[00:35:38] Unknown:
Yeah. We are trying to use mushroom to do robotics task. In particular, what we want to try in the future, but this is an open topic, is also see what you can do while learning with other humans in the loop. So interaction with humans and this kind of task that might require safety. I would say that right now, what we have done is mainly benchmarks of standard tasks. So nothing particularly exciting except this AROCAE
[00:36:15] Unknown:
thing that we think that is quite interesting. But this is current reserve, I would say. Yeah. This is pretty common. So reinforcement learning, rarely, you can apply algorithms out of the box in real world applications. And Mushroom, it's really focused on research, so it's not so easy to use their methods, its methods in applications like that. It's more for benchmarking the algorithms in
[00:36:37] Unknown:
well known environments in reinforcement learning. I would say that you can still apply the algorithm, but maybe you need to do some engineering and on the reward function, some engineering on the feature, some particular choices on the actual space, maybe try to mix some learning with some hard coded solution together. So it's not so easy. Mushroom allows you to do all of this, but it's still quite an effort as reinforcement learning is a very new and open topic in research. And in your own experience
[00:37:16] Unknown:
of doing reinforcement learning research and building and maintaining the Mushroom RL project, what are some of the most interesting or unexpected challenging lessons that you've learned in the process?
[00:37:26] Unknown:
Maintaining the library is not easy at all because we are working to people in this library. And the research community, Rell, is going very fast. So every year, the thousands of papers published, the algorithms that are beating the state of the art. So keeping up with these amount of work is not easy for us. The unexpected part of the working mushroom, it's exactly this. So all the amount of research that is being done was initially when we started working on mushroom unexpected. So it's quite challenging to keep up with all of this. Our plan is more or less to being updated with the most famous algorithms and for implementing, let's say, secondary methods that the literature provides, mushroom can be used but doesn't provide these methods.
So it's a package that can be used as a fork or as a separate module for implementing other reinforcement learning methods. For the most important 1, we provide implementation. For others, we believe that Mushroom is flexible enough to also implement other methods, but they are outside the library. I would say that the
[00:38:35] Unknown:
lesson that we learned is that testing is never sufficient. It's never enough. You have to test everything multiple times. You have to document everything because, otherwise, it will be difficult to use for other people. Tutorials are very important. So what you want to do is a lot of tutorials. It's better if you can have video tutorials too, but this is, for us, is quite hard to do. As Carlo said, we are just 2 of us. In general, I would always say that it's worth to take a bit more time to write code that is clear and easy to understand and maintainable.
And also maybe use tools as continuous integration and code automatic code analysis such that you maintain the quality of your software hide because this pays off, not in the short time. But in the long run, this pays off significantly. So thanks to this attention to the code quality, sometimes in mushroom, it's much easier to implement new new algorithm and new approaches and new functionalities that have been very difficult to do without the attention to the code quality, I would say. So that's crucial for us. For people who are interested
[00:39:59] Unknown:
in exploring reinforcement learning or they're wondering if it might be a useful approach to solving some of their business problems. What are some of the cases where mushroom RL might be the wrong choice and they might be better suited with either other reinforcement learning libraries or just avoiding reinforcement learning as a particular solution to their problem space.
[00:40:20] Unknown:
So, okay, mushroom is, I will say, strongly research focused. So it's made with the purpose of, facilitating the implementation of, empirical result of experiments from other papers to favor the reproducibility of these experiments. Yeah. So for, let's say, industrial application, this is not really the most convenient thing because also maybe they will care more about their performance in terms of computation and time and also the results of the algorithm. So mushroom is not super powerful in terms of computation time. For these, there are other libraries, like, for example, array, arraylib, I think. Basically, it's a library based on the array Python library.
That is a library for strongly parallelizing the code. So they are developing these separate reinforcement learning project based on Ray that really cares about computation time of the experiments. So reinforcement learning is very convenient most of the time to run multiple experiments in parallel, and even to run different distances of the same environment in parallel to basically maximize speeding up the learning as much as possible. All these details on computation time are not really the strongest focus on mushroom that is most based for the research aspect. So if somebody really cares about the performance in terms of time to execute experiments, for example, let's say, real time application, maybe mushroom cannot be the best choice.
But for whoever researcher wants to implement their methods, and produce experiments, mushroom, for me, is definitely
[00:42:01] Unknown:
a good choice. I would say that if you don't want to understand reinforcement learning and you just want to take a method and use it black box, there are also standard libraries to do that such as stable baselines or OpenAI baselines, the first being a fork of the second. And with this method, the you can just plug and play a standard environment or your own environment and try to run the experiment on that. The problem is that you don't have fine control. So if you don't need fine control over your algorithm, over your method, if you don't need to dig a bit into reinforcement learning, then probably mushroom is not a good choice for you because it gives you a lot of control, but this means that you need quite a bit of knowledge. It's not very friendly toward people that have 0 knowledge of reinforcement learning.
It's very good if you want to learn instead.
[00:43:04] Unknown:
Let's say it's hiding some details about compatibility with other libraries that are not reinforcement learning stuff. But for the reinforcement learning, the intention of algorithms, mushroom requires you to know what you are doing and to know where to put your hands when you are working on the mushroom. So the idea is that every user of mushroom can either use it for learning or for doing other research, knowing what he's doing. 1 thing that we didn't discuss yet that I'm curious about is where did the name come from? What what made you decide to call your library Mushroom? It's funny because many people ask me that, and they expect me to say it's because of Mario Super Mario Bros. So much of a Mario.
I am a big fan of Nintendo and Super Mario Bros, but it's not this the reason. The reason was that, basically, I was trying to find a name for the library, and I was, for some days, struggling with finding this name. And then at some point, I basically said, okay. Let's call it the name of a food, something like that. Sounds nice for a library because I've also finding many libraries called pineapple, for example, or stuff like this. And I said, why? It's it's interesting, let's say, to call a library like that. And then, I asked David, what about mushroom? We are both fan of Mario, so we said, okay. Mushroom sounds nice for a library also.
Yes. This, reminds you of Super Mario Bros. And we said, okay. Let's call it mushroom. And at the time, it was a game between us. It wasn't anything serious. But then, we kept this name, and that's it. You forgot to mention
[00:44:40] Unknown:
2 things. So the first thing that you forgot to mention is that you get the inspiration by looking to the pizza of your supervisor that had mushroom on top of it. This is true. Yes. And the second thing that you didn't mention is that, fortunately, there is already a library called mushroom in the Python package index. So we had to rename, our library as mushroom rel for this reason. And that was was quite a bit problematic because the library was already quite big, and we had to do a major refracting the code and remove the old name.
[00:45:18] Unknown:
Yeah. Yeah. I mean, the story of the pizza, it's funny because it's not really that the reason. It's just because I was completely desperate finding a name. Then I saw the mushroom, and I said, this sounds nice. Let's call it mushroom. That's it. And let's pretend to this, torture to find a name to this life. The
[00:45:35] Unknown:
the the hazards of naming things when you're hungry.
[00:45:39] Unknown:
Yes. Yes.
[00:45:42] Unknown:
So in terms of the near to medium term future of the project, what are some of the things that you have planned either in terms of new capabilities or features or just sort of code quality and organizational aspects or community growth that you're looking to add to?
[00:45:57] Unknown:
So to us, they are both very important aspects because maintaining the library is not easy and it's becoming harder and harder as our researcher career proceeds. So when you do research, you are not really supposed to work on the code. You are supposed to read papers to supervise students. So still, maintaining the library is becoming harder for us. But still, we believe that mushroom, it's something that we really did with passion. Basically driven just by our passion for doing something nice with this library. And so I don't find it easy for us to completely abandon this project in the near future because we like it.
We we really appreciate how people are using this library. So we are strongly motivated to maintain this mushroom for a long time, possibly collaborating with some students that can maybe work on that during their master's thesis or something like this. And for adding new functionalities, at the moment, we are seeing in reinforcement learning a huge trend in multi agent reinforcement learning. So multi agent consists basically training multiple agents at the same time. And the purpose is to maximize the performance of all the agents. This is something that mushroom at the moment doesn't support, but we think that we can easily add this extension in Mushroom, and we are actually planning to do it in the next months.
[00:47:15] Unknown:
And also, we are planning to add more simulators. And actually, as I told you before, we have iGeeks and habitat simulators that are currently on the development branch of the library. Of course, we would like also to integrate the library to be used with the real robots. So we are thinking about ROS, which is robot operating system support. We are trying to increase the capabilities of the logging. And we also have benchmarking suite that you can use it to run our algorithm on standard benchmarks, and we are planning also to extend that. What we would like is to have contribution from the community. What we would like is to have implementation of new algorithm, particularly publish the 1 ones. So we would like to extend our library to support more known approaches.
We would like to support to extend our library more, tools for maybe visualization, maybe process data. And we also would like to have feedbacks about the documentation and feedbacks about the presence of bugs or other issues in the code. So this is something that we really appreciate and that we try to address, as soon as possible when we receive some GitHub issue or when somebody writes us to our email. So we are trying to respond as soon as possible.
[00:48:51] Unknown:
So, yes, feedback from users is certainly welcome. The most desirable thing for us will be to have some some implementation of algorithms proposed by users. So for example, if they like certain method reinforcement learning that is not in mushroom, they can make a pull request GitHub to propose the integration of their method in mushroom. And we are actually providing in the readme file the instruction for being compliant with the idea of mushrooms. So with the architecture, with the style of the code, the need for docstrings, and stuff like this. So the users have all the information that they need for adding their methods, and this will be our, let's say, most desirable thing. So to have active users adding the metals in the library, and this will also make our work much easier to maintain.
[00:49:43] Unknown:
Are there any other particular areas of contribution that you're looking for help with or types of support that you're looking to provide to active contributors or any other aspects of the mushroom project and reinforcement learning that we didn't discuss yet that you'd like to cover before we close out the show? So if someone can help us with a nice logo, it would be great. Otherwise, we will have to find a good graphical designer as, unfortunately,
[00:50:08] Unknown:
both me and Carlo are not very skilled on this topic.
[00:50:14] Unknown:
Our logo at the moment is a small mushroom, a small toy mushroom that my girlfriend bought for me. And I made a photo and tried to make some artistic modification to this photo, but it's very raw, I would say. So if a professional designer, as David said, would be willing to make a logo, it would be nice.
[00:50:35] Unknown:
But also video tutorials would be very nice to have. So and we can give hints and help to anyone that wants to do this kind of work. Unfortunately, we don't have much time to improve the documentation in this way. For us, it's a bit too much. But for sure, we can give hints and we can check the videos if someone
[00:51:01] Unknown:
wants to do this kind of work. Yeah. Something really important is the documentation. We saw people actually sending us email asking for improving the documentation. At the moment, I will say informative enough, but not visually appealing. So we need people also with some skills in web programming for creating a visually good looking website for documentation, for adding tutorials,
[00:51:26] Unknown:
videos that will help Mushroom very much to be more visible. I would point out that Mushroom is completely open source, and we don't have any commercial or closed source part of it. So it's all open, and it's all free for everyone. So we are not interested in the commercial part. We are interesting in giving a tool to the open source community to work with and to increase the quality of the research and to spread good methodologies in reinforcement learning in general.
[00:52:02] Unknown:
Alright. Well, for anybody who does want to get in touch and follow along with the work that you're doing, I'll have you each add your preferred contact information to the show notes. And so with that, I'll move us into the picks. And this week, I'm going to choose the TV series Britannia. Just watched the 2nd season of that, and it was very well developed storyline, very sort of interesting combination of both the sort of older style, like the sort of time period with some interesting sort of musical accompaniment. So just a really well done show. Looking forward to seeing the next season whenever it comes out. So definitely recommend that for people who are interested. I'll just note that it is not kid friendly, so before you go trying to show it to your kids. So with that, I'll pass it to you, Davide. Do you have any picks this week? Yeah. I would suggest
[00:52:47] Unknown:
1984
[00:52:48] Unknown:
from George Orwell. I think it's a great book. I'll second that 1. And, Carlo, do you have any picks this week? So as you talked about TV series, I will say that I can tell you about my favorite TV series, which is Twin Peaks season 3. For me, it's really 1 that I ever seen on a screen. So I will recommend it to every fan of David Lynch, for sure. All right. Well, thank you both for taking the time today for sharing the work that you're doing with mushroom RL and the research you're doing in reinforcement learning. It's definitely a very
[00:53:17] Unknown:
interesting area, something that I'm always looking to learn more about. So I appreciate you taking the time to share your work and the work that you're doing to help make it more accessible to the broader community. So thank you both again for all of that, and I hope you enjoy the rest of your day. Thank you. Thank you very much. It was very nice. Thank you for listening. Don't forget to check out our other show, the Data Engineering Podcast at dataengineeringpodcast.com for the latest on modern data management. And visit the site of pythonpodcastdot com to subscribe to the show, sign up for the mailing list, and read the show notes.
And if you've learned something or tried out a project from the show, then tell us about it. Email host@podcastinit.com with your story. To help other people find the show, please leave a review on Itunes and tell your friends and coworkers.
Introduction and Guest Introduction
Guests' Backgrounds and Introduction to Mushroom RL
Overview of Reinforcement Learning
Mushroom RL Project Overview
Lessons Learned and Advice for Developers
Core Concepts for Reinforcement Learning
Structuring Environments and Reward Functions
Library Architecture and Integration
Workflow for Experiments and Iteration
Research vs. Industrial Applications
Open Questions and Active Research Areas
Interesting Applications of Mushroom RL
Challenges and Lessons Learned
When Not to Use Mushroom RL
Origin of the Name 'Mushroom RL'
Future Plans and Community Contributions
Call for Contributions and Support
Closing Remarks and Picks