Experimenting With Reinforcement Learning Using MushroomRL

Hello, and welcome to Podcast Thought in It, the podcast about Python and the people who make it great.

When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode.

With the launch of their managed Kubernetes platform, it's easy to get started with the next generation of deployment and scaling powered by the battle tested Linode platform,

including simple pricing, node balancers, 40 gigabit networking,

dedicated CPU and GPU instances, and worldwide data centers.

Go to python podcast.com/linode,

that's l I n o d e, today and get a $100 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show. Your host as usual is Tobias Macy. And today, I'm interviewing Davide Teteo and Carlo Deramo about Mushroom RL, a library for building reinforcement learning experience. So, Davide, can you start by introducing yourself? Hi. I'm Davide. I'm a postdoctoral

researcher at, TU Darmstadt.

And I focus mainly

in robotics and robot learning and also in developing a mushroom REL together with Carlo. And Carlo, how about yourself? Yeah. I'm Carlo, a postdoctoral researcher in the IIS laboratory.

My focus is mostly on reinforcement learning with a particular focus on multi multitask and curriculum enforcement learning. And before this experience, I graduated in EcoTechnico di Milan,

Information Technology. In both these experiences as a postdoctoral researcher and in my PhD, I worked with Davie, then we were actually in the same office during our PhD.

It's always great to see when people just happen across each other and end up collaborating through a big chunk of their careers. Going back to you, Davy Day, do you remember how you first got introduced to Python? Yeah. Actually, it's kind of funny because I wasn't a fan of Python at all. When Carlos started developing Mushroom,

he started using Python. And the reason is because Python has a very good for machine learning in general. In particular, there is

scipy and umpy and a lot of basic libraries to do optimization.

So that's how I got in touch with Mushroom, basically helping

developing Mushroom. And Carlo, do you remember how you got introduced to Python? Yeah. So as David just said, I started working on Mushroom for my PhD.

And at the time, there were libraries in Python for reinforcement learning. 1 example among the most many library is OpenAI Gym. So this library is for running experiments in reinforcement learning. It's developing. It's implementing some environments for experiments.

And at the time, I saw that Python was a good compromise for having good performance

and easy way to implement

algorithms and running experiments.

So for having something flexible to use,

also to be accessible for students,

I started working with Python and implemented the whole mushroom library based on that. Before we get too far into the mushroom project itself, can you start by giving a bit of an overview about what reinforcement learning is and some of the ways that it differs from other approaches to machine learning that people might be familiar with? Sure. So the most famous machine learning approach, I will say, supervised learning.

In supervised learning, we are actually provided the dataset.

So it's a collection of data

with also classification of this data. Basically, human human expert

or something like this

labels this data. And the purpose of supervised learning is to learn the mapping between an input and the output. And this can be applied, for example, to images. 1 expert will classify an image of an animal or a car.

And our machine learning model has to learn the mapping from the image of a car to the label of the car.

Reinforcement learning differs substantially from this approach because in reinforcement learning, we don't have a dataset.

So the purpose is train an agent that is moving in a unknown environment

to solve this environment.

Where solving means maximizing that we get from the environment.

And there can be modeling whatever we are interested in. For example, in a financial application,

the goal will be to maximize the profit. So the reward function will model this

profit,

and the agent has to learn the strategy.

So the execution of actions

that maximize the profit in a certain amount of time. In terms of some of the use cases where reinforcement learning might be necessary or particularly applicable, I'm wondering if there are any

industries or problem spaces that it's uniquely well suited for or that it can potentially outperform

other machine learning approaches either in terms of the sort of cost of execution or development or just its overall capability to be able to learn and adapt to the environment?

So as I said, in supervised learning,

a human expert has to intervene and do something to help the model in understanding

what we are aiming to understand.

In reinforcement learning,

we are actually free to leave the agent,

explore the environment, and learn its own strategy. So potentially, we don't need any human intervention.

The applications are multiple

and are basically all the time when a planning model cannot be applied. Planning is basically,

I will say, a parallel approach

compared to enforcement learning

where we are provided the model of the environment. And planning can use this model

to plan

the best strategy to solve, this problem.

Whenever the model is not provided,

so usually when the model is very complex or when the environment to model is very high dimensional, very unpredictable,

reinforcement learning can be helpful. So reinforcement learning can actually be used in all real world problems potentially. Yeah.

So basically, whenever we cannot

understand,

completely the environment, And this will include the application, like I said before, financial application,

health care application, also,

for for example, decision making on the kind of cure to

execute on a patient according to the symptoms and stuff like this,

or as we are studying in this laboratory in robotics.

So in robotics,

the model of a robot can be really complex, and also the model of the environment where the robot is acting can be really high dimensional. So reinforcement learning will put this robot

moving

in the environment and learning from its action from its action and from the feedback

that it gets from the environment. And so in terms of the Mushroom RL project, can you give a bit of an overview about what you've built there and some of the story behind how it got created and some of the overall goals that you had in mind? So, yes, during my PhD, I was observing that there were some libraries in reinforcement learning, but they were all very

hardcoded on specific applications.

So the trend was to basically provide the code for a specific experimental

campaign conducted on a specific paper.

And these algorithms were not flexible, were not, also reproducible most of the time.

So in the supervision of students,

I found myself many times reimplementing the same stuff all over and over.

Of course, this was time consuming plus also prone to having bugs in the code. So my purpose also to have the student that were supervised by me was to develop a library just for me

to have some kind of flexible implementation of algorithms that were always the same and reusable.

The nice thing is that it started like this, so, basically, something for me.

But since I wanted to make it flexible and, let's say, also with the potential of being extended,

the students started to like it. They started

to add their own methods,

and they also worked to improve the library. And then after years years also with the help of the Avi that joined the project, We have now a very big library that it's very nice. It's being noticed also by the whole reinforcement learning community. So this is very satisfying for us. I would say that I

joined Carlo, and I was previously developing another library

in c plus plus but unfortunately,

c plus plus is not very suitable for machine learning in general.

So the project

was cut off, and

then I joined

Carlo. And I guess that my previous experience with this previous c plus plus library

was kind of helpful how

to structure the the code and how to make it modular and how to

build a robust platform that can be used to by anyone.

And in terms of the target users of the project, you mentioned that you initially

built it for yourself to be able to simplify some of the work that you were doing, but then it got adopted by some of your students and now the broader community. And I'm wondering if you can just talk through some of the ways that

the original focus of it being on yourself and then some of the ways that it has grown has influenced some of

the design and structure and sort of feature direction of the project?

Yeah. So, basically, I put myself in the position of a student wanting to implement experiments from another paper, reproduce the experiments, and also implement their own my own experiments.

So when I developed Mushroom, the purpose was to make it very accessible to everyone, very flexible for me. But in the end, as I was thinking,

flexible

for me. But in the end, as I was thinking in terms of being just another researcher,

This in the end was a good strategy also for making a library that was appreciated by other people. So I started working on that and then students,

as I said, they also use the library for their own experiments.

And as I was noticing that

modular structure I gave to the library was actually being appreciated and very useful to extend the library,

I decided to proceed in this direction.

So I was very happy to see that my initial idea

was really working well for extending the library,

and this gave me a strong motivation to continue working on that. Also, because I was seeing that the competition let's call it competition.

So other libraries seen on the Internet

were not so modular and flexible like MOSHUB was even back in 2018,

And so I was quite satisfied to see that this library,

even if it was just developed by me and David,

was very, very

general purpose for many people doing reinforcement learning.

This actually paid off because we see now even students from other universities are sending us emails to thank us about the quality of mushroom,

saying that they did a lot for the work in this thesis.

Yeah. We are seeing quite a nice feedback from people that we don't know. So it's satisfying for us.

In terms of

your sort of experience of building this project and then having it gain some visibility

and adoption within the broader community,

what are some of the lessons that you've learned and suggestions that you might make to other developers who want to be able to build a library that ends up being successful and broadly adopted?

Sure. So to me,

the idea of putting myself in the mind of a student

worked well because it allowed me to develop a code that was general perk for many people.

And so I will suggest to

avoid

the our code implementation,

trying to be as modular as possible as Python also allows,

and to really put a strong focus on the flexibility of the code.

So I think it's definitely

the main important lesson also I learned from the

work I did in mushroom in these years.

In terms of the kind of core concepts that

users need to be familiar with to be able

to experiment with reinforcement learning.

What are some of the elements and background that you found to be particularly useful for them to be able to make effective use of MushroomRL

and some of the ways that you've tried to

sort of encapsulate

and sort of ease the onboarding process for people who are first getting started in this area? The most important concept that you have to learn

starting

doing RL is the concept

of agent environment

and their interaction.

So this is key to understand reinforcement learning.

Also, of course, knowing what is our reward function,

what is an action value function, which is the value that you assign to a state

action

in every point of your environment.

So this is the key concept that has to be known to understand

reinforcement learning

and start using this library.

Other concepts that are really important in general for reinforcement learning, but particularly for mushroom,

I would say that it depends on the type of algorithm

that you want to use.

So if you want to use modern deep learning approaches,

it's worth

to know what is a neural network

and have a basic idea of how a neural network can be used.

Instead, if you want to use, the all the state of the art approaches that in many scenarios are the ones that can actually work,

at least for small projects,

and can be deployed in the real system because all then quite established. So you can

trust

on the fact that they will give you some kind of results.

These approaches are heavily based on the on features.

So the concept of what is a feature, that is an abstraction

that map

the state of your environment

to a set of values that tells you something about this state. So this is a very important concept.

The most basic

aspect the most basic feature that you can imagine is to create a tiling. So, basically, let's imagine that you are in a room.

And what you can do is divide this room in

a set of square tiles.

Okay? What you do is you convert your x and y coordinates

in a set a vector that tells you in which style you are in the room. This is very important because depending on how you do these discretization

and how many discretization you use, you can control

how much you generalize in your reinforcement learning algorithm.

Which means that in each position,

how you share the information that you get in a given position with other nearby position. So this concept is really important for classical classical

reinforcement learning, particularly if you want to apply it

on

simple toy scenarios.

Instead, if you want to solve a very complex task, then you need to go

in the neural network domain for sure. And in terms of structuring the environments,

I know that MushroomRL also has integrations with some sort of 3 d engines for being able to simulate real world environments or build sort of visual space for the agent to be able to interact with beyond just having a matrix of values that you want to explore for the more sort of simple environmental case. I'm wondering if you can just talk through some of the

design approaches that you have for creating these environments that you want the agent to explore and some of the ways that you think about how to

approach the overall problem construction to be able to identify the types of outcomes that you're aiming for? What we are trying to do is to use a standard interface to interact with the environment.

And the standard interface is the classical

interface that is used by classical reinforcement learning. As I talked in the beginning about this agent environment interface,

so you have always

a step function that tells you what to do. It gives you, as a result, the next state given the action that you want to apply in the current state and gives you also the the reward that you get in during this transition.

Plus, we have the reset

function that is called to get initial state of the episode. So the interface is very similar to

the interface that is presented in a classical reinforcement learning algorithm.

Then to integrate

some

more complex 3 d environment such as MuJoCo,

PyBullet,

or other physics simulators.

Currently, we are thinking about integrating Igibson

and Avitata from Facebook.

The first, I think it's I don't remember. I think it's NVIDIA or I might be wrong.

So we are integrating

different environments. And for this, we can we are trying to provide an interface that is as much abstract possible to

the user

such that it's not a problem for them to use these interfaces.

When the simulator allows to build an environment

that is very generic, such as 5 bullet environment.

We are trying to provide the tools to build the basic building blocks of the environment such as the reward function,

how to reset the state, and how

to do the simulation step, how to get the link and joint position, velocities,

and this kind of information in an easy way. About designing reward function and other part of the reinforcement learning, we don't provide anything.

So this is for sure something that a user

must learn by himself how to do it, but we can provide some hints.

I would say,

for sure, what I can say is never use reward functions

decaying exponentially.

The reason is that we empirically

saw that this kind of reward function are very difficult to learn.

So I suggest always to use quadratic or linear reward functions.

And this is something that we just know empiric. We still don't know the reason why it's like this, but, I would say that we have many experience with this kind of reward function and

exponentially decaying reward function, which is the most easy

reward function that you can think about, does not work very well. And also, sparse reward functions also are very difficult to learn, but this is another topic. Digging more into the library itself, can you talk through some of the ways that it's architected and some of the sort of component pieces that you've integrated into it to be able to

support different machine learning libraries and environment plug ins and just the overall architectural aspects of how you think about structuring the library to make it accessible to a wider audience.

And most effective solution that we found was to unify with a single core module

different enforcement learning algorithms.

So this core module,

it's a very, very flexible solution to unify

line and online methods

of policy and on policy methods

that are different, let's say, dichotomies over reinforcement learning algorithms.

In many libraries that are not, mushroom, I found that these different algorithms are actually handled with different modules,

creating, of course, some kind of complexity that, for a user, makes everything a bit less accessible.

In mushroom, the core, it's very simple function to unify all these methods, and it's really

it was working well in the beginning where the methods were not so many.

Still

now that we added a lot of algorithms for deep reinforcement learning,

actor critic,

other methods.

Still, this very simple solution,

it's very effective for keeping the simplicity of Mushroom. Tell you more details about other

compatibility

with libraries like Penang and stuff like this, also with PyTorch. That What we do

is we

try

to provide

abstraction layers to access all of this library. So we create the wrappers for every

single environment.

In particular, we created some wrappers for iBullet,

MuJoCo,

simulators.

We are now extending this to

many other

environments. We are trying

to keep the interface as clean as possible and as simple as possible

such that it's easy to use. And we try to hide the complexity

of interfacing

with these libraries inside the classes so it's not problematic for the user, to use them and to interact with them. And sometimes we also

fix

some problems of these libraries.

An important key problem in reinforcement learning

is how to treat the terminal states.

So when a state is terminal,

you need to assign to the state a value function of z by definition.

And libraries

such as OpenAGim,

they don't distinguish between

a trajectory that has been cut because of the horizon or a state

that is a terminal state.

And to do that, we kind of hack through the OpenAGM library

such that we force the distinction between terminal and absorbing state. And this is 1 of the examples on how we handle the connection with other libraries.

And

the other interaction that we have is we have a very good integration with PyTorch.

We support learning with Torch

inside of the library. So

this simplifies a lot the use of this library

to do deep learning with our agents and

our learning methods?

In all the work in mushroom, we always wanted to keep

invisible to the user

the compatibility

issues with other libraries. So we always wanted to provide a simple interface for the user

to use libraries for environments or for tensor computation like PyTorch.

In terms of the actual workflow of setting up an experiment or designing it in the 1st place and then going through execution

and testing the outcomes and maybe iterating on it to

test out different algorithms to use for exploring the different environment, testing out the different reward functions.

Wondering if you can share some of the overall process of how to think about all of that setup and then the workflow of actually building that with mushroom RL and then tracking the different sort of inputs and states and outcomes of the experiments to be able to identify

which ones are successful, which ones are trending in the right direction, and things like that. The first thing that you have to understand is you either have a standard benchmark that already exists

and you want to try a new algorithm to solve this standard benchmark,

or you are creating a new environment

to solve a different problem that is

unknown. And these are 2 different work flows that have very, very different solution.

So in the case that you have a standard benchmark,

what you have to do is you start maybe by

creating your MDP,

then you design a standard learning agent provided by mushroom.

An expert reinforcement

learning consists in the interaction between the environment and the agent.

And in mushroom, this happens through the core.

In mushroom, you can call the function core learn,

and this will make the agent interact with the environment.

And this also will call the learning

function whenever is necessary

on the dataset

that you get from the interaction between agent and environment.

We suggest in mushroom is to call the method evaluate after every epoch of learning

such that you have a clear evaluation

of the performance of the agent after a specified amount of steps or episodes on the environment.

And normally,

this kind of evaluation is a single epoch. And what you do is you use the mushroom logger or you can use also your own logger to store the learning results of this epoch. And you repeat this process for many epochs. This is how you normally structure a simple reinforcement learning algorithm

when you have a known benchmark.

When the benchmark is not known, then this becomes much more difficult.

What you have to do, first of all, is to understand

which kind of learning algorithm

would be suitable for your environment,

you have to design a reward function that not only is good to not only represent the task that you have in mind, but a reward function such that the task that you are considering

can be solved

by

a learning algorithm. Because sometimes,

if you design the reward function,

for example, to sparse or with a scale that varies a lot,

This can cause issues for specific learning algorithm, particularly

with neural networks.

When trying to solve a new task, the workflow is not so straightforward, and you have to iterate back and forth between the design of the reward function

and, for example, state normalization

and other tricks, feature selection

discussed before

and the actual tuning of the reinforcement learning algorithm. So this can be continuous back and forth. Unfortunately,

there is no standard technique neither to design reward function

nor

to tune the reinforcement learning algorithm for a specific task. So this might be

kind of difficult.

And,

unfortunately,

in MachineURL, we don't have any

tool to do this tuning. What we can offer is a visualization

interface, so a graphical interface in which you can see

step by step what is the action selected by the agent in terms of magnitude, of course, and you can see the boundaries of the actions.

So you can see how it's behaving the policy of the agent. So if the policy

is going

too much out of the bounds or is not exploring sufficiently

or is exploring too much, so the signal is too noisy, you can more or less understand what's going on.

And this can be quite too useful in tuning these environments.

But in general, there is no standard recipe to solve this issue, unfortunately.

In terms of the

sort of use case of mushroom RL and reinforcement learning in general in the research environment, I'm wondering if you have any experiments or can provide any juxtaposition as to

how the approach to reinforcement learning and the design of the problems might differ between

a pure research environment versus when trying to use it for solving some business problem such as the financial example that you provided earlier, Carlo? So, yes, in research experiments are rarely the purpose.

The purpose most of the time is develop a method, so an algorithm

that has some theoretical benefits

and then testing the algorithm in an experiments.

So mushroom is helpful to test in the experiments and algorithm.

And in research, what is commonly done, it's trying to find the theoretical properties of a method and prove them in the experiments.

So we don't really care about maximizing the performance,

but it's more about proving some theoretical

claims in the experiments.

I will say that now with the explosion of deep reinforcement learning in also research community,

the importance of performance, it's becoming more and more prominent.

But still, in research, it's commonly,

less important than in an industrial application, of course. So in industry,

you can even relax some constraint of a theoretical framework

to really give more importance to the final performance.

So if a metal, for example, has certain constraint

on the particular parameter or something like this, in an industrial application, you can see, okay, let's not care about this because we just want to maximize the performance. And if some theoretical guarantee will not hold anymore,

we are still happy if the performance are maximized.

I will say that this is the main difference between research and industrial

applications.

And mushroom, of course, allows

both kind of applications. So in,

it is only a matter of how you tune your experiment

and what you want to show. In terms of the overall space of reinforcement

learning, I'm wondering what are some of the open questions and active areas of research and some of the ways that you're using mushroom RL in your own work to help explore some of those problems? About this, I think we can answer both me and David. I will start saying my actual

investigating methods for generalizing

policies. Generalizing

reinforcement learning means to acquire a policy, so a sequence of action

that can be applied not only in a specific task, but in different tasks. That is also what human does. So a human is not only able to play a certain sport, for example. It can adapt and play different sports,

or it's able to grab a book of different shapes, of different sizes. This is what is called multitasking reinforcement learning. So common approaches in reinforcement learning, especially in early ones, just focus on a single task. Of course, this creates very strong gap between reality and simulation

because in reality,

we always have different situation, different environments.

Reinforcement learning agents trained with classical methods, it's almost never able to deal with real world problems. Deep reinforcement learning is trying to solve this issue,

trying to extract,

more general features about the problem

that can generalize

across different tasks.

So it's not anymore a matter of, example, grabbing a book of 400 pages, but learning to grab a book of different sizes from different positions.

And this is my current research in multitask reinforcement learning. I'm also studying, slightly related branch called the Curiculum Reinforcement Learning,

where the tasks are not different so much, but they're actually

put in a sequence of increasingly difficult task. So the task is more or less always in the same environment,

but we are making it more and more challenging in this q curriculum sequence.

So basically, the agent can start learning something in the easiest task and then progressively

improve its performance

until a target task is the most difficult 1 is reached. I work mainly

on

the problem of applying reinforcement learning to robotics,

in particular, how to try to learn without how to use a learning agent on a real robotic platform and learn from scratch on this real robotic platform without the need of a simulator

between and without the need to transfer from a simulator

to the real robot, which is a very interesting problem. It's called sim to real, but it's very limiting because but it means you have to use a simulator

and maybe the simulator does not match the reality.

In order to do that, you need a way to explore the environment

without causing harm to people, without causing harm to the robot,

and without destroying the environment itself. So how to do that? You can explore the safe reinforcement learning area. And this safe reinforcement learning area is very active right now. So there are many different approaches.

And we

are actually

currently working

on 1 of it. We want to learn from scratch

to play hair hockey. Actually, we are still quite far from

learning directly from the real system, and we are still doing in simulation.

But what we want to do is that

when exploring in the simulation,

we don't violate the safety constraint at any time step because violating a safety constraint means crashing the robot.

And actually, we were able to learn

plane AROCI both with our robotic arm with 7 degrees of freedom and a planar arm maintaining the constraints. And we actually use the mushroom. Rl to implement everything. From the environment that is based on PyBUL,

the algorithm

is fully implemented in mushroom.

Of course, we use the core and the logging capabilities of mushroom to

design the experiments.

So everything done by mushroom, and

we

probably, we will open source the code soon. So we will give the access to to these experiments

on the air, okay, with safe reinforcement learning.

And was it able to beat you at air hockey? No. Actually, not. The the robots have not only

the problem of learning the task, but they have also

a lot of hardware problems that make them quite slow. The real robot that we have is a robotic arm, which

from KUKA. This arm is really, really

good in terms of technology,

but still is not optimal

and not optimized to to play the task. So right now, it's very difficult to to make it competitive with humans.

Of course, you can design robots

that can beat you, but then you have to design the robot specifically for a rock a. So maybe a planar robot instead of 7 degrees of freedom

robot because it will be much easier in terms of joint accelerations

and these kind of issues. This is a non trivial problem. There are some work, mainly in robotics, not in reinforcement learning,

in which you

there are robots playing and beating the human

at the Air Hockey, but these robots are designed specifically to play Air Hockey. So they are planar robots that move only on the plane. Instead, what we are interested is

in using

robotic arms

that have a much more complex kinematics,

and the problem is really to move on the plane

and to avoid damaging the cable and this kind of

damaging the environment. That's a research question for us. Not the best way to beat a human in a game, of course.

And so in terms of

the ways that you've seen MushroomRL

used both in your own research and in the community of people who have adopted it? What are some of the most interesting or innovative or unexpected applications of it that you've seen? So a guy that works in finance actually contacted us to apply

some mushroom algorithms in some financial application.

And this was unexpected. At least to me, in the beginning, the mushroom mushroom was really focused on research

and seeing that people working industrial applications

were starting to using it was,

quite unexpected, I would say. Davide, are there any interesting

applications

of mushroom RL that you've seen either in your own work or in the community?

Yeah. We are trying to use mushroom

to do

robotics task.

In particular,

what we want to try in the future,

but this is an open topic,

is also see what you can

do while learning with other humans in the loop. So interaction with humans and this kind of task that might require

safety.

I would say that right now, what we have done is mainly

benchmarks

of standard tasks. So

nothing particularly

exciting

except this AROCAE

thing that we think that is quite interesting. But this is current reserve, I would say. Yeah. This is pretty common. So reinforcement learning, rarely, you can apply algorithms out of the box in real world applications. And Mushroom, it's really focused on research,

so it's not so easy to use their methods, its methods in applications like that. It's more for benchmarking

the algorithms in

well known environments in reinforcement learning. I would say that you can still apply

the algorithm, but maybe you need to do some engineering

and on the reward function,

some engineering on the feature, some particular

choices on the actual space, maybe

try to

mix some

learning with some hard coded solution

together.

So it's not so easy. Mushroom allows you to do

all of this,

but it's still quite an effort as reinforcement learning is a very

new and open topic in research. And in your own experience

of doing reinforcement learning research and building and maintaining the Mushroom RL project, what are some of the most interesting or unexpected challenging lessons that you've learned in the process?

Maintaining the library is not easy at all because we are working to people in this library.

And the research

community, Rell, is going very fast. So every year, the thousands of papers published, the algorithms that are beating the state of the art.

So keeping up with these amount of work is not easy for us. The unexpected

part of the working mushroom, it's exactly this. So

all the amount of research that is being done was initially when we started working on mushroom unexpected.

So it's quite challenging to keep up with all of this.

Our plan is more or less to being updated with the most famous algorithms

and for implementing, let's say,

secondary methods

that the literature provides,

mushroom can be used but doesn't provide these methods.

So it's a package that can be used as a fork or as a separate module for implementing

other reinforcement learning methods. For the most important 1, we provide implementation.

For others,

we believe that Mushroom is flexible enough to also implement other methods,

but they are outside the library. I would say that the

lesson that we learned is that testing is never sufficient. It's never enough. You have to test everything multiple times.

You have

to document everything because, otherwise, it will be difficult to use for other people.

Tutorials are very important. So what you want to do is a lot of tutorials. It's better if you can have video tutorials too, but this is, for us, is quite hard to do. As Carlo

said, we are just 2 of us.

In general, I would always say that it's worth to take a bit more time to write code that is clear

and easy to understand

and maintainable.

And also maybe use tools

as continuous integration

and code automatic code analysis

such that you maintain the quality of your software hide because this pays off, not in the short time. But in the long run,

this pays off

significantly.

So thanks to this attention

to

the code quality,

sometimes in mushroom, it's much easier to implement

new

new algorithm

and new approaches

and new functionalities

that have been very difficult to do without the attention to the code quality, I would say. So that's crucial for us. For people who are interested

in exploring reinforcement learning or they're wondering if it might be a useful approach to solving some of their business problems. What are some of the cases where mushroom RL might be the wrong choice and they might be better suited with either other reinforcement learning libraries or just avoiding reinforcement learning as a particular

solution to their problem space.

So, okay, mushroom is, I will say, strongly research focused.

So it's made with the purpose of, facilitating

the implementation of, empirical result

of experiments from other papers to favor the reproducibility of these experiments.

Yeah. So for, let's say, industrial application, this is not really the most convenient thing because also maybe they will care more about their performance

in terms of computation and time

and also the results of the algorithm. So mushroom is not

super powerful in terms of computation time.

For these, there are other libraries, like, for example, array, arraylib, I think. Basically, it's a library based on the array Python library.

That is a library for strongly parallelizing

the code.

So they are developing these separate reinforcement learning project based on Ray

that really cares about

computation time of the experiments. So reinforcement learning is very convenient most of the time to run multiple experiments in parallel,

and even to run different distances of the same environment in parallel to basically maximize

speeding up the learning as much as possible.

All these details on computation time are not really the strongest focus on mushroom

that is most based for the

research aspect. So if somebody

really cares about the performance in terms of time to execute experiments,

for example, let's say, real time application,

maybe mushroom cannot be the best choice.

But for whoever researcher wants to implement their methods,

and produce experiments,

mushroom, for me, is definitely

a good choice. I would say that if you don't want to understand

reinforcement learning and you just want to take a method and use it black box, there are also

standard libraries to do that such as stable baselines or OpenAI baselines,

the first being a fork of the second.

And with this method, the you can just

plug and play

a standard environment or your own environment

and try

to run the experiment on that. The problem is that you don't have fine control.

So if you don't need fine control over your algorithm, over your method, if you don't need

to

dig a bit into reinforcement learning,

then probably mushroom is not a good choice for you because it gives you a lot of control, but this means that you need quite a bit of knowledge. It's not very friendly toward

people that have 0 knowledge of reinforcement learning.

It's very good if you want to learn

instead.

Let's say it's hiding some details about compatibility with other libraries that are not reinforcement learning stuff.

But for the reinforcement learning, the intention of algorithms,

mushroom requires you to know what you are doing and to know where to put your hands when you are working on the mushroom. So the idea is that every user of mushroom can either use it for learning or for doing other research, knowing what he's doing. 1 thing that we didn't discuss yet that I'm curious about is where did the name come from? What what made you decide to call your library Mushroom?

It's

funny because many people ask me that, and they expect me to say it's because of Mario Super Mario Bros. So much of a Mario.

I am a big fan of Nintendo and Super Mario Bros, but it's not this the reason. The reason was that, basically, I was trying to find a name for the library,

and I was, for some days, struggling with finding this name.

And then at some point, I basically

said, okay. Let's call it the name of a food, something like that. Sounds nice for a library because I've also

finding many libraries called pineapple, for example, or stuff like this. And I said, why?

It's it's interesting, let's say, to call a library like that. And then, I asked David, what about mushroom?

We are both fan of Mario, so we said, okay. Mushroom sounds nice for a library also.

Yes. This,

reminds you of Super Mario Bros. And we said, okay. Let's call it mushroom. And at the time, it was a game between us. It wasn't anything serious.

But then, we kept this name, and that's it. You forgot to mention

2 things. So the first thing that you forgot to mention is that you get the inspiration by looking

to the pizza of your supervisor that had mushroom on top of it. This is true. Yes.

And the second thing that you didn't mention is that, fortunately,

there is already a library called mushroom

in the Python package index.

So we had to rename, our library

as mushroom rel for this reason.

And that was was

quite a bit problematic because the library was already quite big, and we had to do a major refracting the code and remove the old name.

Yeah. Yeah.

I mean, the story of the pizza, it's funny because it's not really that the reason. It's just because I was completely desperate finding a name. Then I saw the mushroom, and I said,

this sounds nice. Let's call it mushroom. That's it. And let's pretend to this, torture to find a name to this life. The

the the hazards of naming things when you're hungry.

Yes. Yes.

So in terms of the near to medium term future of the project, what are some of the things that you have planned either in terms of new capabilities or features or just sort of code quality and organizational aspects or community growth that you're looking to add to?

So to us, they are both very important aspects because maintaining the library is not easy and it's becoming harder and harder as our researcher career proceeds.

So when you do research, you are not really supposed to work on the code. You are supposed to read papers to supervise students.

So still, maintaining the library is becoming

harder for us. But still, we believe that mushroom, it's something that we really did with passion.

Basically driven just by our passion for doing something nice with this library.

And so I don't find it easy for us to completely abandon this project in the near future because we like it.

We we really appreciate how people are using this library.

So we are strongly motivated to maintain this mushroom for a long time,

possibly collaborating with some students that can maybe work on that during their master's thesis or something like this.

And for adding new functionalities,

at the moment, we are seeing in reinforcement learning a huge trend in multi agent reinforcement learning. So multi agent consists basically training multiple agents at the same time. And the purpose is to maximize the performance of all the agents.

This is something that mushroom at the moment doesn't support,

but we think that we can easily add this extension in Mushroom,

and we are actually planning to do it in the next months.

And also, we are planning to add more simulators. And actually,

as I told you before, we have iGeeks and habitat simulators that are currently

on

the development branch

of the library.

Of course, we would like also to integrate the library to be used with the real robots. So we are thinking about ROS, which is robot operating system support.

We are trying to

increase the capabilities

of the logging. And we also have benchmarking

suite that you can use it to run our algorithm on standard benchmarks,

and we are planning also to extend that. What we would like is to have contribution

from the community. What we would like is to have implementation of new algorithm,

particularly

publish the 1

ones. So

we would like to extend

our library to support more known approaches.

We would like to support to extend our library

more, tools for maybe

visualization,

maybe

process data.

And we also would like to

have feedbacks about the documentation

and feedbacks about the presence of bugs or other issues in the code. So this is something that we really appreciate and that we

try to

address,

as soon as possible

when we receive some GitHub issue or when somebody writes

us to our email.

So we are trying to respond as soon as possible.

So, yes, feedback from users is certainly welcome.

The most desirable thing for us will be to have some some implementation of algorithms

proposed

by users. So for example, if they like certain method reinforcement learning that is not in mushroom,

they can make a pull request GitHub

to propose the integration of their method in mushroom. And we are actually providing in the readme file the instruction

for being compliant with the idea

of mushrooms. So with the architecture,

with the style of the code,

the need for docstrings, and stuff like this. So the users have all the information that they need for adding their methods,

and this will be

our, let's say,

most desirable thing. So to have active users

adding the metals in the library,

and this will also make our work much easier to maintain.

Are there any other particular areas of contribution that you're looking for help with or types of support that you're looking to provide

to active contributors

or any other aspects of the mushroom project and reinforcement learning that we didn't discuss yet that you'd like to cover before we close out the show? So if someone can help us with a nice logo, it would be great. Otherwise, we will have to find a good graphical designer as, unfortunately,

both me and Carlo

are not very

skilled

on this topic.

Our logo at the moment is a small mushroom, a small toy mushroom that my girlfriend bought for me.

And I made a photo and tried to make some artistic

modification to this photo, but it's very

raw, I would say. So if a professional designer, as David said,

would be willing to make a logo, it would be nice.

But also

video tutorials would be very nice to have. So

and

we can give

hints and help to anyone that wants to do this kind of work. Unfortunately,

we don't have much time to improve the documentation in this way.

For us, it's a bit too much. But for sure, we can give hints and we can

check the videos if someone

wants to do this kind of work. Yeah. Something really important is the documentation. We saw people actually

sending us email

asking for improving the documentation.

At the moment, I will say informative enough, but not visually appealing.

So we need people also with some skills in web programming

for creating

a visually

good looking website for documentation,

for adding tutorials,

videos that will help Mushroom very much to be more visible. I would point out that Mushroom is completely open source, and

we don't have any

commercial

or closed source part of it. So it's all open, and it's all

free for everyone. So we are not interested in the

commercial part. We are interesting in

giving a tool

to the open source community

to work with and to increase the quality of the research

and to spread

good methodologies in reinforcement learning in general.

Alright. Well, for anybody who does want to get in touch and follow along with the work that you're doing, I'll have you each add your preferred contact information to the show notes. And so with that, I'll move us into the picks. And this week, I'm going to choose the TV series Britannia.

Just watched the 2nd season of that, and it was very well developed storyline, very

sort of interesting combination

of both the sort of older style, like the sort of time period with some interesting sort of musical accompaniment. So just a really well done show. Looking forward to seeing the next season whenever it comes out. So definitely recommend that for people who are interested.

I'll just note that it is not kid friendly, so before you go trying to show it to your kids. So with that, I'll pass it to you, Davide. Do you have any picks this week? Yeah. I would suggest

1984

from George Orwell. I think it's a great book. I'll second that 1. And, Carlo, do you have any picks this week? So as you talked about TV series, I will say that I can tell you about my favorite TV series, which is Twin Peaks season 3. For me, it's really 1 that I ever seen on a screen. So I will recommend it to every fan of David Lynch, for sure. All right. Well, thank you both for taking the time today for sharing the work that you're doing with mushroom RL and the research you're doing in reinforcement learning. It's definitely a very

interesting area, something that I'm always looking to learn more about. So I appreciate you taking the time to share your work and the work that you're doing to help make it more accessible to the broader community. So thank you both again for all of that, and I hope you enjoy the rest of your day. Thank you. Thank you very much. It was very nice.

Thank you for listening. Don't forget to check out our other show, the Data Engineering Podcast at dataengineeringpodcast.com

for the latest on modern data management.

And visit the site of pythonpodcastdot

com to subscribe to the show, sign up for the mailing list, and read the show notes.

And if you've learned something or tried out a project from the show, then tell us about it. Email host@podcastinit.com

with your story.

To help other people find the show, please leave a review on Itunes and tell your friends and coworkers.

The Python Podcast.init

Summary

Announcements

Interview

Keep In Touch

Picks

Closing Announcements

Links

The Python Podcast.__init__