Summary
With libraries such as Tensorflow, PyTorch, scikit-learn, and MXNet being released it is easier than ever to start a deep learning project. Unfortunately, it is still difficult to manage scaling and reproduction of training for these projects. Mourad Mourafiq built Polyaxon on top of Kubernetes to address this shortcoming. In this episode he shares his reasons for starting the project, how it works, and how you can start using it today.
Preface
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- When you’re ready to launch your next app you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 200Gbit network, all controlled by a brand new API you’ve got everything you need to scale up. Go to podcastinit.com/linode to get a $20 credit and launch a new server in under a minute.
- Finding a bug in production is never a fun experience, especially when your users find it first. Airbrake error monitoring ensures that you will always be the first to know so you can deploy a fix before anyone is impacted. With open source agents for Python 2 and 3 it’s easy to get started, and the automatic aggregations, contextual information, and deployment tracking ensure that you don’t waste time pinpointing what went wrong. Go to podcastinit.com/airbrake today to sign up and get your first 30 days free, and 50% off 3 months of the Startup plan.
- To get worry-free releases download GoCD, the open source continous delivery server built by Thoughworks. You can use their pipeline modeling and value stream map to build, control and monitor every step from commit to deployment in one place. And with their new Kubernetes integration it’s even easier to deploy and scale your build agents. Go to podcastinit.com/gocd to learn more about their professional support services and enterprise add-ons.
- Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email hosts@podcastinit.com)
- Your host as usual is Tobias Macey and today I’m interviewing Mourad Mourafiq about Polyaxon, a platform for building, training and monitoring large scale deep learning applications.
Interview
- Introductions
- How did you get introduced to Python?
- Can you give a quick overview of what Polyaxon is and your motivation for creating it?
- What is a typical workflow for building and testing a deep learning application?
- How is Polyaxon implemented?
- How has the internal architecture evolved since you first started working on it?
- What is unique to deep learning workloads that makes it necessary to have a dedicated tool for deploying them?
- What does Polyaxon add on top of the existing functionality in Kubernetes?
- It can be difficult to build a docker container that holds all of the necessary components for a complex application. What are some tips or best practices for creating containers to be used with Polyaxon?
- What are the relative tradeoffs of the various deep learning frameworks that you support?
- For someone who is getting started with Polyaxon what does the workflow look like?
- What is involved in migrating existing projects to run on Polyaxon?
- What have been the most challenging aspects of building Polyaxon?
- What are your plans for the future of Polyaxon?
Keep In Touch
- Website
- @mmourafiq on Twitter
- mouradmourafiq on GitHub
Picks
- Tobias
- Mourad
Links
- Polyaxon
- Investment Banking
- Luxembourg
- Matlab
- Text Mining
- Tensorflow
- Docker
- Kubernetes
- Deep Learning
- Machine Learning Engineer
- Hyperparameters
- Continuous Integration
- PyTorch
- MXNet
- Scikit-Learn
- Helm
- Mesos
- Spark
- SparkML
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to podcast.in it, the podcast about Python and the people who make it great. When you're ready to launch your next app, you'll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 200 gigabit network, all controlled by a brand new API, you've got everything you need to scale. Go to podcastinit.com/linode to get a $20 credit and launch a new server in under a minute. Finding a bug in production is never a fun experience, especially when your users find it first. Air Brake error monitoring ensures that you'll always be the first to know so you can deploy a fix before anyone is impacted.
With open source agents for Python 23, it's easy to get started, and the automatic aggregations, contextual information, and deployment tracking ensure that you don't waste time pinpointing what went wrong. Go to podcastinnit.com/airbreak today to sign up and get your first 30 days free and 50% off 3 months of the startup plan. To get worry free releases, download Go CD, the open source continuous delivery server built by Thoughtworks. You can use their pipeline modeling and value stream app to build, control, and monitor every step from commit to deployment in 1 place. And with our new Kubernetes integration, it's even easier to deploy and scale your build agents.
Go to podcastthenit.com/ gocd to learn more about their professional support services and enterprise add ons. And visit the site at podcastin it.com to subscribe to the show, sign up for the newsletter and read the show notes. Your host as usual is Tobias Macy. And today, I'm interviewing Murad Murafiq about PolyX on a platform for building, training, and monitoring large scale deep learning applications.
[00:01:47] Unknown:
So, Murad, could you start by introducing yourself? Yeah, sure. My name is Murad. I'm the author of Polyaxon. And before starting work, I started working on Polyaxon, I worked a couple of years in investment banking. Then I tried a couple of startup ideas, did not work, and I decided to go back to work. And, yeah, I joined some tech companies, some startups. And recently, again, I left my job and I'm working full time on PolyXR. I don't know for how long, but, yeah, for now I like it.
[00:02:16] Unknown:
Taking a leap of faith. Yeah. And do you remember how you first got introduced to Python? Yeah.
[00:02:23] Unknown:
I think, first, the first time I was introduced to Python, was back in 2010, I guess. It was through a colleague of mine. I was living in Luxembourg. At that time, I was doing a lot of C and C plus plus some MATLAB, some Excel also for the front office, for an investment bank. And my colleague was telling me about his experience writing some text mining scripts to extract sentiment that could be beneficial for trading. So I also started playing with the language. I quite like this and yeah. Especially the, the numerical libraries.
I was like also missing a lot of reports and analysis in a structured and organized way. And by the end of that year, I also started working on 2 projects. 1 was related to credit derivative pricing and rating, and the other 1 was a web app. So it was natural that I was using Python for both and since then I was like relying, on Python for many of the things that I'm doing.
[00:03:22] Unknown:
Yeah. It's always astonishing the breadth of things that you can do with Python and stay within the same language. And occasionally, you may need to integrate with a compiled binary to get some performance improvements. But at the user level, it's all still just Python. Yeah. Exactly. And can you start by giving a quick overview of what the Polyaxon project is and your motivation for creating it?
[00:03:47] Unknown:
So I started working on poly I mean, it was not called Polyaxon at that time. It was just like cap and scripts that I, put together when I start using TensorFlow in 2015. At that time, there was a small team at Google that was working on, scikitflow. It was something similar to scikit learn, but for TensorFlow, I liked it and then they moved it inside TensorFlow as a country blur and since then it was like breaking every time they were like upgrading. So, I thought I would build my own small library and, yeah, been working on that since then.
Eventually it became PolyXone because I started also like thinking about persisting the experiments, reproducing them, how we can also like integrate learning in different kind of like, environment like Docker or Kubernetes. And, yeah, last year in September, I thought that maybe I should just work full time on PolyXone and, basically try to realize the the the vision that I have behind it.
[00:04:52] Unknown:
And in the context here, when you're referring to running an experiment, is that for just doing quick iterations on a machine learning model to figure out how best to approach a problem?
[00:05:04] Unknown:
So the thing is, when you are, building a machine learning or a deep learning model, depending how big or complex the the model is, there is a couple of things that you will be thinking about. Like, for example, you need first to write some code and I assume that this code is, version controlled. So, basically, you are tracking the changes of your code and then this code is basically responsible for doing many things. You have like the cleaning of the data, data augmentation, extracting and engineering features, doing some data exploration, some analysis, and then you start thinking about the core machine learning, the core deep learning model that you want to, that you want to train. And this this model need to load this data and start training and eventually doing some exporting some outputs logs as well as other artifacts like model check checkpoints, weights, and so so on and so forth.
So this, this, basically this, process becomes more or less, tedious each time. So a lot of people start thinking about abstraction this, this, this process into something that could could be, like, where you could you can just, like, put some code and then it's running. It's doing all these kind of things without thinking about, like, I should do this first and then that's afterwards.
[00:06:20] Unknown:
PolyXon, you mentioned Kubernetes and Docker. So wondering if you can describe a bit about how it's implemented and what the internal architecture looks like.
[00:06:30] Unknown:
So, as I said, like, a data scientist or a machine learning engineer basically writes code and try to to train this code. Eventually, think the the the machine learning engineer also thinks about, like, restarting the experiments if, not satisfied if she's not satisfied with the results, eventually also deciding and potentially also deciding to share the work with the rest of the team, to compare the results with previous experiments. She might also, like, decide to retrain the model or, basically just like to pass the the the work to 1 of the the the teammates. But the problem is, in order to, restart an experiment or retrain the model in the same fashion, in the same way, it becomes really difficult because you have to handle environment variables, dependencies, environment sets up and so many other things. So the container technology gives you this ability to basically package everything and have it basically reproducible.
So that's what's, what's, what I'm doing with the PolyXone. Basically whenever, machine learning or data scientists tries to train them all or run an experiment, basically, polyaxone keeps tracks of all the kind of like dependencies that's, that are needed for for this, for for this also. So basically, we we track the the the version of the code. We have a version control system that is built in in polyaxone. We have a docker, docker registry that's basically gets all the dependencies in a very simple way and builds a docker image, run the experiments in a basically SaaS running on Kubernetes.
Kubernetes is very interesting because, it gives you the ability to basically redeploy the same, the same structure, the same, the same infrastructure, on your computer or on a cloud platform or basically on bare metal. So, basically, you have this, this huge engine that gives you, that gives that abstracts all the orchestration of your containers, of your resources. And, yeah, that's, that's how basically, the core of what Axon is working.
[00:08:51] Unknown:
What is it about deep learning workloads that makes it necessary to have a dedicated tool for deploying them, particularly for being able to manage a scale out environment
[00:09:02] Unknown:
to parallelize the experiments that you're running? Yeah. I think deep learning and machine learning in general is, maturing as we, in the last couple of years, but it's not as mature as software engineering. There's a lot of technical debts that gets introduced every for, every project. So depending on the the organization and also like the processes that they have inside, for their teams, you would see, like, different kinds of, like, structures. But in general, it's not as as rigorous as a software engineer. So there's a lot of the dimensions, that's basically and a lot of technical debt that is related, for example, to data versioning, technical debt that is related to configuration, hyperparameters, to to code, to resources, for example, starting an experiment with all all this kind of like dimensions is really hard.
So, basically, when you have someone building a deep learning or machine learning model or trying to solve, a problem within a company, in general, you have all this kind of, like, configuration and people like rely on documentation to reproduce the results. So if for example, 1 of the engineers, leaves the company or basically second someone is trying to pick up the the work, it's really hard to reproduce the same results. Or basically, if you want to, you have there's also like another form which is the stellens of the model. So basically you need to retrain the models every now and then and it's a lot of knowledge that is not distributed equally because basically people are relying on documentation, which is not always a good idea. So, basically, with PolyXone, we are trying to bring some kind of, like, rigor and some kind of, like, organization to, deep learning and machine learning in general. So, that's basically the data science team can focus on the algorithm and then just have a system that basically tracks all these kinds of dependencies that and, gives them an environment that they would they would be able to reproduce the the, the experiments and reach the same results again and again.
[00:11:03] Unknown:
And a lot of times when data scientists are working through a problem and exploring the problem space and then they generate a model, Occasionally, it's necessary for another engineer to take that model and then translate it to a different language or a different architecture or just modify the way that it's implemented to increase efficiency for being able to deploy it in production. But if the model is already contained within a Docker container, have you found that it's possible to just take the output of the experiment that you're using with the PolyaXon and then just push it straight to production?
[00:11:43] Unknown:
No. That's 1 of the things that's PolyaXon, brings to the table. So, basically, you don't care about, like, creating Docker images the PolyXone, but in order to run this code, I need this, these dependencies. Anything that could run on Docker is basically something that we can use on PolyXone. So, basically, if you have a code and you change it to, like, for example, fine tune some kind of parameters or change the structure of the code, you just, like, push the code and it's basically automatically, version controlled, in our system and it creates a Docker image and starts the start the experiment. So, basically, what we are doing with Polyxion is like we are abstracting more or less the data science workflow and simplifying it, but also we are allowing the data sciences to only focus and iterate on the algorithm without having to care about, all this kind of like dependencies and specific things to the experiments. So that's, for example, you just have a configuration file, a small YAML file and basically everything is, everything is handled by PolyXone, like the configuration tracking, the resource allocation.
And also like the, basically you have, you have an API, so you can do that also like programmatically, if you don't want to use the command line interface. So also like, PolyXone allows like different ways of like scheduling the job. So that, for example, you can have also like, experience running, through, with some kind of like schedule every, for example, every week or every 2 weeks or, for example, if there is like some new data in the database, there's like triggers being built right as we speak. So that's basically can train many models, optimize many kind of like experiments, on a cluster by using the maximum resources, GPUs, and and everything.
[00:13:37] Unknown:
So in a way, it sounds a lot like Poly Axon is aiming to be the continuous integration system for machine learning because in a traditional continuous integration system for a pure software project, you would want to be able to have those periodic builds to ensure that everything's operating properly and then also have the build on a code push. But because of the added complexity of requiring the data to be present for machine learning workflows, I imagine that adds some complexity to a standard continuous integration system.
[00:14:14] Unknown:
Yeah. There was, as I said, like, there was, like, 2 kind of, like, faces, 2 kind of, like, dimensions. There was, like, the the dashboards, which is basically just, like, for viewing and, reviewing the experiments so that you have some kind of, like, knowledge, distribution within the team. So basically you have you are working on a project with the other team members. You are working on your own kind of, like, experience and you want to compare their experiments and then the the values and how this experience are, like, different. What what is the, what's make this experience better than the other 1 in terms of, like, hyperparameters and other kind of like configurations.
And there's like this part of where you want to also like automate as much as possible. So there's like a so the core model of machine learning is very small compared to all the the the entropy around, like, all the data management, configurations and everything. So what we're trying to do is basically use Kubernetes, which is really good for orchestration as a class of the cluster for managing the resources for producing the same deployments, everywhere and also for scaling and with PolyX and we're adding a specific layer that is, that is trying to provide some kind of like reproducible and scalable machine learning and deep learning, on top of Kubernetes. So basically we have like authentication and the user management for projects, for, machine learning projects and and also like having some kind of like modular kind of like, architecture where you you can, for example, install only some parts of, like, the platform for doing a couple of things, for example, for the installing just the dashboard board or having also like the the the engine, the runner, or having also like the pipelines for doing, for example, the whole continuous integration of the the machine learning. So that, for example, you can say, I want this experiment to run every Monday or run basically, whenever there's something some some triggers happening.
[00:16:05] Unknown:
And when you're doing those periodic builds, is it possible to set some sort of expectation of the output to ensure that it's staying within certain bounds so that you can be alerted if something is not operating as intended or whether there's new data that's causing the model to behave in an unpredictable fashion?
[00:16:25] Unknown:
So, basically, with the the the idea behind this pipeline is that it gives you more or less some kind of, higher, like, abstraction of doing whatever you want, for example, integration with the different kind of like environments, for example, Slack. So for example, you can say if, the last experience has, accuracy or like a loss of this value or that value, send me a notification, for example, or restart the training based on a different kind of, like, time range or whatever. So basically with this kind of, like, backlinks, you can build different kind of, like, workflows. And yeah, you have you don't need to to basically watch your computer, like, have some kind because right now what most people do is, like, they have some kind of, like, session somewhere on some kind of machine. They have to SSH and check what's happening.
If the if the machine is gone, basically, the the logs and all the artifacts is gone. And that's what we're trying to do with the codec zone is basically have all this kind of, like, components separated so that you never kind of you never lose like your logs, artifacts, outputs, and you have all this kind of like, links between how the the lineage basically of how this experiment started, why why it reached these results and how to reproduce it.
[00:17:38] Unknown:
And going back to the step of building the Docker container to be deployed onto Kubernetes and run via Poly Axon As somebody who is new to the overall technology of containerization, it can be difficult to figure out how best to approach encapsulating everything into 1 image and having a entry point that will function properly. And it sounds like you've attempted to abstract out all of those concerns of writing a Docker file, figuring out what needs to go into it, how it gets executed. But for somebody who it needs a bit more control, what are some of the tips or best practices that you found for creating containers that are going to be used to execute these deep learning or neural network type workloads?
[00:18:27] Unknown:
I mean, so how PolyXone works is very simple, basically. You always have to provide some kind of, like, base image. If you're running your code on, for example, Python 3 and you need to install TensorFlow, basically there's like a TensorFlow image, but imagine, for example, you want to install this on like a Python 3, for example, Docker image. You just provide the the base the base image and then you have, like, some build steps that you need to provide, for example, to install TensorFlow and then the command that you want to run on your, on your code. So, basically, you don't, the the process of creating a Docker, Docker file is not, is very abstracted that is it's it seems very simple. So you don't need to provide many things. You just need to provide a couple of, dependencies that you want to install on some kind of, like, machines. And, going back to, whether you you have, like, more, whether you want to have more control over, like, the Docker images that are being built, Of course, you can create, a very complex, container for example. You can upload it to some kind of Docker registry and then you can reuse it within PolyXone. So it's not, mutually exclusive how you can use like the if you are like very proficient or and you know what you want to do with the Docker image, of course, you can do whatever you want and then you can just like reuse it within PolyXon. But PolyXon just provides this very simple interface, between the user and Docker so that it's not forcing the researchers to be proficient, dev ops people. So they can just run the experiments without, like, having to think too much about, like, how the specificities of, how, a container is running.
[00:20:04] Unknown:
And in PolyXon, you're supporting a number of different libraries for being able to discuss some of the relative trade offs or comparison between the available libraries or the available frameworks and also maybe some of the complexities that are involved in being able to support all these different targets? Yeah. So,
[00:20:35] Unknown:
so, basically, since the platform is, built and relying on Docker to stuff like the experiments, we can basically support any framework. In fact, you can run anything that can run, on Docker. That being said, for users who wish to run distributed training algorithms, that's when the complexity becomes really, when the complexity becomes really, when the complexity increases is basically we provide very simple interface for distributing the work on a user defined topology. For example, for running a simple TensorFlow model, you can run it basically with them on your, on your computer or on any platform. But for example, if you want to run a distributed training on TensorFlow, you need to build some kind of like topology and this topology needs to communicate like all these, these, virtual machines need to communicate between them. Same thing for other, frameworks. So what Polyactone does is basically provide a very simple interface where you say, for example, I need this number of machines and this is their role. For example, it should be a worker or a parameter server or, or a master and we build this topology and basically you run your code within this topology and, and PolyXone knows how to make this, this, virtual machine stop, with each other and, reach some kind of like, results. So basically, what we, we provide a simple interface for TensorFlow as well as, AutoVault and also for PyTorch and MXNet.
There's a there's, some, trade off between these technologies. Some of them are well documented and there's a lot of tutorials, for example, TensorFlow. And yeah, for others, it's really hard and like for many people they don't, you need to, for example, MXNets, you need to rebuild, Docker image with, and basically have some kind of like configuration enabled. Both TensorFlow and MXNet, they have more or less similar kind of like architecture in terms of you have a master and you have, parameter servers and workers. For PyTorch, you have only like a master, which is the, have a couple of workers and the master is the worker with rank 0 and all the other workers are are basically communicating with each other depending on how you want to do that. So, basically, in your code, you would have to specify what kind of like architecture you want to, you want to build. PolyXon only builds the the topology and provide it for the user. And for somebody who is getting started with Poly Axon, what's involved
[00:23:01] Unknown:
in setting up the integration with Kubernetes, assuming that they already have the Kubernetes deployment available.
[00:23:08] Unknown:
Yeah. I mean, PolyXone is, using Helm charts. So Helm is the the official package manager for, for Kubernetes. We provide, like, some charts. So, basically, the user just need to basically install install our our charts and the the platform will start working on the top of of Kubernetes in some kind of, like, namespace, that the user also creates. And, yeah, from there, you need also like our command line interface to basically talk with the API. And from there, everything just works. In the next few versions, we would have also like, we would separate the core runner which is, which requires Kubernetes and the dashboard which is basically just having an interface over what's project experiments, group of experiments, user management so that you can, for example, install the, this dashboard on, a Docker Swarm and you don't need to use our runner. You can run the experience on your own cluster in a different way and basically just upload all the data, like related to each experience. For example, you just have some kind of like hooks that's you install within your code, in your experiments and they are just like sending metadata about like the experiments so that you can have a dashboard without having to care if you want to run your experiments,
[00:24:26] Unknown:
with the the polyax on runner or with different kind of, like, engine. So that would let somebody use, as you mentioned, Docker Swarm or for instance, something like Mesos if they don't have Kubernetes and don't want to introduce that to their infrastructure?
[00:24:40] Unknown:
Exactly. So for example, imagine you have Spark and you have, using the spark, ML library and you, you're running all your experiments within spark and you want, but you still want to track how, how these experiments are running, what kind of like configurations they're using, what kind of, versions the of code they are using. And, yeah, so basically, you just install the the polyaxone dashboard without, and you run your experiments on, experiments on, Spark, and basically you have all this kind of like APIs that's called, PolyXone API.
So, to persist all the different configurations for each and every experiment so that you have comparison and historical data over your machine learning, process.
[00:25:26] Unknown:
And if somebody has an existing workflow, an existing code base for a machine learning project. Is there anything special that's involved in migrating to using PolyaXon?
[00:25:38] Unknown:
Yeah. Of course, you need to have for example, it depends obviously on, what kind of what parts you want to install. For example, you want to have only the dashboard or also like the the runner. But in general, at least in my experience, most of like machine learning and data science team, they basically run experience or on their computers or if they have some kind of like a company cluster where they are just like SSH into the machines and running the experiments. In that case, if you are using only the dashboard, you have you only need to have like, our SDK installed in your code and then basically running, whenever you run an experiments, it basically extracts all the metadata and send it to the dashboard. If you want to use the the full platform with the the runner and everything, so basically, you need to have the command line interface and then you have the specification files, the XAML files, where you say how you want to run your experiments over Kubernetes.
And basically from there, everything stays the same. You don't even have to change anything within your code because everything is running within, Polyaxon's platform. So it's tracks and, yeah, you can reproduce it whenever you want.
[00:26:45] Unknown:
And as I was looking through the documentation, I noticed that there's a library implementation that you have as part of this project that looks as though it aims to abstract out a lot of the steps involved in just building a simple model so that it's easier for people to get started without having to dive into TensorFlow or PyTorch, etcetera. I'm just wondering if that's an accurate interpretation of the intention of that library. And if so, how somebody would go about getting started with that?
[00:27:14] Unknown:
Not like it's not, very well maintained right now. So basically it was, the idea behind the, the, the PolyX library is to have, describing, describable models so that you don't have to write even code for most of the, the, the machine learning or the deep learning architectures, the, the, the well known architectures so that you just specify, some YAML configuration where you say, I want to use this model and this is the architecture inside the the engine and then codec zone knows how to how to create the the whole model and train it.
[00:27:51] Unknown:
And you mentioned that it's not currently being well maintained, so that was just an experiment to see if it was viable, but the main body of the work that you're doing now is more on managing the orchestration and deployment of the models. Yeah. Exactly. And what have been some of the most challenging aspects of building and improving Polyaxon whether from a technical level or from just raising awareness and making it understandable to people who are interested in using it?
[00:28:22] Unknown:
I think there was, like, 2 2 type of, users for PolyXone. So the data science teams in general, they they know most of these problems and they if you work at any company where where you have at least like mid to big, data science team, you will find like at least some people working on some kind of internal tool to obstruct this, this this machine learning and deep learning training kind of like workflow. On the other hand, you have also like the the single developers for this, this, for these people, it's kind of like hard to convince them to use, PolyXone because it's a big kind of like infrastructure for running all the, like, experiments of 1 user. But for for, for anyone who wants to have, like, reproducible models, for anyone who wants who wants to have historical data about, like, their, their machine learning experiments. It's I think it's necessary to have something comparable to PolyXone to at least track everything that is running. And are there any particular
[00:29:20] Unknown:
features or improvements that you have planned for future releases of PolyXone or a particular direction that you're intending to take the project? Yeah. As I mentioned,
[00:29:29] Unknown:
having, set separating the concerns between the the dashboard and, the the the Kubernetes, runner, is 1 of the things that I'm working on. Having also the pipelines running and all the integration with different kind of like the databases, external tools, like for example, Gits, GitLab and Slack and many other, Trello and many other kind of like platforms. So that's basically you can have this integration with different tools also like triggering, experiments and workflows. This is mostly like, what I'm trying to do also for, for authentication. A lot of people ask about like LDAP and how, for example, they can integrate easily with their,
[00:30:11] Unknown:
with their systems for big companies in general. And if you were to start the whole project over again, do you think you would still use Python for building polyaxon, or do you think that there is potentially a different language that would have made things simpler?
[00:30:24] Unknown:
Actually, when I started working on PolyXon, at least like this, this kind of version, I had, like, very small microservices talking and I had some Microsoft services in Go. I had like the streamer in Node. Js, but then the complexity became, start like became really like hard to maintain for a single developer. That's why I decided to just go with the Python because it's more or less manageable for me to have this kind of complexity. In the future, yes, whenever there's like a microservice that I think needs to have, where the performance is important, I think I would move it to to go, for example. But most of the other things, I think Python is a really good language for building a lot of things. And 1 of the other things that I found interesting about the project is that you're effectively
[00:31:12] Unknown:
wrapping the Kubernetes API with the Polyaxon command line interface. And I'm wondering whether you faced any difficulties in terms of figuring out how that interaction should be structured, particularly given that you need to be able to reflect it internally to the Kubernetes API?
[00:31:33] Unknown:
Not really. I have, like, 1 API that's accessing to the PolyXone platform. It's basically through the specification. And from there, I have a couple of spawners that is, framework dependent and I'm building everything on the top of, the Kubernetes, Python clients to basically just create the ports and everything. I know that a lot of other platforms, they create CRDs, custom, they customize customize the, basically, deployment or something. But I think, the API, the Kubernetes clients, the Python client is, is good enough that you can create many kind of, like, workflows and customizable workflows.
So, yeah, in general, it's like 1, 1 single, entry points, from, from the specification to the PolyX and API. And from there, basically having different kind of cuss customizable, workflows for running different kind of, like, frameworks.
[00:32:35] Unknown:
And are there any other topics that you think we should discuss before we start to close out the show or anything that we didn't cover? Yeah. I think that I believe that, machine learning in the next 2 years will,
[00:32:46] Unknown:
keep maturing and, there will be a lot of, a lot of, platforms that are similar more or less of trying to solve, at least some some some of these problems that I'm trying to solve for reproducibility in the machine learning. There's particularly, for example, the the data the data management and data, versioning problem. I think it's very hard problem to solve. A lot of, companies are also trying to to basically, abstract this this huge area, this huge dimension of, not only deep learning, but also for in software engineering, what do you have, because you have multiple access to to the data? How can you version it, especially with the space and everything? Hopefully in the future also, like for Axon, we'll, we'll we'll be able to have some kind of abstraction for data management and data versioning. But apart from that, I think polyax or any other platform is really important if we want to bring some kind of like rigor to the deep learning and machine learning space.
[00:33:43] Unknown:
Alright. So for anybody who wants to get in touch with you and follow the work that you're up to with PolyXon, I'll have you add your preferred contact information to the show note. And with that, I'll move us to the pick. And I know that we already spent a fair bit of time talking about Kubernetes, but I've been digging into it more recently. And so I'm going to pick Kubernetes as a platform because it really provides a great way of abstracting out a lot of the operational concerns of running software and creates a fairly robust contract between software engineers and machine learning engineers and the operations engineers who need to manage the underlying infrastructure. So I've been reading the O'Reilly book, Kubernetes Up and Running, and that's been, helpful to try and understand the concerns of Kubernetes and what that contract is.
And also, Kelsey Hightower has been doing a great job Yes. Of creating a lot of communications tools and creating a lot of resources for people who are getting into the space. And there's a great podcast that he was on recently, the Foodfight Show where he was discussing that. So I'll add links to all those things in the show notes for anybody who wants to dig a bit deeper into Kubernetes. And so with that, I'll pass it to you, Murad. Do you have any picks this week? I'm rereading actually,
[00:35:00] Unknown:
Schopenhauer recently. So I started reading again Schopenhauer. That's when when I'm not working on PolyXone or something related.
[00:35:09] Unknown:
Well, I appreciate you taking the time today to join me and discuss the work that you're doing with Poly Axon. It's definitely a very interesting project, and I think that it's, move in the right direction of making machine learning more repeatable and, easier to share around the business that you're working with. So thank you for that, and I hope you enjoy the rest of your day. Thank you. Have a nice day.
Introduction to Murad Murafiq and Polyaxon
Murad's Journey with Python and Early Projects
Overview of Polyaxon and Its Motivation
Challenges in Deep Learning Workloads
Polyaxon Pipelines and Automation
Setting Up Polyaxon with Kubernetes
Challenges and Future Directions for Polyaxon
Closing Thoughts and Picks