Summary
Building and managing servers is a challenging task. Configuration management tools provide a framework for handling the various tasks involved, but many of them require learning a specific syntax and toolchain. PyInfra is a configuration management framework that embraces the familiarity of Pure Python, allowing you to build your own integrations easily and package it all up using the same tools that you rely on for your applications. In this episode Nick Barrett explains why he built it, how it is implemented, and the ways that you can start using it today. He also shares his vision for the future of the project and you can get involved. If you are tired of writing mountains of YAML to set up your servers then give PyInfra a try today.
Announcements
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- This portion of Podcast.__init__ is brought to you by Datadog. Do you have an app in production that is slower than you like? Is its performance all over the place (sometimes fast, sometimes slow)? Do you know why? With Datadog, you will. You can troubleshoot your app’s performance with Datadog’s end-to-end tracing and in one click correlate those Python traces with related logs and metrics. Use their detailed flame graphs to identify bottlenecks and latency in that app of yours. Start tracking the performance of your apps with a free trial at datadog.com/pythonpodcast. If you sign up for a trial and install the agent, Datadog will send you a free t-shirt.
- You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For more opportunities to stay up to date, gain new skills, and learn from your peers there are a growing number of virtual events that you can attend from the comfort and safety of your home. Go to pythonpodcast.com/conferences to check out the upcoming events being offered by our partners and get registered today!
- Your host as usual is Tobias Macey and today I’m interviewing Nick Barrett about PyInfra, a pure Python framework for agentless configuration management
Interview
- Introductions
- How did you get introduced to Python?
- Can you start by describing what PyInfra is and its origin story?
- There are a number of options for configuration management of various levels of complexity and language options. What are the features of PyInfra that might lead someone to choose it over other systems?
- What do you see as the major pain points in dealing with infrastructure today?
- For someone who is using PyInfra to manage their servers, what is the workflow for building and testing deployments?
- How do you handle enforcement of idempotency in the operations being performed?
- Can you describe how PyInfra is implemented?
- How has its design or focus evolved since you first began working on it?
- What are some of the initial assumptions that you had at the outset which have been challenged or updated as it has grown?
- The library of available operations seems to have a good baseline for deploying and managing services. What is involved in extending or adding operations to PyInfra?
- With the focus of the project being on its use of pure Python and the easy integration of external libraries, how do you handle execution of python functions on remote hosts that requires external dependencies?
- What are some of the other options for interfacing with or extending PyInfra?
- What are some of the edge cases or points of confusion that users of PyInfra should be aware of?
- What has been the community response from developers who first encounter and trial PyInfra?
- What have you found to be the most interesting, unexpected, or challenging aspects of building and maintaining PyInfra?
- When is PyInfra the wrong choice for managing infrastructure?
- What do you have planned for the future of the project?
Keep In Touch
Picks
Links
- PyInfra
- Oxygem
- WordPress
- Lua
- Gary’s Mod
- Java
- Ansible
- SaltStack
- Chef
- Puppet
- EC2
- Boto 3
- Hashicorp Vault
- Vagrant
- Docker
- Testinfra
- Dockerfile
- Idempotence
- Nginx
- POSIX
- gevent
- Jinja2
- Click
- Zero Tier
- BSD
- AST Module
- RedBaron
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to podcast dot in it. The podcast about Python and the people who make it great. When you're ready to launch your next app or want to try out a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode. With the launch of their managed Kubernetes platform, it's easy to get started with the next generation of deployment and scaling powered by the battle tested Linode platform, including simple pricing, node balancers, 40 gigabit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to python podcast.com/linode today. That's l I n o d e, and get a $60 credit to try out our Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show. Your host as usual is Tobias Macy. And today, I'm interviewing Nick Barrett about Pyinfra, a pure Python framework for agentless configuration management. So, Nick, can you start by introducing yourself? Hi. Yeah. Thanks for having me, and thank you for listening. I'm Nick Barrett. I'm a software engineer working at a company in the retail analytics space where I focus on kind of infrastructure heavy projects. I also run a couple of side projects under the Oxygen
[00:01:19] Unknown:
brand and spend way too many hours on open source projects in addition.
[00:01:24] Unknown:
And do you remember how you first got introduced to Python?
[00:01:27] Unknown:
Yeah. So I actually started Python quite late. I started maybe 14 years ago with the kind of classic PHP, HTML, WordPress stack, fitting with my own kind of blog, little bit of the Internet, and then moved on to lure in Gary's mod of all plays the game Gary's mod of all places. Before, this is all pre university. And then even in university, we did, you know, Java and the the big big slow ones. And eventually,
[00:01:55] Unknown:
after University of the of the well, I started working professionally, then I moved into picking up Python. And I instantly fell in love with language. And ever since, as I said, the rest is history. And so that has ultimately led you down the road of building your own configuration management framework. So can you give a bit more background about what pyinfra is and some of the origin story and what led you to build it in the first place? Absolutely.
[00:02:20] Unknown:
So Pine for as a tool for managing infrastructure, it's designed to support both ad hoc command execution and kind of full state or configuration management. It, so it started back when I sorry. When I first started working professionally, we used a lot of Ansible and bit of fabric, both of which I love, both also written in Python. And they were kind of they're kind of like the basis of the inspiration for Pyinfra. The real that's kind of thing that it kick gave me the kick to to start building Pyinfra was, we were the infrastructure is growing rapidly. As we grew, Ansible became quite slow and frustrated to use over large numbers of servers. Do recall a a kind of infrastructure wide deploy we had to set up. I believe it was sent to at the time, which which involves uploading a whole bunch of files, onto each server.
And basically, as this as they've sort of grew, this this became exponentially slow. It got to the point where it would take kind of 20, 30 minutes just to run 1, you know, 1 rollout of updated, sensitive scripts at the time. We also were using, as I mentioned, fabric, which was which was still super fast, and we kind of occasionally would mix a bit of Ansible and fabric to, you know, fabric to do the kind of the bits that Ansible was slower and Ansible to do the initial install and and configuration management side of which is so good. So essentially, I took the my favorite bits of fabric, the its speed and its kind of pure Python configuration with the state management concepts and ideas from Ansible. And that's how I hacked together, high end for 0.1.
[00:03:59] Unknown:
And as you mentioned, there's Ansible, which is preexisting, and you highlighted some of the challenges and pain points there. And there are a number of other options both in Python and other languages for configuration management that have different layers of complexity and various options in terms of the ways that you approach it. And I'm wondering what the core goals are of pyinfra and some of the capabilities that it offers that might lead someone to choose that over other options such as Ansible or SaltStack or Chef or Puppet? Absolutely. So I think for me, the other part of the reason of building time for the kind of greatest benefit is in is in debugging capability,
[00:04:39] Unknown:
particularly due to the way time for works. So it kind of roughly executes shell commands in the in the same way that you would if you were kind of setting up a server by hand. And this kind of leads nicely into when something does go wrong, you get instead of getting, like, a bespoke kind of error trace back specific to, you know, the software, you just get essentially the standard out or the standard error in as it should be from the the command that failed during the deployment. So I think this is a this is a huge, like it's just instant feedback. So you get instant debugging feedback. So it means you can kind of rapidly iterate through a through building out a deploy or set of deploys with this in mind. And the other the other big win is the pure Python configuration of PlanFab, which is, which kind of enables almost infinite possibilities.
As everything is configured in Python, this allows you to integrate with with basically any Python package that already exists. So you if you need to pull in, like, a bunch of e c 2 hosts from AWS, you can just use Boto 3 and and integrate that. Or if you need to pull in secrets from hash called vault, you can just use the sort of standard Python libraries for that. And so none of that needs to be built into Python for itself. It's kind of out of the box compatible with more or less everything that Python is compatible with. So for somebody who is trying to build out infrastructure,
[00:06:01] Unknown:
there are a number of different ways that you can go about it, different levels of complexity in terms of what you're trying to deal with. And I'm wondering what your major pain points are in terms of dealing with the infrastructure that you're working on and some of the shortcomings that you see in terms of just the overall approach to tooling or
[00:06:20] Unknown:
system available systems architectures and how you're trying to tackle that with pyinfra in your own work? So I think I've I've previously mentioned the whether we're talking about the other tools. 1 of the certainly 1 of the pain points I've experienced is the the abstraction that tools apply over the top of of what's going on on under the hood. And so they hide away they they can hide away the kind of underlying commands of of changes being made to the server which is great when they work fine and they get the end result. The state is as defined in the playbook role or whatever. But when it doesn't work, you it's much harder to, pick out exactly how and why this has failed, which leads to kind of kind of can leads kind of unknowns in the infrastructure as it were, if that makes sense. I think there's another another pain point which is kinda unrelated to Python, to be honest, is I think resource creates a real issue that I've experienced kind of where especially if you've got, like, a bunch of cloud providers and maybe some dedicated servers thrown in there, either across a whole bunch of data centers, it's really easy to essentially lose servers.
And you'd occasionally, you know, might find, like, an old box just lying around, which is not only a a waste of money, but obviously becomes a security hole after a certain period of time when instantly, if you don't know about it and it's still attached to your kind of networking infrastructure. Yeah. There's definitely a lot of different complications
[00:07:46] Unknown:
in trying to approach infrastructure management and just all of the interconnectedness of the systems after you get them up and running makes it very difficult to try and think about them from the initial starting point and how to go from 0 to a fully working system in a
[00:08:03] Unknown:
relatively straightforward progression. So Yeah. Absolutely.
[00:08:07] Unknown:
And for somebody who is using pyinfra to be able to configure their servers, can you just talk through the overall workflow for actually creating the deployment logic and working with pyinfra and figuring out how to go from the local development to the production environment? Yep. Absolutely.
[00:08:25] Unknown:
So this kind of I mean, back so when when I first built Pyinfer, it was essentially use vagrant machines sort of a you know, bring up a local vagrant VM and then and then write the deploy logic and and execute that against the machine and and effectively manually verify that the deployment has completed. And then, you know, once that's super duper, you roll that onto production. And I think as is 0.9, 0.8, there's probably for now integrates directly with Docker so you can so obviously all the avoids all the overheads of the vagrant VM. So it becomes extremely easy to rapidly iterate endlessly. They're basically building Docker container over and over again using Pine for it, which which has the nice benefit of you can quite easily write some quick tests to verify that the contents of said container, you know, matches the the state defined in your deploy logic. I am also keen to kind of expand this. There's a cool project called test infra, which I became aware of recently, which kind of allows you to write unit tests for for tools like PyInfo and Ansible and Salt and so on. And I'm looking forward to integrating PyInfo into that, because I think that would be really awesome to be able to write, like, you know, full unit tests for that. Yeah. I've used test in for for my own work and definitely appreciate being able to use it. And I've actually written a custom integration with SaltStack so that you can write the test in for tests using the SaltStack
[00:09:49] Unknown:
YAML syntax, and then it'll do the translation for you behind the scenes, which is ugly to look at but useful when you're actually using it. Yeah. That's awesome. Yeah. It's it's a really cool project. And, yeah, kind of a bit just along my radar, but really for getting into it. Yeah. And I also appreciate your efforts to use pyinfra for being able to build Docker containers so that you can free yourself from the Dockerfile that I personally don't like the approach of, but it has gotten us thus far. So, we'll try not to denigrate it too much.
[00:10:21] Unknown:
I agree. I, yeah, I really struggle with the Docker file. Some days, I kinda like the the the lack of being able to have logic or, you know, complex logic in there, but some days it is very frustrating and the hacks that have to be used to kind of get past it. So, I mean, that was part of the inspiration for for educating Python with Python through a Docker. But, yeah, it's kind of a strange concept, like building a Docker container, either image and then turning that, like, from
[00:10:46] Unknown:
a from a kind of configuration management tool rather than a Docker file. But it it works pretty well. So the other thing that you mentioned is that pyinfra has support for being able to handle item potency of the deployments so that you don't have to worry about running at multiple times. And I'm wondering what your approach is for enforcing that in the different operations being performed and some of the things that you have to address in the built in operators to be able to ensure that it does have that idempotency support or edge cases that people who are customizing pyinfra need to look out for in terms of handling idempotency?
[00:11:22] Unknown:
Yeah. So it's a really interesting subject. It's a bit like pyinfra's like explaining the approach and then then we go from there. The essentially, achieves other potency by executing deploys in a kind of 2 phases. The during the first phase, which is read only, plan for will kind of read state from the server, you know, where's this file? What what packages are installed? Blah blah blah. And and then this is compared in the operations with the state defined by the user or within the deploy logic. And then finally, client for will spit out the commands required to alter that state to the 1, you know, defined defined by the user. And so by doing this, it means you you run it first time it executes them as normal. And then the second time you run it, nothing, you know, nothing will happen because there's nothing to change. There are some operations that don't follow this. For example, there's a a server dot shell operation, which which literally just executes a any shell commands you give it. Obviously, this this there's no state to compare this, so that that command would execute every time. In pretty much every configuration tool, there is the option to just run this bash script, and then you're on your own as far as ensuring that it's not going to break every time you run it. Yes. Yeah. It's an interesting 1. Yeah. There's there's a kind of there's an additional edge case with this because of the 2 this is this is where I think Pionfo differs to, like, Ansible and stuff where they do the the check the check whether the state is right kind of all on the machine during during execution phase because PlanGrid does not does it in a 2 phase and and allows you to essentially dry run it, to see what changes would be made. It does mean that operations, you almost have to consider them independent of others to be fully idempotent, which is obviously kind of a nonsense realistically. Like, for example, like, 1 1 good example is to install nginx using apt, and then you wanna remove the default site, or the link, you know, the the sim link to the default site configuration. The if you do that in pine for without, kind of if you just call the 2 operations, apt install and files dot link is present equals false, it will it will install NGINX as expected. But then when it gets to the files dot link operation, it won't do anything because at the time of first phase, that file never existed. So pyinf is gonna assume not we're not pyinf doesn't know it's there, and therefore doesn't know to delete it. So I've included there's a number of kind of hint options in Pine for in in certain operations. So you can get the output of the call to app dot packages with installing nginx. And then you can based on the output, it'll tell you whether something's changed or not. And so if something has changed, I. E. It's installed nginx, you can then pass the assume present argument to the files dot link operation, which will ignore or not check whether the the link exists and and kind of issue that removal
[00:14:17] Unknown:
based on the Nginx change. Digging deeper into Py infra itself, can you talk through how it's implemented and some of the overall design and implementation changes that it's gone through since you first began working on it and as you started to use it yourself and expose it for use by other people? Yeah. Absolutely. I think, obviously, I just mentioned the kind of 2 phase deploy. The 2 phase deploy goes way back to the very beginning,
[00:14:41] Unknown:
and it is, yeah, core to the the way behind for works. And this is implemented essentially by there's a core API that runs it kind of links everything together, and then the main 2 kind of objects, if you like, are, facts and operations. So operations are used by users to describe the desired state. You know, ensure this file is here, install this package. And facts kind of describe the current state on the remote side, I. E. This file doesn't exist or these apps packages are installed. And so playing for basically oh, sorry. Operations use these facts to figure out what needs to change to match the state defined in the operation. And then that is essentially the first phase. And at the end of that, you end up with essentially a bunch of commands to run per individual target host. And, when I say command, I mean either a show a standard show command or something like a file upload, a file download, or a Python callback. So once phase 1 is complete and you've got this list, I have for essentially just executes that by default operation by operation and going through each operation and, executing each of the host's commands as it goes. So that's the that's kind of execution phase. And another kind of key part of PyInfo, and this is this is a much more recent addition, is the idea of connectors. So PyTorch was built up using POSIX servers only, and then SSH was basically the only thing it connected to for a long time. And then that's gonna change recently with the addition of things like the Docker connector and the Vagrant connector. These get these essentially define how commands are executed. And it makes for a very pluggable system. For example, there's also a winrm Windows remote management connector, which is which is in Pine for now. It's kind of very, very early stages, but it's it's cut that's the kind of flexibility this system will will allow.
[00:16:39] Unknown:
This portion of podcast dot in it is brought to you by Datadog. Do you have an app in production that is slower than you like? Is its performance all over the place, sometimes fast and sometimes slow? Do you know why? With Datadog, you will. You can troubleshoot your app's performance with Datadog's end to end tracing and, in 1 click, correlate those Python traces with related logs and metrics. Use their detailed flame graphs to identify bottlenecks and latency in that app of yours. Start tracking the performance of your apps with a free trial at python podcast.com/ Datadog. If you sign up for a trial and install the agent, Datadog will send you a free t shirt to keep you comfortable while you keep track of your apps.
And as you've been building out pyinfra, what are some of the libraries that have been essential to being able to
[00:17:27] Unknown:
make it happen and some of the prior art that you've looked at as either positive or negative examples to learn from? From the library perspective, gevent is the absolutely at the core of pine front. I'm a I'm an absolute raving mad fan of gevent, as a library. It's, obviously, Python 3 is somewhat, reduced the need for it with native, async. But gevent is not only the API of gevent. I know I think it's better than asyncio, but it's also it's definitely quicker. I I think it's a what a truly wonderful library to use. And although the only downside, of course, is the kind of monkey patching part, which is kind of a bit hacky but I like it. Yeah. I mean, I've been using Pyenvran in on production workloads for, like, 4 years now, all of which has used GeoVent. And so I'm I'm not concerned from that perspective if you see what I mean. And the other the other kind of major library that it uses would be sorry. 2 libraries would be Ginger 2 for templating, which is kind of the same as Ansible and and Salt, I believe, both use it. And also Click, for the CLI,
[00:18:35] Unknown:
which I think is another absolutely fantastic bit of blue code. Yeah. Those are definitely useful and ubiquitous libraries, and both of them from the mind of who has also brought us Flask and many other good things. Indeed. Yeah. And so the other interesting piece to dig into is the set of operations that you've built into pyinfra for being able to perform the basis of idempotency and sort of define the available functionality for interfacing with the different system aspects of the servers that you're trying to build out. I'm wondering how you have determined which pieces need to be built and just the overall interface of being able to define new operations and add them into the runtime of pyinfo?
[00:19:22] Unknown:
Yeah. So the the core this I mean, this yeah. I remember looking at I remember still remember, Kurt, like, when I was first kind of figuring out Pinefro. What was what's the kind of baseline? What's the minimum you need to sort of, configure a server on a, you know, in a reasonable way? And so it comes down to, like, the kind of file file system for all for all kind of file system, like, you know, managing files, directories, line lines in files, links, and upload and download files. That can that's kind of essential to, I think, to any configuration management system or or even any kind of infrastructure management system. And then there's the server side stuff or it's it's called server the server module I'm buying from. I don't know whether that's the the right terminology but of user management, group management, you know, little things like managing the host name, managing CTO entries, managing the loaded kernel modules, and so on. So that's kind of the those, like, those 2 are the kind of real core core kind of set of operations that I would say most, like, are used most by any deployer that I've seen. And then on top of that, there's kind of the more tools command specific ones. So like at, for managing app repositories and and installing app packages, and then Yum, the same thing. And they were they were essentially the only 2 I implemented at the beginning because that was the they're the only they're the only ones I needed, but has now expanded. I think there's d n f, Brew, APK. There's, you know, a whole bunch of them, which is cool. And and a lot of them implemented by contributors, which is absolutely fantastic.
So as far as far as in so as far as developing additional operations, there's a the documentation contains a page on this. It's it's reasonably simple. It's it's basically just a matter of creating a Python function and over the decorator flag it as an operation that takes a state and a host object and then whatever key and whatever arguments secured arguments you want. And then normally these operations are actually they turn out to be really quite simple functions because they just, you know, they they'll read some state from the host by the way of facts. So they'll call, you know, host dot facts stop. And then depending on the what their state looks like, they'll basically issue some shell commands. It's it's kinda general how it looks. So it's actually it makes for quite a quite a simple implementation from an operation perspective. So the the there should be should be quite easy to write, which I think is a real a good thing. Yeah. Reducing the
[00:21:52] Unknown:
amount of information and understanding that you need to have to be able to plug in something new is definitely beneficial, particularly for a new framework that's trying to gain adoption where I know that, for instance, with things like salt, there's a lot of power there. But the amount of understanding that you need to have about how the system works to be able to contribute a new module is fairly substantial. And so there's a fairly high barrier to entry before you can really even get started adding new code to it. Yeah. Absolutely.
[00:22:22] Unknown:
Yeah. I think that's because I've looked at Ansible modules as well. And it's the kind of same thing. Like, they're much more complex, which is partly just really the way they're executed. But, yeah, I do I think it's a real advantage of
[00:22:34] Unknown:
of Pyinfer that it it's it's it's accessible, should we say, without, you know, a huge effort expended in learning it. And then another interesting thing that I saw as I was looking through the documentation is that you have the operation modules that allow you to execute these various shell commands, but you also mentioned the ability to execute Python code via pyinfra and using some of the ecosystem libraries. I'm wondering how you handle the execution of those functions on the remote hosts when you're not shipping the external dependencies and the Python runtime to that server and just what the overall
[00:23:14] Unknown:
flow looks like in the code level of how that works? So the the key here is actually the they're not executed on the they are executed on the local host, of the users the user's computer. So they they bypass that in a sense, essentially, the actual execution of of Python functions during during a deploy run locally, which means, obviously, your local virtual environment can have all all the requirements and such as stored. So Python for itself will not run Python on the remote side. In fact, it won't run anything, but, but by default, by the kind of default shell is the only requirement on the other side, and it could could be any shop. It kind of bypasses it by running them running them locally if you like, which means you can run them within your inventory file or within the deploy file. But the the kind of real use of the callback operations is that you can within the within the Python callback, even though it's running locally, you can still execute and receive output from the remote host. So there's a a virtual network software called 0 tier, which, you like to install it by app app repository.
And it will it brings itself up and it basically puts a an identifier, a, you you know, unique identifier on the fax system. And you need this identifier to authorize the machine within over the 0 tier UI to join your kind of network. So 1 way of achieving that is you you have a Python a callback function within pyinfo. And after you do the install, and within that function, you basically cap that file out, collect it back in your in your function, and then just call the 0 tier API. So that allows you to kind of you obviously, you're not executing anything on the remote server, but you you can dynamically speak and and collect output from the remote server kind of mid deployment within within the context of a function. And beyond
[00:25:03] Unknown:
the operations, you also mentioned the different, execution context for interfacing with Docker versus POSIX. But what are some of the other extension points that exist for being able to plug into pyinfra and add different capabilities or functionality?
[00:25:20] Unknown:
I mean, there's kind of 3 areas currently. The extension is the connectors, as you just mentioned. Terraform is an example of 1 that could be built, and there's there is actually an issue open for it. But, you know, essentially read the TF state file, and turn that into an inventory and then execute against that inventory, would be 1 example. Then there are facts. You can write facts, custom facts quite easily. They're just Python classes with 2 command method, the process method. These can be written and then used without actually adding them into PyInfo. So if you, you know, you could write bespoke set of facts and operations as well, and these could all be stored in, like, you know, bespoke package or whatever you want it really, and then cool within Pine Tree. And that's that's kind of extending Pine Tree's today. And then the the kind of the other area where there's a lot more potential, I think, is kind of pioneers API, which which is fully fledged and and in existence, but not currently stable. And well, that's it's fairly stable, but there's no guarantees against that stability yet. This is something I'm targeting for version 1.1 to have a kind of stable API with semantic versioning guarantees. I think this the API could offer some really interesting integration to the value to execute a parameter for deployments from within almost any context rather than just being a a CLI, which could learn up some really interesting work in the future. I know on the back of my mind, 1 thing I really want to do over the coming months is build a really lightweight pyin for agent, which would allow pine for it to run-in a similar manner to kind of salt agent or chef agent, that kind of thing, which would obviously
[00:26:56] Unknown:
mean allow enable running Pyenvra in an agent based manner as well as it's kind of native agent list. For people who are adopting Pyenvra, for, what are some of the edge cases or points of confusion that they should be aware of or that you see them commonly experience?
[00:27:13] Unknown:
This is really interesting 1 actually. As of today, there's there's 1 major gotcha I see, which is the other boats see that we talked about earlier where operations rely on each other. This can be a bit of a gotcha because it requires thinking about the like, each operation is almost individual. And then any dependencies between operations have to be essentially almost manually reconciled by by, kind of receiving the output of 1 operation and and then using that to infer the arguments passed into the next operation. Unfortunately, this is somewhat the the the nature of the kind of 2 step deploy. And there is some and or my to do list more documentation around this and kind of examples of it. Because it is it quite rare, in my experience, but it's kind of thing that, like, bites you in the ass because you don't realize. And so that's kind of annoying. Historically, there's been a whole bunch of cultures, particularly with there was like, the ordering of the operations, I think, was the the biggest culture ever. But that's
[00:28:11] Unknown:
somewhat resolved, but it's resolved. And then as far as the community and the overall uptake of Pyinfra, you mentioned that there has been some use of it and contributions from other users. I'm wondering what the overall response has been from developers who first encounter and test out Pyinfra and your overall approach to trying to promote it or grow the community.
[00:28:36] Unknown:
Absolutely. Yeah. It's been, responses have generated, like, been pretty good on the whole. I've been really, really pleased with people's feedback, and and it's kind of taken aback, actually. I didn't really it was it the pilot's pilot for kind of obviously, been building it for years and using it. And then it appeared on Hacker News a couple months ago or something, and I not submitted by me, which was really interesting. It was really nice to see really great feedback. I'm I'm kind of glad to have it out there. I'm kind of working on plans as to promote it more. I think I haven't really done I probably haven't done it justice at all, to be honest. I think I've posted it on Hacker News a year or 2 ago. And then, yeah, I kind of sat on it, to be honest. I don't think I've put in I think that's 1 where area I of lack when it comes to my open source projects, should we say, is and what's promoting them, I guess. And I think part of that probably lends to kind of using it using it professionally myself. So I kinda haven't thought about, you know,
[00:29:38] Unknown:
where else it could be used, if that makes sense. Yeah. It's definitely easy to be focused on your own use case and overlook some of the potential other applications just because it's not something that you have had to deal with or challenges that you're facing. And so that's definitely 1 of the great benefits of open source and making a tool available to other users is that it can allow for those new use cases to be discovered and factored into the tool. Although, it also ends up bringing in the challenge of having to know when to decline a contribution because as, 1 of my favorite adages goes, you know, open source is free as in puppy.
[00:30:19] Unknown:
That is very true. Yeah. It's been it's been there's been some really interesting contributions recently on from kind of BSD BSD users. I always had an open BSD in the vagrant vagrant test machine set, but I never really used BSD firsthand and and certainly not as a daily driver. So I didn't really I'm kind of winging it, if you like. This has been really interesting. I've learned a lot, you know, from other people contributing operations or or improving operations within Virendra.
[00:30:47] Unknown:
You know, it's really exciting to see that. What are some of the other interesting or unexpected or challenging lessons that you've learned in the process of building PIAN4 or some of the complexities that you've had to deal with in its development?
[00:31:01] Unknown:
So I think for me, the most interesting thing is kind of been, learning about all different systems, like, different operating systems kind of I kinda like, I'm definitely an infrastructure nerd. I love let's love infrastructure, technical infrastructure. I think, yeah, really interesting to, like, properly deep dive into that and and play around with kind of various Linux distributions that I've never used. BSD is another example. Yeah. Stuff like that. And and I do like, I I found that element that were really interesting, especially in the beginning when I was kind of figuring out what paying for it would look like and what systems it would integrate with. And, obviously, the kind of guarantee then was just POSIX based systems. But now with the WinRM connector, I think the next interesting area is gonna be kind of learning about Windows remote management because I haven't used Windows in years. So that's gonna be I'm really looking forward to kind of figuring that out.
I think the the yeah. There's a I think the the really interesting part of, developing Pyintra was the deploy file itself. So you've got, you know, you've got like, you can store your operations in a in a file, a Python file, whatever it's called. And it this is what's used to execute against remote host. But, so the way it works is pyinfo will take this file and then for every single host in your inventory, it will execute the file. This essentially means that the file what this means that the file is executed a whole bunch of times, once per host to generate that individual host's operations, which is all fine and well if you're if it's just a series simple series of operation calls where it became, you know, complex was when you had conditional statements or for loops or any anything like that. And you suddenly you'd have depending on the order of the hosts in the inventory, you might end up with a different operation order because of these if statements. So that, you know, if you had if at the top of the file, you wouldn't that operation might not get seen, as it were by applying for until the second host. So it would get put afterwards. This this is that was the original implementation. This is essentially append only every time it saw a new operation. So this this is obviously less than favorable.
So I as I say, this is Pynefer's biggest issue over the years and 1 that I solved late last year, I think. So the the second part of this, my second attempt at fixing this was to, implement kind of whole booth control statements. So you'd have, instead of doing if, you do, like, with state dot when. So use context process or have you used context processes to do it with state of when and and then the contents of your your if statement within that. And then the same for loops. But it was it was hacky, but it worked. It did the job. But it meant it meant the the Python code in the deploy file wasn't just Python anymore. It was a weird you know, it was Python but with some weird control statements.
So the next approach, 2 years ago was to essentially compile the deploy files. So using the a s, AST module and then later, I think I use Red Baron for a bit, which is also fantastic. Use those to essentially take the users, deploy codes, rewrite it with the context processor if statements or swap if statements for context processes and the same with loops, and then execute the file. And this actually actually worked really well for a long time. But I think the the kind of it a was slow, relatively slow to do all this compilation. And it did lead to some kind of edge cases over the years, especially because you're you're fiddling with other people's code essentially. And, like, I think that just opens you up to, like, an infinite number of of potential edge cases, which is which was resulted in some painful debugging sessions. And the the so to fix all of this, I basically, applying for now, and and this this is how it works, it uses it actually uses line numbers to order the operations. So as you it calls the deploy file, and then as it executes the file every time an operation is called, it's tracking those line numbers. And this happened this is this happens in a nested manner. So if you include another file, it it attracts the, you know, the file that the line of include and then the line within the file included.
And and it does exactly the same thing when you include, like, other people's package deploys. And that that is how Applying for it does its ordering today. And it takes the kind of list of, operations and then operation lines and then use it basically sorts them to come out with the final order. And it has the kind of nice effect of being, like, human understandable if you like. You could you can if you I mean, if you run it with debug, you can literally see the lines being printed out. But, basically, the way you would read the file is the way the operations now execute, which is kind of that was always the the kind of the ultimate goal for operation ordering. And it took, yeah, the best part of 3 years to get it kind of nailed down to where it is now. Yeah. It's funny how infrastructure kind of attracts a special kind of masochist who wants to be able to actually deal with and handle all of these little edge cases that show up because of all of the slight variances in the systems that you're using and the different distributions of Linux and the package naming and how the file structures are laid out for people who want to deploy their code in different manners.
[00:36:13] Unknown:
So somebody who works in the space, I can commiserate with that. For somebody who is considering what tool to use for building and managing their infrastructure, what are the cases where pyinfra is the wrong choice and they might want to just go with a Dockerfile or use 1 of the more full blown configuration management frameworks like Salts or Ansible or something like that. Apparently, there's definitely gonna be cases where Pyrenef doesn't doesn't have the the necessary operations
[00:36:41] Unknown:
to to fulfill the, you know, the desired state. I think on the whole, I can't, like, think this is probably quite rare. Like, obviously, ideally, Pineapple is perfect in all of these use cases. 1 of its limitations certainly is kind of, would be the performance side. So I think as as you reach kind of 1, 000, tens of 1, 000 of of targets, the kind of single executing from a single host will be a you'll run out of CPU and and network throughput at some point. So, kind of at that scale, I don't think I don't think Python is the right tool. Perhaps Python for agent might be at some point in the future. And then I think the other 1 is kind of continuous ensuring of state, which I believe Stolt does very nicely. Obviously, because Pyon for his agent, this is not running all the time, and so it can't it only runs it whenever you run it, and you can hack that together by running a chronicle on Jenkins or whatever, but that's it's not the same as, having a demon sat on the machine that making sure that things are constantly in the right state. Windows is the other 1. Because because PyTorch is so new to win our end and it's, you know, really early days.
And I think if I were Windows, I I would not recommend using PyTorch currently.
[00:37:47] Unknown:
For the future of the project, you mentioned a few different goals that you have. But what is your overall vision for where it's going to end up or anything that you are looking for help with contributions or feedback or anything as you continue to grow and build on the project? Yeah. So
[00:38:05] Unknown:
the I mean, the first the initial big thing that I wanna get over the line is version 1. It's no no real major change from, an end user perspective, but it removes a lot of the kind of old craft that built up over those years of iteration, which is which will kind of give a much cleaner slate for building on top of Pyen for it, which I think is really important. I'm really keen on hearing the the kind of feedback from the community on Pineapple's APIs and and how Pineapple is used in different contexts. Because ideally, it would be, you know, ideally, it's, like, super easy to use, very accessible tool is is the aim. I think the fact that it's configured in Python a little bit probably increases the bar for entry from something like a YAML based syntax, but I'm hoping the the kind of examples and I absolutely welcome and love people contributing any more examples more just feedback on how it works, like anything like that, super keen to improve the kind of the user facing UI APIs. And then operation coverage is the other kind of area for expansion. I think there's there's still plenty of things that Pineapple can't do or can't natively do and and for which you might use, like, shell commands or old bash script for. And it would be nice to cover all of that. And the kind of the other major thing is is with the upcoming version 1.1, hopefully, with the stable API. I'm extremely keen to see, you know, how people use that. I like, I don't know what I have no idea what that looks like, but I'm very interested,
[00:39:34] Unknown:
to see if and how people use it. Are there any other aspects of your work on pyinfra or the ways that you're using it for infrastructure management or just the overall space of configuration and
[00:39:51] Unknown:
You got anything?
[00:39:52] Unknown:
The only other thing that I wanted to call out is the fact that I think your approach to having everything be pure Python and the fact that that enables you to distribute deployment logic as a Python package and being able to compose things together and manage dependencies in that way is
[00:40:10] Unknown:
an excellent contribution to the ecosystem. So I appreciate that aspect of it. Thank you very much. Alright. That's a yeah. I hadn't thought of mentioning that. But, yeah, you make a really good point. I think leaning on the built in package management the Python offers, and and PyPI setup tools kind of out of the box enables, yeah, like, dependency management, and bundling up a deploy as a as a Python package is kind of a really nice feature if you like kind of sidesteps the need for, you know, custom implementation like got Ansible Galaxy comes to mind. But, you know, there are others. Absolutely. Well, for anybody who wants to get in touch with you or follow along with the work that you're doing or contribute to
[00:40:49] Unknown:
or your other projects, I'll have you add your preferred contact information to the show notes. And so with that, I'll move us into the pics. And this week, I'm going to choose a movie that I watched last night with the family called My Spy. Just became available on Amazon, and it was pretty hilarious. Just, you know, great movie about, bumbling CIA agent who ends up, embroiled with the targets who he's supposed to be surveilling. Not gonna give any more of that away, but definitely a lot of fun. So for anybody who's looking for something to watch, I recommend it. And with that, I'll pass it to you, Nick. Do you have any picks this week? Sounds great. Yes. I picked up 2 things. 1 is tech or tech
[00:41:24] Unknown:
ish is my or the, DAS keyboard ultimate, which I would highly recommend to anyone. I think the combination of mechanical keycaps and blank keycaps. Mechanical keys and blank keycaps is is has dramatically improved my typing of these. And prying for his written on this very keyboard or more majority of it. And my other pick is a is completely unrelated. It's actually a food pick. There's a recipe which I can provide the link for I found the other for which we we made,
[00:41:55] Unknown:
was extremely delicious. It's for Korean slow cooked Korean short ribs with kimchi fried rice. Highly recommend it. Alright. Definitely sounds like a interesting and enjoyable meal, so I'll have to take a look at that. So thank you again for taking the time today to join me and the work that you've been doing with PY infra. It's definitely an interesting tool and 1 that I plan to take a closer look at and possibly use for some of my personal infrastructure. I appreciate all the work you've done on that, and I hope you enjoy the rest of your day. Excellent. Thank you very much, and thank you for listening. Thank you for listening. Don't forget to check out our other show, the Data Engineering Podcast at data engineering podcast dot com for the latest on modern data management.
And visit the site of pythonpodcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email host at podcastinit.com with your story. To help other people find the show, please leave a review on Itunes and tell your friends and coworkers.
Introduction and Guest Introduction
Nick Barrett's Journey to Python
Origin and Purpose of Pyinfra
Core Goals and Capabilities of Pyinfra
Challenges in Infrastructure Management
Workflow for Using Pyinfra
Idempotency in Pyinfra
Implementation and Design of Pyinfra
Operations and Extending Pyinfra
Community Feedback and Adoption
Lessons Learned and Challenges
When Not to Use Pyinfra
Future Vision and Contributions
Closing Remarks and Picks