Summary
Python is a powerful and expressive programming language with a vast ecosystem of incredible applications. Unfortunately, it has always been challenging to share those applications with non-technical end users. Gregory Szorc set out to solve the problem of how to put your code on someone else’s computer and have it run without having to rely on extra systems such as virtualenvs or Docker. In this episode he shares his work on PyOxidizer and how it allows you to build a self-contained Python runtime along with statically linked dependencies and the software that you want to run. He also digs into some of the edge cases in the Python language and its ecosystem that make this a challenging problem to solve, and some of the lessons that he has learned in the process. PyOxidizer is an exciting step forward in the evolution of packaging and distribution for the Python language and community.
Announcements
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- This portion of Python Podcast is brought to you by Datadog. Do you have an app in production that is slower than you like? Is its performance all over the place (sometimes fast, sometimes slow)? Do you know why? With Datadog, you will. You can troubleshoot your app’s performance with Datadog’s end-to-end tracing and in one click correlate those Python traces with related logs and metrics. Use their detailed flame graphs to identify bottlenecks and latency in that app of yours. Start tracking the performance of your apps with a free trial at pythonpodcast.com/datadog. If you sign up for a trial and install the agent, Datadog will send you a free t-shirt.
- You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For more opportunities to stay up to date, gain new skills, and learn from your peers there are a growing number of virtual events that you can attend from the comfort and safety of your home. Go to pythonpodcast.com/conferences to check out the upcoming events being offered by our partners and get registered today!
- Your host as usual is Tobias Macey and today I’m interviewing Gregory Szorc about his work on PyOxidizer, a revolutionary new approach to building and distributing self-contained Python applications
Interview
- Introductions
- How did you get introduced to Python?
- Can you start by giving an overview on the shortcomings of the current state of the art for distributing Python projects, both for deployment and end-user consumption?
- What is PyOxidizer and what motivated you to create it?
- How does PyOxidizer differ from projects such as CxFreeze, Py2Exe, or Shiv?
- What are the characteristics of CPython and the packaging ecosystem that make it so challenging to easily distribute self-contained applications?
- For someone using PyOxidizer, what is their workflow for building an executable that they can share with end users?
- What are some of the edge cases or special considerations that they need to be aware of?
- How is PyOxidizer implemented?
- How has the design or direction evolved since you first began working on it?
- From your experience in working on PyOxidizer, what changes would you like to see in the Python language or the CPython reference implementation?
- What are some of the most interesting, unexpected, or challenging lessons that you have learned while working on PyOxidizer?
- What do you have planned for the future of PyOxidizer?
- What are the ways that listeners can contribute to PyOxidizer?
Keep In Touch
Picks
- Tobias
- Gregory
Closing Announcements
- Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
- Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
Links
- PyOxidizer
- Mercurial
- Mozilla
- Virtualenv
- Pip
- Docker
- Py2Exe
- CXFreeze
- Beeware
- Shiv
- FPM
- Python Build Standalone
- Importlib
- Rust
- Russell Keith-Magee Black Swans Keynote
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to podcast dot in it, the podcast about Python and the people who make it great. When you're ready to launch your next app or want to try out a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode. With the launch of their managed Kubernetes platform, it's easy to get started with the next generation of deployment and scaling powered by the battle tested Linode platform, including simple pricing, node balancers, 40 gigabit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to python podcast.com/linode today. That's l I n o d e, and get a $60 credit to try out our Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show. You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis.
For more opportunities to stay up to date, gain new skills, and learn from your peers, there are a growing number of virtual events that you can attend from the comfort and safety of your own home. Go to python podcast.com/conferences to check out the upcoming events being offered by our partners and get registered today. Your host as usual is Tobias Macy. And today, I'm interviewing Gregory Zork about his work on PyOxidizer, a revolutionary new approach to building and distributing self contained Python applications. So, Greg, can you start by introducing yourself?
[00:01:29] Unknown:
I am Greg Reesorck. I am the author and maintainer of PyOxidizer. I also am a somewhat prolific open source contributor. I worked at Mozilla for a bit over 7 years contributing to Firefox, its build system, its source control infrastructure. I'm also a comaintainer of the Mercurial version control tool, which is implemented mostly in Python.
[00:01:52] Unknown:
And do you remember how you first got introduced to Python?
[00:01:55] Unknown:
Not specifically. I believe I was exposed to it while hacking at something at a previous job, probably about 15 years ago at this point. And just started tinkering with Python, not not really knowing much about Python at first. And over time, gradually became more exposed to Python at work primarily and through open source and just gradually picked it up.
[00:02:18] Unknown:
So in terms of the work on pyoxidizer, before we get too far into that, can you just start by giving a bit of an overview on the current state of the art for being able to distribute Python projects for both deployment into production server environments as well as end user consumption and some of the shortcomings that exist in that ecosystem?
[00:02:39] Unknown:
I would say the state of the art varies depending on the audience. For developers, we tend to have full control over the environments that we target. And so state of the art, you can choose from any number of existing tools, whether it be populating virtualems, PIP installing packages, relying on system packages, Docker images. The real problem in this space where I would say, the state of the art is not as advanced is for end user consumption. That is where you do not control the target environment. You can't rely on system libraries, a Python interpreter being installed, things of that nature. The state of the art here, I would say PyOxidizer is trying to become state of the art. I'm not sure if it's quite there yet. But there are other tools in this space, PytXe, Shiv, The Bware project is making good headroom here for mobile.
I would say it's generally an unsolved problem in the Python ecosystem.
[00:03:36] Unknown:
For PyOxidizer itself, can you discuss a bit about what it is and what motivated you to create it?
[00:03:43] Unknown:
From the highest level, PyOxidizer is a tool that aims to make it easier for Python application maintainers to distribute Python applications, whether that be to environments that you control or for end users. My motivation in creating PyOxidizer was that there doesn't seem to be a single tool that does this in a modern way that works across all major platforms that you would want to target for Python. There are tools that work well on Windows. There are tools that work well on Linux macOS. There isn't 1 tool that does things in a consistent way across all platforms with the performance properties and the distribution qualities that I would like to have in a in a tool for distributing Python applications.
[00:04:34] Unknown:
And 1 of the challenges there is also the fact that for different platforms, users have different expectations about how they think an application should behave, where on Linux, people just expect there to be a binary that is executable and it's somewhere on their path. For Mac users, it likely going to be a DMG file that they pull an application out of and put into their app drawer. For Windows users, there should be some MSI installer that puts the application into their start menu for being able to launch it from that or from a search query. So I'm wondering what challenges exist in terms of being able to build a tool that satisfies those different expectations and how far down that path you're looking to go.
[00:05:23] Unknown:
Yes. That is a very astute observation. The different expectations on different platforms vary extremely. I would say the problem space is divided into largely 2 components, or at least it is in my head. Problem 1 is around the packaging of Python applications and creation of binaries and what is the actual thing that gets executed on machines. Part 2 of that problem space is then how do you package that entity up for distribution. That encapsulates problems like MSI installers, DMGs on macOS, packages, Debian packages, YUM packages, things of that nature. Currently, PyOxidizer is making headroom primarily in the first realm, creating the binaries themselves.
The creation of the distributable artifacts outside of just the executable binary is or outside of the list of files in addition to an executable binary, that is still largely an unsolved problem. It's on my radar. I'd like to tackle it sometime. But my thinking here was that I would rather deliver a solution to that first problem because there's no general solution to it in the Python ecosystem space, or at least not a sufficiently modern 1 that meets my expectations of what that tool should look like. And so I wanted to have a robust solution in this space before I approach the more general problem of creation of installers,
[00:06:55] Unknown:
things of that nature. Yeah. I think that the biggest challenge that exists in the Python ecosystem right now is that problem of being able to produce a single statically linked binary that you can drop onto another system and just execute it without having to have all of these ancillary installation steps of what version of Python do you have installed? Is it in your path? Do you have the right modules located? Are they in a virtual environment, or are they in a packaged objects in the case of things like shiv? And that's definitely an area where language communities such as Java and Go and Rust have definitely got the advantage. So I'm excited to see what you're building with PyOxidizer.
And then as far as being able to handle the installation piece of things, you might even be able to just lean on existing tools such as FPM for handling the packaging aspects of it for distribution and installation.
[00:07:52] Unknown:
Definitely. There are existing tools out there that do facilitate producing installers, producing packages. In my opinion, the quality of a lot of these tools, especially the meta tools, is not it's functional. But when you really go down that path and you have a really successful project, you my feeling is that you outgrow a lot of these meta tools in existence, and you have to spend the time to actually understand the platform specific packaging components of each. I suspect there's an opportunity to deliver a more powerful and generic tool for packaging here. It is, I feel, an unsolved problem in industry. Every major project that I've been interacting with, including the Firefox project and Mercurial, have to spend an enormous amount of effort to solve this problem.
And it's arguably things that application distributors should not be required to know. Like, if I want to write a Python application and Python is just an implementation detail, why should I need to know the ins and outs of how MSIs work or DMGs work? A lot of that should be abstracted. But at the same time as my application grows in complexity, you'd need the hooks into the customization of those installers and those distribution mechanisms to have full control. I don't think there's a a good tool that solves that problem.
[00:09:20] Unknown:
Yeah. It's definitely 1 of those n by n complexity matrices that you end up in where you have to be able to support n different versions of your project with n different potential configurations of it across n different platforms that each have their own specificities as to what they're expecting for the installation. Right. Going back to the point of being able to produce a single binary for being able to distribute aside from the packaging complexities of actually Python ecosystem and extensions and the CPython runtime that make it so complex and challenging to be able to produce these statically linked objects that you can just drop out a system and execute?
[00:10:10] Unknown:
It's an extremely challenging problem, which I suspect is 1 of the reasons why PyOxidizer is somewhat new and novel in this space. Each platform, each operating system has its own unique set of challenges. And so you can't just solve every problem the same way on different systems. But generally speaking, there's a problem of Python distributions are generally not built in a way that are portable across machines. An exception to this is Windows because Windows has a very strong commitment to backwards compatibility. And so you can build a binary on Windows that runs on pretty much any modern Windows machine. But on Linux, there are all these implicit dependencies on system libraries, things you probably take for granted like the Readline library or GetText or SQLite or an XML parsing library. These are things that generally are always available on systems.
You don't think about them, but the way that the Python distribution is built on Linux, at least in macOS to a certain extent, is highly dependent on the availability of these libraries. And it's very difficult and nuanced to produce a Python distribution that works a fully featured Python distribution that works, I should say. It's relatively easy to get the core Python to work, but there's this myriad of random extensions in Python distributions that have dependencies on libraries that you just cannot rely on. And so I'd say a large portion of my time investment in PyOxidizer, probably at least 25%, has been figuring out how to build Python distributions that you can actually run on different systems.
It's an extremely nuanced problem and it's required hacking up the Python build system in ways that it wasn't designed to do. Outside of the distributions, I would say that there are general problems around breaking assumptions about how Python applications run. So for example, many Python applications assume that Python modules are loaded from the file system, which is a fair assumption. That's the only way you can do it in giving the module importers available to the official Python distribution. But 1 of the goals of PyOxidizer is to load everything from memory so you can have a single file executable. And when you break assumptions about how things have traditionally worked, you find all of these subtle packages in Python libraries that have been making those assumptions and that they break when they run under PyOxidizer.
And so a big challenge of PyOxidizer is figuring out how to enable application distributors to live on the cutting edge to have a self contained binary containing their Python application, but also still have compatibility with all of the existing Python code in the world that relies on the old way of doing things.
[00:13:20] Unknown:
On the point of being able to create reliable distributions of Python, I know that you've also got a separate project that PyOxidizer relies on, but that can be used independently for creating these standalone builds of Python that do embed things slower statically link the read line libraries and things like bzip or zlib for being able to handle compression or SSL for encryption. So I'm wondering what the challenges are in being able to manage that library and understand and locate the different edge cases of Python and CPython, and then also if there are other implementations of the Python interpreter that are easier to be able to manage in this distributable fashion.
[00:14:07] Unknown:
The standalone project or the sister project to Pyxisor you're referring to is Python build stand alone. And the primary output of that is essentially a tarball of a Python distribution. And there's a JSON file in the tarball that describes the distribution. The tarball itself contains not only the fully built and assembled installation layout of a fully functioning Python interpreter with all of the requisite system libraries. But it also contains the object files that go into the Python distribution, all the extensions.
It contains the metadata about what libraries are required by which Python extensions, the licensing of those extensions. 1 of the goals with Python build standalone has been to define this description of a Python distribution. And the intent of this is that PyOxidizer is able to ingest that description and make intelligent decisions about how to build Python binaries. Ultimately, I think it is possible for other Python implementations such as PyPI to define their own distributions that can form to the JSON schema that the Python Builds Template project defines and for someone to just slot those distributions into pyoxidizer, and as long as pyoxidizer knows how to consume those distributions, it should, in theory, just work. Long term, I think there's potential for the Python project itself to standardize that distribution format to allow tools like PyOxidizer to reassemble Python interpreters and construct binaries.
Thinking the way that wheels are a general package format for Python packages or libraries, we could imagine this archive format being a standard descriptor language for Python distributions themselves.
[00:16:09] Unknown:
And so for somebody who is using PyOxidizer, who has an application that they want to be able to package up and ship off to another system or to an end user. What is the overall workflow for getting started with packaging and building the binary and any of the potential edge cases or special considerations that they should be aware of as they're building their application and getting it ready for distribution?
[00:16:34] Unknown:
In the simplest case, say you just have a very simple hello world type application or command line tool, you can run pyoxidizer init. There's a few init commands that you can run, and it creates a scaffolding for your pyoxidizer enabled project. And what that does is it throws in a configuration file into your project or creates a new project. From that configuration, you can say, here is the location of the Python package I want to install. Here is the Python code I want to execute when that binary starts up or here is the Python module to execute, kind of like your dunder main module. And once you have that minimal configuration in place, you can run pyxidizer build or pyxidizer run and it will do all of the heavy lifting to download the Python distribution, assemble it into a binary, combine that with your Python packaging, your Python source code and modules, your compiled bytecode, and produce an executable for you. In the ideal case, this takes only a few minutes of work.
In the complex case where you have to do customizations, it can take quite a bit of effort. And I'll be honest, there's some use cases that aren't fully supported there. Common problem people run into is PyOxidizer is currently rather opinionated about being forward thinking and using modern practices. And so it attempts to do things like always load Python modules from memory. This breaks things, this breaks assumptions as I was talking about earlier, and causes your application to fail at run time. So 1 of the biggest hurdles for people who wish to use PyOxidvisor is ensuring that the location, the mechanism by which their Python resources are loaded actually works.
So this can require, like, running your unit tests against the delta binary and retooling. You may have to, like, retool your build system just to support that workflow. And then, of course, there's missing features in Pack Exodizer. Like, we don't yet support actual distribution workflow that well. So the best packs that Azure can currently give you is it gives you a binary with additional extra files if needed if the binary is not self contained, and you have to pick those up and figure out how to package them and distribute them to your end users.
[00:18:59] Unknown:
You mentioned the extra files that you might need to ship with the built binary, so things like data assets or maybe if there's a web application or something that you're going to be executing within the context of a web browser that you wanna be able to load files from disk or be able to reference images for use in other systems or other applications. So I'm curious what the support is for PyOxidizer for being able to specify the information and those assets that are going to be accompanying the application and how the built binary can locate them at the point of distribution for being able to load them in at the time that they're needed.
[00:19:45] Unknown:
Yeah. In in the case of resource assets, like text files, images, things like that, you would typically ship next to your Python package, those can actually be loaded from memory and incorporated into the self contained binary. The caveat there is that doing so requires utilizing modern APIs in the Python standard library for loading resources. There's a long history of ways that Python applications in source code has located resources like this. The old school way is using Dunder file, resolving the directory, and then doing some path joins and loading files from the file system. And that's worked for well over a decade. Modern versions of Python have introduced newer and better APIs around resource loading.
The current state of the art here are the APIs in the importlib resources package or module, which is part of the standard library. Problem there is that those APIs were not very robust until Python 37, I believe. And so lots of Python code in the wild does not target those modern APIs because they need compatibility with older versions of Python. But over time, my hope is that more Python code uses the modern import lib APIs for loading these resources. And if they do that, then PyOxidizer is able to load these resources
[00:21:19] Unknown:
directly from the binary. Going deeper on the import capabilities, I know too that 1 of the other projects that has come about from your work with PyOxidizer is that the importing mechanism that you use for being able to load resources from memory has been released as its own library that people can use directly in their applications in order to get a bit of a performance improvement for loading resources and loading modules within their application. So I don't know if you want to talk a bit more about some of the capabilities that that brings along and some of the challenges of building that in a way that is consumable both within PyOxidizer and for the broader Python community?
[00:22:02] Unknown:
PyOxidizer contains a runtime component that manages the embedded Python interpreter. And 1 of the bigger components or features that it contains is the ability to load resources from memory. And a few months ago, I realized that there was value in having that feature be a standalone feature outside the context of PyOxidizer because I've been involved in a number of Python projects where we identified that Python module import was actually a bottleneck in terms of application performance. This is especially true in command line tools. The Mercurial version control tool, its test suite, involves invoking the hg commands 1, 000, maybe tens of thousands of times now.
And overhead of just 1 millisecond invoking a COI command multiplied by a1000 can add up pretty quickly. And when we're talking about delays or lag of tens of milliseconds as part of importing oftentimes dozens or hundreds of Python modules when starting a Python interpreter, that overhead can add up to, you know, minutes of actual CPU time. In In the case of materials test harness, something like 25 or 35 percent of the execution time is just the Python interpreter starting and loading modules. I wanted the larger Python community to start experimenting with alternate ways of loading Python resources not from the file system. This would both encourage the larger community to think about nontraditional ways of loading resources and enforce the community to reckon about, okay, how do we act in a future where there may not be a file system for loading loading modules and resources.
It would also bring more attention to the mechanisms that PyOxidizer is using and hopefully proving compatibility with the broader Python ecosystem.
[00:24:19] Unknown:
Features help teams visualize granular application data for more effective troubleshooting and optimization. Datadog Continuous Profiler analyzes your production level code and collects different profile types, such as CPU, memory allocation, IO, and more, enabling you to search, analyze, and debug code level performance in real time. Correlate and pivot between profiles and distributed traces to find slow or resource intensive requests. In addition, Datadog's Application Performance monitoring live search lets you search across a real time stream of all ingested traces from your services. For even more detail, filter individual traces by infrastructure, and custom tags.
Datadog has a special offer for podcast dot in it listeners. Sign up for a free 14 day trial at pythonpodcast.com /Datadog. Install the Datadog agent and receive 1 of Datadog's famously cozy t shirts for free. Digging more into PyOxidizer itself, can you talk through how it's implemented and some of the changes that it has undergone from when you first began building it to where you are now?
[00:25:29] Unknown:
So PAX Advisor is mostly written in the Rust programming language. I learned Rust well, I'll say that I was learning Rust as I was building out PyOxidizer. And a lot of the early code in PyOxidizer was I was very naive in terms of my Rust skills. And so the Rust code quality wasn't great. Rust is a fantastic programming language in that it allows you to make mistakes without some consequences like security vulnerabilities like you get in other languages like c or c plus plus But the way I was using Rust was not very, to borrow a Python term, Rustonic, I guess. I don't know what the Rust community uses here, but it just wasn't following great practices in Rust. And so I've had to spend a lot of my time in recent months as I've up leveled my Rust knowledge to change Pyrexidizer's code to just be higher quality Rust code, be more flexible.
And this has enabled me to publish things like the oxidized importer Python extension, which I just talked about for handling the the standalone resource blob, you know, file format and importing resources from memory. It's also allowed me to publish standalone Rust crates containing just very specific components of the Python language stack. So there's a Python packaging crate, Rust crate that I now publish, which can define some general primitives related to how Python is structured as a language. There's primitives in there defining Python source modules, bytecode files, how to compile a source module to bytecode, things of that nature.
[00:27:16] Unknown:
Have there been any particular changes in the overall scope or direction that you are trying to take PyOxidizer as you have dug deeper into this problem?
[00:27:26] Unknown:
I would say that early on in pyoxidizer, it was very much a science experiment. I wanted to see if I could solve this problem of can you actually produce a standalone executable embedding Python and have it work on all the major platforms. I was also not very comfortable with the Rust programming language. And so a lot of the early efforts that went into PyOxylizer, I was actually writing code in Python. And as I became more comfortable with Rust, I shifted my scope a little bit and started porting Python code to Rust code. I also pivoted from PyOxidizer being a science experiment to it being I attempted to pivot from PyOxidizer being a science experiment to it being more generally usable.
That pivot, I would say, is still in progress, and it's a problem that I'm still grappling with. Having to find the right balance between wanting to be opinionated about modern packaging practices while also supporting legacy code and making PAX Slicer a friendly end user experience. I would say that shifting from a science experiment mindset to something that wants to be more customer centric has been very challenging. And I've been humbled a number of times by the issues that people are reporting the Pyrex Advisor project, the problems that people are are encountering in the real world. That feedback has been very beneficial to guiding the the direction of the project.
[00:29:08] Unknown:
And as you have explored this problem space, what are your thoughts on the ways that you need to conform to Python and its ecosystem versus your thoughts on ways that the Python language and the ecosystem for being able to package and distribute applications needs to change more fundamentally to maintain viability going into the future and working with modern systems as they continue to evolve in the ways that they manifest and the ways that we use them.
[00:29:39] Unknown:
I try to be a pragmatist on this and realize that you cannot change the world. Python is a massive ecosystem. I'm not sure how many Python developers there are in the world right now, but there's very few languages that have more. If Python isn't number 1 right now, I'm honestly not sure, but it's close. I realized that I, as 1 person, cannot change the Python ecosystem. And so I need to adapt, and PyOxidizer needs to adapt to accommodate others. At the same time, I also want to push the forefront of how the Python community thinks about things like how Python modules are loaded, can you load resources from memory, How do you build and distribute Python applications?
I think I have some good ideas here that the Python community could adopt and long term be in a better state as a result. I'm not sure when we're going to get there. It'll probably take years, But I think Pack Exodizer is nudging or is playing its part to nudge the community in a forward direction on some of these efforts.
[00:30:51] Unknown:
This is a bit of a tangent, but the subject of being able to load modules from memory, does that then open up the possibility of being able to hot swap modules in a running Python process for being able to handle things like security patches or version updates similar to what you can do with the Erlang virtual machine?
[00:31:12] Unknown:
Possibly. But my intuition is that that is more of a feature that would need to be supported in the Python interpreter and defined as part of the language. It's certainly possible to reload Python modules today, but it doesn't have the capabilities of other programming languages like like Erlang here. I do think that there's a lot of potential in this space around leveraging multiple Python interpreters per process. And there is some interesting work in the Python the larger Python ecosystem around subinterpreters and exposing the subinterpreter API to Python code.
And I think that is probably the easiest path forward for solving this larger problem around, like, large scale in process reloading because the existing ways of which you can reload a map reload a module and a active interpreter are very limited, and I just don't see how Python can radically change that.
[00:32:17] Unknown:
And in terms of the overall adoption that you've seen and your thoughts on the sustainability and long term viability of the pyoxidizer oxidizer project? How has that developed, and what is it that continues to motivate you to spend your time and energy on this problem?
[00:32:36] Unknown:
Well, foremost, I would say that it's been very difficult in the last 6 months to work on PyOxidizer to the degree I would like to. It seems that there's 1 major world issue after another, the latest being all the wildfire smoke in California where I live. And so I haven't been working on PyOxidizer as much as I would like to. It is very much an open source personal time hobby project for me, although it's arguably the the most important hobby project that I have going right now. What motivates me to work on Paroxidizer is a new approach to Python packaging. I am trying to push the forefront of what you can accomplish with Python. I am trying to find a solution to this existential problem of how do you distribute Python applications. It is, I believe, Keith Russell McGee referred to this problem as a black swan for the Python community. Like, you have this massive community of programmers and there's so much energy in in the large larger Python community.
But that energy is and the potential is significantly undermined by the inability to easily distribute Python applications. And I just think about if PyOxidizer or if I can contribute improvements to this space and enable Python to spread and be consumed on a wider basis, that's extremely motivating. You are giving Python a foothold into spaces where it cannot easily be used today. That's just extremely exhilarating for me.
[00:34:12] Unknown:
And on that note, what are some of the ways that listeners and the overall community can help contribute to your work on pioxidizer and help to move this forward either by contributing directly to it or maybe starting other projects that explore different avenues for solving the same problem or just general discussions that can help to move the overall ecosystem forward?
[00:34:36] Unknown:
I would say the most appropriate way that the biggest cohort of users could get involved is experiment with PyOxidizer on your Python application and report back your experience, preferably through a GitHub issue or on the mailing list. The issues that I see, the feedback that I get is mostly negative, things that don't work. I'm very curious to know what are some of the successes. I know anecdotally that there are some people using inside industrial settings, but I don't have a thorough accounting of all of them. And so I'm honestly not sure how successful Pyroxysizer is yet. Obviously, in my mind, I have a lot of unfinished work to do and bugs to fix, but I just don't know how many people are successfully using it today.
I would also encourage the broader Python community to be receptive to people when they come along and request features like using the the modern resources APIs in Python code, not relying on dunder file, which Pyrax Advisor doesn't yet support unloading resources from memory. There have been a handful of cases where people are filing issues against popular Python packages and saying, like, Dendervile, you can't you shouldn't use that. Use a modern API, and there's been some resistance to that. The resistance is understandable. Dunder file has worked for probably 20 years.
So why should you have to change on account of pyroxidizer? But at the same time, if we want to move the Python programming language forward, we're going to need some level of buy in and acceptance to more modern ways of doing things. And so I do encourage package maintainers, library authors to keep an open mind and think about putting in, you know, a little bit of effort to support PyOx Advisor today because it may pay significant dividends in the long term.
[00:36:39] Unknown:
From your experience of working on PyOxidizer, what are some of the changes that you would like to see in the Python language or the CPython reference implementation to simplify the work of yourself and anyone else who's trying to modernize the packaging and distribution capabilities of Python applications?
[00:36:59] Unknown:
There's a handful of things, and I have captured many of these in Packs Advisor's documentation. Before I name things that haven't been done, I'd like to call out something that has been done, and that is Python 3.8 contains a new set of C APIs around controlling the initialization of the Python interpreter. And it gives you a level of control around an embedded Python interpreter that just wasn't easily achievable before these APIs existed. I'd really like to single out, I think it was Victor Stinner, for his work pushing that API through the upstream project. I'm not sure if PyOxidizer was responsible for the conception of that API, but I definitely got involved in its its implementation and helped guide it. And that has greatly reduced the complexity of the the runtime component of Pyroxidizer and made it a lot easier to achieve things like importing resources from memory.
What I would like to see from the Python language or in this larger ecosystem, a large part of the technical problems that PyOxidizer has to solve involve reverse engineering or reimplementing core logic from the Python interpreter or behaviors in Python, like consuming real files, searching for package dependencies, things of that nature. What I was surprised by is that a lot of these things aren't defined in specifications. There are in Python, you go through a PEP to propose a change to something, and the PEP is a discussion about how maybe something exists today, what the new state is going to be. But there's no at the end of the PEP, there's no formal specification for, like, what are the actual APIs for the module importing system.
What are the semantics for when certain attributes need to be defined? There is no formal language specification for the Python language on the way that that many of its components behave. And I I would love to see Python start to establish more formal specifications for how the core pieces of the language work. Like, have a specification for what the wheel format is. I have that specification under version control so I can see how it evolved over time. I want a PEP to be a diff to a specification and the corresponding rationale for that. Another consideration is I I would really like the community to think about embedding use cases a bit more.
My perception is that embedding is oftentimes an afterthought for both the interpreter maintainers and people who author Python applications. If Pyraxadizer is successful and it encourages the distribution of of applications where you don't control the interpreter or where you don't control the runtime environment. We're going to see a lot more people embed the Python interpreter into larger applications. And that is going to break a lot of assumptions. And Python developers are going to have to think about how do things work when they are embedded in a larger application.
[00:40:21] Unknown:
And as far as the work that you're doing with PyOxidizer and the fact that you are able to lean on Rust and its capabilities and some of its compilation capabilities, how much of the feature set of PyOxidizer and its ability to produce these static binaries is attributable to the fact that you're able to use Rust for it? And how far do you think you would have been able to get if you were still leaning entirely on Python to be able to produce these executables?
[00:40:53] Unknown:
So the only part of PyOxidizer that needs to be implemented in a systems language like Rust or c is the runtime component. That is the code that is running in the binary, the executable that you produce. It needs to speak to the Python C APIs to control the embedded Python interpreter. That code could be written in Rust, could be written in C, just needs to compile down to assembly and interoperate with C code. I could probably implement it in Go if you really wanted to, but you probably want a systems level language without a garbage collector. The code that's doing all the packaging could be implemented in any language you want. As I said earlier, Python was the initial language I was using heavily for this.
And as the complexity evolved as the code base grew, I found that maintaining that code in Rust was easier for maintainability. It also gave me easier access to a lot of other functionality. So for example, PACS Advisor has some facilities for scanning binaries for certain symbols that may cause conflict issues. There are Python libraries out there that can parse ELF files. ELF is the executable file format for Linux machines and some other architectures. But it's just much easier when you're doing things like that in a systems programming language because the ecosystem is kind of geared towards solving problems in that domain. Traditionally, Python is geared towards a a higher level problem space.
Doesn't need to get into the bowels of how, like, your system loads libraries. That's all obstructed away from you. And so the Python solutions in this space tend to be wrappers around existing tools, whereas Rust gives you raw access to libraries to do all this stuff natively. And you have a full power at blindingly fast speed and the confidence that the software builds and can be maintained.
[00:43:04] Unknown:
In terms of your experience of building and maintaining the PyOxidizer
[00:43:18] Unknown:
around Python extensions and getting them to load correctly. Part of this is as well the how to build a redistributable Python distribution, especially on Linux and macOS. Though I have been humbled so many times about the subtle complexities involved here, comaintainer of the build system for the Firefox web browser for a number of years. And I was exposed to tons of complexity in that space. But the problem space in Python is arguably more complex because you're not talking about just 1 application. You're talking about literally 1, 000. And there are so many different libraries and use cases and ways of combining and using and loading Python packages and modules and extensions.
There's just so much complexity there And trying to think that think through all of that and come up with, like, a unifying abstraction or solution for how to handle all of those edge cases is extremely challenging and humbling.
[00:44:27] Unknown:
And as you continue to work on the project, what are some of the things that you have planned for the future of PyOxidizer?
[00:44:33] Unknown:
So the next release of PyOxidizer is going to be using Python 3.8 exclusively and is going to hopefully solve a lot of the major reported issues that people are having, especially around extension compatibility. This is continuing to pivot away from an opinionated science experiment towards a more broadly usable tool. After that is done, and I am confident with the basic functionality of the produce a binary, produce a artifact that can be run on any machine, I really wanna start to tackle this problem of how do you actually distribute those artifacts to people? How do we make the building of MSIs, of DMGs, of RPMs as turnkey as possible. I really wanna make it so that a Python application developer can put up can easily generate installable artifacts for their application without their end users having to be concerned with the existence of Python.
I just felt there's so much energy that's wasted in the Python ecosystem of people repeatedly reinventing this wheel. And I really wanna make that pivot. I don't know if I'm gonna get to it in 2020, but that is the goal. We'll see what happens.
[00:45:52] Unknown:
Are there any other aspects of the work that you've been doing with PyOxidizer or your experience with the packaging and distribution ecosystem of Python or the language space in general that we didn't discuss yet that you'd like to cover before we close out the show?
[00:46:06] Unknown:
Another goal I have for Pyroxizizer is to bridge the gap between the Python and Rust communities. I don't want to start a language war here. And I think both languages can can coexist in harmony. But I think that there is opportunity, for example, for Rust programs to embed a dynamic programming language like Python. I think there's opportunities for large Python programs to port some of their functionality to Rust. And 1 of the goals of PyOxidizer is to expose an API of sorts for controlling Python interpreters and for making the embedding of Python interpreters in larger Rust applications or in any Rust application for that matter to be as turnkey as possible.
And I think that giving the Rust ecosystem access to embedded Python interpreters and the Python ecosystem access to easily writing Rust code to to supplement or assimilate functionality in Python are both very powerful. And I look forward to seeing how Paxidizer can bridge those 2 communities together.
[00:47:24] Unknown:
For anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And so with that, I'll move us into the picks. So this week, my pick is going to be Carlos Santana for some good music. If you're looking for something to listen to, just a great musician who has been around for ages and has some pretty amazing guitar skills. So definitely some good music to listen to if you're looking for something to relax. And so with that, I'll pass it to you, Gregory. Do you have any picks this week?
[00:47:55] Unknown:
My pick this week will be a a good high quality not specific, I don't want to name any brands, but I'll say invest in a good high quality air monitor for your home. I acquired 1 several months ago, and I've learned a lot about the effect that various activities within your home can have on the air quality, your your c o 2 rate, the particles in the air. And it's been very enlightening going through the wildfires in California in in recent weeks and and seeing how the air quality fluctuates and the effect it has on my well-being. And so I'd encourage people to invest in something there.
[00:48:31] Unknown:
Definitely something that gets overlooked all too frequently is the air inside your home because it is where you spend a lot of your time particularly these days. So definitely a worthy investment. Thank you again for taking the time today to join me and discuss your work on PyOxidizer. Definitely excited to see where it goes and try using it for some of my projects. So I appreciate all the time and energy you've put into that, and I hope you enjoy the rest of your day. Alright. Thank you for having me. Thank you for listening. Don't forget to check out our other show, the Data Engineering Podcast at data engineering podcast.com for the latest on modern data management.
And visit the site of pythonpodcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email host at podcastinit.com with your story. To help other people find the show, please leave a review on Itunes and tell your friends and coworkers.
Introduction and Welcome
Interview with Gregory Zork on PyOxidizer
Current State of Python Project Distribution
Motivation Behind PyOxidizer
Challenges in Cross-Platform Distribution
Existing Tools and Their Limitations
Complexities of Statically Linked Binaries
Python Build Standalone Project
Workflow for Using PyOxidizer
Handling Resource Assets
Importing Mechanisms and Performance Improvements
Implementation and Evolution of PyOxidizer
Future of Python Packaging and Distribution
Potential for Hot Swapping Modules
Adoption and Sustainability of PyOxidizer
Desired Changes in Python Language
Role of Rust in PyOxidizer
Challenges in Extension Compatibility
Future Plans for PyOxidizer
Bridging Python and Rust Communities
Contact Information and Picks