Summary
Software development is a complex undertaking due to the number of options available and choices to be made in every stage of the lifecycle. In order to make it more scaleable it is necessary to establish common practices and patterns and introduce strong opinions. One area that can have a huge impact on the productivity of the engineers engaged with a project is the tooling used for building, validating, and deploying changes introduced to the software. In this episode maintainers of the Pants build tool Eric Arellano, Stu Hood, and Andreas Stenius discuss the recent updates that add support for more languages, efforts made to simplify its adoption, and the growth of the community that uses it. They also explore how using Pants as the single entry point for all of your routine tasks allows you to spend your time on the decisions that matter.
Announcements
- Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- Building data integration workflows is time consuming and tedious, requiring an unpleasant amount of boilerplate code to do it right. Rivery is a managed platform for building our ELT pipelines that offers the industry’s first native integration with Python, allowing you to seamlessly load and export Pandas dataframes to and from all of your databases, services, and data warehouses with a few clicks and no extra code. Rivery is hosting a live demo of their first class Python support on February 22nd, and when you use the promo code "Python" during registration you will be entered to win a brand new series 7 apple watch. Go to pythonpodcast.com/rivery today to learn more and register.
- Your host as usual is Tobias Macey and today I’m interviewing Eric Arellano, Stu Hood, and Andreas Stenius about the Pants build tool and all of the work that has gone into it recently
Interview
- Introductions
- How did you get introduced to Python?
- Can you describe what Pants is and the story behind it?
- What is the scope of concerns that Pants is focused on addressing?
- What are some of the notable changes in the project and its ecosystem over the past 1 1/2 years?
- How do you approach the work of defining the target scope of the Pants toolchain?
- What are some of your guiding principles to decide when a feature request belongs in the core vs as a plugin?
- What are some of the ergonomic improvements that you have added to simplify the work of getting started with Pants and adopting it across teams?
- What are some of the challenges that teams run into as they start to scale the size of their monorepos? (e.g. project design, boilerplate reduction, etc.)
- How are you managing the work of growing and supporting the community as you move beyond early adopters/experts into newcomers to Pants and programming?
- How are you handling support for multiple language ecosystems?
- What are some of the challenges involved with making Pants feel idiomatic for such a range of communities?
- How does the use of Python as the plugin/extension syntax work for teams that don’t use it as their primary language?
- What are the architectural changes that needed to be made for you to be capable of integrating with the different execution environments?
- How would you characterize the level of feature coverage across the different supported languages?
- Now that you have laid the foundation, how much effort is required to add new language targets?
- What are the most interesting, innovative, or unexpected ways that you have seen Pants used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on Pants?
- When is Pants the wrong choice?
- What do you have planned for the future of Pants?
Keep In Touch
- Eric
- Eric-Arellano on GitHub
- @earellanoaz on Twitter
- Stu
- Andreas
- @andreasstenius on Twitter
- kaos on GitHub
Picks
- Tobias
- Last Kingdom on Netflix
- Eric
- Stu
- Andreas
Links
- Pants
- Make
- Earthly
- MyPy
- PyRight
- Pylint
- Flake8
- Bazel
- pre-commit
- Underpants library
- PyOxidizer
- Eric PyCon Talk
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to podcast dot in it, the podcast about Python and the people who make it great. Building data integration workflows is time consuming and tedious, requiring an unpleasant amount of boilerplate code to do it right. Riverie is a managed platform for building your ELT pipelines that offers the industry's first native integration with Python, allowing you to seamlessly load and export Pandas data frames to and from all of your databases, services, and data warehouses with a few clicks and no extra code. Riverie is hosting a live demo of their 1st class Python support on February 22nd. And when you use the promo code Python during registration, you will be entered to win a brand new series 7 Apple Watch.
Go to python podcast.com/rivery today to learn more and register. When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode. With the launch of their managed Kubernetes platform, it's easy to get started with the next generation of deployment and scaling powered by the battle tested Linode platform, including simple pricing, node balancers, 40 gigabit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to python podcast.com/linode, that's l I n o d e, today and get a $100 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show.
Your host as usual is Tobias Macy. And today, I'm interviewing Eric Arellano, Stu Hood, and Andreas Stenius about the pants builds tool and all of the work that has gone into it over the past year and a half. So, Eric, can you start by introducing yourself? Yeah. Absolutely. I'm Eric. I use they, them pronouns.
[00:01:55] Unknown:
I have been using Python for about 6 years in maintainer of pants for a little over 3 years, where I first started using the project as an intern at Foursquare and ended up changing my internship project about halfway through the summer to lead pants' Python 3 migration. Fell in love with the community and have been contributing to it since for a little bit working at Twitter and now working at a startup called Toolchain.
[00:02:18] Unknown:
And, Stu, how about yourself?
[00:02:20] Unknown:
Sure. I was introduced to Python via the Pants Project after spending a lot of time on the JVM. I've been working on pants for something like 8 or 9 years now. We'll get more into the history, but not too deep. It's a lot of history. And I think Python has been just a huge boon for the project, so I'm really happy to talk more about how we use it. And, Andreas, how about yourself?
[00:02:43] Unknown:
Yeah. My name is Andreas Stenius. I joined the Ants built community about a year ago and fell in love quite straightaway with design and the community and the spirit and everything there is. So I've been working closely with the team and became a maintainer last summer.
[00:03:07] Unknown:
I've been working on the Docker integration and the Docker backend in Pans. Eric and Stu, you already mentioned a bit about how you got introduced to Python. So, Andreas, do you remember how you first got introduced to Python?
[00:03:18] Unknown:
Introduced to Python as, DevOps developer at where I'm current currently employed. And it was 7 years ago, thereabouts. So I've been doing mainly Python development for past 7 years.
[00:03:34] Unknown:
And so in terms of the pants project itself, Stu, as you mentioned, it has relatively long history. And for folks that wanna dig into the details of that, I'll point them back at the previous interview that you and Eric were on about a year and a half ago. But for folks who haven't listened to that yet, if you wanna just give kind of the CliffsNotes version of what pants is and some of the story behind how it came to be and how you got to where we are today.
[00:04:01] Unknown:
Yeah. For sure. I think 1 of the things about Pants's history is that many build tools, and I'll explain a little bit more about what I mean there, start as single project build tools and then evolve in the other direction. They involve to support monorepos or to support larger projects that are made up of more units of code. PANTS started in the opposite direction. It started from attempting to support monorepos as well as it possibly could and then scaling down as much as we possibly can. So by a build system, I mean, we are a tool for executing all of the steps between writing your code and taking it to production, which involves running your tests as parallel as possible, running linters, formatters, type checkers, doing cogen for you such that you don't have to commit it to your repository, making your scripts as formal and checkable as possible, running REPLs, doing packaging, and publishing. So there are a lot of steps that you would otherwise have to script inconsistently and then sort of maintain scripts for.
All of those steps, we strive to pipeline, run-in parallel, run accurately and correctly. And so we integrate with all the tools you're used to, like PIP and Pytest and Black and MyPy and Twine and Docker. But then we attempt to provide the most consistent and easy to use interface that we can atop those. In many cases, those tools are shaped similarly. So you have maybe 8 or 9 linters that people will use commonly with Python. And knowing the right arguments to invoke all of them, you almost no 1 does. So they're going to put them in a script. We remove the need for that script, and we additionally prevent somebody having to write a script that's going to figure out how to invoke Winters in parallel on only the changed files.
So we're focused on improving the developer experience regardless of repo size. But as I said, our history has been in monorepos, and that's where we still shine the most.
[00:06:01] Unknown:
And to your point of the scope of concerns that are encapsulated by build tools, I'm wondering if you can maybe talk through the range of considerations that pants is trying to address and some of the pieces that you have explicitly opted to not try to include into the pants experience and defer to other systems or other tool chains?
[00:06:29] Unknown:
I can say that our bias has been toward inclusion into the core of pants for a few reasons. And 1 of them is that our plug in API isn't stable yet. And so when people write plugins for pants in the core, we gain the benefit of experience of seeing, oh, you know, this plugin, which does something, you know, vastly different from what we could have expected and is in the core and gives us experience about how easy the plug in API is to use. And we can sort of notice the rough edges and help. And it gives anyone who contributes to the project the benefit of us maintaining that code. So as to the scope, like, we would like to be surprised. We would like somebody to come along and say, hey. We wanna add a plug in for, you know, x tool that I've never heard of. That can absolutely be in scope if there are, you know, non private companies that wanna use it or even if there are, you know, enough private companies that wanna use it. As long as it's not completely custom internal code, like, we would be interested in accepting it. And just I think Andreas, in particular, has a lot of experience recently with, you know, understanding a use case better than we do and then contributing something, you know, hey. That's in scope. We just didn't know.
So thank you for bringing it with this Docker work. Yeah. We also take the approach that if
[00:07:47] Unknown:
you as a user experience a particular need or you're confused, for example, with a certain part of our docs, it's very likely that other people are also have the same problem, and we simply don't know that they do. So we always appreciate whenever people come on our Slack, for example, and share that they're confused with a part of our docs or that they have a certain future request because we assume that they're speaking for dozens of other users who might not even tried pants yet. But because of that feedback, they'll be able to have a better experience.
[00:08:19] Unknown:
And to the point of having a consistent interface for all of the different stages of the software development life cycle, there are a number of other tools that have aimed to have a similar approach with maybe the most venerable being the make tool, but some more recent entries being something like the Earthly project. I'm just wondering if you can share your thoughts on the considerations that go into building this common interface and the design of it, and maybe some of the ways that pants might be compared to or against some of the tools like Make or Earthly or some of the other environments to try to give this consistent experience?
[00:09:03] Unknown:
I think 1 thing that is definitely different from make, and I I don't have experience with Earthly, but would be interested, is that we are attempting to actually attach more semantic meaning to groups of things that you might wanna do. So pants has separate goals, capital g goal, for various tasks that you might wanna execute. And we're trying to bundle all of the things that have the same semantic meaning into 1 command line invoke. So if you wanna test, that's 1 goal. If you wanna lint, that's a separate goal. If you wanna check, that's a third goal. And lint and check are an interesting case because we've been differentiating between lightweight linters that are mostly about style and heavier weight checks like type checking. Right? So mypy and pywrite and even actually, ironically, pylint are fairly heavy checks that use the transitive dependencies, all of your code, essentially. Anytime they check anything, they take a lot longer to run. And so we sort of consider them to have a different semantic meaning than just linters.
Better example, though, is probably, like, packaging and publishing. They have a clear semantic meaning. If you were going to implement packaging and publishing in make, for example, you'd be doing it from scratch, and you'd be figuring out what should the semantics of publishing packaging be. Whereas when we model packaging and publishing, we're thinking through what that usually looks like for a user and making sure that the arguments are consistent and the behavior is consistent in performing.
[00:10:38] Unknown:
Another major differentiator for pants is that it has a really fine grain understanding of your project's dependencies. So it understands, for example, that this file depends on that 1, which depends on this other 1. And with that information, you can do things like only run tests that have been impacted. So if you change, like, a common library, we can figure out that 10 of your tests out of a 100 have been changed, and we don't need to rerun the others. That all happens automatically. Normally, to have that really fine grained information with older build systems like Bazel, you have to have a lot of boilerplate that you essentially duplicate your imports and have these things called build files where you explicitly teach the tool, this depends on that. And instead with pants, it's a major goal of the project that it'd be really easy to adopt and a joy to use. So we have a thing called dependency inference where we'll read your import statements for you and then map back those imports to the rest of your project.
So you get those benefits of fine grained metadata and fine grained dependencies without having a bunch of boilerplate that you need to maintain.
[00:11:47] Unknown:
Another aspect of having this common interface to all the different kind of broad tasks that you might do in the software development life life cycle, 1 of the other things that is interesting to think about as a user of pants is how much ownership of all of that you want to push fully into pants versus how much you might want to also execute via these other more dedicated tools. So the thing that comes to mind, most notably for myself, is pre commit, where I have a number of pre commit checks that I might want to run using their default sort of out of the box supported plug ins. So Flak8, MyPy, Black, all of those can run as pre commit checks. I can also have pants run all of those things.
And so I can get some measure of consistency by ensuring that they're deferring to, like, the setup dot config or Pyproject.toml for setting up how those different executables want to run. But there's also the question of, do I want pre commit to run all these checks independently, or do I want pre commit to call into pants, or do I want pants to call into pre commit? So just figuring out, like, what are the different contexts in which you want to execute which tool.
[00:12:57] Unknown:
Absolutely. And I think that we're similar to pre pre commit in that they are choosing a particular semantic task, which is everything that sort of, like, blocks committing. But by choosing that as their only goal, they are essentially limiting the scope of what you can do with pre commit. You're probably not going to run your entire test suite with pre commit because, you know, that's not gonna scale with your project. You don't wanna wait for that. Right? So they're going to sort of by choosing that 1 semantic scope of things that are fast enough to run just before I commit, That's sort of 1 of pants's multiple goals. So we provide a consistent interface across other tasks that you might do other than just the things that are fast enough to run before commit.
So if if pre commit is adding value, like, using pre commit to call us would get you some consistency in that you can then use pants directly or use pants in CI. I think having this the whole suite of goals is kind of an interesting thing because in common usage, you're gonna iterate on code by sort of, like, running your tests or linting or type checking. And you may not want to run all of those. You're gonna cherry pick which of the things you know are relevant at a particular point in time. You know that you haven't made enough. Change. You've only changed a comment. You're not gonna bother running all of the tests. Maybe you're just gonna run formatting because doc forter formatter might complain about that otherwise.
Like, the developer is usually picking and choosing how much they need to do at any given point in time, and we're just providing a consistent interface for that. Yeah. And that being said, another
[00:14:39] Unknown:
major focus we've really focused on in the past year is allowing you to incrementally adopt pants that we know a lot of organizations already have really big repos and might have workflows and tools that are working well for them. And we want it to be easy that you can incrementally add pants that you might start with only using it for your formatters. In winters, for example, we're only using it for test. So there have been a lot of features where we can complement the workflow that you already have. For example, 1 of our users is still sticking with their current test runner while they migrate, but they're using Pants's dependency information that I was talking about earlier to grab the metadata about what tests should we run. So they run Pants for the query and then pipe it into their original test workflow.
[00:15:28] Unknown:
At work, have you used, pre commit to invoke pants to get the consistent output? Like, there shouldn't be a difference if I run the pre commit hook or if I run pants directly, it should say the same about the current state of my code. So if pre commit run the same tool, but potentially with some different configuration, that would possibly not be the case. And it has worked out really well, I think so.
[00:15:57] Unknown:
And in terms of the pants project itself, as I mentioned at the beginning, we did an episode about the pants project about a year and a half ago. And since that time, it seems like there's been a very rapid uptake in the pace of development and the size of the community and the number of capabilities that have been built into it. I was I'm wondering if you can give some of the
[00:16:18] Unknown:
notable changes in the project and its ecosystem and community that have taken place over that time. I would say that the largest change in the last year has been we've spent a a lot of time going more polyglot, adding some more languages. In particular, we've added support for Go, Scala, and Java. And if you if you count our Docker integration, which is fantastic job with Docker files are a language in and of themselves, I suppose. We've learned a huge amount and improved the plug in API to support those languages better. And users have sort of been satisfied that they've been able to keep this consistent interface across multiple languages within their repository, which is sort of 1 of the premises that we are attempting to fulfill.
We also have spent more time improving the lock file story. I think in our last conversation, 1 of the gray areas was a question of having incompatible projects within a monorepo, those with, you know, sort of overlapping requirements where 1 library requires 1 version and 1 library requires another. Having those overlapping requirements without having a global lock file is interesting because how single project poly repo tools like Poetry and Talks and Piplock would do or PIPFREASE would do it is they'd have a lock file per project or unit of code.
And we think that we found a really great design that sort of threads the needle between 1 lock file for your entire repository or 1 lock file and potentially inconsistent, incompatible dependencies between all of your units of code.
[00:17:58] Unknown:
And we're hoping to ship that in the next few few weeks. We can definitely talk more about that. Yeah. Beyond code changes, you're right that the community has grown a lot, the past year. We've had a lot of new users and organizations, joining our Slack every day. 1 of the things I've been most excited about is honing in on what being a part of our community means. Originally, we used to primarily think of contributions in terms of code. And this past year, we restructured everything that we no longer call them our maintainers committers, but we now call them maintainers. And we recognize that contributions take a ton of different forms, including docs. But 1 of the big things we've focused on is how useful it is to get feedback from where things are confusing in future requests.
Even if you never write a piece of code or don't change docs, simply letting us know how you're using Pants is extremely helpful to our project so that we can focus on making the tool more useful for everyday users.
[00:19:01] Unknown:
Going back to the question of the scope of the project and figuring out what you want to build, Obviously, if somebody comes to you from the community with a contribution, there's not as much scoping to do there. But there is also the question of, does this belong in core or as a plug in? And I'm just wondering if you can talk through some of the conversations that you have as the core maintainer team and some of the ways that you engage with the community to figure out what is the overall scope of what belongs in core versus what belongs as a plug in and how to prioritize the baseline capabilities that the pants tool should provide out of the box and the interfaces that it can expose to give control to end users to be able to customize to suit their needs?
[00:19:49] Unknown:
I think 1 useful recent example was that the AutoFlake plug in starts to cross the line between a auto formatter and a sort of a fixer, where it does potentially change the semantic meaning of your code by deleting import statements, and it's incredibly useful. That was a contribution from the community, and I think we have to continue to bias toward inclusion in all cases because they force us to think about where our semantic boundaries between these goals are wrong. I think the hit rate of, hey. You know, somebody considering whether something should go upstream or not is pretty high. And there's survivorship bias, of course, because there are probably a lot of things that people keep private and don't tell us about.
But when people, you know, have even an inkling that something might be useful to the wider project, our answer is is yes. Yes. That is useful. And something like, you know, 95 or 98% of use cases that people have so far fit into, you know, the goal the buckets that we have designed, the goals. And I think that's been reassuring. We continue to, like, to be surprised. If somebody really wants to lean in on making deployment something that we should be orchestrating with pants, like, that that's something we could definitely discuss. It's not something we have a goal for yet. Right now, we will go as far as packaging and publishing, but we're not necessarily gonna orchestrate, you know, the restarting of your cluster, for example. People might have custom goals for that. It's something that we'd be willing to discuss and include. But it's just always a learning experience to have potential contributions. And so that's where we're at. Bring your ideas, and we're gonna bias toward bringing them on board. To your point of making it easier to incrementally adopt pants and easier to get started. I know that the tailor capability
[00:21:42] Unknown:
hadn't quite made it into the core or was just very recently added to the core the last time we spoke. And I know that that has been going through some evolution, and I've used it myself a few times. So I'm just wondering if you can talk to some of the ergonomic improvements that you've added to make it easier to manage the adoption and getting started and kind of reduce the level of effort that's required to be able to try out pants and understand if it's right for your organization?
[00:22:11] Unknown:
Yeah. We have a really strong belief in the project that the tool should adapt to you rather than you having to adapt to the tool. So we set up hints intentionally that we can handle multiple different code structures. Like we were talking about, it's possible and hopefully easy to be able to integrate pants incrementally in addition to your current workflow. So we made a couple of changes to make the onboarding process even faster. 1 of them that you're talking about with Taylor is that we have these build files, which are usually only 1 to 2 lines that give us metadata about your code. So you can do things like setting a timeout on certain tests.
Now we'll scan your repository and then generate those 1 to 2 line files for you. And then with that, we also are inferring your dependencies like we talked about. Another really important part of ergonomics that we think about is the difference between power users and everyday users. When you use pants, that's something that usually every engineer at your organization will use. And we hear a lot from power users who are the people on our Slack or who are opening GitHub issues. But we try to think a lot about that everyday user who might not be as active, but is still using Pants, and put a lot of focus on optimizing our experience for them, which includes things like really intense focus on error messages that we honestly, we assume that most everyday users don't very thoroughly read our docs. And And rather than expecting users to change and adapt to us, we try to adapt to the user. So within the past year, we've audited a lot of our error messages and set them up, rewritten them, and improved them so that even if you didn't read our docs, you can intuit what's going on and figure it out.
[00:24:05] Unknown:
In terms of the challenges the teams run into as they're starting to work through the adoption of pants and maybe they're starting to move into a monorepo structure for their code for the first time. What what are some of the complexities that they run into as they're starting to figure out how to architect the repository layout, how to architect the workflow of their pants configurations, figuring out what are the appropriate places to add the Python case, what are the appropriate places to add the Python distribution configurations versus just letting it be a Python source target, the types of custom plugins that they might want to build to simplify their workflows to say, you know, this is my version schema to be able to say across the board. Whenever I run pants, it will generate the right version for setup dot pie. Just any of those kinds of considerations as they're starting to scale adoption?
[00:25:05] Unknown:
I think at a fundamental level, monorepos are about a few things, but the primary thing that they're about is sort of a desire for consistency and scalability. So it really matters, and not just across the projects that you have, but maybe also across multiple languages if you do have multiple languages in a repository. And so the challenge is adopting pants and adopting a monorepo depend a lot on whether you're sort of, like, converting from a poly repo, having lots of projects in different repositories, or converting from a monorepo using a different tool to using Pants. And so those 2 cases are pretty different, the challenges you encounter. The thing when converting from poly repo to monorepo is you are probably already inconsistent unless you've done a huge amount of work to reuse the boilerplate.
For example, if you've used a template generator, you've generated the code in 1 place, but then you've committed it. And so it can diverge because people people are gonna edit, you know, all that boilerplate, and they're gonna end up with inconsistent requirements and scripts and all this. So going from poly repo to monorepo, you know, it really depends how quickly you're trying to apply the consistency or get the consistency. And so while adopting pants, like, part of why I think our resolve strategy that I mentioned earlier, the idea of not necessarily having a single global resolve for your repository and not necessarily having a resolve per project is it's definitely a spectrum between long time monorepo, incredible consistency of all the projects, and you're using a single version of almost everything versus just onboarding to that experience.
And it's not necessarily the case that a monorepo that is 100% consistent and doesn't allow the use of multiple versions, like some tools sort of make difficult, is better. You know? It's not always the case that stricter is better. It's a spectrum, and there are having the flexibility to have inconsistency is an important thing. So depending on which end of the spectrum you're migrating from, for people migrating from a monorepo with a different tool, we might be making things more flexible and easier. And going from a poly repo, we might be applying the consistency that you've been lacking by having, you know, a bunch of copy pasted code in a bunch of repos.
So, hopefully, this results strategy, we're we're very optimistic the next few weeks, about a month that we'll have more to share on that. The other thing is, you know, anytime somebody comes to us, they have huge test suites. And I think our goal is that pants makes CI a 1 liner sort of regardless of your repository size. We have a long way to achieve that, but CI being a 1 liner regardless of repository size requires a lot of things. It requires caching. If your repository is huge, it might require remote execution, which we also support.
If you want to run a variety of different goals and maybe even package and publish, you'd like to be able to include all of those on 1 line. If you include all those as your invoke of pants, you would want it all to run concurrently. So we think there's an opportunity to remove a huge amount of sort of the boilerplate of CI config and and the lock in that you have at various CI providers of huge amounts of YAML and and probably YAML generators YAML generator generators to essentially make the CI experience very, very similar to what you run on the command line, but just for a smaller scope. Right? I'm testing just this 1 file as opposed to the entire repository, but the command is similar. And the scalability means you don't need an entire separate framework for CI versus local.
So for teams onboarding to monorepos, that's sort of this promise that we'd like to achieve. And the challenge is always that the larger the project, the more work it is to achieve that goal. So we'll continue working on that, and we're we're happy to help teams sort of, like, onboard to this monorepo experience.
[00:29:13] Unknown:
And to that point of consistency and onboarding and particularly as you're expanding into supporting multiple different language runtimes, 1 of the complexities that comes about there is being able to manage the execution environment, which a lot of developer teams these days are leaning on Docker for that. And I know that, Andreas, you recently added support for Docker natively into the pants build tool chains. I'm wondering if you can talk to how that manifests and what that workflow looks like for people who are using pants and want to be able to use Docker to manage the actual execution context without having to do a bunch of setup on developer machines or having to replicate that in their CI ecosystem.
[00:29:55] Unknown:
1 of my pain points I was looking forward to solve when I discovered pants was to be able to deprecate and get away from our custom built tools we have built around how to build and manage our Docker images. So we have this kind of custom Docker build tool that we call Welder. That's basically the version of a multistage Dockerfile. Before Docker had support for multistage. So what it does is set up all the different arguments to run Docker, to build around the images in sequence, various pipelines, and pushing tagging images right left. So maintaining that, those build pipelines using that tool was becoming becoming increasingly difficult.
Enterpants and fell in love with, engine and the rule system and thought that, hey. It shouldn't be too difficult to implement our Docker infrastructure needs into pants. And reading more about it, I noticed that there was a demand or other that had asked for the feature to support Docker in pants. So when I raised my hands last summer and said, I would be interested in implementing Docker support for pants. So I gotta go ahead for that. So the experience was a real delight. Implementing it incrementally with basic support for just invoking a simple Docker build command, integrating with Taylor to generate the build file necessary for adding the Docker image target that you have. So what pants needs to know in order for you to use Docker is just point at the Dockerfile and whatever dependencies that you want to have included in your build context for Docker.
If you use the PEX file, it's a Python executable that you can package with PANS. You can even infer the dependency on that from your Docker file. From that, we have built on top of that to manage published images to registries. And, also, you can chain your Docker images. If you have a common base image, you build that first to then go on and build your other images that depend on that base image.
[00:32:36] Unknown:
And thanks to the infrastructure that we already have, all of that works with change detection. So if a file has been edited, the Docker images, perhaps, you know, patch a whole chain of them that depend on that file will show up in change detection. You can determine which Docker images need to be republished. And I think from an architectural changes perspective, this gets to my point about CI and making that essentially a 1 liner. I think to achieve that, you would need to actually rebuild the relevant images. You might, if you had any sort of native code going into those Docker images, want to execute the compilation of the relevant wheels either inside the Docker image or outside of them, but in parallel. Right? For as many Docker images as you have, you might want that to execute on a remote machine if it can.
And so the Docker support is fantastic. 1 of the frontiers that we're we'd like to continue to explore with it is continuing to remove the steps sort of before and after pants in CI, which might include, well, okay. I've got a custom wheel that I need to build in order to put it in the Docker container, or I'm going to, you know, invoke Docker to do something before the build. If we continue to expand our support for essentially cross building in Docker, Your execution platform might be macOS. It might be Linux, but you essentially cross build into Docker from your local platform. That's sort of, you know, a local developer running on Windows or macOS transparently using Docker only when they need to for the cross building portion is something that we'd like to continue to push in the next year. There's improvements planned in that area.
Right now, you know, cross platform Docker builds are possible as long as they're in native code. So
[00:34:28] Unknown:
we'd like to improve that. In terms of the support for multiple language runtimes, that's also an interesting challenge beyond just managing the execution context as far as this goal of consistency and ease of adoption for end users of the pants builds tool. And given that pants itself is written largely in Python with a Rust core for the execution engine, I'm wondering how you've approached the sort of design of the experience for these additional language runtimes so that it feels idiomatic and approachable for people who are maybe not Python native or maybe don't even use Python in the repository at all?
[00:35:09] Unknown:
Yeah. It's a great question. So we support now Go, Java, Scala. We've supported Python for past 10 years. And 1 of the really interesting ones is shell support. And with each of those languages, we spend a lot of time first thinking about what are the unique strengths of pants in this ecosystem and what does this ecosystem already do well. We very much view the rule of pants as complementary. So for example, with Shell, pants hooks up with the amazing shell check cleanser and the sh formatter, which is kind of like black. It will make your scripts pretty automatically. And then a unit test framework called shunit2.
And we decided to focus on those 3 things with shell rather than trying to hook up with things like running or packaging because Shell already has really good simple support for things like executing your script. And we didn't think that pants would add that much value to it, whereas we could add a lot of value that will install those tools like ShellTrack for you, making sure that everyone, whether you're in CI or you're different developers, that they're all using the exact same version, then run it with this consistent interface. We'll run that all in parallel so that you can run shell check at the same time as flake8 and black and isort and so on.
Same with go. Go already has really strong tooling. So we actually leverage a lot of the underlying Go tooling and make it better with things like this consistent interface and benefits like caching.
[00:36:45] Unknown:
Yeah. And I would also say that the check goal that I referenced earlier is an interesting example of of making a consistent experience across multiple language ecosystems. Mypy had actually been in a goal called type check until recently. And we deprecated that goal and renamed it to check because there is a shared semantic meaning across all of Python, Go, and JVM languages like Java and Scala, which is that you want to do as much as possible to ensure that your code is correct. And that might be type checking, but it might just be compilation. And so for Go, Java, and Scala, it is compilation. The the check goal runs compilation.
And it does so in a similar amount of time, sort of in relative terms to mypy, you know, type checking your code. Mypy is definitely a little bit faster. But meeting your transitive dependencies in this goal is sort of a common thread. So the check goal is an example of, you know, finding the shared semantic meaning across languages. It's definitely not linting. It's doing something more heavyweight. You definitely wanna run it before you submit your code. You may not necessarily wanna iterate on it for absolutely every edit. Maybe you do. So I think that's an example of this consistency that we're trying to apply. To that point of consistency
[00:38:04] Unknown:
across these different language environments, how would you characterize the overall, I don't know if you'd call it feature coverage or coverage of specific targets or goals that are supported across these environments and any kind of foundational changes that were necessary in Pants itself to be able to support adding these different runtime environments?
[00:38:26] Unknown:
Go was interesting in that users have very different expectations of what they are going to build. It's directory centric, which isn't really super common. Python's very file centric, and Java and Scala are as well. So as we added these languages, we had some surprises. But at the same time, I think the JVM languages were pleasantly straightforward to add. So I think the foundation that we have has proved itself to be really useful. At the same time, we have definitely noticed now with half a dozen languages that there's some boilerplate for plug in authors that we'd like to remove. So we will definitely be doing a little bit more work as Go and Java and Scala are stabilized themselves to remove that boilerplate internally so that, you know, the next dozen languages are added.
There's less for rule authors to either trip on or have to sort of mindlessly copy. I would say that the other thing that we've definitely seen as we've added these other languages is the dependency inference has been a success in all of them. It's useful sort of regardless of how structured your import statements are or aren't. You know, Java and Scala have gained sort of significant benefit in that you can compile at a very fine grained level in Go. It's just an expectation that you don't have to, you know, write a bunch of metadata about your build in order for things to compile because the Go tooling uses your import statements sort of the same way dependency inference does. Likewise, you know, the Python ecosystem people don't want, you know, to repeat themselves, and we are trying to avoid that. So I think dependency inference scaling to all these languages has been really important as well. I think as we continue to expand language support, I think it'll be interesting to lean in further on the assumption that there is dependency inference and see what we can do to further lower the either the boilerplate for users when they're, you know, creating a repository or for plug in authors.
You know, what would it look like for dependency inference not to be optional? How much can we remove in that case? It's on by default, to be clear.
[00:40:29] Unknown:
And as far as the growth of the community and overall adoption of the project, I'm curious what your strategy is for being able to scale the community and scale the interaction and engagement patterns that have helped you go from small scale to where you are now and how you're thinking about the continued growth of the ecosystem now that you seem to be kind of at a tipping point where you're adding these additional language environments. You're trying to expand beyond your initial base of enthusiastic and power users into people who are finding it and maybe just want to have something that runs and they don't necessarily care about being as enthusiastically engaged or just some of that overall community growth aspect of the project?
[00:41:15] Unknown:
My first interaction with the community when I approached pants was to ask about a feature that I thought was missing. And the response I got really quickly from Eric was that, sure, why don't you get put up a PR for it? And so I was rather delighted in the welcoming spirit. The welcoming spirit was really encouraging for me to contribute more and to get to know pants better and learn more about it. And I think I see that in other members of the community too that come in, and they're enthusiastic about what they see. And they have all these ideas, how those ideas are welcome, how they are received that encourages people to stay and continue to invest and go deeper into the community and becoming contributors or maintainers
[00:42:28] Unknown:
in the long run. Yeah. I think as a maintainer, it's always a question how much time do we spend directly interacting with the community and, for example, mentoring possible new contributors. 1 thing that really helps frame our perspective here is the idea of the curse of knowledge, which in some Buddhist circles is called beginner's mind. That curse of knowledge is once you learn something, it's really hard to go back to where you didn't know that thing before. So once you're a PaaS power user, it's hard to remember what it was first like when you were using it and take that perspective no matter how much you try. And like we were talking about earlier, we think a lot about power users versus those everyday users or people who are just using this to get something done. We want it to be a great experience for everyday users.
So we very intentionally seek that feedback from beginners and think that beginners often have, perspective that makes the entire project a lot better. We're always eager to actively support people who are trying out pants for the first time, who wanna do a new contribution. We often pair program with them, for example. And beyond helping them to have a good experience, it's also helping the project that we get to see things from their perspective.
[00:43:47] Unknown:
Yeah. And it's also the case that a huge number of people who use pants will never contribute to it. We love any sort of contribution, and I guess that depends on the definition of the word contribute. A lot of people's contribution might also just be answering questions. So in terms of scaling a project, building a community that's welcoming and that sort of echoes and people pass on the assistance that they received, it goes a long way. You know, identifying which people to encourage contributions from or patches, you always ask, but you're also willing to dive in and do it yourself.
So we love all contributions, but if you can't, you know, we're still we're always willing to help. Another aspect of the
[00:44:31] Unknown:
community growth and community engagement and the ease of adoption is having useful examples that you can point to of this is how you use pants, or here's a list of plug ins that you can use in your project that are generally available. And I'm curious what your thinking is as far as how to encourage that level of contribution and adoption necessarily retread the same ground with everybody. You can say, you know, if you're necessarily retread the same ground with everybody. You can say, you know, if you're coming from this language ecosystem, here is kind of the reference implementation of how you can get up and running. Here are a set of useful plug ins and things like that. And also just because of the fact that a lot of the plug in development happens inside the monorepo, ways to think about architecting that experience so that it is more conducive to contributing those plug ins either back upstream to pants or into a, you know, a repository or even just like an awesome list on GitHub or something like that?
[00:45:33] Unknown:
Yeah. I can say that we love when people contribute examples because it's a demonstration of how much boilerplate we still have. It also is sort of a way for us to learn how people learned about the project. Right? If they didn't discover some feature and we can help improve their example, that's a lesson that we can take back to improve our documentation so that when they're getting started, they don't they don't need to, you know, bend over backwards. And I think the other thing about example code is that it's always a good thing to look at in terms of how much boilerplate you have. Right? If our examples all consisted of a single line, like I was promising with CI promising that, then we would know that we had no more boilerplate to remove. Right? What's the example? Well, I run this 1 line, and you run that 1 line. Oh, how can we reduce that further?
So the size of any given example, you wanna push that down. You wanna play a game of golf in terms of the total number of lines required to accomplish some goal. So we love seeing contributed examples. Also, to be clear, our goal is absolutely to stabilize the plug in API and not require that people write things in poor. I really hope that that is going to be the the when we blossom in terms of lists of third party plug ins existing. Right? Right now, you know, we absolutely love a third party plug in. It's going to be easier in many cases for both the contributor and any consumers of the plug in if it lives in core because we can essentially maintain it for you. We can continue to expand the API.
3rd party plug ins are absolutely something that we want to, you know, further encourage. We promise that we're gonna make that easier in the future.
[00:47:20] Unknown:
As you have been growing and scaling the community and the project itself and growing the number of use cases that it supports, what are some of the most interesting or innovative or unexpected ways that you've seen it applied?
[00:47:33] Unknown:
I tend to push the boundaries of what you're meant to do with a piece of software or technology or anything, really. 1 of the first thing my mind started doing when I discovered pants was, where else can I use this? So I have a kind of proof of concept project where I try to leverage the rule engine of pants for any kind of Python application. So instead of writing kind of plugins and build files and running the regular pants command line to get into the engine, you can instead use the pants engine as a library that you load from your Python code and set up the engine and then just start writing rules right off the bat. It was surprisingly easy to get working actually so there's a underpants kind of library that's does that.
[00:48:38] Unknown:
No pants, the name that keeps on giving. I was really appreciative recently. We had somebody contribute a high oxidizer plug in, and it was mind bending because when I first considered integrating PyOxidizer because PyOxidizer is written in Rust and PANCES as well. I initially went down the path of attempting to integrate directly with the Rust code, but a contributor came along and essentially dropped it in atop our Python distribution support. So pyoxidizer is going to be supported relatively soon in an experimental fashion. Probably, the 2 10 release will have some experimental support.
And I think that was innovative and unexpected and how simple it ended up being when I ignored the fact that there was Rust involved in both of these projects, and they can integrate across the distribution boundary, the Python distribution boundary. So I think that's both innovative and and exciting is a an alternative to Docker for folks who know their deployment environment really, really well.
[00:49:40] Unknown:
And in your experience of being contributors and maintainers to the pants project and I'm sure consumers of it as well, what are some of the most interesting or unexpected or challenging lessons that you've learned in the process?
[00:49:53] Unknown:
For me, it's been how awesome of a combination Rust and Python is when used together. For any Spanish speakers, I actually have a Python talk last year about when to use Rust native extensions. So about 70% of pants is written in Python 3, and about 30% our core engine is written in Rust, which is the engine that schedules everything, like how to sequence which task and handles things like caching. Rust is pretty new for me. I've only learned it in the past year. There's this amazing library called pyo3 that makes it surprisingly easy to integrate Rust and Python, that we have both Rust code code we wrote in Python and the other way around.
[00:50:38] Unknown:
Yeah. And I would say that something that's not necessarily new and probably a classic issue is just that async code where you're no longer using operating system thread stacks, whether it be in Python or on the JVM or in Rust, is kind of a pain to observe and to apply metrics to and to get stack traces from. Luckily, we have sort of a growing body of infrastructure to make all of that possible, but it does, in some cases, end up coming down to creating your own tools, for, you know, observability and performance work, which is definitely a focus for us. Go, for example, you know, it can't use the operating system level stack traces. You need custom tooling for that. So async observability, async has all the benefits, but few downsides as well.
[00:51:30] Unknown:
And so for folks who are interested in being able to have consistency of their experience across the software development life cycle? What are the cases where Pants is the wrong choice and they're better off using either discrete tooling or some homegrown solution or something built into their CI framework, etcetera?
[00:51:51] Unknown:
Sure. So the most obvious answer is that there's still some languages and ecosystems that we don't have first class support yet for. A big 1 is JavaScript. There are some ways to get JavaScript working with Pants, but in general, we find that most users for now are still managing JavaScript, using tools like Yarn and NPM and integrate with Python. That's 1 of the biggest things that this next year, we just are wrapping up a community survey that we hear from our community that we'd love to add proper first class support for. So even if Pants doesn't yet support your language, the plug in API means that we can add support and the community is really responsive. But that's an obvious case where you might need to either use multiple different workflows and tools.
[00:52:40] Unknown:
And the amount of boilerplate required to get a project going is still a thing that can mean that pants is not absolutely perfect for your tiny, tiny, tiny project. Right? The bar on whether pants becomes useful, continues to move, and hopefully, we continue to move it in the direction of, sure, as soon as you hit 200 lines, you know, pants is worth adding to your project. Right? We'd like that bar to be ever lower.
[00:53:05] Unknown:
And I think we've done a reasonable good job, but there's always more to do. You've mentioned a number of things you have planned for the near to medium term future of the pants project. And so rather than digging more into that, I guess, I'd be interested to explore some of the sustainability of the project and how you're able to spend so much of your time and focus on continuing to scale it and grow it and make it available for end users?
[00:53:31] Unknown:
Well, I can say that the pants community has ever more open source maintainers from more diverse backgrounds. That's a great thing. We're super happy to have Andreas. We have another maintainer that came on board recently from another organization. And that's always helpful. Like, you want your open source project to have a really diverse community, and that is our goal. At the same time, we also have corporate backing, and that is a useful thing. Right? People need support in their project when they have, you know, either enterprise use cases or other things. So I think that we are continuing to to strive for a good balance of open source governance
[00:54:08] Unknown:
and corporate support when you need it. From my perspective, it is either we develop the tooling ourselves in house with a custom tool chain and everything and with all the maintenance that comes with it, Or we can get involved with an open source 1 where we have a whole community that will pitch in and help develop it. We can develop the features that we think make sense for us and get the benefit of all the additional features we didn't even think of from everyone else. Together, we maintain it and bring it forward. So it's a win win situation to be part of.
[00:54:47] Unknown:
Well, for anybody who wants to get in touch and follow along with each of you, I'll have you add your preferred contact information to the show notes. And so with that, I'll move us into the picks. And this week, I'm going to choose a new show I started watching recently called Last Kingdom on Netflix. It's just a very engaging and well written and well executed historical drama fiction about the kind of early middle ages or dark ages in England and focused on the invasion of the Danes and their conquest of England and surrounding regions. And it's really well done show, so definitely recommend that for folks who are looking for something to watch. And so with that, I'll pass it to you, Eric. What do you have for a pick this week? Sure. My pick is
[00:55:29] Unknown:
a new show on Netflix from Jonathan Van Ness, who is 1 of the hosts on Queer Eye. And Jonathan has a new show called Getting Curious. 1 of the episodes that came out last week is a 30 minute segment on gender non binary people, which like I imagine most listeners grew up not really realizing that non binary people exist and how much gender controls things like what we how we dress and how we talk and what sports you can play and so on. So I thought it was a really engaging and informative 30 minute episode that can possibly help you better get to know your coworkers or family members or even open source maintainer.
[00:56:08] Unknown:
And, Stu, how about yourself?
[00:56:10] Unknown:
I've really enjoyed the Checks and Balance podcast, which is sort of the con American podcast. They really introduce history in a useful way, and so it helps to put, you know, the story of the day in context. So every episode, you know, is a good lesson, not just about the present, but but also the past. So And, Andreas, how about you? What's your pick for this week? My pick would be the,
[00:56:32] Unknown:
book by Andy and Dave, the pragmatic programmer. I've read it many years ago when I was starting out as a software developer. It has influenced me deeply from then on. So I can hardly recommend it. Publish the 20th anniversary
[00:56:51] Unknown:
edition a few years ago. It's a classic. And not only has it supported my programming career, it's also literally supporting my laptop right now by raising it a few inches off the table. So great book. I have it memorized, so I don't need to crack it very often.
[00:57:07] Unknown:
Another related 1 that's really good is The Effective Engineer.
[00:57:11] Unknown:
Alright. Well, thank you all very much for taking the time today to join me and share the work that you've been doing on pants. It's a tool that I've been enjoying using and has helped a lot with some of the projects that I'm building at work. So thank you all for the time and energy you put into that, and I hope you enjoy the rest of your day. Thank you, Tobias. It's always a pleasure. Thank you. Thank you for listening. Don't forget to check out our other show, the Data Engineering podcast at dataengineeringpodcast.com for the latest on modern data management.
And visit the site at python podcast dotcom to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email host at podcastinit.com with your story. To help other people find the show, please leave a review on Itunes and tell your friends and coworkers.
Introduction and Overview
Meet the Guests: Eric, Stu, and Andreas
The History and Evolution of Pants Build Tool
Scope and Goals of Pants Build Tool
Integration with Other Tools and Incremental Adoption
Recent Developments and Community Growth
Ergonomics and Onboarding Improvements
Docker Integration and Cross-Platform Builds
Supporting Multiple Language Runtimes
Scaling the Community and Ecosystem
Innovative Uses and Lessons Learned
When Pants Might Not Be the Right Choice
Sustainability and Future Plans
Picks and Recommendations