Summary
In a software project writing code is just one step of the overall lifecycle. There are many repetitive steps such as linting, running tests, and packaging that need to be run for each project that you maintain. In order to reduce the overhead of these repeat tasks, and to simplify the process of integrating code across multiple systems the use of monorepos has been growing in popularity. The Pants build tool is purpose built for addressing all of the drudgery and for working with monorepos of all sizes. In this episode core maintainers Eric Arellano and Stu Hood explain how the Pants project works, the benefits of automatic dependency inference, and how you can start using it in your own projects today. They also share useful tips for how to organize your projects, and how the plugin oriented architecture adds flexibility for you to customize Pants to your specific needs.
Announcements
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- Python has become the default language for working with data, whether as a data scientist, data engineer, data analyst, or machine learning engineer. Springboard has launched their School of Data to help you get a career in the field through a comprehensive set of programs that are 100% online and tailored to fit your busy schedule. With a network of expert mentors who are available to coach you during weekly 1:1 video calls, a tuition-back guarantee that means you don’t pay until you get a job, resume preparation, and interview assistance there’s no reason to wait. Springboard is offering up to 20 scholarships of $500 towards the tuition cost, exclusively to listeners of this show. Go to pythonpodcast.com/springboard today to learn more and give your career a boost to the next level.
- Feature flagging is a simple concept that enables you to ship faster, test in production, and do easy rollbacks without redeploying code. Teams using feature flags release new software with less risk, and release more often. ConfigCat is a feature flag service that lets you easily add flags to your Python code, and 9 other platforms. By adopting ConfigCat you and your manager can track and toggle your feature flags from their visual dashboard without redeploying any code or configuration, including granular targeting rules. You can roll out new features to a subset or your users for beta testing or canary deployments. With their simple API, clear documentation, and pricing that is independent of your team size you can get your first feature flags added in minutes without breaking the bank. Go to pythonpodcast.com/configcat today to get 35% off any paid plan with code PYTHONPODCAST or try out their free forever plan.
- Your host as usual is Tobias Macey and today I’m interviewing Eric Arellano and Stu Hood about Pants, a flexible build system that works well with monorepos.
Interview
- Introductions
- How did you get introduced to Python?
- Can you start by describing what Pants is and how it got started?
- What’s the story behind the name?
- What is a monorepo and why might I want one?
- What are the challenges caused by working with a monorepo?
- Why are monorepos so uncommon in Python projects?
- What is the workflow for a developer or team who is managing a project with Pants?
- How does Pants integrate with the broader ecosystem of Python tools for dependency management and packaging (e.g. Poetry, Pip, pip-tools, Flit, Twine, Pex, Shiv, etc.)?
- What is involved in setting up Pants for working with a new Python project?
- What complications might developers encounter when trying to implement Pants in an existing project?
- How is Pants itself implemented?
- How have the design, goals, or architecture evolved since Pants was first created?
- What are the major changes in the v2 release?
- What was the motivation for the major overhaul of the project?
- How do you recommend developers lay out their projects to work well with Python?
- How can I handle code shared between different modules or packages, and reducing the third party dependencies that are built into the respective packages?
- What are some of the most interesting, unexpected, or innovative ways that you have seen Pants used?
- What have you found to be the most interesting, unexpected, or challenging aspects of working on Pants?
- What are the cases where Pants is the wrong choice?
- What do you have planned for the future of the pants project?
Keep In Touch
- Eric
- Eric-Arellano on GitHub
- @EArellanoAZ on Twitter
- Stu
Picks
- Tobias
- Cursed TV show
- Eric
- Stu
- Faster Than Lime blog
Links
- Pants
- Foursquare
- Toolchain
- Bazel build tool
- Ant build tool
- Monorepo
- isort
- Tox
- Poetry
- distutils
- setuptools
- mypy
- Bandit
- Flake8
- Sample Python Pants Project
- gRPC
- Protocol Buffers
- Rust
- GIL == Global Interpreter Lock
- PEP 420
- Blog post about using Pants to migrate from Python 2 to 3
- Pex
- Shiv
- PyOxidizer
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to Podcast Dot in It, the podcast about Python and the people who make it great. When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode. With the launch of their managed Kubernetes platform, it's easy to get started with the next generation of deployment and scaling powered by the battle tested Linode platform, including simple pricing, node balancers, 40 gigabit networking, dedicated CPU and GPU instances, and worldwide data centers.
Go to python podcast.com/linode, that's l I n o d e, today and get a $100 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show. Feature flagging is a simple concept that enables you to ship faster, test in production, and to do easy rollbacks without redeploying your code. Teams using feature flags release new software with less risk and release more often. ConfigCat is a feature flag service that lets you easily add flags to your Python code and 9 other platforms. By adopting ConfigCat, you and your manager can track and toggle your feature flags from their visual dashboard without redeploying any code or configuration, including granular targeting rules.
You can roll out new features to a subset of your users for beta testing or canary deployments. And with their simple API, clear documentation, and pricing that is independent of your team size, you can get your first feature flags added in minutes without breaking the bank. Go to python podcast dotcom/configcat today to get 35% off any paid plan with code Python podcast or try out their free forever plan. Your host as usual is Tobias Macy. And today, I'm interviewing Eric Arellano and Stu Hood about pants, a flexible build system that works well with monorepos. So, Eric, can you start by introducing yourself?
[00:02:02] Unknown:
Sure. My name is Eric Arellano. First got involved with the pants project about 3 years ago as a summer intern at Foursquare, and my project that summer was to lead pants's Python 3 migration. I fell in love with the community and kept working on pants as an open source contributor after that. And then got offered a job to work for Twitter, working on pants, and now work on pants for most of my time as a software engineer at a start up called Toolchain.
[00:02:31] Unknown:
And, Stu, how about yourself?
[00:02:33] Unknown:
Yeah. I'm Stu Hood. I got started working on pants at Twitter many years ago. As we'll get into later, the the pants project has sort of a long history that I've been involved with for a little while now. Other folks not on the call are Benjie and John Sarris and are they've been fundamental in getting the project started. But in the past few months, I also have ended up at ToolChain, which is a company doing some some great work in this space.
[00:03:00] Unknown:
And, Eric, going back to you, do you remember how you first got introduced to Python?
[00:03:04] Unknown:
Yeah. It was actually, earlier internship that I had my freshman year of summer. The first language I learned was Java, and I remember learning Python for that internship, being amazed at how much easier programming could be. My love for Python has really continued since then. Actually, in between graduating and working for toolchain, I was a middle school computer science teacher for a couple months, teaching intro Python to 12 13 year olds. And it was a really cool experience being able to teach them explaining that the same thing that they were using was the same language used to create Instagram.
That was really empowering for a lot of the students to see how powerful the language is. So we'll talk a little bit later today about why we use Python for pants and why we're so invested into it. But I think that that experience I had about Python being so easy to get into yet also so powerful is what makes me keep coming back to Python.
[00:04:02] Unknown:
And, Stu, do you remember how you first got introduced to Python? I do in particular because it was via pants.
[00:04:08] Unknown:
I was sort of a JVM devotee for a long time about 8 years ago. And pants at that time was a modern Python 2 code base, but had not started this transition to Python 3. So I was sort of a hesitant Python user, and I have, at this point, become a sort of a devoted Python user because of all the changes in the Python ecosystem and Python 3 and and types. And I think in many ways, there it's ended up being a superior platform. So really enjoying it.
[00:04:38] Unknown:
Can 1 of you start by giving a bit of an overview about what the Pants project is and some of the history behind it?
[00:04:44] Unknown:
Yeah. Sure. So as I alluded to earlier, pants got started at Twitter, but its history, something like 8 years ago and was open sourced, 6 or 7 years ago. But its history is really intertwined with the history of monorepos. And monorepos are sort of a 10 year arc of how to layout code changing as our understanding of these larger and larger projects. How larger and larger projects that Google and companies larger companies are developed. So as monorepos have have evolved or the understanding of monorepos has evolved, The need for monorepo build tooling has become more obvious.
And so pants started out as effectively a inspired by clone of Google's Blaze build system and has become its own sort of independent strong system in the intervening time.
[00:05:37] Unknown:
And before we go too much more into monorepos and Pants itself, can you give a bit of a story about how the name came about?
[00:05:44] Unknown:
Sure. Yeah. So the name actually originally stood for Python Ant. At the very, very beginning, there was a connection to Ant that we haven't had for a long time since then, but the name stuck as quirky as it is. And what also stuck was that through since the beginning, we've implemented pants through Python and had this focus on Python.
[00:06:06] Unknown:
Yeah. And I mentioned John Sarois. He he's said a few times that over his dead body, would we change the name? I agree. It's it's a fun name.
[00:06:14] Unknown:
Thinking a bit more into monorepos, you mentioned that some of the history comes about because of these large tech companies and the code bases that they have to manage. But I'm wondering if you can dig a bit more into what are the benefits of a monorepo and why might I decide that I want to actually organize my code in that structure versus just having individual repositories for each unit of code.
[00:06:37] Unknown:
These units of code, you used a great word because many people think of it as a project. You have a project and it has a name and it's a pet. And if you think of sort of the way the deploy philosophies now, you know, cattle versus pets. If you have a very large volume of code, you may not be very precisely naming each module per se, but you still want very fine grained control over those libraries with a minimum amount of a boiler plate. And so if you have many libraries and many deployed binaries, some of them will be named and some of them will be very first class, but you don't wanna have to pay this huge cost for each of them to necessarily build a brand and, you know, have your own set of custom scripts in this repository to manage it. So people use monorepos to remove a lot of that per unit of code boilerplate.
And 1 of the ways it removes boilerplate is you don't need to actually give things these globally unique names that can go to Pypie and won't collide with something. And in general, you don't have to version things. So if you're depending on a library with Phantom on a repo and it's 2 or 3 hops away, you do not have to bump versions for each of those those units of code in order to make an edit. And you certainly don't have to commit and you certainly don't have to publish to PyPI, you know, for each of those intervening libraries. And removing that cost makes it really cheap to have more fine grain unit units of code, and it sort of allows your code layout to be natural. You can lay it out based on what a good module boundary is rather than what you're willing to sort of, like, attach a particular brand to. And so it's particularly helpful to have a monorepo when you have lots of microservices or lots of notebooks from a data science perspective.
Because your notebooks may, you know, sort of build their own brands, but you have a lot of support code. And maybe you're just doing throwaway experiments. That's sort of naturally going to lead to a repository that I can experiment in without giving these precise names to all of my notebook repositories or something like that.
[00:08:51] Unknown:
For the case where I might be considering a monorepo and I don't know much about what's actually involved in getting it started, what are some of the challenges that I should be aware of before I actually go down the path of committing all of my code into 1 source code management system and handling the layout that way versus going the what has become the standard route in Python of having my libraries deployed as a Python package that I then pull down for a particular version and use in my various projects and have each different projects deployed independently as well and versioned in its own projects.
[00:09:25] Unknown:
I think monorepos require good tooling. And so the history of, like, monorepo build tools and monorepos is pretty intimately intertwined because it's hard to have 1 without the other. And so you need this good tooling. The reason you need the tooling is that if you don't have it, you end up manually maintaining sort of partitions of some larger code base. And the CI example is a pretty clear 1 where when you get to the point that you have thousands of tests, you're gonna find that you wanna run some of them on a different machine. You don't wanna run the ones that aren't affected by a code change.
And so you need tooling to sort of help you do that so that you don't have to lay out manually in your CI config the various partitions of your tests. These types of tools are designed to sort of make what you do for a single project roughly the same as what you do for a very large project. And so pants scales down to smaller projects, and I run, you know, pants test colon colon to test everything under some subdirectory. And then likewise, I can scale that up and run all of those tests or filter to just some tests, even if then I might wanna run on 1 machine.
The tooling enables that, the monorepo.
[00:10:40] Unknown:
The scale factor is something that I think is worth digging into as well because there are some cases where I need to use a monorepo because I have 500 engineers all working on various pieces of the code, and there are dependencies between them and intimate interlinking where I need to be able to see what are all of the different ways that this code change is going to affect all of the downstream dependencies, which can be difficult or impossible if you have them all in separate repos. But what is the smallest scale at which you think it makes sense to use something like pants for actually managing your build tooling versus just using setup dot py or poetry or, you know, whatever the tool du jour is for your de facto Python project?
[00:11:25] Unknown:
Right. That's a good question. You actually alluded to an important point that I missed, which is just that this fine grain dependency tracking between projects is critical to enable you to know which portion of the repository is relevant to you. And even if it's not a large team, if you just have enough code already written and and it's a small team, it might be relevant to not run all 100 tests, run just 1, and have the recipe cached. Pansa does as well as it can to minimize the amount of boilerplate that we add to your repository, and we try to get out of the way and allow the tools that we integrate to configure themselves. And so most of the time, you might end up with a config file for something like eyesort, for example, if you're configuring eyesort.
And so we add some small amount of boilerplate above and beyond that to say, isort is configured. Right? You need to tell pants that eyesore is available. We strive to make that boilerplate as light as possible, as thin as possible. And so the cost of adding paints to your repo should be really minimal, above and beyond, you know, the the configuration for the various tools you're using. And in many cases, you know, if tools have good defaults, you basically are toggling 1 value in your pants configuration file to say, use black because it has no config or effectively no config, and getting this smooth integration.
And so for even for a really small projects, pants should be straightforward to get started with and not add much overhead in your repository.
[00:12:58] Unknown:
You're totally right. There are a lot of great tools out there now, like Poetry and Talks that you can use when you have a smaller repository. 1 of the things that we've heard from users that they really like about pants is being able to have a uniform interface that you might be using in a modern Python project, MyPy, and using 2 or 3 lenders like Bandit, Flategate. You might be using Black and Isort all at the same time. And pants gives you a uniform interface that to run all of those lenders and all of those formatters. It's as simple as saying dot/pants, lint, and then the file that you wanna run. And pants will orchestrate all of those different linters and formatters that you have set up. It'll run them in parallel for you, cache the results rather than having to make 6 different calls to different tools sequentially.
[00:13:45] Unknown:
For somebody who's actually interested in using pants, can you dig a bit more into the workflow of what's involved in actually getting it set up and then actually using it for driving their development process and building the packages and running the tests and all of the different aspects of the software development life cycle?
[00:14:03] Unknown:
Sure. So a major goal with pants, as Stu was talking about, is that we have minimal boilerplate and also that it's easy to adopt pants incrementally. We know when you have a preexisting repository that it can be for some tools, it can be a high barrier to entry with adopting it, and we wanted it to be a goal that you can try out pants. I mean, a small portion of your code base, you can use it, for example, to only run linters and deal with packaging your code and publishing your code as a separate step.
[00:14:33] Unknown:
We have an example, Python repo that we've strived to reduce the config in to the absolute minimum. When you get started with VANCE, generally, you're just adding a list of tools that you'd like to use. And if you have no additional configuration for each of those tools such as, you know, black not having any configuration, you've basically just enabled that tool for the repository. And that now gets integrated smoothly into the pants format command. Right? And the pants format command will run as many formatters as you have in a consistent way sequentially so that they actually feed the output of 1 formatter into the other. When you're linting, it will run all those linters in parallel. And so just sort of enabling them, even if you don't further configure eyesort, for example, is really easy. So the example config file we have is 50 lines, but it could probably be something closer to about 20. And I think we are sort of continuously striving for an empty config file to be useful for you. And so, like, a lot of this can probably be further prudent.
[00:15:37] Unknown:
For somebody who has an existing project where they might already be using some of those linters or they might be using a particular process for building the project, how does pants integrate with that broader ecosystem of tools where you already mentioned things like auto formatters and linting, but in particular, the concept of dependency management and ensuring the proper sequence of steps for being able to build the subprojects within a monorepo, for instance.
[00:16:05] Unknown:
Yeah. So there are a couple of different integrations that we have out of the box for packaging your code with Pants. 1 of them is using what most repos are using with setup tools that pants will actually auto generate your setup dot pie for you and then be able to run commands like wheel to create the asset that you wanna deploy to IPI. We find for a lot of users, it's really useful that Pants will auto generate that file because Pants has this understanding of all of your projects' dependencies that you can both depend on external projects, and you can also depend on your own source code. And pants will see if you have multiple different distributions or multiple different projects in the same monorepo.
Pants will auto generate in that setup dotpy that this project depends on that project and do all that hard work with that mapping for you. We also support generating in the PEX format, which stands for Python executable, bundles up all of your code into a single zipped file that you can simply run on the command line to start your app, has everything you need, and we support creating AWS Lambda's. And there are a lot of other formats out there like py installer that we haven't added yet, but we're really interested in adding, and it's something that our plug in API is designed so that it's easy to write plug ins to hook up to whatever format you're using to package your code.
[00:17:28] Unknown:
Right. As Eric mentioned, we have a really powerful plug in API. And so the thing that enables the integration with with each of Pance's goals, like format my code, lint my code, package my code, is this very powerful plug in API. And so if we don't already support a Python tool that somebody is used to, we can definitely help people get it integrated, and it has we have great docs for those plugins as well.
[00:17:51] Unknown:
Digging more into pants itself, I know that there has been some fairly significant changes, particularly recently with the move to version 2. But can you just dig a bit into how Pants is built itself and how you've structured the code and just maybe some of the ways that the overall design and goals of the project has evolved since it was first created?
[00:18:11] Unknown:
In Pants v 2, plugins are developed as pure coroutines or async functions called rules. And this is a huge improvement from v 1 because it means that it's very, very challenging to, get things incorrect. You can't forget to indicate that a rule depends on some other code because the dependencies of rules are actually sort of dependency injected by the Pants v 2 plug in system. And it's important that we know what dependencies the rule code has because it's critical to sort of make sure that we know that we've gotten your dependencies correct. And when the build tool needs to rerun, it needs incremental, correct incremental information to know what's changed since the last time they ran.
So v 2 with rules as the core of its plug in API is dramatically faster than v 1 and is able to do more work incrementally. It uses a a daemon to keep the outputs of rules that haven't changed warm. And so rerunning pants after having run it before with a minor edit now takes, you know, less than 200 milliseconds to do the actual substantial work. We have more sort of improvements planned.
[00:19:31] Unknown:
So the main way that pants is implemented, we use the same API if you're a plugin author or you're writing core pants so that we break down each build step into a couple of small composable parts. If you think about running a tool like Black, for example, there are a couple steps to it. We first need to determine what files that you wanna run on, and we need to download the actual tool and if you have any plug ins. And then we need to run the actual subprocess where we've downloaded black. We have all the files, and now we go let black run like you would normally invoke it on the command line. Each of those steps is modeled through Python, through using type hints and using pure functions, which are async functions that we call rules.
And each rule or each function says what it needs to run and then what it's going to give back. So if we talk about downloading black, for example, we need to know the version of black that we should download. The function will say that I need this information, and then it will give back that downloaded black file that we just have or binary that we just got. Because we model each step as a function, we're able to automatically get things like caching that we can see if the input to this step was the exact same as before, then we can use the cache.
But if any part of the input changed, then we need to rerun this step and so on.
[00:21:05] Unknown:
And 1 of the fundamental differences between v 1 and v 2 is that for that list of inputs, we have moved from an sort of an include based fingerprinting model where you have to explicitly list to the things you depend on, and you might get that wrong, to an exclude based model where everything is included by default. We we know what dependencies you used in your rule, and it's impossible to miss 1. But then you would need to do extra work to remove something from a fingerprint, and this allows the, rule fingerprinting to be much more accurate and your cache keys to be more accurate.
[00:21:42] Unknown:
And my understanding too is that with the move to version 2, you've done a fairly major overhaul of the source code of the project, and you're using Rust for sort of the core engine with Python as the layer on top of that to handle all of the plugins and the interoperability. I'm wondering if you can talk through the motivation for that major shift in how the project was built and some of the down stream impacts that you think that that's going to have on the future of the project?
[00:22:12] Unknown:
So the Rust engine has enabled us to execute the scheduling of all those pure coroutines while minimizing the amount of time that we're actually holding the GIL. And so a large amount of pants's IO and scheduling, downloading of files, sort of snapshotting of inputs is all accomplished in native code. And so we spend relatively little time with the GIL acquired aside from or plugins rule code is running. And that's clearly the right choice for us because Python is just such a great language to be writing plug ins in. We found that Rust is a great addition to Python in the sense that you can provide a great native Python API with a safe concurrent fast core.
[00:22:58] Unknown:
Another aspect of this switch to the version 2 release is that you've actually decided to drop support for the large variety of languages and build systems that you had in the previous version. And I'm wondering why that was an acceptable trade off and what your thoughts are on the future of pants in terms of the broader ecosystem of working with different languages and monorepos.
[00:23:24] Unknown:
So moving to v2, yeah, we made the hard decision to drop support for some languages. And as we mentioned, it is a sort of a fundamentally new tool. We felt that that was the right decision just because some of those languages are reasonably well served by other tools right now, and Python represented a really underserved use case. Pants v 2 is fully generic, and so we expect to add back support for all of those languages in the very near term. And And in fact, we'd love people's feedback on which which languages they'd like back first. But we this focus on Python, we think, has enabled us to better serve those users in v 2, and we think sort of the results speak for themselves.
[00:24:07] Unknown:
Python has become the default language for working with data, whether as a data scientist, data engineer, data analyst, or machine learning engineer. SpringBoard has launched their school of data to help you get a career in the field through a comprehensive set of programs that are 100% online and tailored to fit your busy schedule. With a network of expert mentors who are available to coach you during weekly 1 on 1 video calls, a tuition back guarantee that means you don't pay until you get a job, resume preparation, and interview assistance. There's no reason to wait.
Springboard is offering up to 20 scholarships of $500 towards the tuition cost exclusively to listeners of this show. Go to python podcast.com/springboard today to learn more and give your career a boost to the next level. And digging more into the specifics of Python for people who are looking to lay out their project as a mono repo or maybe combine multiple different repositories into 1 for being able to take advantage of the code sharing and discoverability aspects. What have you found to be some useful strategies for determining how to structure those repositories to be able to lay them out for effectively having shared modules and libraries and being able to narrowly scope the 3rd party dependencies so that you don't end up having to pull in 15 other packages for 1 library just so that you can actually use 1 aspect of that code base?
[00:25:37] Unknown:
So 1 of the main goals with PANS that we're talking about being able to adopt easily when you have an existing repository. So we don't want you to have to and you should never have to rearchitect or change how your entire project is set up. There are hundreds of different ways that people set up their monorepo, and pants is designed to be flexible that whether you have multiple different top level folders or everything lives under 1 folder, whether you have test living right next to their source files or an entirely separate part of your project, pants can work with all of those different 1 of
[00:26:18] Unknown:
the things that 1 of the things that enables that flexibility is minimal boilerplate. Dependency inference is a feature in Pants that it that is really important because it enables you to have very precise information about what code depends on what other code without needing to sort of repeat all of that in build files. And so the comparison here is if you're not familiar with a build file, if you had these units of code and you had, you know, hundreds of units of code of libraries or whatever unit in 1 repository, if you needed to track precisely the dependencies between all of them, You would have import statements that would indicate that you use a particular library, but then you would also, in your build metadata, maybe your setup pie or in this case, a build file, need to repeat your import statements and say, oh, well, I depend on this version, this this named this version of this named library. And dependency inference removes the need to do that because pants is aware of your import statements, and it actually parses import statements to determine what other libraries code depends on. And this is really important because it allows for this very, very precise information that sort of can't go out of date. Right? Assuming you have linters enabled, which which are trivial to enable, you can't have an unused import statement.
And this unused import statement would represent a dependency that went in your binary that you didn't need. If you don't have unused imports and your dependency information is precise because it matches your import statements, your binaries end up with exactly the things that are required. They're minimal. They're very accurate. And having these accurate tiny binaries also means that cache keys are tiny. Pants is striving to cache and carefully invalidate all of the work that it does to save you time. And if you have, you know, hundreds of tests, thousands of tests, we're actually catching those tests as well to avoid having to rerun them. So this fine grain dependency tracking allows for these smaller binaries.
It allows for more cache hits, and it doesn't require sort of the boilerplate that you might expect where you you might have to repeat yourself, repeat the information that's in your import statements. So we're we're really happy with how dependency inferences come out.
[00:28:40] Unknown:
And digging further into that because that's 1 of the things that I'm personally very interested in is how you would approach defining the dependencies for the repository level. And for instance, I've seen a number of projects that may be using distutils or setup tools where they have the setup dot py or poetry where they have the pyproject dot TOML, and they need to have multiple of those defined at different layers of the project where I have a setup dotpy for this subpackage, and then I have a setup dotpy for another subpackage, but then I have a requirements dot text at the top layer to be able to do development of the repository at large.
Or, you know, with poetry, I have a pyproject.toml that installs all the dependencies for everything in the project, or I have to have 1 that scopes the dependencies to the specific subpackage. How would I actually go about handling that type of situation with pants and being able to have all the dependencies installed in my virtual end for development, but then being able to only include the actual third party packages that I need in the built project.
[00:29:46] Unknown:
The easiest case in that type of environment is where everything is actually a subset. So if all of the subprojects or all of the projects libraries within 1 repository actually use a subset of some larger consistent set of libraries. It's very easy to sort of say this binary depends on the subset of the larger set of libraries. And then what how pants operates is that you have perhaps a single requirements dot TXT, but each binary might only use a small portion of it. Pants also supports though having different binaries or libraries with different sort of non overlapping, non subset requirements.
And so you could have a binary that has a specialized version of NumPy versus some other binary, and that works fine. The only thing that you then have to do is be specific about which portion of the repository you're operating on. So for example, if you're then trying to load an IDE, the version of NumPy that is in use for a particular portion of the code is different than it might be from another portion of the code. If it's a pure subset, it's very straightforward because we're taking just the subset of of the larger set of requirements. But we also support cases where they have different requirements.
[00:31:09] Unknown:
When you brought up a great case about the problem with a lot of Python projects where you have that duplication of your requirements in your setup dotpy file and also maintaining those same dependencies in your requirements dot text. That is a specific problem that we solve by that pants will auto generate your setup dot py for you based off of looking at what your project's dependencies are. So you only need to define your requirements once in that requirements dot text. Your entire project can use that. And then like Stu was talking about, we take the subset that we see this project is using a subset of those global requirements,
[00:31:48] Unknown:
and we'll generate the setup dot py file for you with exactly what you need. Yeah. That definitely solves a big problem that I've seen where I might have a project that has 2 distinct modules that are each doing their own thing. And then I have a library folder that has some helper elements that I might use in 1 of the subprojects and I might use in the other, and there's some subset of those that are shared. And the library module has a specific set of dependencies that I pull in where maybe 1 of the modules within that library is handling MySQL connections. So I need to have the PyMySQL package installed, and then, you know, another 1 might require interacting with MongoDB versus the other 1 that uses Redis. And, you know, the disjoint aspect is Mongo and Redis. So package a gets the MySQL dependency and the Mongo dependency. Package b gets the MySQL dependency and the Redis dependency. And it sounds that sounds like I just have 1 requirements dot text or pyproject.toml at the top level that installs all of those for my development environment. But then when pants goes to actually build those projects, it just uses those subset of the modules for eat packages a and b.
[00:32:57] Unknown:
Exactly. And if you're not happy with 1 of those requirements from the global superset, then you can swap that out and say, I don't wanna use this normal version, but I instead wanna use maybe this version that uses CUDA and GPU for only this project.
[00:33:14] Unknown:
Yeah. It's definitely very helpful. And so for people who are starting to adopt Pants and they might be using it in their projects, are there any potential edge cases or common points of confusion that you've seen that you think people should be aware of?
[00:33:29] Unknown:
Yeah. So when people begin to adopt pants, we're enabling a lot of tooling that people might have been invoking manually. So for instance, if you were doing Cogen manually, you were generating gRPC stubs that you were then going to load. You might have had scripts for that. You might not have. But you then need to tell your IDE that those are a thing. So we enable more sets of tooling than people might have been been used to using, but we wouldn't recommend actually checking in the gRPC code, right, into your repository. The build tool sort of manages this generated code and make sure that it's accurate, but you then need to integrate with other tools that sort of consume your build. So we've done a lot of work, and we're we will do more work to make sure that Pants smoothly integrates with things like IDEs.
But sometimes people have bumps on their way to getting a good IDE experience.
[00:34:21] Unknown:
For people who are using pants, what are some of the most interesting or unexpected or innovative ways that you've seen it used?
[00:34:28] Unknown:
I think the most exciting thing for me to see is the plugins that people have been writing. We only finished writing our plugin docs about a month ago, and already we've had a couple of different really cool plugins written by people within relatively quick time that we had a user who showed up on Slack and within 2 days wrote a Python plugin. We had another who got Jupyter working for them, which we're planning on adding first class support for soon, But they're able to get it working with their specific projects requirements. It's had a couple users write Docker plugins that leverage a lot of benefits of pants, like, running in parallel.
And we've had people also build on top of the what we're talking about, that pants will auto generate that setup pie for you. And we have a hook that you can add your own plug in logic to generating that setup pie. So we've had some users who are using git submodules to come up dynamically with what their packages version is. Couple of people are reading from files, for example, to determine that information. So that's been really exciting and validating for us that a major motivation for this v 2 project was to make it much easier to write a plug in if you want to. A big insight that we had is that there are so many different workflows and builds that people have that no matter how hard we try to make that core pants experience work out of the box, we will always have users who have a certain workflow that we're not able to reach, but the way we reach them is by having that very powerful plug in API written in Python.
[00:36:06] Unknown:
And in your own work building and rebuilding and promoting the pants project, what are some of the most interesting or unexpected or challenging lessons that you've learned in that process?
[00:36:17] Unknown:
For me, I think it's been a constant reminder working on this VTube project with Pants of how important it is to not only understand your own project and the pants tool, but also to have a deep understanding of the way that Python works and Python's tooling works. We spend a lot of time, for example, iterating on init dotpy files that as many of y'all have probably seen, init dotpy file is pretty complex in Python that there's 2 purposes, first is to indicate what is a module or what is a package if you're not using PEP 420. And then the second is that sometimes people put content in their nint.py file.
Went through a lot of iterations to make sure that pants is always behaving like Python does in the real world, that we will always include your init dotpy file, not only for the current directory, but also for all of your ancestors and make sure that we properly pull in, if you have content in your init.py file, we know that even if you're not importing that file, we need to pull in all of those dependencies of any of those ancestors. Because you might not have an import statement like we normally use, but if you have code, then those imports need to work and you have a dependency. So that's, I think, been the most interesting for me is when we think we have a solid understanding of Python, finding that we need to read that pep a little bit more closely to really truly understand the ecosystem.
[00:37:46] Unknown:
And I would say that it's probably not unexpected. It's a classic issue, but making something that's powerful and performant, but then also really easy to use is just sort of a constant struggle. Like, what is worth putting in a plugin developer's face because it's critical for them to understand versus hiding? I think we've hit a pretty good a good spot. I can certainly go back and look at earlier versions of the rule API from a year or 2 ago before we were using coroutines. And there was this whole declarative syntax that you use to declare what your dependencies were, and it was awkward. It was horrible. We've ended up with something that's massively better and very natural. And a lot of that's sank thanks to, you know, type hints in Python 3, a little bit of introspection.
And I think the API now is at a really sweet spot, but it's been a bumpy road.
[00:38:38] Unknown:
For people who are considering using Pants or they're just getting started and trying to figure out what build tools and what development cycles they wanna have for their projects, what are the cases where Pants is the wrong choice?
[00:38:51] Unknown:
We strive to minimize the amount of boilerplate necessary to get started with pants. And so we would hope that even really small projects that don't really anticipate having multiple, you know, deploy binaries or multiple published libraries within 1 repository might still be a good fit for pants. And we will always sort of keep that in mind to make sure that the single project repository, we still add value for without, you know, too much of a boiler plate cost. But it is the case that we add some amount of configuration above what you might use.
So as I said, we'll we'll continue to strive to make sure that our overhead in terms of configuration is low. We'll continue to strive to make sure that our our overhead in terms of runtime performance is low and that we're we're really adding value even for those tiny repositories. But it's certainly the case that the more code and the more deploy binaries or libraries you have in a repository, the more value we're gonna add for you. And, you know, the obviously, the concurrency and caching is great from a a latency perspective as you're just iterating on a small project to reformat or lint your code or type check and test it.
[00:40:00] Unknown:
And the deploy binary piece actually reminds me of another thing that I forgot to ask about is when you're working on the project, there's the matter of dependencies and linting and maintaining code quality. But for the case of when you actually have something that you want to be able to push to production or possibly to PIPI, but particularly when you want to have a deployable binary, What are some of the useful approaches for that, and how does pants help to support that where I know that out of the box, it has strong support for things like PEX, but then there are also projects like Shiv or the new PyOxidizer project and just some of the considerations that developers might wanna look at for that?
[00:40:43] Unknown:
Yeah. So Pex has a long history of integration with Pex. But particularly in v 2, we are fully pluggable. So there's no reason really for PEX to be the primary way that you're building deployed binaries. We don't have support for Shiv or, PyOxidizer yet. But as a Rust project, we would love to work with somebody to integrate PyOxidizer or perhaps do it ourselves. There's a distinct possibility that pants will be deployed using, itself using PyOxidizer at some point in the future, given that we have 35% of our code in Rust. And so we'd love to help integrate those.
The v 2 plug in API sort of makes it a level playing field. We don't have massively sort of a particular favorite in mind when it comes to deploys.
[00:41:29] Unknown:
And in terms of other plans that you have for the future or new capabilities or contributions that would be helpful to bring you forward, what are some of the overall plans that you have for the project?
[00:41:42] Unknown:
We spent the past year, I'd say, really focusing on getting out this 2 0 release, which represents the first stable release with this v 2 engine and v 2 rewrite. So now we are turning our attention and asking the community for where they would like us to focus next. There are a lot of different ideas that we have and that users are submitting, including support for new languages like Java and Go, but also supporting more robustly different Python workflows, such as better integration with Jupyter. We were just talking about some different deploy binaries that we would like to support.
[00:42:20] Unknown:
So 1 other issue that's close to my heart is actually improving that case you were mentioning earlier where you have heterogeneous dependencies. Right? Not every project within the repository is just a subset of some global requirements file. And we already support that, but it's really important to sort of have the same rigor of being able to generate lock files like you can if you have homogeneous dependencies. But what you have to then do without, you know, requiring that people have a huge amount of boiler plate of or lock file per library would be to allow for a lock file per actual distinct resolve that you might want.
And so a binary that that definitely needs a particular version, if you know the MySQL connection library or NumPy can specify that, and that triggers this need for that additional lock file. If you're homogeneous across the whole repository, you only need 1 lock file if you have these mixed requirements, you need more. So we'd like to support that even more rigorously than we already do. We have been built in v 2 to support remote execution and caching, all of the dependency tracking and file fingerprinting that we do supports these quick low latency builds, but it also supports the cases where you have many many hundreds of tests or thousands of tests or native dependencies that need to be built.
And doing that, particularly on a larger team, it's helpful to have the ability to run those tests or those builds on a cluster. So Pants already has support for remote execution and remote caching, but we would like to make it even easier to use in particular cross cross platform. So if you're on an OSX laptop, being able to run the portions of the build on Linux on a remote Linux cluster, for example, Or if you're doing machine learning to be able to take advantage of the cluster that has particular CPU instructions that you don't have locally. So we have a lot of ideas in the idea in this general area.
[00:44:21] Unknown:
Are there any other aspects of the overall space of monorepos and working within them and the benefits or drawbacks and the specifics of pants and how it enables all these different workflows that we didn't discuss yet that you'd like to cover before we close out the show?
[00:44:36] Unknown:
I would say 1 other interesting aspect of Monterey Bose is just the ability to sort of bisect and get bisect perspective to a precise change that might have broken you. And that that helps more when you have larger teams than smaller teams, but it's built around the fact that you're not Git committing in a bunch of different repositories. You're sort of able to interact with 1. You can create a branch that changes 10 different libraries all at once and commit it atomically. And that means that the bills of master are sort of always green, and you get this sort of continuous top to bottom integration testing, which is really beneficial. The default in a monorepo, assuming you, you know, run all of the tests or run all the tests that are changed by a particular commit, is that you've tested top to bottom and that commit you know is is trustworthy. Right? You don't have to integrate or have some particular explicit integration step between libraries or binaries.
And I think this is really powerful.
[00:45:38] Unknown:
And I mentioned at the start that I first got involved in Pants working on its Python 3 migration and also working on Foursquare's migration. That to me the reason I got so interested in Pants beyond the community is realizing how a build tool like pants can be useful when you're doing huge projects like an incremental migration. I'm really excited that we're adding some new functionality in the next week and a blog post with it about ways that pants can give you insight on your project for your migrations that you can see, for example, what is your most used code so that you know what to prioritize.
And you can incrementally keep track of that part of your code base might still be only Python 2 and part of it might be Python 3 only, and that works with the tool. That doesn't need to be some consistent thing. 1 of the really cool features we added recently into pants is that we will run your linters and your test using the appropriate interpreter for that code. So if you have some Python 2 only code and you have some Python 3 only code, because we run each test as a separate process and it runs in parallel, that's perfectly okay. That will run those Python 2 tests using a Python 2 interpreter, and Python 3 one's using a Python 3 interpreter.
And what that unlocks is an incremental migration.
[00:46:56] Unknown:
Right. And you get all the same parallelism and caching across this sort of heterogeneous repository.
[00:47:02] Unknown:
And that also brings up the other aspect of monorepos or even regular project repos that might have multiple different languages with maybe the most common case being maybe a Django project where you also have JavaScript that you wanna be able to bundle and deploy to s 3, for instance. Or if you have some c plugin for your Python project that needs to be built before you can actually then use it within the rest of your project. So having pants being able to handle all those different aspects, I can see as being very valuable.
[00:47:31] Unknown:
Absolutely. And I think that's a definite advantage of monorepos even if they're small. But as they get larger, you will have multiple languages involved. And if you're a Django project, you immediately, you will. And less so in that case, but more so with cross language integrations. The fact that you don't need to version and publish removes this sort of n by m problem where I don't have to have my JavaScript package manager understand PyPy version numbers in order to have JavaScript consume Python or or vice versa. So that's really powerful and I wanted to sort of like, particularly if you were to have your native extensions in this repository, I don't need to know how to to package them in a in a way that is versioned and publishable to PyPI. Although I can, I don't have to maintain that boilerplate to then to then consume it? And that's really powerful. So we're we're excited for more language support to be added.
[00:48:27] Unknown:
For anybody who wants to get in touch with either of you or the rest of the pants team and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And with that, I'll move us into the picks. And this week, I'm going to choose the TV show cursed that I watched recently on Netflix. There's 1 season of it out so far, but it's just a very interesting and well done reimagining of the Arthurian legends and takes place sort of before king Arthur comes into power and the story around the sword. So it's just a really well done story. I had a lot of fun watching that, and I look forward to future seasons there. And so with that, I'll pass it to you, Eric. Do you have any picks this week?
[00:49:03] Unknown:
Yeah. I think for me, it goes back to when I was teaching the 12 13 year olds. The way that I taught them was using Tracy the turtle, which is a built in module in Python called turtle or turtle graphics. It's a really cool little DSL if you haven't used it yet where you tell this little turtle some commands, like move forward 40 pixels or turn left 90 degrees. You're able to come up with really awesome different designs that a lot of students come up with things like Darth Vader designs, volcanoes. The reason I'm bringing up is if you're interested in showing pry programming in Python to anyone, especially young people in your life, I think Tracy the turtle and turtle graphics are an awesome way to get them introduced.
[00:49:46] Unknown:
Yeah. Definitely, 2nd turtle graphics is a great way to just have a quick easy win for somebody who's just learning programming. So definitely recommend that. And so, Stu, what do you have for this week?
[00:49:57] Unknown:
I've really appreciated recently some writing by a blogger. His domain is faster than lime, and his name is Amos. But he he writes great in-depth posts that explain systems programming sort of top to bottom. And he does some really interesting things and sort of goes way deeper than you would expect. Right? If you ignore the scroll bar and don't pay attention to the scroll bar, you'll think that he's not going to go deeper, but then he'll go deeper. You're you're a level a level lower. You've gone further into the rabbit hole. And so he writes these really fun blog posts that are very informative, so I've appreciated that.
[00:50:33] Unknown:
Well, thank you both very much for taking the time today to join me and discuss the work that you've been doing with pants. It's a very interesting project and 1 that I've been keeping an eye on for a while and plan to start using for some of my own work soon. So thank you both for all the time and energy you've put into that and the recent reimagining of it, and I hope you enjoy the rest of your day.
[00:50:51] Unknown:
Thank you. Thank you so much. I look forward to helping out.
[00:50:57] Unknown:
Thank you for listening. Don't forget to check out our other show, the Podcast at dataengineeringpodcast.com for the latest on modern data management. And visit the site at python podcast dotcom to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email host at podcastinitdot com with your story. To help other people find the show, please leave a review on Itunes and tell your friends and coworkers.
Introduction and Sponsor Messages
Interview with Eric Arellano and Stu Hood
Eric's Introduction to Python
Stu's Introduction to Python
Overview of the Pants Project
Benefits of Monorepos
Challenges of Monorepos
Using Pants for Build Tooling
Workflow for Setting Up Pants
Integration with Existing Projects
Pants v2 and Plugin API
Building Pants and Design Evolution
Rust Engine and Python Integration
Dropping Support for Other Languages
Structuring Monorepos in Python
Defining Dependencies and Managing Subprojects
Adopting Pants and Common Issues
Innovative Uses of Pants
Lessons Learned from Building Pants
When Pants is the Wrong Choice
Deploying with Pants
Future Plans for Pants
Benefits of Monorepos
Multi-language Support in Monorepos
Contact Information and Picks