Pants Has Got Your Python Monorepo Covered

Hello, and welcome to Podcast Dot in It, the podcast about Python and the people who make it great.

When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode.

With the launch of their managed Kubernetes platform, it's easy to get started with the next generation of deployment and scaling powered by the battle tested Linode platform, including simple pricing, node balancers, 40 gigabit networking,

dedicated CPU and GPU instances, and worldwide data centers.

Go to python podcast.com/linode,

that's l I n o d e, today and get a $100 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show.

Feature flagging is a simple concept that enables you to ship faster, test in production, and to do easy rollbacks without redeploying your code.

Teams using feature flags release new software with less risk and release more often.

ConfigCat is a feature flag service that lets you easily add flags to your Python code and 9 other platforms.

By adopting ConfigCat, you and your manager can track and toggle your feature flags from their visual dashboard without redeploying any code or configuration,

including granular targeting rules.

You can roll out new features to a subset of your users for beta testing or canary deployments. And with their simple API, clear documentation, and pricing that is independent of your team size, you can get your first feature flags added in minutes without breaking the bank.

Go to python podcast dotcom/configcat

today to get 35%

off any paid plan with code Python podcast or try out their free forever plan. Your host as usual is Tobias Macy. And today, I'm interviewing Eric Arellano and Stu Hood about pants, a flexible build system that works well with monorepos.

So, Eric, can you start by introducing yourself?

Sure. My name is Eric Arellano.

First got involved with the pants project about 3 years ago as a summer intern at Foursquare,

and my project that summer was to lead pants's Python 3 migration.

I fell in love with the community and kept working on pants as an open source contributor after that.

And then got offered a job to work for Twitter, working on pants, and

now work on pants for most of my time as a software engineer at a start up called Toolchain.

And, Stu, how about yourself?

Yeah. I'm Stu Hood.

I got started working on pants at Twitter

many years ago.

As we'll get into later, the the pants project has sort of a long history that I've been involved with for a little while now. Other folks not on the call are Benjie and John Sarris and

are they've been fundamental in getting the project started.

But in

the past few months, I also have ended up at ToolChain, which is a company doing some some great work in this space.

And, Eric, going back to you, do you remember how you first got introduced to Python?

Yeah. It was actually,

earlier internship that I had my freshman year of summer. The first language I learned was Java, and I remember

learning Python for that internship, being amazed at how much easier programming could be. My love for Python has really continued since then. Actually,

in between

graduating and working for toolchain, I was a middle school computer science teacher for a couple months,

teaching intro Python to 12 13 year olds.

And it was a really cool experience being able to teach them explaining that

the same thing that they were using

was the same language used to create Instagram.

That was really empowering for a lot of the students to see how powerful the language is. So we'll talk a little bit later today about

why we use

Python for pants and why we're so invested into it. But I think that that experience I had about Python being so easy to get into yet also so powerful is what makes me keep coming back to Python.

And, Stu, do you remember how you first got introduced to Python? I do in particular because it was via pants.

I was sort of a JVM devotee for a long time

about 8 years ago. And pants at that time was a modern Python 2 code base, but had not started this transition to Python 3.

So I was sort of a hesitant Python user,

and I have, at this point, become a sort of a devoted Python user because of all the changes in the Python ecosystem and Python 3 and and types. And I think in many ways, there it's ended up being a superior platform.

So really enjoying it.

Can 1 of you start by giving a bit of an overview about what the Pants project is and some of the history behind it?

Yeah. Sure.

So as I alluded to earlier, pants got started at Twitter,

but its history,

something like 8 years ago and was open sourced,

6 or 7 years ago. But its history is really intertwined with the history of monorepos.

And monorepos are sort of a 10 year

arc

of how to layout code changing as our understanding of these larger and larger projects.

How larger and larger projects that Google and companies larger companies are developed.

So as monorepos have have evolved

or the understanding of monorepos has evolved,

The need for monorepo build tooling has become more obvious.

And so pants started out as effectively a

inspired by clone of Google's Blaze build system

and has become its own sort of independent strong system in the intervening time.

And before we go too much more into monorepos

and Pants itself, can you give a bit of a story about how the name came about?

Sure. Yeah. So the name actually originally stood for Python Ant. At the very, very beginning, there was a connection to Ant that we haven't had for a long time since then, but the name stuck as quirky as it is. And what also stuck was that

through since the beginning, we've implemented pants through Python and had this focus on Python.

Yeah. And I mentioned John Sarois. He he's said a few times that over his dead body, would we change the name? I agree. It's

it's a fun name.

Thinking a bit more into monorepos,

you mentioned that some of the history comes about because of these large tech companies and the code bases that they have to manage. But I'm wondering if you can dig a bit more into what are the benefits of a monorepo and why might I decide that I want to actually organize my code in that structure versus just having individual repositories

for each unit of code.

These units of code, you used a great word because many people think of it as a project.

You have a project and it has a name

and it's a pet. And if you think of sort of the way the deploy philosophies now, you know, cattle versus pets.

If you have a very large volume of code, you may not be very precisely naming

each module per se, but you still want very fine grained

control over those libraries

with a minimum amount of a boiler plate. And so if you have many libraries and many deployed binaries,

some of them will be named and some of them will be very first class, but you don't wanna have to pay this huge cost

for each of them

to necessarily build a brand

and, you know, have your own set of custom scripts in this repository to manage it. So people use monorepos

to remove a lot of that per

unit of code boilerplate.

And 1 of the ways it removes boilerplate is you don't need to actually give things these globally unique names

that can go to Pypie and won't collide with something.

And in general, you don't have to version things.

So if you're depending on a library with Phantom on a repo and it's 2 or 3 hops away,

you do not have

to bump versions for each of those

those units of code in order to make an edit. And you certainly don't have to commit

and you certainly don't have to publish to PyPI, you know, for each of those intervening libraries.

And removing that cost makes it really cheap to have more

fine grain unit units of code, and it sort of allows

your code

layout to be

natural. You can lay it out based on what a good module boundary is

rather than what you're willing to sort of, like, attach a particular brand to. And so it's particularly helpful to have a monorepo

when you have lots of microservices or lots of notebooks

from a data science perspective.

Because your notebooks may, you know, sort of build their own brands, but you have a lot of support code.

And maybe you're just doing throwaway experiments.

That's sort of naturally going to lead to a repository that I can experiment in without giving these precise names to all of my notebook

repositories or something like that.

For the case where I might be considering a monorepo and I don't know much about what's actually involved in getting it started, what are some of the challenges that I should be aware of before I actually go down the path of committing all of my code into 1 source code management system

and handling the layout that way versus going the what has become the standard route in Python of having my libraries deployed as a Python package that I then pull down for a particular version and use in my various projects and have each different projects deployed independently as well and versioned in its own projects.

I think monorepos require good tooling. And so the history of, like, monorepo build tools and monorepos is pretty intimately intertwined because it's hard to have 1 without the other. And so you need this good tooling. The reason you need the tooling is that if you don't

have it, you end up manually maintaining sort of partitions

of some larger code

base. And the CI example is a pretty clear 1 where when you get to the point that you have thousands of tests,

you're gonna find that you wanna run some of them on a different machine.

You don't wanna run the ones that aren't affected by a code change.

And so

you need tooling to sort of help you do that so that you don't have to lay out manually in your CI config

the various partitions of your tests.

These types of tools are designed to sort of make

what you do for a single project

roughly the same as what you do for a very large project. And so

pants scales down to smaller projects, and I run, you know, pants test colon colon

to test everything under some subdirectory.

And then likewise, I can scale that up and run all of those tests or filter to just some tests,

even if then I might wanna run on 1 machine.

The tooling enables that, the monorepo.

The scale factor is something that I think is worth digging into as well

because there are some cases where I need to use a monorepo because I have 500 engineers all working on various pieces of the code, and there are dependencies between them and intimate interlinking

where I need to be able to see what are all of the different ways that this code change is going to affect all of the downstream dependencies,

which can be difficult or impossible if you have them all in separate repos.

But what is the

smallest

scale at which you think it makes sense to use something like pants for actually managing your build tooling versus just using setup dot py or poetry or,

you know, whatever the tool du jour is for your de facto Python project?

Right. That's a good question. You actually alluded to an important point that I missed, which is just that this fine grain dependency tracking between projects is critical to enable

you to know which portion of the repository is relevant to you.

And even if it's not a large team, if you just have

enough code already written and and it's a small team, it might be relevant to not run all 100 tests, run just 1, and have the recipe cached.

Pansa does as well as it can to

minimize the amount of boilerplate that we add to your repository,

and we try to get out of the way and allow the tools that we integrate

to configure themselves. And so

most of the time, you might end up with a config file for something like eyesort, for example, if you're configuring eyesort.

And so we add some small amount of boilerplate above and beyond that to say,

isort is configured. Right? You need to tell pants that eyesore is available.

We strive to make that boilerplate as light as possible, as thin as possible.

And so the cost of adding paints to your repo should be really minimal,

above and beyond, you know, the the configuration for the various tools you're using.

And in many cases, you know, if tools have good defaults, you basically are toggling

1 value in your pants configuration

file to say, use black because it has no config or effectively no config,

and getting this smooth integration.

And so for even for a really small projects, pants should be straightforward to get started with and not add much overhead in your repository.

You're totally right. There are a lot of great tools out there now, like Poetry and Talks that you can use when you have a smaller repository.

1 of the things that we've heard from users that they really like about pants is being able to have a uniform interface

that you might be using in a modern Python project, MyPy, and using 2 or 3 lenders like Bandit, Flategate. You might be using Black and Isort all at the same time.

And pants gives you a uniform interface that to run all of those lenders and all of those formatters.

It's as simple as saying

dot/pants,

lint, and then the file that you wanna run. And pants will orchestrate all of those different linters and formatters that you have set up. It'll run them in parallel for you, cache the results

rather than having to make 6 different calls to different tools sequentially.

For somebody who's actually interested in using pants, can you dig a bit more into the workflow of what's involved in actually getting it set up and then actually using it for driving their development process and building the packages and running the tests and all of the different aspects of the software development life cycle?

Sure. So a major goal with pants, as Stu was talking about, is that we have minimal boilerplate and also that it's easy to adopt pants incrementally.

We know when you have a preexisting

repository that it can be for some tools,

it can be a high barrier to entry with adopting it, and we wanted it to be a goal that you can try out pants.

I mean, a small portion of your code base, you can use it, for example, to only run linters and

deal with packaging your code and publishing your code as a separate step.

We have an example,

Python repo that we've strived to reduce the config in to the absolute minimum.

When you get started with VANCE, generally, you're just adding a list of tools that you'd like to use.

And if you have no additional configuration for each of those tools such as, you know, black not having any configuration,

you've basically just enabled that tool for the repository.

And that now gets integrated smoothly into the pants format command. Right? And the pants format command will run as many formatters as you have in a consistent way

sequentially so that they actually feed the output of 1 formatter into the other. When you're linting, it will run all those linters in parallel. And so just sort of enabling them, even if you don't further configure eyesort, for example,

is really easy. So the example config file we have is

50 lines, but it could probably be something closer to about 20. And I think we are sort of continuously striving for an empty config file to be useful for you. And so, like, a lot of this can probably be further prudent.

For somebody who has an existing project where they might already be using some of those linters or they might be using a particular process for building the project, how does pants integrate with that broader ecosystem of tools where you already mentioned things like auto formatters and linting, but in particular, the concept of dependency management

and ensuring the proper sequence of steps for being able to

build the subprojects within a monorepo, for instance.

Yeah. So there are a couple of different integrations that we have out of the box for packaging your code

with Pants.

1 of them is using what most repos are using with setup tools that pants will actually auto generate your setup dot pie for you and then be able to run commands like wheel

to create the asset that you wanna deploy to IPI.

We find for a lot of users, it's really useful that Pants will auto generate that file because Pants has this understanding

of all of your projects' dependencies

that you can both depend on external projects, and you can also depend on your own source code. And pants will see if you have multiple different distributions or multiple different projects

in the same monorepo.

Pants will auto generate in that setup dotpy that this project depends on that project and do all that hard work with that mapping for you.

We also support

generating in the PEX format, which stands for Python executable,

bundles up all of your code into a single zipped file that you can simply run on the command line to start your app, has everything you

need, and we support

creating AWS Lambda's.

And there are a lot of other formats out there like py installer that we haven't added yet, but we're really interested in adding, and it's something that our plug in API

is designed so that it's easy to write plug ins to hook up to whatever format you're using to package your code.

Right. As Eric mentioned, we have a really powerful plug in API. And so the thing that enables the integration with with each of Pance's goals, like format my code, lint my code, package my code, is this very powerful plug in API. And so if we don't already support a Python tool that somebody is used to, we can definitely help people get it integrated, and it has we have great docs for those plugins as well.

Digging more into pants itself, I know that there has been some fairly significant changes, particularly recently with the move to version 2. But can you just dig a bit into how Pants is built itself and how you've structured the code and just maybe some of the ways that the overall design and goals of the project has evolved since it was first created?

In Pants v 2,

plugins are developed as

pure

coroutines

or async functions

called rules.

And this is a huge improvement from v 1 because it means that it's very, very challenging to,

get things incorrect.

You can't forget to indicate that a rule depends on some other code because the dependencies of rules are actually sort of dependency injected by the Pants v 2 plug in system.

And it's important that we know what dependencies

the rule code has

because it's critical to sort of make sure that we know that we've gotten your dependencies correct.

And when the build tool needs to rerun, it needs incremental,

correct incremental

information

to know what's changed since the last time they ran.

So v 2 with rules as the core of its plug in API

is dramatically faster than v 1 and is able to do more work incrementally.

It uses a a daemon to keep the outputs of rules that haven't changed warm.

And so rerunning pants

after having run it before with a minor edit now takes, you know, less than 200 milliseconds to do the actual substantial work. We have more sort of improvements planned.

So the main way that pants is implemented,

we use the same API if you're a plugin author or you're writing core pants

so that we break down each build step into a couple of small composable parts.

If you think about running a tool like Black, for example,

there are a couple steps to it. We first need to determine what files that you wanna run on, and we need to download the actual tool and if you have any plug ins. And then we need to run the actual subprocess

where we've downloaded black. We have all the files, and now we go let black run like you would normally invoke it on the command line.

Each of those steps is modeled through Python, through using type hints and using

pure functions,

which are async functions that we call rules.

And each rule or each function says what it needs to run

and then what it's going to give back. So if we talk about downloading black, for example,

we need to know the version of black that we should download.

The function will say that I need this information,

and then it will give back that downloaded

black file that we just have or binary that we just got.

Because we model each step as a function,

we're able to

automatically

get things like caching

that

we can see if the input to this step was the exact same as before,

then we can use the cache.

But if any part of the input changed, then we need to rerun this step and so on.

And 1 of the fundamental differences between v 1 and v 2 is that

for that list of inputs,

we have moved from an sort of an include based fingerprinting model where you have to explicitly list to the things

you depend on, and you might get that wrong,

to an exclude based model where everything is included by default. We we know what dependencies

you used in your rule,

and it's impossible to miss 1. But then you would need to

do extra work to remove something from a fingerprint, and this allows the, rule fingerprinting to be much more accurate and your cache keys to be more accurate.

And my understanding too is that with the move to version 2, you've done a fairly major overhaul of the source code of the project, and you're using Rust for sort of the core engine with Python as the layer on top of that to handle all of the

plugins and the interoperability.

I'm wondering if you can talk through the motivation for that major shift in how the project was built

and some of the

down stream impacts that you think that that's going to have on the future of the project?

So the Rust engine has enabled us to execute the scheduling of all those

pure coroutines

while minimizing the amount of time that we're actually holding the GIL.

And so a large amount

of pants's

IO

and scheduling,

downloading of files, sort

of snapshotting of inputs is all accomplished in native code. And so we spend relatively little time with the GIL acquired aside from or plugins rule code is running.

And that's clearly the right choice for us because Python is just such a great language to be writing plug ins in. We found that Rust is a great addition to Python

in the sense that you can provide

a great native Python API

with a safe concurrent fast core.

Another aspect of this switch to the version 2 release is that you've actually decided to drop support for the large variety of languages

and build systems that you had in the previous version.

And I'm wondering

why that was an acceptable trade off and what your thoughts are on the future of pants in terms of the broader ecosystem of working with different languages and monorepos.

So moving to v2,

yeah, we made the hard decision to drop support for some languages.

And as we mentioned, it is a sort of a fundamentally new tool.

We felt that that was the right decision just because some of those languages are reasonably well served by other tools right now, and Python represented a really underserved use case.

Pants v 2 is fully generic, and so we expect to add back support for all of those languages in the very near term. And And in fact, we'd love people's feedback on which which languages they'd like back first.

But we this focus on Python, we think, has enabled

us to better serve those users

in v 2, and we think sort of the results speak for themselves.

Python has become the default language for working with data, whether as a data scientist, data engineer, data analyst, or machine learning engineer.

SpringBoard has launched their school of data to help you get a career in the field through a comprehensive set of programs that are 100%

online and tailored to fit your busy schedule.

With a network of expert mentors who are available to coach you during weekly 1 on 1 video calls, a tuition back guarantee that means you don't pay until you get a job,

resume preparation, and interview assistance. There's no reason to wait.

Springboard is offering up to 20 scholarships of $500

towards the tuition cost exclusively to listeners of this show.

Go to python podcast.com/springboard

today to learn more and give your career a boost to the next level.

And digging more into the specifics of Python for people who are looking to lay out their project as a mono repo or maybe combine multiple different repositories

into 1 for being able to take advantage of the code sharing and discoverability

aspects.

What have you found to be some useful strategies for determining how to structure those repositories

to be able to lay them out for effectively having shared modules and libraries

and

being able to narrowly scope the 3rd party dependencies so that you don't end up having to pull in 15 other packages

for 1 library just so that you can actually use 1 aspect of that code base?

So 1 of the main goals with PANS that we're talking about

being able to adopt easily when you have an existing repository.

So we don't want you to have to and you should never have to rearchitect

or change how your entire project is set up.

There are hundreds of different ways that people set up their monorepo,

and pants is designed to be flexible that whether you have

multiple different top level folders

or everything lives under 1 folder, whether you have test living right next to their source files or an entirely separate part of your project, pants can work with all of those different 1

of

the things that 1 of the things that enables that flexibility

is minimal boilerplate.

Dependency inference is a feature in Pants that it that is really important

because it enables you

to have very precise information

about what code depends on what other code

without needing to sort of repeat all of that in build files. And so

the comparison here is if you're not familiar with a build file, if you had these units of code

and you had, you know, hundreds of units of code of libraries or whatever unit in 1 repository,

if you needed to

track precisely the dependencies

between all of them, You would have import statements that would indicate that you use a particular library,

but then you would also,

in your build metadata, maybe your setup pie or in this case, a build file, need to repeat your import statements and say, oh, well, I depend on this version, this this named this version of this named library. And dependency inference removes the need to do that

because pants is aware of your import statements, and it actually parses import statements to determine

what other libraries code depends on. And this is really important because

it allows for this very, very precise information that sort of can't go out of date. Right? Assuming you have linters enabled, which which are trivial to enable,

you can't have an unused import statement.

And this unused import statement would represent a dependency that went in your binary that you didn't need.

If you don't have unused imports

and your dependency information is precise because it matches your import statements,

your binaries end up with exactly

the things that are required. They're minimal. They're very accurate. And having these accurate

tiny binaries also means that cache keys are tiny. Pants is striving to cache

and carefully invalidate

all of the work that it does

to save you time. And if you have, you know, hundreds of tests, thousands of tests, we're actually catching those tests as well

to avoid having to rerun them. So this fine grain dependency tracking

allows for these smaller binaries.

It allows for more cache hits,

and it doesn't require sort of the boilerplate that you might expect where you you might have to repeat yourself, repeat the information that's in your import statements.

So we're we're really happy with how dependency inferences come out.

And digging further into that because that's 1 of the things that I'm personally very interested in is how you would approach defining the dependencies

for the

repository level.

And for instance, I've seen a number of projects that may be using distutils or setup tools where they have the setup dot py

or poetry

where they have the pyproject dot TOML, and they need to have multiple of those defined at different layers of the project where I have a setup dotpy for this subpackage,

and then I have a setup dotpy for another subpackage, but then I have a requirements dot text at the top layer to be able to do development of the repository

at large.

Or, you know, with poetry, I have a pyproject.toml

that installs all the dependencies for everything in the project, or I have to have 1 that scopes the dependencies to the specific subpackage.

How would I actually go about handling that type of situation with pants and being able to have all the dependencies installed in my virtual end for development,

but then being able to only include the actual third party packages that I need in the built project.

The easiest case in that type of environment

is where everything is actually a subset.

So if all of the subprojects

or all of the projects libraries within 1 repository

actually use a subset

of some larger

consistent set of libraries.

It's very easy to sort of say this binary depends on the subset of the larger

set of libraries.

And then

what how pants operates is that you have perhaps a single requirements dot TXT, but each binary might only use a small portion of it. Pants also supports though having different binaries or libraries

with different

sort of non overlapping,

non subset requirements.

And so you could have a binary that has a specialized version

of NumPy

versus some other binary,

and that works fine. The only thing that you then have to do is be specific about which portion of the repository you're operating on.

So for example, if you're then trying to load an IDE,

the version of NumPy that is in use for a particular portion of the code is different than it might be from another portion of the code. If it's a pure subset, it's very straightforward because we're taking just the subset of of the larger set of requirements.

But we also support cases where they have different requirements.

When you brought up a great case about the problem with a lot of Python projects where you have that duplication of your requirements in your setup dotpy file

and also maintaining those same dependencies in your requirements dot

text. That is a specific problem that we solve by that pants will auto generate your setup dot py for you

based off of looking at what your project's dependencies are. So you only need to define your requirements once in that requirements dot text.

Your entire project can use that.

And then like Stu was talking about, we take the subset that we see this project is using a subset of those global requirements,

and we'll generate the setup dot py file for you with exactly what you need. Yeah. That definitely solves a big problem that I've seen where I might have a project that has 2 distinct modules that are each doing their own thing.

And then I have a library folder that has some helper elements that I might use in 1 of the subprojects and I might use in the other, and there's some subset of those that are shared.

And the library module has a specific set of dependencies that I pull in where maybe 1 of the modules within that library is handling MySQL connections. So I need to have the PyMySQL package installed, and then, you know, another 1 might require interacting with MongoDB versus the other 1 that uses Redis.

And, you know, the disjoint aspect is Mongo and Redis. So package a gets

the MySQL dependency and the Mongo dependency. Package b gets the MySQL dependency and the Redis dependency. And it sounds that sounds like I just have 1 requirements dot text or pyproject.toml

at the top level that installs all of those for my development environment. But then when pants goes to actually build those projects, it just uses those

subset of the modules for eat packages a and b.

Exactly. And if you're not happy with 1 of those requirements from the global

superset, then you can swap that out and say, I don't wanna use

this normal version,

but I instead wanna use maybe this version that uses CUDA and GPU

for only this project.

Yeah. It's definitely very helpful.

And so

for people who are starting to adopt Pants and they might be using it in their projects, are there any potential edge cases or common points of confusion that you've seen that you think people should be aware of?

Yeah. So when people begin to adopt pants,

we're enabling a lot of tooling that people might have been invoking manually.

So for instance, if you were doing Cogen

manually, you were generating gRPC stubs that you were then going to load. You might have had scripts for that. You might not have. But you then need to tell your IDE that those are a thing. So we enable

more sets of tooling than people might have been been used to using, but we wouldn't recommend

actually checking in the gRPC code, right, into your repository.

The build tool sort of manages this generated code and make sure that it's accurate,

but you then need to integrate with other tools that sort of consume your build. So we've done a lot of work, and we're we will do more work to make sure that Pants smoothly integrates with things like IDEs.

But sometimes people have bumps on their way to getting a good IDE experience.

For people who are using pants, what are some of the most interesting or unexpected or innovative ways that you've seen it used?

I think the most exciting thing for me to see is the plugins that people have been writing. We only finished writing our plugin docs about a month ago, and already we've had a couple of different

really cool plugins written by people within relatively quick time that we had a user who showed up on Slack and within 2 days wrote a Python plugin.

We had another who got Jupyter working for them, which we're planning on adding first class support for soon, But they're able to get it working with their specific projects requirements.

It's had a couple users write Docker plugins

that leverage a lot of benefits of pants, like, running in parallel.

And we've

had people also build on top of the

what we're talking about, that pants will auto generate that setup pie for you. And we have a hook that you can add

your own plug in logic to generating that setup pie. So we've had some users who are using git submodules

to come up dynamically with what their packages

version is. Couple of people are reading from files, for example, to determine that information.

So that's been really exciting and validating for us that a major motivation

for

this v 2 project was to make it much easier

to write a plug in if you want to. A big insight that we had is that there are so many different workflows and builds that people have that

no matter how hard we try to make that core pants experience work out of the box,

we will always have users who have a certain workflow

that we're not able to reach, but the way we reach them

is by having that

very powerful plug in API written in Python.

And in your own work

building and rebuilding and promoting the pants project, what are some of the most interesting or unexpected or challenging lessons that you've learned in that process?

For me, I think it's been

a constant reminder working on this VTube project with Pants of how important it is to not only understand

your own project

and the pants tool,

but also to have a deep understanding of the way that Python works and Python's tooling works.

We spend a lot of time, for example, iterating on init dotpy files

that as many of y'all have probably seen, init dotpy file is pretty complex in Python that

there's 2 purposes,

first is to indicate

what is a module or what is a package if you're not using PEP 420.

And then the second is that sometimes people put content in their nint.py

file.

Went through a lot of iterations to make sure that pants is always behaving

like Python does in the real world, that

we will always include your init dotpy file, not only for the current directory, but also for all of your ancestors

and make sure that we properly pull in, if you have content in your init.py file,

we know

that even if you're not importing that file, we need to pull in all of those dependencies

of any of those ancestors.

Because you might not have an import statement like we normally use,

but if you have code, then those imports need to work and you have a dependency. So

that's, I think, been the most interesting for me is when we think we have a solid understanding of Python, finding that we need to read that pep a little bit more closely to really truly understand the ecosystem.

And I would say that it's probably not unexpected. It's a classic issue, but

making something that's powerful

and performant,

but then also really easy to use is just sort of a constant struggle. Like, what is worth putting in a plugin developer's face

because it's critical for them to understand versus hiding? I think we've hit a pretty good a good spot. I can certainly go back and look at earlier versions of the rule API from a year or 2 ago

before we were using coroutines. And there was this whole declarative syntax

that you use to declare what your dependencies were, and it was awkward. It was horrible. We've ended up with something that's massively better and very natural. And a lot of that's sank thanks to, you know, type hints in Python 3, a little bit of introspection.

And I think the API now is at a really sweet spot, but it's been a bumpy road.

For people who are considering using Pants or they're just getting started and trying to figure out what build tools and what development cycles they wanna have for their projects, what are the cases where Pants is the wrong choice?

We strive to minimize the amount of boilerplate

necessary to get started with pants.

And so we would hope that even really small projects that don't really anticipate having

multiple, you know, deploy binaries or multiple published libraries within 1 repository

might still be a good fit for pants. And we will always sort of keep that in mind to make sure that

the single

project repository,

we still add value for without, you know, too much of a boiler plate cost. But it is the case that we add some amount of configuration

above what you might use.

So as I said, we'll we'll continue to strive to make sure that our overhead

in terms of configuration is low. We'll continue to strive to make sure that our our overhead in terms of runtime performance is low and that we're we're really adding value even for those tiny repositories.

But it's certainly the case that the more code and the more deploy binaries or libraries you have in a repository, the more value we're gonna add for you. And, you know, the obviously, the concurrency and caching

is great from a a latency perspective as you're just iterating on a small project to reformat or lint your code or type check and test it.

And the deploy binary piece actually reminds me of another thing that I forgot to ask about is when you're working on the project, there's the matter of dependencies

and linting and maintaining code quality. But for the case of when you actually have something that you want to be able to push to production or possibly to PIPI, but particularly when you want to have a deployable

binary,

What are some of the

useful approaches

for that, and how does pants help to support that where I know that out of the box, it has strong support for things like PEX, but then there are also projects like Shiv or the new PyOxidizer

project and just some of the considerations

that developers might wanna look at for that?

Yeah. So Pex has a long history of integration with

Pex. But particularly in v 2, we are fully

pluggable.

So there's no reason really for PEX to be the primary way that you're building deployed binaries.

We don't have support for Shiv or, PyOxidizer

yet. But as a Rust project, we would love to work with somebody to integrate PyOxidizer or perhaps do it ourselves.

There's a distinct possibility that pants will be deployed using,

itself using PyOxidizer

at some point in the future, given that we have 35% of our code in Rust.

And so we'd love to help integrate those.

The v 2 plug in API sort of makes it a level playing field. We don't have

massively

sort of a particular favorite in mind when it comes to deploys.

And in terms of other plans that you have for the future or new capabilities

or contributions that would be helpful to

bring you forward, what are some of the overall plans that you have for the project?

We spent

the past year, I'd say, really focusing on getting out this 2 0 release, which represents

the first stable release with this v 2 engine and v 2 rewrite.

So now we are turning our attention and asking the community for where they would like us to focus

next. There are a lot of different ideas that we have and that users are submitting, including support for new languages like Java and Go, but also supporting

more robustly different Python workflows,

such as

better integration with Jupyter. We were just talking about some different deploy binaries that we would like to support.

So 1 other issue that's close to my heart is actually improving that case you were mentioning earlier

where you have heterogeneous dependencies. Right? Not every project within the repository is just a subset of some global requirements

file. And we already support that, but it's really important to sort of have the same rigor of being able to generate lock files like you can if you have homogeneous dependencies.

But what you have to then do

without, you know, requiring that people have a huge amount of boiler plate of or lock file per library

would be to allow for a lock file per actual

distinct

resolve that you might want.

And so a binary that that definitely needs a particular version,

if you know the MySQL connection library or NumPy

can specify that, and that triggers this need for that additional lock file. If you're homogeneous across the whole repository, you only need 1 lock file if you have these mixed requirements, you need more. So we'd like to support that even more rigorously than we already do. We have been built in v 2 to support remote execution and caching,

all of the dependency tracking

and file fingerprinting that we do

supports these quick low latency

builds,

but it also supports

the cases where you have many many hundreds of tests or thousands of tests

or native dependencies that need to be built.

And doing that, particularly on a larger team,

it's helpful to have the ability to run

those tests or those builds on a cluster.

So Pants already has support for remote execution

and remote caching,

but we would like to make it even easier to use in particular

cross cross platform.

So if you're on an OSX laptop,

being able to run the portions of the build on Linux on a remote Linux cluster, for example, Or if you're doing machine learning to be able to take advantage of the cluster that has particular CPU instructions that you don't have locally. So we have a lot of ideas in the idea in this general area.

Are there any other aspects of the overall space of monorepos

and working within them and the benefits or drawbacks

and the specifics of pants and how it enables all these different workflows that we didn't discuss yet that you'd like to cover before we close out the show?

I would say 1 other interesting aspect of Monterey Bose is just the ability to sort of bisect

and get bisect

perspective

to a precise change that might have broken you. And that that helps more when you have larger teams than smaller teams, but it's built around the fact that you're not Git committing in a bunch of different repositories. You're sort of able to interact with 1.

You can create a branch that changes 10 different libraries all at once and commit it

atomically.

And that means that the bills of master are sort of always green, and you get this sort of continuous top to bottom integration testing, which is really beneficial.

The default

in a monorepo, assuming you, you know, run all of the tests or run all the tests that are changed by a particular commit, is that you've tested

top to bottom

and

that commit you know is is trustworthy. Right? You don't have to integrate

or have some particular explicit integration step

between libraries or binaries.

And I think this is really powerful.

And I mentioned at the start that I first got involved in Pants working on its Python 3 migration

and also working on Foursquare's migration.

That to me the reason I got so interested in Pants beyond the community is realizing

how a build tool like pants

can be useful when you're doing huge projects like an incremental migration.

I'm really excited that we're adding some new functionality in the next week and a blog post with it about

ways that pants can give you insight on your project

for your migrations that you can see, for example,

what is your most used code so that you know what to prioritize.

And you can incrementally keep track of that part of your code base might still be only Python 2 and part of it might be Python 3 only,

and that works with the tool. That doesn't need to be some consistent thing.

1 of the really cool features we added recently into pants

is that we will run

your linters and your test using the appropriate interpreter for that code.

So if you have some Python 2 only code and you have some Python 3 only code, because we run each test as a separate process and it runs in parallel,

that's perfectly okay. That will run those Python 2 tests using a Python 2 interpreter,

and Python 3 one's using a Python 3 interpreter.

And what that unlocks is an incremental migration.

Right. And you get all the same parallelism and caching across this sort of heterogeneous

repository.

And that also brings up the other aspect of monorepos

or even regular project repos that might have multiple different languages with maybe the most common case being maybe a Django project where you also have JavaScript that you wanna be able to bundle and deploy to s 3, for instance. Or

if you have

some c plugin for your Python project that needs to be built before you can actually then use it within the rest of your project. So having pants being able to handle all those different aspects, I can see as being very valuable.

Absolutely. And I think that's a definite advantage of monorepos

even if they're small. But as they get larger,

you will have multiple languages involved. And if you're a Django project, you immediately, you will. And

less so in that case, but more so with cross language integrations.

The fact that you don't need to version and publish

removes this sort of n by m problem where

I don't have to have my JavaScript

package manager

understand

PyPy version numbers in order to have JavaScript consume Python or or vice versa.

So that's really powerful and I wanted to sort of like, particularly if you were to have your native extensions

in this repository,

I don't need to know how to to package them in a in a way that is versioned and publishable

to PyPI. Although I can, I don't have to maintain that boilerplate to then to then consume it? And that's really powerful. So we're we're excited for more language support to be added.

For anybody who wants to get in touch with either of you or the rest of the pants team and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And with that, I'll move us into the picks. And this week, I'm going to choose the TV show cursed that I watched recently on Netflix. There's 1 season of it out so far, but it's just a very interesting and well done reimagining of the Arthurian legends and takes place sort of before king Arthur comes into power and the story around the sword. So it's just a really well done story. I had a lot of fun watching that, and I look forward to future seasons there. And so with that, I'll pass it to you, Eric. Do you have any picks this week?

Yeah. I think for me, it goes back to when I was teaching the 12 13 year olds. The

way that I taught them was using

Tracy the turtle, which is a built in module in Python called turtle or turtle graphics.

It's a really cool

little DSL if you haven't used it yet where you tell this little turtle some commands, like move forward 40 pixels or turn left 90 degrees.

You're able to come up

with really awesome different designs that a lot of students come up with things like Darth Vader designs, volcanoes.

The reason I'm bringing up is if you're interested in showing pry programming in Python to anyone, especially young people in your life, I think Tracy the turtle and turtle graphics are an awesome way to get them introduced.

Yeah. Definitely, 2nd turtle graphics is a great way to just have a quick easy win for somebody who's just learning programming. So definitely recommend that. And so, Stu, what do you have for this week?

I've really appreciated recently

some writing by a blogger. His domain is faster than lime, and his name is Amos. But he he writes great

in-depth posts

that explain systems programming sort of top to bottom.

And he does some really interesting things and sort of goes way deeper than you would expect. Right?

If you ignore the scroll bar and don't pay attention to the scroll bar, you'll think that he's not going to go deeper, but then he'll go deeper. You're you're a level a level lower. You've gone further into the rabbit hole. And so he writes these really fun blog posts that are very informative, so I've appreciated that.

Well, thank you both very much for taking the time today to join me and discuss the work that you've been doing with pants. It's a very interesting project and 1 that I've been keeping an eye on for a while and plan to start using for some of my own work soon. So thank you both for all the time and energy you've put into that and the recent reimagining of it, and I hope you enjoy the rest of your day.

Thank you. Thank you so much. I look forward to helping out.

Thank you for listening. Don't forget to check out our other show, the

Podcast at dataengineeringpodcast.com

for the latest on modern data management.

And visit the site at python podcast dotcom to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email host at podcastinitdot

com with your story.

To help other people find the show, please leave a review on Itunes and tell your friends and coworkers.

The Python Podcast.init

Summary

Announcements

Interview

Keep In Touch

Picks

Links

The Python Podcast.__init__