Keep Your Code Clean Using pre-commit with Anthony Sottile

Hello, and welcome to podcast dot in it, the podcast about Python and the people who make it great.

When you're ready to launch your next app, you'll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 200 gigabit network, all controlled by a brand new API, you've got everything you need to scale.

Go to podcastinit.com/linode

to get a $20 credit and launch a new server in under a minute.

And visit the site at podcastinit.com

to subscribe to the show, sign up for the newsletter, and read the show notes.

Your host as usual is Tobias Macy, and today, I'm interviewing Anthony Sotilli about pre commit, a framework for managing and maintaining hooks for multiple languages. So, Anthony, could you start by introducing yourself? Sure. I'm a software developer currently doing developer

experience infrastructure at Lyft. And do you remember how you first got introduced to Python? Yeah. Sure. My first experience with Python was actually writing an auto grader for a university,

and then I was immersed in Python at work as a front end developer that turned full stack and eventually turned into

infrastructure.

So before we start getting into

pre commit the project, can we first establish what we're discussing when we are talking about a pre commit hook and, some of the ways that they're useful for developers?

Sure. During the git life cycle, there are many hooks that run as a side effect of running git commands. 1 in particular is the pre commit step,

which happens right before a commit is finished and added to your Git history. During the pre commit stage, you can run a series of checks that make sure that the code that's being committed is correct or what you want it to be. And so

there are also, I believe, some of the different life cycle hooks in tools such as Mercurial

and I'm sure other source control systems. And I'm wondering if there's been any thought in extending the pre commit framework to support some of those other version control systems.

Yeah. So there was actually a issue early on about adding Mercurial support,

but I personally don't use Mercurial, so I didn't actually get a chance to take a stab at it. But there's there's nothing really stopping from making it happen. We just need somebody to invest some time and send a PR. And so in a number of the projects that I've worked on, there have been some various implementations

of pre commit hooks for managing

some different routines in the Git life cycle before it gets pushed to the remote repository.

And a lot of times, they've just been various bash scripts or maybe Python scripts. And I'm wondering what motivated you to create a,

more structured framework for being able to

manage these various hooks and some of the benefits that that's provided to you as a developer using that framework?

Yeah. So I actually had my own terrible

100 several 100 line Python script that did basically the same thing, and I checked it into all of my repositories and had this precommit dot py script that

drifted and was pretty terrible and 1 off every time.

And I found that it really just didn't scale to projects that I wanted to work on. Keeping a giant bash script or giant Python script in sync and installed across a bunch of different developers

just really wasn't something that I was interested in doing. So I wanted a way to have

a smaller declarative set of metadata that described what hooks should be run and allow that to be configured per repository.

And I know that there are some other attempts at a more general purpose way of managing pre commit hooks

that have been built by other people and using different approaches. And I'm curious,

how the pre commit framework that you've built compares to some of those other tools that you may have come across. Yeah. There's a couple things that I think pre commit does exceptionally well, and in some ways are just the same as others. The things to me that really set it apart is Preqmit sets a

strong focus on

user space installations.

So by default, you don't need to install any external tools to run Precommit. You just need Python, and that's it. Whereas with others, you might need to

install Flake8 to your system or install

Go format wherever you need, although you always have Go format, so maybe that's not a good example.

Or, like,

would need to be installed to your system. And managing system dependencies, keeping those in sync, and,

upgrading or downgrading them is a difficult task, especially in an organization.

The other thing that really sets PRECURRENT apart is it tries to support a lot of different programming languages. Several other frameworks

focus entirely on JavaScript or entirely on Ruby

and

don't really give you the flexibility to work with a bunch of different languages.

And so Preakmant tries to enable

support for lots of languages at once. And for somebody who is first getting started with pre commit, either as a new user trying to set up their own integrations with their repository

or as somebody who's joining a team that's already using pre commit. Can you talk through the steps of getting set up and getting the hooks integrated into their workflow? Yeah. Sure. So the first thing that you would do, of course, is install the pre commit framework. Pre commit is available through PIP or through Brew.

And once you have it installed, you'll want to create a pre commit configuration in your repository. There's a command pre commit sample config, which gives you a very, very basic starting point. You can add a bunch of hooks to that configuration.

Pre commit.com/hooks

has

a list of those. Once you have a configuration set up, you'll run pre commit install. This will add the hook this will add the hook script to the Git repository such that when you make a commit, it will run, and you just go from there. And so as we discussed earlier, there are a number of different life cycle events

that you could optionally hook into. So I'm wondering what it is about the pre commit hook in particular that makes it particularly useful, and if there are any other events that you

have either extended pre commit to be able to take advantage of or that you have considered doing that for? Yeah. Sure. So

to me, commit is the

most basic atom in a git history, and so it makes sense to catch stuff as early as possible. And to to me, commit is the place to get that done. Precommit, the framework itself, despite being called precommit,

actually supports a couple of other git hooks. Currently, those are precommit, pre push, and commit message. So if you don't wanna lint it directly at commit time, you can lint at push time. The commit message hook also gives you the opportunity to adjust the commit message or make sure it adheres to your style guidelines. So discussing the pre commit versus pre push,

I'm thinking of a number of different ways that you could structure your workflow where you would be able to take advantage of 1 or both of those.

And I know that for pre commit in particular, because it can be a blocking operation for somebody can continue doing anything else, that there are a number of

steps or routines that you can put in there that will cause people to wanna just bypass those checks entirely

where putting them in the pre push command might allow them to get into a flow where they're committing a number of times locally before they push and then make those longer running operations happen at that point. So I'm wondering if you can talk through some of the ways that you can inadvertently

discourage people from taking advantage of the pre commit framework and some of the types of checks that are most useful at those different stages? Absolutely. Yeah. So for pre commit, it's a blocking operation, like you said, so it's very important to keep those checks

as fast as possible. Some other things that I've seen discourage people about checks is, well, 1, if they take too long, but 2,

creating false positives or unnecessary noise. 1 type of hook that I think is really, really useful and where the framework really shines is when a code formatter will rewrite your files directly for you. This is particularly powerful because you don't really have to think about style and it just makes it happen. As for a pre push, it's often useful to move

the slower or

less important checks to pre push such that you can run them,

you can you can get push and walk away, essentially.

Yeah. And that seems like a place where you would want to maybe automatically run your unit tests

because, as you said, you can walk away or you can, say, push and then maybe continue to do other development

while it's waiting to do that. And if you were to do that during the pre commit stage, then you're preventing anybody from making forward progress while the unit test suite runs. And particularly for some projects where that can take upwards of 5 to 10 minutes, it can really slow down your workflow of being able to write some code, commit to make sure that you don't lose your progress, and then continue to do other things. Yeah. For me, if it takes more than a couple of seconds, it probably doesn't belong in pre commit. Pre commit

the

GitHub.

And

given that

some of the most useful state steps for that pre commit stage are the fast running ones. And you mentioned some of the linters or code formatters, but I'm curious

what other types of operations you've seen people put into that stage of the life cycle to help improve the

cleanliness and health of the code base or the productivity of the developer? Yeah. So I think, like you said, the most important ones are

your basic code lenders that do static analysis and type checking, those sorts of things, as well as formatters, which will make your code adhere to a specific style. Beyond that, I haven't seen too many other strange uses. Probably the most interesting use I've seen in a pre commit hook is to check for

security credentials

or secrets that you don't really wanna check into a repository

or making sure that you're not accidentally adding really large files,

encouraging to use tools like GitLFS.

Probably the strangest 1 I've seen, though, is a hook that tried

to deduplicate

files in a repository

by replacing

ones with symlinks.

That was probably the weirdest 1 that I saw. And given that these hooks are executing across your code base, there's the potential that they can rewrite files or modify files. And, you know, with the case of Winters or code formatters, that's beneficial, but there's also the possibility that you could end up putting some code in there that will actually,

have detrimental effects on your code base. So I'm curious

what are some ways that you can guard against that or at least,

verify that the hooks that you're using, particularly since pre commit the since the pre commit framework is essentially a package manager of sorts

that you're not pulling in code that you don't want to be executed against your environment? Yeah. Unfortunately, I don't have a great answer for this 1. 1 thing that I can talk about here is that Pre commit tries to aim for repeatability

wherever it's possible. So you specify a specific version that gets pinned in your precommit configuration,

and it relies on get to ensure some amount of, I guess, integrity there. I actually just remembered 1 thing from earlier about features that set you like, you reminded me a little bit of a feature that sets Pre commit apart from other frameworks, and that's that Pre commit tries

to smooth out a lot of the sharp edges of git commits. So 1 1 case in particular that a lot of time was spent on

was when you commit a partially staged review, it will make sure that you're only checking the code that you're actually checking in and not the unstaged parts. Yeah. And that's something that I found very useful in my own uses of the pre commit framework is, for instance, if you're doing something like a 2 to 3 conversion for Python or

running something a little bit more aggressive like the black formatter against your code,

it's not going to run across your entire code base by accident. It's only going to affect the files that you've actually already edited and that you're already going to be verifying so you don't inadvertently

balloon the size of your pull request that you're trying to submit. Mhmm. Yeah. And when you're committing a merge conflict, it's only gonna check the conflicting files and not every single file that everyone else touched upstream. And Oh, I guess 1 thing I can talk about from 2 questions ago is

Prepayment will never automatically

stage files that it makes changes to, and so this gives the user a chance to double check the code that gets changed as part of the hook. Yeah. That's definitely a good way to verify that you're making the changes that you wanted to make. Yeah. Because linters often have bugs, so it's a good chance to make sure that they're good. Absolutely. And 1 of the other things that we've glossed over so far is the multi language aspect of pre commit. So I'm wondering what has been

some of the most challenging or difficult aspects of

enabling

the support for running checkers and linters for multiple languages and that are implemented in multiple languages. Oh, for sure. Yeah. This was actually 1 of the design decisions from the beginning of the framework is that it would segment different programming languages into different

language implementations.

Honestly, the hardest part of working with a lot of different programming languages is you need to learn learn all the idiosyncrasies

of their package managers,

their different installation structures,

how to make them install in user space,

and probably the worst part is how to make them work on Windows. Yeah. That's always the sort of,

forgotten stepchild of developers is, oh, it it also runs on Windows,

kind of.

Yeah. Yeah. For pre commit, probably the hardest language support has been with Ruby.

It still doesn't work on Windows because

I haven't figured it out yet. Part of that's probably because I don't have a lot of experience with Ruby, but another part of that is all of the Ruby management tools are written in bash, and so they're not super portable. Or they rely on things like overriding CD, which I'm not comfortable with. Yeah. Yeah.

The Ruby community has a inherent love of monkey patching and magic that just happens because

of reasons. Yeah.

That said, it mostly works on Linux and mostly works on macOS. So that's Yeah. That's our primary use case. And I'm wondering if you can discuss too a bit further of how the pre commit framework itself is actually architected and implemented

and some of the ways that the design has evolved from when you first started working on it. In a lot of ways, the design of the framework itself hasn't changed much since the beginning. It's always been kind of a

2 part system. 1 part being remote repositories that you can add or remove in a configuration file, and the other part being,

the other part being overriding

the configuration in your local file. Beyond that, the place where Prekma has grown the most is by adding other hooks in the Git life cycle as well as adding other languages

that Precommit the framework supports directly.

And given the fact that

by design, Git won't automatically install the hook into your locally checked out repository and it requires a manual operation.

How do you ensure that all of the developers on a team are actually using the hooks that are configured for the repository

and help to reinforce the behavior

of running these checks locally before committing,

and then just relying on code reviewers

to take care of what the pre commit hooks are intended to do automatically. Yeah. So there is actually a new feature in

brand new versions of Git that allow you to set a template for your Git repository.

I actually haven't looked into this too much because that's not the strategy I use to get pre commit set up. The most common strategy that I use is to add pre commit install to common,

operations inside the repository. Things like common make targets or talks targets or other things that you would commonly do. I also have a sneaky shell alias that does it for me when I CD into a repository. So I I have some terrible magic for myself. And, maybe if I find it to be not so terrible, I'll document it and share it with others. And then before we go into the conversation a bit further of just general sort of best practice and hygiene for developers,

I'm curious

if you were to completely

start over and rewrite the pre commit framework today,

what are some of the things that you would do differently?

And do you do you think that you would still implement it in Python? Yeah. So, actually, the first thing that I would probably do is pick a better name. When pre commit was originally implemented, it really only targeted the pre commit git hook, and so it was an appropriate name. But now that pre commit also supports push, commit message, and probably other hooks in the future, the name makes less and less sense. Unfortunately, I have SEO, so it's too hard to go back from there. Another thing that I would probably change if I were to start over is the configuration

file language. YAML has YAML is a nice configuration language

for human writeable things, but Preakmant does a lot of automatic rewriting of the configuration file, which is incredibly difficult in a language like YAML,

especially when there aren't any

good round trip rewriters of YAML. As for Python, I think Python is still a good choice for Preakmets. If I were to write it in another language, I would have to consider

portability.

And for the platforms that I target, Python is pretty easy to install and generally just works. As you mentioned, the YAML configuration

is how somebody interfaces

with

setting up and configuring the different plugins

and checkers

that you run during the pre commit hook. So I'm curious

for somebody who's interested in writing their own plug in that they want to be able to run during those life cycle events.

What is

involved

in getting that set up, and just the overall workflow of writing that plug in and then making it available to be installed by by Precommit and used by other people? Yeah. Sure. So the first step to writing a Precommit plug in is to get yourself a Git repository.

Each plugin in Precommit is distributed by git, and so that's how you have to get started. There's some directions on the Precommit website that tell you how to set up a

repository configuration.

This will basically provide some metadata out of the box so that PreCmit knows how to install your hook and run it. And then from there, you make an executable and go. There are some other options if you don't wanna make it distributable. You can write a what's called a repository

local hook, which will just run for your specific repository. This is useful if you need 1 off checks or, like, really simple

repository specific checks. And so the main purpose of using these life cycle hooks is to help enforce

code hygiene and developer productivity.

And I'm curious

what are some of the other

methods that you use and that you encourage

in, your team and other teams that you've worked on to, make sure that you're producing high quality code and repeatable processes?

Cool. Yeah. So some of the things that our team uses to encourage high quality code is

working with

tests, considering a change based on code coverage, of course, using linters like pre commit and flake8 and those sorts of things. For Python specifically, we've been doing a lot recently with type annotations

such that we can get gradual typing and catch some bugs before they hit test time or before they hit production. Some other kind of broader

organizational tools that we've been using recently are tools that provide refactoring

across a lot of different repositories.

This allows us to do

organization wide upgrades and ensure that every repository

is up to a specific spec.

We've actually used this to

add linters to a bunch of repositories

or

upgrade old packages

to the newest versions or make sure things that are in need to be changed need to be changed.

And on the pre commit framework itself, what have you found to be some of the most challenging aspects of building and maintaining the project and the community around it and some of the things that you've learned that you didn't expect to come across when you first started on that path?

Yeah. So pre commit pre commit itself started as basically taking a list of file names and running an executable with them. The first implementation

used bash to do or bash and XRGS to do a lot of this for it, but bash and XArgs aren't super portable, and so they've been replaced by Python implementations of the same functionality.

Probably the strangest things that I've encountered as part of Precommit is all of the

various edge cases in each of the tools that are tested. PreCmit itself has a very thorough test suite, so it's often

it it often happens where PreCmit will find a bug inside Git itself and then cause a patch to happen upstream. I think this happened, like, 2 or 3 times. I've actually been able to contribute myself to Git in fixing some of these.

Probably

probably the weirdest thing I've seen while working on pre commit is

trying to fix a Unicode bug in Python 2 and hitting a segfault in Python. That was probably the strangest thing. It ended up being

a strange edge case in Schlax.

And what are some of the plans that you have

for the future of pre commit, whether in terms of feature additions or bug fixes

or,

plans to try and encourage

new contributors

to the project or

maybe different directions or a different,

sort of implementation

of

the overall approach that you're trying to achieve with pre commit? Yeah. So for the future direction of the project, I think most of what needs to happen is

more hooks providing

metadata so that they're easy to use out of the box,

additional language support to encourage

other hooks to be integrated into the into the, community,

probably also additional

get life cycle hooks and

maybe even other version control systems.

And so for anybody who wants to follow you or get in touch or, keep up to date with the work that you're doing, I'll have you add your preferred contact information to the show notes. And with that, I'll move us into the pics. And this week, I'm going to choose the movie tag

because I watched it recently, and it was hilarious.

And the fact that it's based on the true story of a group of grown men who have been playing the same game of tag every month of May for 30 years is pretty amazing. So So it's definitely worth the watch. It's a lot of fun. And with that, I'll pass it to you, Anthony. Do you have any picks this week? That sounds great. I'll have to check that out. Yeah. So 1 thing that I've been paying attention to recently is this YouTube channel called Guess Theory. Their idea is to be super positive and to seek discomfort.

Their idea is that, like, life's too boring. Get out there and experience it. There are no strangers in the world, just friends that you haven't met yet, and, push positivity wherever possible. Alright. Well, thank you for that, and, thank you for taking the time today to discuss the work you've done with pre commit. I've used it in numerous projects and across a few different jobs, so I appreciate the work you've put into that. Definitely recommend other people check it out and use it for their own projects, And I hope you enjoy the rest of your evening. Thanks very much.

The Python Podcast.init

Summary

Preface

Interview

Keep In Touch

Picks

Links

The Python Podcast.__init__