Summary
Maintaining the health and well-being of your software is a never-ending responsibility. Automating away as much of it as possible makes that challenge more achievable. In this episode Anthony Sottile describes his work on the pre-commit framework to simplify the process of writing and distributing functions to make sure that you only commit code that meets your definition of clean. He explains how it supports tools and repositories written in multiple languages, enforces team standards, and how you can start using it today to ship better software.
Preface
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- When you’re ready to launch your next app you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to scale up. Go to podcastinit.com/linode to get a $20 credit and launch a new server in under a minute.
- Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email hosts@podcastinit.com)
- To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.
- Join the community in the new Zulip chat workspace at podcastinit.com/chat
- Your host as usual is Tobias Macey and today I’m interviewing Anthony Sottile about pre-commit, a framework for managing and maintaining hooks for multiple languages
Interview
- Introductions
- How did you get introduced to Python?
- Can you start by describing what a pre-commit hook is and some of the ways that they are useful for developers?
- What was you motivation for creating a framework to manage your pre-commit hooks?
- How does it differ from other projects built to manage these hooks?
- What are the steps for getting someone started with pre-commit in a new project?
- Which other event hooks would be most useful to implement for maintaining the health of a repository?
- What types of operations are most useful for ensuring the health of a project?
- What types of routines should be avoided as a pre-commit step?
- Installing the hooks into a user’s local environment is a manual step, so how do you ensure that all of your developers are using the configured hooks?
- What factors have you found that lead to developers skipping or disabling hooks?
- How is pre-commit implemented and how has that design evolved from when you first started?
- What have been the most difficult aspects of supporting multiple languages and package managers?
- What would you do differently if you started over today?
- Would you still use Python?
- For someone who wants to write a plugin for pre-commit, what are the steps involved?
- What are some of the strangest or most unusual uses of pre-commit hooks that you have seen?
- What are your plans for the future of pre-commit?
Keep In Touch
- asottile on GitHub
- @codewithanthony on Twitter
- anthonywritescode on twitch
- anthonywritescode on YouTube
Picks
- Tobias
- Anthony
Links
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to podcast dot in it, the podcast about Python and the people who make it great. When you're ready to launch your next app, you'll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 200 gigabit network, all controlled by a brand new API, you've got everything you need to scale. Go to podcastinit.com/linode to get a $20 credit and launch a new server in under a minute. And visit the site at podcastinit.com to subscribe to the show, sign up for the newsletter, and read the show notes. Your host as usual is Tobias Macy, and today, I'm interviewing Anthony Sotilli about pre commit, a framework for managing and maintaining hooks for multiple languages. So, Anthony, could you start by introducing yourself? Sure. I'm a software developer currently doing developer
[00:00:58] Unknown:
experience infrastructure at Lyft. And do you remember how you first got introduced to Python? Yeah. Sure. My first experience with Python was actually writing an auto grader for a university, and then I was immersed in Python at work as a front end developer that turned full stack and eventually turned into infrastructure.
[00:01:15] Unknown:
So before we start getting into pre commit the project, can we first establish what we're discussing when we are talking about a pre commit hook and, some of the ways that they're useful for developers?
[00:01:28] Unknown:
Sure. During the git life cycle, there are many hooks that run as a side effect of running git commands. 1 in particular is the pre commit step, which happens right before a commit is finished and added to your Git history. During the pre commit stage, you can run a series of checks that make sure that the code that's being committed is correct or what you want it to be. And so
[00:01:52] Unknown:
there are also, I believe, some of the different life cycle hooks in tools such as Mercurial and I'm sure other source control systems. And I'm wondering if there's been any thought in extending the pre commit framework to support some of those other version control systems.
[00:02:08] Unknown:
Yeah. So there was actually a issue early on about adding Mercurial support, but I personally don't use Mercurial, so I didn't actually get a chance to take a stab at it. But there's there's nothing really stopping from making it happen. We just need somebody to invest some time and send a PR. And so in a number of the projects that I've worked on, there have been some various implementations
[00:02:31] Unknown:
of pre commit hooks for managing some different routines in the Git life cycle before it gets pushed to the remote repository. And a lot of times, they've just been various bash scripts or maybe Python scripts. And I'm wondering what motivated you to create a, more structured framework for being able to manage these various hooks and some of the benefits that that's provided to you as a developer using that framework?
[00:02:57] Unknown:
Yeah. So I actually had my own terrible 100 several 100 line Python script that did basically the same thing, and I checked it into all of my repositories and had this precommit dot py script that drifted and was pretty terrible and 1 off every time. And I found that it really just didn't scale to projects that I wanted to work on. Keeping a giant bash script or giant Python script in sync and installed across a bunch of different developers just really wasn't something that I was interested in doing. So I wanted a way to have a smaller declarative set of metadata that described what hooks should be run and allow that to be configured per repository.
[00:03:38] Unknown:
And I know that there are some other attempts at a more general purpose way of managing pre commit hooks that have been built by other people and using different approaches. And I'm curious,
[00:03:53] Unknown:
how the pre commit framework that you've built compares to some of those other tools that you may have come across. Yeah. There's a couple things that I think pre commit does exceptionally well, and in some ways are just the same as others. The things to me that really set it apart is Preqmit sets a strong focus on user space installations. So by default, you don't need to install any external tools to run Precommit. You just need Python, and that's it. Whereas with others, you might need to install Flake8 to your system or install Go format wherever you need, although you always have Go format, so maybe that's not a good example.
Or, like, would need to be installed to your system. And managing system dependencies, keeping those in sync, and, upgrading or downgrading them is a difficult task, especially in an organization. The other thing that really sets PRECURRENT apart is it tries to support a lot of different programming languages. Several other frameworks focus entirely on JavaScript or entirely on Ruby and don't really give you the flexibility to work with a bunch of different languages. And so Preakmant tries to enable
[00:05:02] Unknown:
support for lots of languages at once. And for somebody who is first getting started with pre commit, either as a new user trying to set up their own integrations with their repository or as somebody who's joining a team that's already using pre commit. Can you talk through the steps of getting set up and getting the hooks integrated into their workflow? Yeah. Sure. So the first thing that you would do, of course, is install the pre commit framework. Pre commit is available through PIP or through Brew.
[00:05:35] Unknown:
And once you have it installed, you'll want to create a pre commit configuration in your repository. There's a command pre commit sample config, which gives you a very, very basic starting point. You can add a bunch of hooks to that configuration. Pre commit.com/hooks has a list of those. Once you have a configuration set up, you'll run pre commit install. This will add the hook this will add the hook script to the Git repository such that when you make a commit, it will run, and you just go from there. And so as we discussed earlier, there are a number of different life cycle events
[00:06:11] Unknown:
that you could optionally hook into. So I'm wondering what it is about the pre commit hook in particular that makes it particularly useful, and if there are any other events that you have either extended pre commit to be able to take advantage of or that you have considered doing that for? Yeah. Sure. So
[00:06:31] Unknown:
to me, commit is the most basic atom in a git history, and so it makes sense to catch stuff as early as possible. And to to me, commit is the place to get that done. Precommit, the framework itself, despite being called precommit, actually supports a couple of other git hooks. Currently, those are precommit, pre push, and commit message. So if you don't wanna lint it directly at commit time, you can lint at push time. The commit message hook also gives you the opportunity to adjust the commit message or make sure it adheres to your style guidelines. So discussing the pre commit versus pre push,
[00:07:09] Unknown:
I'm thinking of a number of different ways that you could structure your workflow where you would be able to take advantage of 1 or both of those. And I know that for pre commit in particular, because it can be a blocking operation for somebody can continue doing anything else, that there are a number of steps or routines that you can put in there that will cause people to wanna just bypass those checks entirely where putting them in the pre push command might allow them to get into a flow where they're committing a number of times locally before they push and then make those longer running operations happen at that point. So I'm wondering if you can talk through some of the ways that you can inadvertently discourage people from taking advantage of the pre commit framework and some of the types of checks that are most useful at those different stages? Absolutely. Yeah. So for pre commit, it's a blocking operation, like you said, so it's very important to keep those checks
[00:08:05] Unknown:
as fast as possible. Some other things that I've seen discourage people about checks is, well, 1, if they take too long, but 2, creating false positives or unnecessary noise. 1 type of hook that I think is really, really useful and where the framework really shines is when a code formatter will rewrite your files directly for you. This is particularly powerful because you don't really have to think about style and it just makes it happen. As for a pre push, it's often useful to move the slower or less important checks to pre push such that you can run them, you can you can get push and walk away, essentially.
[00:08:43] Unknown:
Yeah. And that seems like a place where you would want to maybe automatically run your unit tests because, as you said, you can walk away or you can, say, push and then maybe continue to do other development while it's waiting to do that. And if you were to do that during the pre commit stage, then you're preventing anybody from making forward progress while the unit test suite runs. And particularly for some projects where that can take upwards of 5 to 10 minutes, it can really slow down your workflow of being able to write some code, commit to make sure that you don't lose your progress, and then continue to do other things. Yeah. For me, if it takes more than a couple of seconds, it probably doesn't belong in pre commit. Pre commit
[00:09:25] Unknown:
the GitHub.
[00:09:28] Unknown:
And given that some of the most useful state steps for that pre commit stage are the fast running ones. And you mentioned some of the linters or code formatters, but I'm curious what other types of operations you've seen people put into that stage of the life cycle to help improve the cleanliness and health of the code base or the productivity of the developer? Yeah. So I think, like you said, the most important ones are
[00:09:58] Unknown:
your basic code lenders that do static analysis and type checking, those sorts of things, as well as formatters, which will make your code adhere to a specific style. Beyond that, I haven't seen too many other strange uses. Probably the most interesting use I've seen in a pre commit hook is to check for security credentials or secrets that you don't really wanna check into a repository or making sure that you're not accidentally adding really large files, encouraging to use tools like GitLFS. Probably the strangest 1 I've seen, though, is a hook that tried to deduplicate files in a repository by replacing ones with symlinks.
[00:10:39] Unknown:
That was probably the weirdest 1 that I saw. And given that these hooks are executing across your code base, there's the potential that they can rewrite files or modify files. And, you know, with the case of Winters or code formatters, that's beneficial, but there's also the possibility that you could end up putting some code in there that will actually, have detrimental effects on your code base. So I'm curious what are some ways that you can guard against that or at least, verify that the hooks that you're using, particularly since pre commit the since the pre commit framework is essentially a package manager of sorts
[00:11:20] Unknown:
that you're not pulling in code that you don't want to be executed against your environment? Yeah. Unfortunately, I don't have a great answer for this 1. 1 thing that I can talk about here is that Pre commit tries to aim for repeatability wherever it's possible. So you specify a specific version that gets pinned in your precommit configuration, and it relies on get to ensure some amount of, I guess, integrity there. I actually just remembered 1 thing from earlier about features that set you like, you reminded me a little bit of a feature that sets Pre commit apart from other frameworks, and that's that Pre commit tries to smooth out a lot of the sharp edges of git commits. So 1 1 case in particular that a lot of time was spent on
[00:12:03] Unknown:
was when you commit a partially staged review, it will make sure that you're only checking the code that you're actually checking in and not the unstaged parts. Yeah. And that's something that I found very useful in my own uses of the pre commit framework is, for instance, if you're doing something like a 2 to 3 conversion for Python or running something a little bit more aggressive like the black formatter against your code, it's not going to run across your entire code base by accident. It's only going to affect the files that you've actually already edited and that you're already going to be verifying so you don't inadvertently
[00:12:40] Unknown:
balloon the size of your pull request that you're trying to submit. Mhmm. Yeah. And when you're committing a merge conflict, it's only gonna check the conflicting files and not every single file that everyone else touched upstream. And Oh, I guess 1 thing I can talk about from 2 questions ago is Prepayment will never automatically
[00:12:58] Unknown:
stage files that it makes changes to, and so this gives the user a chance to double check the code that gets changed as part of the hook. Yeah. That's definitely a good way to verify that you're making the changes that you wanted to make. Yeah. Because linters often have bugs, so it's a good chance to make sure that they're good. Absolutely. And 1 of the other things that we've glossed over so far is the multi language aspect of pre commit. So I'm wondering what has been some of the most challenging or difficult aspects of enabling
[00:13:33] Unknown:
the support for running checkers and linters for multiple languages and that are implemented in multiple languages. Oh, for sure. Yeah. This was actually 1 of the design decisions from the beginning of the framework is that it would segment different programming languages into different language implementations. Honestly, the hardest part of working with a lot of different programming languages is you need to learn learn all the idiosyncrasies of their package managers, their different installation structures, how to make them install in user space, and probably the worst part is how to make them work on Windows. Yeah. That's always the sort of,
[00:14:09] Unknown:
forgotten stepchild of developers is, oh, it it also runs on Windows, kind of.
[00:14:16] Unknown:
Yeah. Yeah. For pre commit, probably the hardest language support has been with Ruby. It still doesn't work on Windows because I haven't figured it out yet. Part of that's probably because I don't have a lot of experience with Ruby, but another part of that is all of the Ruby management tools are written in bash, and so they're not super portable. Or they rely on things like overriding CD, which I'm not comfortable with. Yeah. Yeah.
[00:14:43] Unknown:
The Ruby community has a inherent love of monkey patching and magic that just happens because of reasons. Yeah. That said, it mostly works on Linux and mostly works on macOS. So that's Yeah. That's our primary use case. And I'm wondering if you can discuss too a bit further of how the pre commit framework itself is actually architected and implemented
[00:15:09] Unknown:
and some of the ways that the design has evolved from when you first started working on it. In a lot of ways, the design of the framework itself hasn't changed much since the beginning. It's always been kind of a 2 part system. 1 part being remote repositories that you can add or remove in a configuration file, and the other part being, the other part being overriding the configuration in your local file. Beyond that, the place where Prekma has grown the most is by adding other hooks in the Git life cycle as well as adding other languages that Precommit the framework supports directly.
[00:15:48] Unknown:
And given the fact that by design, Git won't automatically install the hook into your locally checked out repository and it requires a manual operation. How do you ensure that all of the developers on a team are actually using the hooks that are configured for the repository and help to reinforce the behavior of running these checks locally before committing, and then just relying on code reviewers to take care of what the pre commit hooks are intended to do automatically. Yeah. So there is actually a new feature in
[00:16:25] Unknown:
brand new versions of Git that allow you to set a template for your Git repository. I actually haven't looked into this too much because that's not the strategy I use to get pre commit set up. The most common strategy that I use is to add pre commit install to common, operations inside the repository. Things like common make targets or talks targets or other things that you would commonly do. I also have a sneaky shell alias that does it for me when I CD into a repository. So I I have some terrible magic for myself. And, maybe if I find it to be not so terrible, I'll document it and share it with others. And then before we go into the conversation a bit further of just general sort of best practice and hygiene for developers,
[00:17:10] Unknown:
I'm curious if you were to completely start over and rewrite the pre commit framework today, what are some of the things that you would do differently?
[00:17:20] Unknown:
And do you do you think that you would still implement it in Python? Yeah. So, actually, the first thing that I would probably do is pick a better name. When pre commit was originally implemented, it really only targeted the pre commit git hook, and so it was an appropriate name. But now that pre commit also supports push, commit message, and probably other hooks in the future, the name makes less and less sense. Unfortunately, I have SEO, so it's too hard to go back from there. Another thing that I would probably change if I were to start over is the configuration file language. YAML has YAML is a nice configuration language for human writeable things, but Preakmant does a lot of automatic rewriting of the configuration file, which is incredibly difficult in a language like YAML, especially when there aren't any good round trip rewriters of YAML. As for Python, I think Python is still a good choice for Preakmets. If I were to write it in another language, I would have to consider portability.
And for the platforms that I target, Python is pretty easy to install and generally just works. As you mentioned, the YAML configuration
[00:18:25] Unknown:
is how somebody interfaces with setting up and configuring the different plugins and checkers that you run during the pre commit hook. So I'm curious for somebody who's interested in writing their own plug in that they want to be able to run during those life cycle events. What is involved in getting that set up, and just the overall workflow of writing that plug in and then making it available to be installed by by Precommit and used by other people? Yeah. Sure. So the first step to writing a Precommit plug in is to get yourself a Git repository.
[00:19:00] Unknown:
Each plugin in Precommit is distributed by git, and so that's how you have to get started. There's some directions on the Precommit website that tell you how to set up a repository configuration. This will basically provide some metadata out of the box so that PreCmit knows how to install your hook and run it. And then from there, you make an executable and go. There are some other options if you don't wanna make it distributable. You can write a what's called a repository local hook, which will just run for your specific repository. This is useful if you need 1 off checks or, like, really simple
[00:19:36] Unknown:
repository specific checks. And so the main purpose of using these life cycle hooks is to help enforce code hygiene and developer productivity. And I'm curious what are some of the other methods that you use and that you encourage in, your team and other teams that you've worked on to, make sure that you're producing high quality code and repeatable processes?
[00:20:02] Unknown:
Cool. Yeah. So some of the things that our team uses to encourage high quality code is working with tests, considering a change based on code coverage, of course, using linters like pre commit and flake8 and those sorts of things. For Python specifically, we've been doing a lot recently with type annotations such that we can get gradual typing and catch some bugs before they hit test time or before they hit production. Some other kind of broader organizational tools that we've been using recently are tools that provide refactoring across a lot of different repositories.
This allows us to do organization wide upgrades and ensure that every repository is up to a specific spec. We've actually used this to add linters to a bunch of repositories or upgrade old packages to the newest versions or make sure things that are in need to be changed need to be changed.
[00:20:57] Unknown:
And on the pre commit framework itself, what have you found to be some of the most challenging aspects of building and maintaining the project and the community around it and some of the things that you've learned that you didn't expect to come across when you first started on that path?
[00:21:15] Unknown:
Yeah. So pre commit pre commit itself started as basically taking a list of file names and running an executable with them. The first implementation used bash to do or bash and XRGS to do a lot of this for it, but bash and XArgs aren't super portable, and so they've been replaced by Python implementations of the same functionality. Probably the strangest things that I've encountered as part of Precommit is all of the various edge cases in each of the tools that are tested. PreCmit itself has a very thorough test suite, so it's often it it often happens where PreCmit will find a bug inside Git itself and then cause a patch to happen upstream. I think this happened, like, 2 or 3 times. I've actually been able to contribute myself to Git in fixing some of these.
Probably probably the weirdest thing I've seen while working on pre commit is trying to fix a Unicode bug in Python 2 and hitting a segfault in Python. That was probably the strangest thing. It ended up being a strange edge case in Schlax.
[00:22:18] Unknown:
And what are some of the plans that you have for the future of pre commit, whether in terms of feature additions or bug fixes or, plans to try and encourage new contributors to the project or maybe different directions or a different, sort of implementation of
[00:22:39] Unknown:
the overall approach that you're trying to achieve with pre commit? Yeah. So for the future direction of the project, I think most of what needs to happen is more hooks providing metadata so that they're easy to use out of the box, additional language support to encourage other hooks to be integrated into the into the, community, probably also additional get life cycle hooks and maybe even other version control systems.
[00:23:08] Unknown:
And so for anybody who wants to follow you or get in touch or, keep up to date with the work that you're doing, I'll have you add your preferred contact information to the show notes. And with that, I'll move us into the pics. And this week, I'm going to choose the movie tag because I watched it recently, and it was hilarious. And the fact that it's based on the true story of a group of grown men who have been playing the same game of tag every month of May for 30 years is pretty amazing. So So it's definitely worth the watch. It's a lot of fun. And with that, I'll pass it to you, Anthony. Do you have any picks this week? That sounds great. I'll have to check that out. Yeah. So 1 thing that I've been paying attention to recently is this YouTube channel called Guess Theory. Their idea is to be super positive and to seek discomfort. Their idea is that, like, life's too boring. Get out there and experience it. There are no strangers in the world, just friends that you haven't met yet, and, push positivity wherever possible. Alright. Well, thank you for that, and, thank you for taking the time today to discuss the work you've done with pre commit. I've used it in numerous projects and across a few different jobs, so I appreciate the work you've put into that. Definitely recommend other people check it out and use it for their own projects, And I hope you enjoy the rest of your evening. Thanks very much.
Introduction and Guest Introduction
Understanding Pre-Commit Hooks
Motivation Behind Pre-Commit Framework
Getting Started with Pre-Commit
Pre-Commit vs Pre-Push Hooks
Useful Operations in Pre-Commit Stage
Challenges with Multi-Language Support
Architectural Evolution of Pre-Commit
Future Directions and Improvements
Closing Remarks and Picks