Summary
One of the perennial challenges in software engineering is to reduce the opportunity for bugs to creep into the system. Some of the tools in our arsenal that help in this endeavor include rich type systems, static analysis, writing tests, well defined interfaces, and linting. Phillip Schanely created the CrossHair project in order to add another ally in the fight against broken code. It sits somewhere between type systems, automated test generation, and static analysis. In this episode he explains his motivation for creating it, how he uses it for his own projects, and how to start incorporating it into yours. He also discusses the utility of writing contracts for your functions, and the differences between property based testing and SMT solvers. This is an interesting and informative conversation about some of the more nuanced aspects of how to write well-behaved programs.
Announcements
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- Your host as usual is Tobias Macey and today I’m interviewing Phillip Schanely about CrossHair, an analysis tool for Python that blurs the line between testing and type systems.
Interview
- Introductions
- How did you get introduced to Python?
- Can you start by giving an overview of what the CrossHair project is and how it got started?
- What are some examples of the types of tools that CrossHair might augment or replace? (e.g. Pydantic, Doctest, etc.)
- What are the categories of bugs or problems in your code that CrossHair can help to identify or discover?
- Can you explain the benefits of implementing contracts in your software?
- What are the limitations of contract implementations?
- What are the available interfaces for creating and validating contracts?
- How does the use of contracts in your software influence the overall design of the system?
- How does CrossHair compare to type systems in terms of use cases or capabilities?
- Can you describe how CrossHair is implemented?
- How has the design or goal of CrossHair changed or evolved since you first began working on it?
- What are some of the other projects that you have gained inspiration or ideas from while working on CrossHair? (inside or outside of the Python ecosystem)
- For someone who wants to get started with CrossHair, can you talk through the developer workflow?
- I noticed that you recently added support for validating the functional equivalency of different method implementations. What was the inspiration for that capability?
- What kinds of use cases does that enable?
- How much of CrossHair are you able to dogfood while developing CrossHair?
- What are some of the most interesting, innovative, or unexpected ways that you have seen CrossHair used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on CrossHair?
- When is CrossHair the wrong choice?
- What do you have planned for the future of the project?
Keep In Touch
- pschanely on GitHub
- @pschanely on Twitter
Picks
- Tobias
- Phillip
- Hammock chairs! (affiliate link)
Closing Announcements
- Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
- Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
Links
- CrossHair
- NLTK == Natural Language ToolKit
- ACL2
- Liquid Haskell
- SMT Solver
- Doctest
- Property Based Testing
- Hypothesis
- Halting Problem
- Pydantic
- PEP 316
- icontract
- Eiffel programming language
- Design By Contract
- Metamorphic Testing
- Higher Order Types
- Fuzz Testing
- The Fuzzing Book
- Python Audit Hooks
- GitHub Scientist
- Laboratory Python implementation of GitHub Scientist
- Taint Analysis
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to Podcast Dot in It, the podcast about Python and the people who make it great. When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode. With the launch of their managed Kubernetes platform, it's easy to get started with the next generation of deployment and scaling powered by the battle tested Linode platform, including simple pricing, node balancers, 40 gigabit networking, dedicated CPU and GPU instances, and worldwide data centers.
Go to python podcast.com/linode, that's l I n o d e, today and get a $100 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show. Your host as usual is Tobias Macy. And today, I'm interviewing Philip Shanelly about Crosshair, an analysis tool for Python that blurs the line between testing and type systems. So, Philip, can you start by introducing yourself?
[00:01:08] Unknown:
My name is Philip. I am a just a Python fanboy. The primary dev on Crosshair. For my day job, I work on maps data for Google.
[00:01:18] Unknown:
And do you remember how you first got introduced to Python?
[00:01:20] Unknown:
In the mid 2000, I joined to start doing news aggregation. And so NLTK was a thing back then and just starting off by porting some prototype stuff from Java to Python. And that's that's where I got started.
[00:01:33] Unknown:
And so that led you down the path of using Python fairly regularly, I'm sure. And I'm wondering if you can give a bit of an overview now of what the Crosshair project is and some of the story behind how that got started and your motivation for putting your time and energy into it.
[00:01:48] Unknown:
I've always been interested in kind of software verification and proof assistance from my whole life way back into when a c l 2 was a thing, which I I still love to look at. And more recently, I've been excited about a product called Liquid Haskell, which is a refinement type system that lets you say even more things about your Haskell code that uses a thing called an SMT server, which is this amazing technology that lets you do all sorts of deductions. And I was really curious about trying to apply some of that technology to Python. And so I had a few missteps and things that didn't work out along the way, but the current incarnation of Crosshair actually sort of applying some of that intelligence to Python code.
[00:02:30] Unknown:
Can you give a bit of an example of the types of use cases that it enables or some of the ways that it fits into the overall development flow and maybe some of the other tools in the Python ecosystem that it might either augment or replace?
[00:02:44] Unknown:
In many cases, crosshair can be used alongside existing tools. So 1 of the ways you can use it is to put your conditions and the contracts and the the properties you want to specify inside the docstring. You can do that alongside your dot tests so you can have some examples and then also specify, properties that should always hold about your code. So there's some nice synergy there. There is other work sort of in the space here. So if you're familiar with property testing tools, there's a product called Hypothesis That also is in the space of trying to figure out, like, oh, like, what's true about my code, and how can I verify that it's going to be true for all inputs?
And so this is very much the same sort of thing that Crosshair is trying to do and and 1 of the places where I think we'll see some convergence. Hypothesis is a probabilistic kind of tool where it throws random data at your code and crosshair tries to be a little bit smarter, but actually the probabilistic
[00:03:43] Unknown:
approach is is better in some other cases. I think in the long run, we're gonna see some convergence around that that kind of Yeah. It's an interesting to dig into, and I've actually had the hypothesis developer on the podcast a while ago. And I'm curious sort of what the relative strengths are for contract based development and proof theorems versus the probabilistic approach where it uses the type system to generate randomized data to understand when things break.
[00:04:12] Unknown:
1 thing I'll say is actually I think there's 2 axes here. So you can talk about contracts, and you can talk about property testing, and then you can talk about separately randomized and symbolic approaches. There's sort of actually, all 4 cases are possible. For instance, I I have been talking with Zach Hatfield Dodds, who is 1 of the hypothesis coordinator, and he is interested in using, oh, like, using some of Crosshair to demonstrate hypothesis tests because ultimately, the same engine can be used for both. And conversely, I've been looking at how to get more randomized test inputs into verified cross sell contracts.
And so you can actually imagine all 4 of these scenarios, and they all have different advantages and disadvantages. In some sense, the property testing and contract distinction isn't actually that important. In some sense, the only real difference is the contract is, like, an intrinsic part of the function you're writing, whereas the property test sort of sits outside the definition of that thing. But in some sense, that distinction isn't isn't terribly important. But the way in which you try to verify a property, symbolic or this sort of randomized approach, does significantly change how effective the tool is going to be in terms of trying to figure out what you're doing. So some typical things that are great for randomized approaches are things like where you need to hash a value.
You're doing some complex manipulation of a thing. Another example is, like, oh, I'm gonna take the string of a number and figure out what the third character is, and as you do more of these complicated computations, it gets harder for a symbolic engine to understand, like, oh, okay. The third character of this string is a 3 when the original number looked like these other numbers. So the more of those things you pile on, the harder it gets for the symbolic approach, and sometimes just throwing lots of random data at it is more effective. So I think the the ideal solution will ultimately incorporate aspects of both of these systems. So that's work that's actively going on right now.
[00:06:10] Unknown:
In terms of the categories of problems or ways that the program might violate expectations, What are the types of constraints that you can impose by using crosshair and adding these contracts and being able to discover ways that the contract is not being met by the code that's actually implementing the logic?
[00:06:34] Unknown:
Basically, like, a contract can verify anything you can write in Python code. So 1 advantage over some other systems, they'll try to, like, narrow the language used to describe the properties of your program. Proseccorp doesn't try to do that. It is if you can run some Python code, it will try to find some way to make that Python code return false no matter what that Python code is. So in some sense, that is fairly freeing for the Python developer end in terms of what you can express. It doesn't mean that everything you can express across there is going to find for you. So there's plenty of things you can write where that do have counterexamples that we simply won't be able to find. It is a bug finding tool. It is not a real software verification tool. The other sorts of things that it it can't handle are things you might expect or might not, but things properties that you might wanna test might include things like run time or another common thing people talk about is termination. Crosshair does not do a a termination verification or anything like that right now. So you haven't solved the halting problem?
No. And so the only way we get around that is by saying, like, yeah, sometimes we can't do it, you know.
[00:07:37] Unknown:
And another category of tools that I was thinking about in terms of what you might use in the same breath as crosshair is something like pedantic where you have this validation that the inputs to a particular object are going to match a certain set of types and that there are these validators that you can execute to ensure that the values are within a certain range or meet a certain criteria. And I'm curious how you see things like Pydantic and Crosshair either coexisting or obviating the need for each other.
[00:08:10] Unknown:
I don't think they obviate the need for each other. I actually did try with Crosshair recently, and it doesn't work for reasons that I will probably maybe get into later. But 1 thing you might do is use them together. So 1 thing Crosshair is not really doing is anything about the type systems that are built into Python already. So you can use a type checker. You can use PyDantic to get data into your system. In terms of checks, you are bleeding a little bit into Prosper cares about, and contracts have this thing called invariants, which are related here, which are just things for a particular type of data we expect these conditions to always be true. So types get you part of the way there where you're like, oh, this always has to be an integer, but you can also say, like, this optional field should only be present when this enum is set to this other value. And so when when you get to that kind of level of constraints, something like, an invariant really helps you. The other thing I'll say is, like, Crosshair supports multiple contract systems right now, and there's not really any reason that it can't support sort of arbitrary constraints that are specified in other ways. And so if you had a pandantic constraint that you custom coded, like, it would not be hard to adapt crosshair to understand that as an invariant of things of that type because we already do that. You can do that with at least 2 out of the 3 contracts syntaxes right now.
[00:09:34] Unknown:
Digging a bit more into that, when I was looking at the documentation, I saw that there are these different levels or ways that you can specify the contract going from simple assert statements. So it just feels very natural and, you know, runs the risk of possibly being optimized out if you use the dash o flag when you're compiling your Python bytecode and then through to the 316 format where it's embedded in the docstring or using a decorator approach where you say, this is the contract being applied to this function. And I'm wondering if maybe you can talk through some of the trade offs of those different approaches and the relative strength that they have? Historically, contract implementations have been a runtime thing. So the system that we integrate, the decorator based system, for instance, called Icontract,
[00:10:23] Unknown:
is a system that normally is intended to just run at run time, and it does also have that risk of being optimized out with red flags. Crosshair is doing a pretty different thing. It it really is not intending to do anything at the time your code is actually running in production. It's meant to just be run as a developer, almost as a static analysis like tool. So Crosshair is really meant to be used almost as a static analysis tool. Some sense we a lot of those if you are using crosshair exclusively, you sort of don't want your stuff to tend to make it to production. Search statements are a really nice way to get started because everyone knows the search statements, and there's no one's gonna look at you funny for having a search statement in your code. It does make some assumptions about how they work in that, like, your function is expected to lead with assert statements, and those are intended to be preconditions. And if you had some leading asserts, then you're sort of implying to crosshair that, you know, this function isn't gonna throw an exception or hit a later assert that turns false. But basically, as long as you get past the leading asserts, we sort of, like, implying to crosshair that this is a safe function to run for all inputs because you've already done all the checking you expect to do. The other contract implementations, docstring and decorator based ones, are a little bit more featureful. So the asserts ones, we think of as sort of a gateway drug to contracts.
And at some point, you may get to a place where you're like, oh, I had this class, and and I need invariants, and I wanna make sure every method of this class maintains the invariant. That's sort of like a little bit annoying to do with the asserts. As you find yourself needing a little bit more power, you might wanna sort of upgrade to a more featureful contract implementation.
[00:11:58] Unknown:
Yeah. And your point about using it as a static analysis tool more than a runtime validator was something else I wanted to understand is is this something that you expect to validate the inputs to these functions at execution time, or is it more just a development tool that hooks in with your other linters and your test executions to make sure that the code is ready for production? And then once it goes into production, Crosshair is kind of hands off.
[00:12:23] Unknown:
Yes. That's absolutely the case. You very much do not want Crosshair to be running in production. We do lots of weird things to the CPython interpreter to make all of the symbolic analysis work. We actually monkey patch large swaths of the standard library and do all sorts of things that are CPython specific and not something you would want to to run-in production on. So when you run crosshair check on your code, that is its own process and should stay its own process. And hopefully, tell you useful things about your code. But that is definitely the intended use case for crosshair.
[00:12:58] Unknown:
Yeah. And when I was going through looking at the documentation, it's always fun when you see a warning that says, be very careful not using shutil. Rmtree because you will destroy your file system.
[00:13:08] Unknown:
Yeah. So 1 of the things about the symbolic execution, it's actually pretty good at getting coverage over your code, and so it will be more sneaky than you might imagine in terms of finding interesting inputs to functions, and it doesn't know which interesting inputs are bad and which ones are good. Actually, more recently, I have been working on an integration with Python 38 has some auditing capabilities, which basically allow you to block a lot of these side effects. You can't do it in a way that's like perfect sandboxing, but I expect it to cover a lot of the common cases and will hopefully make it a little bit more safe.
[00:13:46] Unknown:
In terms of using these contracts as part of your development process, I'm curious how you've seen that influence the overall design approach of the software system that you're building and some of the ways that it influences the way that you think about a particular problem space?
[00:14:03] Unknown:
There are a lot of opinionated people about this. The original contract idea, I believe, came out of the program language called Eiffel. And in the text, it's often referred to as design by contract, like it comes with a design philosophy. There's no reason that you need to have a design philosophy around it. So, you know, if you just wanna verify some assertive statements, just, you know, just do that. But you can adopt some design philosophy around it. In the original incarnations, the idea was, like, you actually write your contracts before you write your function. In some sense, this is not different than, like, a test driven design where you're like, oh, write your test first and then implement your function. The difference is, of course, that that contract sort of, like, specify almost infinitely many test cases.
That's the thing. I will also say, though, that, like, 1 of the things that I personally find very different is I I think about interfaces a lot more. So 1 of the interesting things about interfaces is that, you know, like, typically we think of them as just I take things of types and return other things of types, but when you think about them with a contract and then think about applying that contract across all of the implementations of that interface, you get this really powerful kind of, like, multiplication of the value of your contract because, basically, like, once you specify the contract on the interface, every class that implements that interface needs to also have these kind of behavioral properties that you specified, and and you've only had specified once, and it's sort of magical. For me personally, it does encourage me to think a lot more about, like, okay, what are the core interfaces of my system, and and what are the properties that go beyond just what are the types that the methods expect?
[00:15:43] Unknown:
Interface driven design is something that is an interesting thing in the Python space because 1 side, it's, you know, very dynamic and flexible, and you can do all kinds of duct typing. And then on the other side, you have a lot of the sort of scalability issues that come with that. And there are libraries such as zip dot interface to be able to bring in some of the same ideas as Java interfaces to say that this class and these functions have these interface types, and then you can create a concrete implementation of it. And so it's a useful design tool, but it's also something that's somewhat fraught within the Python community because of the tendency to wanna be very dynamic.
[00:16:21] Unknown:
Yes. Absolutely. I think the right way to think about a crosshair contract interface is that you're just applying tests to your class. It's a it's a it's an easy way to apply test to your class. Think about that. Absolutely. And, that also leads me to wonder
[00:16:37] Unknown:
how you see the trade offs and sort of the level of effort that Crosshair saves you in writing the tests that you're creating for your software and sort of some of the types of tests that you might not bother writing because they're covered by crosshair or, you know, some of the types of tests that you will focus on writing because crosshair doesn't cover it or because it frees you from writing some of these more baseline tests?
[00:17:04] Unknown:
You know, this is where there's a lot of great literature on property based testing. They have a lot of similar challenges and advantages. So sometimes there aren't good universal properties that you can write, like or that seem obvious. And so having a couple of unit tests often really helps where you're just saying, like, oh, here's some complicated computation between 2 integers. The only way to describe the behavior of the thing is actually to just implement the function. And so there may not be an obvious thing to write there, and so that's where unit test is perfectly reasonable. But there are some tactics for writing contracts and property based tests that don't come to developers naturally. You you have to develop a little bit of a skill for it. An example here is, you know, you're always looking for, can I compare the this behavior of the behavior of this function to the behavior of some other function that I know is correct? So if you ever, like, reimplementing something, copy based and contract tests are exactly what you want. It's so perfect because you have an oracle. You have this thing. You know you have to replicate that behavior.
There are other strategies to think about. 1 is called metamorphic testing where you basically say, if my input changes in this way, I expect my output to change sort of directionally and in another way. An example of this is, like, oh, if I have a function that averages a bunch of numbers and I add a number that's bigger than the result, I expect the new result to be bigger than it was previously. That's another way to think about it. Like, oh, what kinds of property tests could I be writing? A good example of this is just like I need to write a function that takes a great test score and produces a great letter. So it gives a test, and if it's a 95 and it produces an a, you know, like, writing the unit test for that is sort of annoying. It's like, okay. Well, so if it's, you know, below 60, it's an f, and then maybe you'll do 1 check for every letter. And if you're really diligent, you'll check some boundary cases. Maybe you'll write some tests for whether it's out of the range of valid values at all. In that contract mode, you can really get away with very little. You can say, like, okay. If it's below 60, then it's an f. And you've instantly covered all of those cases. Right? You don't have to worry about whether you didn't get the bad number that didn't give you the f. And then you can write a test that's like, oh, actually, like, in this metamorphic style, like, if I add 10 points to any letter, I get the next letter.
And actually, those 2 contracts are enough to, like, fully specify the function and assuming a precondition that kind of, like, says, oh, the input number is in this ring. So you're writing fewer tests and getting basically, like, this kind of perfect coverage of the input space, and then that's really the power of of property based testing and and contracts.
[00:19:41] Unknown:
And that plays into the idea that you should make invalid states unrepresentable within the program. And I know that some people will talk about using things like higher order types for being able to handle some of those types of situations. And this is where crosshair, as you admit in the tagline, starts to bleed into the idea of type systems. And I'm wondering if you can just talk through some of the capabilities that crosshair adopts from those types of approaches or from the sort of ideas of higher order types?
[00:20:13] Unknown:
Yeah. Absolutely. So it's really interesting to me how much of computer science, the field is sort of obsessed with types. I think it's important to remember that types are really just another way to help us make sure our code does what we expect it to. So in some sense, tests aren't really different. They all have the same objective. 1 thing that historically academia has built on these assumptions that things should be decidable. Your type checker should be able to tell you yes or no. That assumption kind of has led us into this world of, like, okay. We can make our type system more expressive, but we also have to put these boundaries around it. And these boundaries often mean that, like, the type systems themselves are actually sort of hard to understand.
So we have this thing of, like, oh, we can get more expressivity, but we actually make the type system fairly complicated and not very user friendly. And instead of having that dichotomy, you can ditch the, like, oh, decidability part. Like, okay. Maybe sometimes it can't tell you. And then you get something like crosshair, which is like, oh, yeah. It's easy to understand. It's very expressive, but sometimes they can't tell you the answer. So this is a big part of why I'm excited about crosshair because, you know, a lot of these dependent types still aren't making their way into to mainstream programming. It's kind of sad because they're very expressive.
And so I'm very interested in ways that we can get that expressivity but retain that kind of, like, ease of use.
[00:21:34] Unknown:
Digging a bit more into Crosshair now, can you talk through how it's implemented and some of the ways that the overall goals or design of the project has evolved since you first began working on it?
[00:21:44] Unknown:
Initially, I was sort of interested in fuzz testing. And so I was trying to figure out ways to, like, do these contracts or or property based tests based on some of the modern fuzz testers, which are a whole another talk, if you wanna get into those. And then I sort of saw some of the liquid Haskell stuff and the symbolic stuff, and so I was like, okay. Maybe maybe there's some combination of these things that would be effective. So I'll sort of save you a lot of the missteps, but to talk a little bit about where we are right now. Basically, what crosshair does is when your function takes an integer, it gives you not quite an integer. It gives you a Python object that implements all of the integer like behaviors, but inside of it, it doesn't actually have a value yet.
Instead, what it does is it records what sorts of things are done to it and what things are asked of it and records those in a way that we can feed into the SMT solver and it sort of remember those things. And then when we get to somewhere where we need to make a decision, we need to take out we have an if statement. Basically, we just decide, and we sort of decide randomly what way to go. But we recorded in the solver that we took that bread because now that we've made this decision, we will never make another decision that contradicts it. So we sort of remain consistent with ourselves even though the we haven't actually picked values for anything yet. And so the objective here is just to explore different execution paths through your code, and really we're just looking for some goals. And so for a contract implementation, what this means is we're trying to get past the preconditions, and then we wanna try to get to a place where the postcondition returns false.
If we can get there, then we can ask the solver for concrete values and give me some actual inputs that that correspond to this execution path, and then we can report a counterexample to the user. There are a lot of details on how you make that work, but that's basically it. And I sometimes refer to these values as they're they're sort of like quantum Python values. They're values that don't know what value they hold yet until you observe properties about them.
[00:23:41] Unknown:
Yeah. It's definitely an interesting way to look at it. And the sort of symbolic approach is interesting and, you know, analogous to you know, of just having some unknown value that you work through to figure out at the end what the concrete implementation is. I think that's interesting how you're able to wrap these base type instances with the behavior of those types without actually assigning them a specific value until you absolutely have to. Yeah. Absolutely. In terms of the other projects that you've looked at for inspiration or guidance. I'm curious what you have been able to pull from them and some of the examples that you've used either inside or outside of the Python ecosystem. I know you mentioned Liquid Haskell already, but I'm wondering if there's anything else.
[00:24:32] Unknown:
I sort of also sort of hinted at a lot of the developments with fuzz testing lately. Fuzz testing has gotten really interesting because it's become more intelligent than just sort of throwing random data at things. There's a lot of work into guided fuzz testing, which is sort of understanding what execution path your code is taking as you give it in example inputs. And then you sort of, like, say, like, oh, if I can get to this new space, then I'll I'll sort of work on different inputs that sort of look like that input and see if I can get to some new place that I haven't been before. And this is surprisingly effective and also was, I think, pretty influential for me. The other thing that really influenced the current version of this is there's a book called The Fuzzing Book, which goes through a lot of details around some of the fuzzing things I've talked about, but also fuzzing with symbolic components to it.
There's a term that is very similar to what Crosshair does called concolic execution, where you are running it's sort of a combination of concrete and symbolic, where you you're running some aspects of your program in concrete means and some in in symbolic. So this idea of systems that are sort of hybrid and especially systems that can scale between the 2 depending on the problem and what you're trying to do, I think is really compelling. And in some ways, it's the future of how computers help developers.
[00:25:54] Unknown:
On the point of contracts, I know that there are some other libraries in the Python ecosystem that provide a way to be able to design and enforce these contracts. And I'm wondering if you can give a bit of a comparison to the approach that Crosshair has taken and some of the trade offs that are present in the choice of the different libraries and what makes Crosshair stand out from some of the other options in the ecosystem?
[00:26:19] Unknown:
The biggest difference is that almost all the other contract libraries you find out there will be intended for enforcement at runtime. So Crosshair is very much closely not in that camp. To some extent, I don't see Crosshair as an alternative to a contract. You know, it it supports a couple of syntaxes, and it's actually not too hard to write it for more. So I know there are a few other contract libraries in Python, and I'm hoping that, you know, like, maybe someday, Crosshair will be able to statically check all of them. Right? But it is just another tool to help you sort of verify properties of your programs. And 1 way to talk about a property of program is a contract. So that's sort of the way I see it.
[00:27:00] Unknown:
Yeah. It's definitely always good when you can sit sort of above a class of tool and integrate across all of them rather than having to push somebody into a choice between 1 or the other. Yes. Completely. And so for somebody who wants to get started with using Crosshair and implementing contracts in their code and being able to validate statically ahead of deploy time that their code is validating the expectations and the invariance that they have defined. What is the process of getting started with Crosshair, and how does it integrate into the overall workflow of writing a program?
[00:27:35] Unknown:
So I definitely recommend if you're starting out to start out in the assert based mode. We talked about a few different modes. And so the way you do this is you just set up some assert statements at the top of your functions, and these sort of represent the preconditions of what you expect to be true. And if you do this, cross sell will sort of assume that the rest of your function should execute correctly. It should not raise exceptions, and it should not future assert statements shouldn't be able to return false. And so this is a fairly lightweight way to do things because you can just install Crosshair. You don't need to import anything into your code. You just add some assert statements. The way you actually run Crosshair, there's actually a a couple of different ways.
There is an ID integration for Emacs, and we're working on 1 for Versus code, and I hope to obviously add more. And this will do a very lightweight checking of your assert statements. There is another mode of running it where you run across our watch, and you can give it, you know, your whole directory tree, and it will look for things that match the pattern of things it's looking for. And it will sit there and chew on it sort of indefinitely. And so this mode is much more like a fuzzer. It will be more effective, but it will sit there and consume a lot of resources and think about your stuff for sort of an arbitrarily long amount of time if you would like it to.
[00:28:52] Unknown:
In terms of some of the other capabilities, I noticed that in 1 of your recent releases, you added a diff behavior capability for being able to validate the functional equivalency of different implementations of a given method. And I'm wondering if you can just talk through what was your motivation for adding that capability and some of the additional use cases that that enables beyond just the creation and validation of these contracts?
[00:29:17] Unknown:
1 of the weird things about Crosshair is, like, it isn't necessarily specific to contracts. And at some level, the engine is just a thing that is like, what inputs can I provide to this Python code and get to some goal state? There's, like, a surprising number of problems that you can, like, compact into that description. And so I was sort of thinking about, like, okay. In my day to day job, like, what sorts of things do I need help with? So some of these things are just very simplistic things like, okay. I did some refactoring. How do I know that I didn't accidentally change the behavior of the program?
The diff behavior command is basically just another version of this where it's like, okay. Run the old version and run the new version and assert that they are the same because you provided the same symbolic inputs, and then you can you can maybe find a counterexample. And if you can, then you can say, like, oh, here. I gave you these numbers to this function, and I got a different result. This is pretty handy, and actually, there's a git work treat command that allows you to sort of materialize an old version of your code. And so you can do some tricks and get this dip behavior thing working on your own changes, which I find to be the most compelling way to use this thing. Also use it if you're making a small change to 1 piece of the program, but you wanna make sure the other main flow of your program is working the same. And so you can run this behavior on that even though you've made this other change that you hope to be unrelated. And then you can even use it when you want the behavior to be different, but you need help finding test cases that you should be adding. So because this thing can tell you, oh, here are the inputs that give different behavior. Those inputs can be useful to help you just add unit tests.
[00:30:53] Unknown:
There have been a few other tools that I've seen along the lines of, like, the GitHub scientist project, and there's a Python equivalent whose name is escaping me for being able to say, I'm changing the implementation of this code, but the actual logic and behavior should remain the same in being able to run them side by side to make sure that they have the same outputs or that you're comfortable with switching to the new version
[00:31:17] Unknown:
before you delete all of the old code. And so this is an interesting alternative way to go about it. So I was intrigued when I saw that in the release notes. You know, in some sense, there's an analog here between the scientist is, I believe, doing the sort of, like, runtime you you're it's meant to do, like, a launch and then you can run them both and type in production. So you think it's very similar to how contracts sort of have historically worked. And and Crosshair is just, you know, like, a lot of that same capability just trying to bring it closer to the developer and and more like a static analysis tool. In terms of the actual development of Crosshair, how much do you dogfood it on itself as you're writing the code that enables you to validate these invariants or validate the functional equivalency of functions? I always had a sort of an idea in my head that I we would use crosshair to sort of verify itself.
In the beginning, I was frightened to do this because when you have a bug, it's hard to know whether a bug is in the original system or the system that's checking itself. But but more recently, we actually have started to add some tests where we are we are just using crosshair to verify parts of itself. So I mentioned that we, like, monkey patch large swaths of the standard library. We do actually run crosshair on some of those patches to ensure that they have the same behavior as the native c Python versions.
[00:32:36] Unknown:
And in terms of the usage of Crosshair, what are some of the most interesting or innovative or unexpected ways that you've seen it used or that you've used it yourself?
[00:32:45] Unknown:
1 example that I had not expected that I heard someone talk about is is using this for, I I guess, what they call taint analysis where you want to understand whether some input to, a piece of logic isn't actually read by that logic. The strategy here is you say, like, oh, if I provide a symbolic input for the value that I don't want read and a different symbolic value on the second iteration, it doesn't matter that those things are different. The end result should be the same. And if you can come up with a way to kind of make those 2 inputs differ and and get a different result, then you know the logic did something to it, which I thought was a really fascinating idea.
[00:33:24] Unknown:
In your experience of building the Crosshair library and evolving it and bringing in new capabilities, what are some of the most interesting or unexpected or challenging lessons that you've learned in the process?
[00:33:37] Unknown:
Using SMT solvers, which is the theorem prover that underlies processor, is pretty challenging. Basically, the language that these tools use called SMTLib is very expressive. You can say lots of things, but it doesn't mean that the solver will do reasonable things with them. And there's very little intuition, like, humanistic intuition about what are easy and what are hard problems, and that does not map well to what the solver actually treats as easy and hard problems. So actually, like, learning how to use SMG solvers effectively in practice has been a large aspect of the learnings free in this project. Some examples of this are, like, avoid quantifiers in your logical expressions.
Other things are, like, when you bridge different theories, like, if you want to get from integers to floating points and back, that's, like, incredibly hard for an SMT software to do. So you often need, like, some other tactics when you're you're doing that sort of thing. I guess a completely unrelated kind of learning part of this project for me has been around just community development. 1 thing I tell people now a lot is, like, don't be afraid to email people. There are people out there who sort of care about the thing you care about and care about things that are related. And I think a lot of us are we live in this social media world where, like, oh, like, we have to care about, like, lights and, like, things on, like, big big broadcast to lots and lots of people. So, like, I've actually found it, like, be really more valuable to, like, just try to develop 1 on 1 connections and find those places where people are really passionate about the thing they're passionate about. And I found so much traction from just sending an out of the blue email to someone. It's just really interesting.
[00:35:21] Unknown:
Yeah. I can definitely relate to that both from running the podcast and being able to email people to say, hey. This project that you're working on is really cool. Let's chat about it. But also as an outgrowth of that, but also just in personal life of, I've got this problem, and I see that somebody has, you know, either a similar problem or somebody has built a library that I'm trying to use and understand more thoroughly. Sometimes it's easier rather than trying to break through the noise in their community Slack channel to send a well thought out email to say, this is the problem. These are the specifics of it. Here is what I'm trying to do, and just give them an easy way to provide an answer without forcing them to do a lot of upfront work. So the same kinds of behaviors that go into filing a well crafted bug report, you know, being able to say in an email, you know, either I'm having this problem with the project or
[00:36:14] Unknown:
I have this idea of how I might use it, and here are some of the other things in the tool chain that I'm trying to integrate it with. What are your thoughts on the viability of that? Yes. Yes. Completely. You definitely wanna make it easy for the person. And my experience has been if you are making it easy for the person you're trying to talk to, that people have been so helpful and willing to chat. It's sort of amazing.
[00:36:35] Unknown:
Absolutely. Particularly people who work in open sources, they're doing it because they enjoy it and because they want to help people. And so giving them an opportunity to help you and making it easy for them to do so, there's a high probability that they will actually follow through on it. Yes. Completely. And so for people who are considering using crosshair and they want to be able to provide these invariants to their programs and try to identify potential problems in their logic. What are the cases where Crosshair is the wrong choice?
[00:37:05] Unknown:
I will sort of mention that crosshair is is an experimental tool. It definitely has bugs. You you may find bugs. Let me get through some of the basic constraints, though, first. Python 3 7 and up, CPython only. We actually hook into some internal stuff there. Another thing we mentioned is it is not meant for code with side effects. So you do not want to write contracts around code that will write to the disk or make network calls or things like that. Once you get past kind of those physical, like, very base level restrictions, there will be cases where it will perform better or worse.
Typically, you will find that crosshair is best suited for checking smallish hunks of code. It can check larger ones, and you can run it for a longer period of time. Basically, you will start to lose effectiveness as you scale up. 1 way we get around this is that if you build contracts for at the base level and at all levels of the thing, all levels of the call stack, Crosshair can actually make use of some of those subcontracts. So if you build up enough contracts, you can actually overcome some of those challenges. Regardless, as you scale up, you may find that a probabilistic approach is a better 1, and features like that are coming. Like I mentioned, I'm working on collaborating with some of the hypothesis people. And in the ideal world, this could all be transparent in terms of, like, whether you're getting probabilistic testing or symbolic or a combination of the 2, which is a very exciting avenue for future work. But today, when you run process, you're mostly gonna get the aspects of the symbolic 1. Things that symbolic are executing are really good at is getting past things that are tricky for randomized approaches to get through. So, like, if you need to check that the response that the person typed in in their input field was yes, the string yes. It's actually very hard to randomly generate the string yes, but crosshair will get through that kind of condition very quickly. So it is a little bit hard to get an intuition for what approaches will be really good and which ones aren't.
So if you do want to adopt crosshair, you know, like, treat it as you're also contributing to the community in terms of, like, coming up with test cases, and that may work and may maybe won't work. But I will be incredibly grateful for you for trying it out.
[00:39:20] Unknown:
And as you continue to build Crosshair and use it in your own work, what are some of the additional features or capabilities or improvements you have planned for the future of the project that we haven't already discussed?
[00:39:32] Unknown:
There's almost an infinite amount of work in terms of handling more cases symbolically. So, basically, a lot of crosshair is about kind of, like, making sure that we can keep things symbolic as long as possible because as soon as you get concrete, then it gets hard to get past those conditions. So for instance, there's a number of functions that you can call on strings right now which could be implemented symbolically in the solver but aren't. I think a lot of the work is just in continuing to implement more and more of those basically c functions in the solver. So, basically, you know, there is an arc around just sort of improving the effectiveness of the system in terms of how long we can keep value symbolic.
The other arc, I guess, I did mention previously was just doing more scaling between concrete and symbolic execution, And definitely, IDE integrations, we want this thing to be easy for people to use, and so we want to meet the developer where they are. I have to tell you, using crosshair on emacs, you do get to this point where it is almost like you are pair programming with someone. Right? So oh, what does this function do? I say, like, oh, it does this. I type in my contract, and I work on it for a little while. And then the idea is, oh, well, what about this case? Did you think about this? I'm like, oh, no. I didn't. That's the future that I would like. I just believe in this stuff a lot. Yeah. I did wanna say thank you for having Emax be the first implementation that you have support for because as a fellow Emax user, I appreciate that. Very good. Yes. Yeah. Yeah.
[00:41:04] Unknown:
Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And with that, I'll move us into the pics. And this week, I'm going to choose a movie that I watched over the weekend with the family called the war with grandpa, and it's just a hilarious movie. Great fun for the whole family. Just watch it and have a good laugh. I'll just leave it at that. And so with that, I'll pass it to you, Philip. Do you have any picks this week? So my pick is hammock chairs.
[00:41:31] Unknown:
I recently put in a hammock chair into our apartment. You only need to attach it to 1 point in the ceiling if you have a good rafter. And I'll tell you just, like, everything I do in the hammock chair is, like, 20% better. You can have a drink. You can read a book. You can work on some code. Everything in the hammock chair is 20% better. It's That's a very specific quantity.
[00:41:52] Unknown:
Alright. Well, thank you very much for taking the time today to join me and share the work that you've been doing on Crosshair. It's a very interesting tool and 1 that I look forward to experimenting with on my own code. So I appreciate all the time and energy you put into that, and I hope you have a good rest of your day. Thank you so much. This was a lot of fun. Thank you for listening. Don't forget to check out our other show, the Data Engineering Podcast at data engineering podcast.com for the latest on modern data management. And visit the site of pythonpodcast.com to subscribe to the show, sign up for the mailing list, and read the show notes.
And if you've learned something or tried out a project from the show, then tell us about it. Email host at podcast in it.com with your story. To help other people find the show, please leave a review on Itunes and tell your friends and coworkers.
Introduction to the Episode
Guest Introduction: Philip Shanelly
Overview of Crosshair
Use Cases and Integration with Other Tools
Contract-Based Development vs. Probabilistic Testing
Constraints and Limitations of Crosshair
Comparison with Pydantic and Type Systems
Different Approaches to Specifying Contracts
Static Analysis vs. Runtime Validation
Influence on Software Design and Interfaces
Property-Based Testing and Writing Effective Contracts
Crosshair and Type Systems
Implementation and Evolution of Crosshair
Inspiration from Other Projects
Comparison with Other Contract Libraries
Getting Started with Crosshair
Diff Behavior Capability
Using Crosshair for Self-Verification
Interesting Use Cases
Challenges and Lessons Learned
When Not to Use Crosshair
Future Plans for Crosshair
Contact Information and Picks