Summary
Static typing versus dynamic typing is one of the oldest debates in software development. In recent years a number of dynamic languages have worked toward a middle ground by adding support for type hints. Python’s type annotations have given rise to an ecosystem of tools that use that type information to validate the correctness of programs and help identify potential bugs. At Instagram they created the Pyre project with a focus on speed to allow for scaling to huge Python projects. In this episode Shannon Zhu discusses how it is implemented, how to use it in your development process, and how it compares to other type checkers in the Python ecosystem.
Announcements
- Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- Your host as usual is Tobias Macey and today I’m interviewing Shannon Zhu about Pyre, a type checker for Python 3 built from the ground up to support gradual typing and deliver responsive incremental checks
Interview
- Introductions
- How did you get introduced to Python?
- Can you describe what Pyre is and the story behind it?
- There have been a number of tools created to support various aspects of typing for Python. How would you describe the various goals that they support and how Pyre fits in that ecosystem?
- What are the core goals and notable features of Pyre?
- Can you describe how Pyre is implemented?
- How have the design and goals of the project changed/evolved since you started working on it?
- What are the different ways that Pyre is used in the development workflow for a team or individual?
- What are some of the challenges/roadblocks that people run into when adopting type definitions in their Python projects?
- How has the evolution of type annotations and overall support for them affected your work on Pyre?
- As someone who is working closely with type systems, what are the strongest aspects of Python’s implementation and opportunities for improvement?
- What are the most interesting, innovative, or unexpected ways that you have seen Pyre used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on Pyre?
- When is Pyre the wrong choice?
- What do you have planned for the future of Pyre?
Keep In Touch
- shannonzhu on GitHub
Picks
- Tobias
- Lord Of The Rings: The Rings of Power on Amazon Video
- Shannon
- King’s Dilemma board game
Closing Announcements
- Thank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. The Machine Learning Podcast helps you go from idea to production with machine learning.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
Links
- PYre
- MyPy
- PyRight
- PyType
- MonkeyType
- Java
- C
- PEP 484
- Flow
- Hack
- Continuous Integration
- OCaml
- PEP 675 – Arbitrary literal strings
- Gradual Typing
- AST == Abstract Syntax Tree
- Language Server Protocol
- Tensor
- Type Arithmetic
- PyCon: Securing Code With The Python Type System
- PyCon: Type Checked Python In The Real World
- PyCon: Łukasz Lange 2022 Keynote
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to Podcast Thought in It, the podcast about Python and the people who make it great. When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So check out our friends over at Linode. With their managed Kubernetes platform, it's easy to get started with the next generation of deployment and scaling powered by the battle tested Linode platform, including simple pricing, node balancers, 40 gigabit networking, and dedicated CPU and GPU instances. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover.
Go to python podcast dotcom/linode today to get a $100 credit to try out their new database service. And don't forget to thank them for their continued support of this show. The biggest challenge with modern data systems is understanding what data you have, where it is located, and who is using it. Select STAR's data discovery platform solves that out of the box with a fully automated catalog that includes lineage from where the data originated all the way to which dashboards star will set everything up in just a few hours. Go to Python podcast star will set everything up in just a few hours. Go to python podcast.com/selectstar today to double the length of your free trial and get a swag package when you convert to a paid plan. Your host as usual is Tobias Macy. And today, I'm interviewing Shannon Zhu about pyre, a type checker for Python 3 built from the ground up to support gradual typing and deliver responsive incremental checks. So, Shannon, can you start by introducing yourself?
[00:01:44] Unknown:
Yeah. Hi. My name is Shannon. I'm a software engineer at Meta, and I've been working on Pyre team building a type checker and static analyzer for Python for the last 5 or so years.
[00:01:55] Unknown:
And do you remember how you first got introduced to Python?
[00:01:58] Unknown:
Yeah. So my very first programming languages, I believe, were Java and C. And I think my initial deeper exposure to Python was when I started using it to prep for my very first coding interviews. That was the first opportunity that I really started to get familiar with the language, and I think I kind of quickly realized that some of the same, like, properties that make it such a good choice for technical interviews also makes it a really good learning language too. So, yeah, that was probably my first exposure.
[00:02:26] Unknown:
Can you describe a bit about what the Pyre project is and some of the story behind either how it got started, if you are familiar with that history or at least your introduction to it and why you wanted to spend your time and energy on this particular area.
[00:02:41] Unknown:
The first part, you know, is like, what is Pyre? It's, I guess, simply type checker for Python, and we originally started building this back in 2017. And some of the initial motivation there was to solve the type checking needs for Instagram, which is a huge Python code base. And they were 1 of the very first early adopters for Python annotations when they were kind of introduced to the language in PEP 484. And the codebase is growing really fast, and we really needed faster type checking results to keep that development experience reasonable. So that was 1 big motivation. Another thing was that Meta actually has a somewhat, like, long standing track record for adding types to dynamic languages and then building static analysis tools for security on top of it. So some examples prior to pyre would be flow for JavaScript and hack for PHP.
And so there was some, you know, intention to do the same thing for Python since those have been so successful in the past. So I thought it was a really exciting space. I like Python from what I had used it for in the past, and I thought it was a really interesting problem to kind of build, type checker for such a dynamic language. I joined the team around when it was started, so that's kind of the story for for the team.
[00:04:00] Unknown:
In terms of type checking, particularly with Pyre, but also with some of the other suite of tools that are out there, what are some of the ways that that capability fits into the development workflow as you're working on a particular code base, either starting from scratch or jumping into a massive code base such as Instagram?
[00:04:19] Unknown:
Yeah. So the most common use case probably is for a developer who's already working on a project that has type checking turned on, and they'll interact with the type checker via either the command line or more commonly their editor when they just open a Python file. We spin up a type checking server quietly by default in the background, and they'll just get type errors as they write code. And then they'll probably see it again in continuous integration. So when they submit their changes for review, we usually provide signal both, you know, whether this change introduced type issues in your project or maybe in a downstream dependency. But there are also maybe additional lints or, like, coverage delta signal that you might get in the CI as well. So that's kind of the most common, I think, use case. But definitely for teams who are interested in, like, just starting out with types or or maybe taking their, like, really driving type annotations, We also have, like, different interfaces and tools for them to sort of, like, push some buttons and either turn on type checking, make it more strict, or do code mods for type coverage, things like that. So I suppose that's 1 other dimension where you might interact with Pyre that's not just, showing type errors in your editor or in your CI.
[00:05:33] Unknown:
In Python, there have been a number of other tools that have come about for being able to interact with types and specifically for type checking where maybe the canonical example is mypy because of the fact that Guido was 1 of the people who was involved in its early development efforts. And I'm wondering if you can describe some of the landscape and ecosystem of type inference and type validation and type annotations, particularly as it pertains to these type checkers and some of the different goals and focus that those different tools adhere to?
[00:06:08] Unknown:
Yeah. So MyPy is a great example. There is also HiType at Google and Pyrate at Microsoft. So I think these 4 are typically the biggest names in the type checking space for Python. I guess Pyre in particular definitely started out with the goal of just being, like, blazingly fast on huge code bases, you know, Instagram being our, like, urgent need. And it's written in OCaml, which I think is pretty different from a lot of the implementations of other type checkers, which are usually themselves written in Python. And so a big focus for us was to get, like, sub second response times for changes in the editor that are also, you know, absolutely guaranteed to always be consistent with a full cold type check. So we did a lot of work on performance and consistency reliability, basically, on really large code bases. So that was that was definitely, like, our number 1 number 1 goal and a big reason that we didn't just go with MyPy, for example, internally, which is what Instagram started on. And I think some other differences, you know, Pyre also tends to be a relatively more opinionated type checker in the space. So, we only really support 2 modes. There's gradual typing and there's strict typing.
And within each of these modes, Pyre takes a pretty strong stance on what is type safe and what is not, and it doesn't really allow users to, like, fiddle with tons of flags and opt ins and optionality for, like, you know, enabling or disabling different pieces of the type system or, like, different interpretations of Python type system, which maybe is either a good or bad thing, but the type system itself is pretty specified as, like, kind of a skeleton, like, their syntax and their operators. There's some rules, but there isn't a lot of, like, concrete specification on the expected type output, like, type error output or behavior of every particular pattern.
So that is 1 way that different type checkers might might be different. Like, they actually may output slightly different errors in different places, things like that. And I think the last thing that might be, you know, interesting to call out is is, again, the static analysis for security application. So that has been a focus for the Pyre team in the past, and it's even bled into some PEPs. Like, if you're curious to, like, you know, hear more about security applications for types, We recently published, and it's included in 311 with 675, which is arbitrary literal string. So that's another thing that at least our team really focuses on as, like, 1 application of type checking that I don't think is really as much in the forefront of some of the other tools in the ecosystem.
[00:08:36] Unknown:
The fact that it's implemented in OCaml is an interesting juxtaposition where OCaml is a very strongly typed language and functional, and you're working on a dynamically typed language that is, I guess, agnostic to paradigm. And so working in OCaml to do type checking in Python, I'm curious what you have seen as some of the interesting kind of juxtapositions or lessons learned as you jump between the 2 different environments.
[00:09:04] Unknown:
Yeah. That is really interesting to call out. They're just worlds apart. You know, OCaml is kind of the language where you're fighting with the compiler to the death, but, like, once it compiles, it, like, definitely works, or, like, more or less definitely works. And so I think it hasn't felt so strange, honestly. Like, it doesn't feel too weird to be working in OCaml for Python, but it definitely does, you know, help us have a concrete appreciation for what a very sound type system or a very complete type system can give you. The level of just safety guarantee. Like, you can just do refactors and make changes with just such an insane level of confidence that it'll tell me if I forgot something, if I wasn't aware of some dependency, and it's something you're pretty used to not having in Python.
So I guess in some ways, it does provide that kind of tangible illustration of what a really sound type system can do. I'm wondering,
[00:09:57] Unknown:
as you go from working in OCaml to working in Python, if there are any ways that that influences the way that you approach development of your Python code to be maybe more strict or detailed with the it type annotations that you're adding to try and get back some of that level of confidence and just some of your, some some of the perspective that it provides you working between those language environments?
[00:10:21] Unknown:
I actually say not really. I think we really keep in mind the value of, like, the simplicity and cleanliness of Python. And, you know, the goal isn't really to build a totally sound type system and capture all of the dynamic possibilities of the language and express them in the type system. Although, you may be surprised by, like, just how expressive the type system can be on things that are quite dynamic. But I don't think working in OCaml brings us to the Python type checking space with, like, more of an eye for, like, really sound complex types. Like, I think the languages are different enough that we we definitely viewed the type system in Python as, like, a completely different beast.
It was tacked on afterwards on, like, a pretty fully fledged language, and that's pretty neat. So, yeah, I I don't think it it necessarily applies in a very concrete sense, but it perhaps, like, gives us, yeah, that appreciation for, you know, if we had, like, very well typed code, like, what could the development and iteration process look like, and how fast could it be, if that makes sense.
[00:11:26] Unknown:
Another interesting element of to your point, the type system well, I guess the type system was already there, but the type annotations and type enforcement in Python is something that was added as not necessarily an afterthought, but added as something that to to an already existing language and ecosystem, which brings up the idea of gradual typing, which is 1 of the goals of Pyre. And I'm wondering if you can talk to some of the ways that that manifests. And for people who might not be familiar with that concept, sort of what that actually means to be able to do gradual typing and how you might incorporate that into an existing code base?
[00:12:02] Unknown:
Basically, gradual typing is a necessary crutch in order to convert an existing Python project that doesn't have types into 1 that does without saying halting all production and getting your entire team to just annotate things for a few months, which is just never realistic in the real world. So gradual typing essentially is a way to, like, add a stopgap. Like there's basically a type that we use that kind of breaks the type system. It can be compatible with any other type, and this is used in gradual typing to allow you to incrementally add types and only deal with type inconsistencies between the types you've already added rather than having to deal with all the downstream effects of every type that you've added, which essentially becomes 1 of those viral problems where you end up with more and more and more errors and fixing those results in more errors until you find yourself sitting down to convert your entire code base in 1 sitting, which is not possible. So, yeah, gradual typing is basically a mode where, you know, you can add types as you go, and the type checker just makes blanket assumptions about everything that doesn't have a type and assumes it's compatible.
This trades off obviously with soundness and safety guarantees. If you have a gradually typed project, you might not know exactly how much of it is covered. You You might not know how much your of your dependencies are covered, and so you don't know how much to trust the type checker when it says, like, you're all good. There's no type errors. So it definitely is like a a phase that we try to move projects through and and out of eventually into stricter type checking to gather more of those guarantees. But it's totally necessary for a language and a type system like this where types are being added to existing things in most cases.
[00:13:43] Unknown:
Is there any real distinction from that kind of gradual typing mode and the type inference that a type checker might do where you say, I have a variable foo and I initialize it to an integer 1. And so I'm going to assume that for the lifetime of this project that foo is has a type of int? And then if I see that you assign a string to foo, then I'm going to throw an error. Like, is it the the similar approach to that, or is it a kind of different layer above that?
[00:14:12] Unknown:
Oh, yeah. So I would actually separate those 2 concepts. I think gradual typing, at least when we talk about it on pyre, I think mostly in the typing community, it refers to the adoption process for types. Whereas type inference is definitely a a critical part and baked into every type checker and maybe more manifest in the idea that not everything in the Python program needs types. So conventionally, we only ask for explicit annotations on really globally accessible values. So that might be the globals, class attributes, and function signatures. And then everything else, a lot of local values, etcetera, are all inferable from that set of globally accessible values.
And, therefore, Python doesn't wind up looking like a language where every single variable initialization needs an explicit type. In terms of being able to actually go from a project that
[00:15:06] Unknown:
is functioning, you're using it, it's running in production, and then you say, okay. I wanna start adding type information into this program because I'm starting to hit some of these roadblocks where I change a value or I change the type of a variable, and then all of a sudden my code starts breaking. So I just wanna understand what is this value even supposed to be? Because I'm 5 layers deep in the stack, and I don't know where it originated. And so I know that there are projects such as monkey type that are aimed at looking at the runtime behavior of your program and then generating the type stubs so that you can layer those in afterwards. And I'm wondering, you know, over and above that, if there are any other kind of useful strategies that you've seen teams take for being able to start figuring out what are the right types to add and what are the right places to think about adding those types so that I can get some of those kind of safety checks and kind of guardrails in the project.
[00:15:56] Unknown:
Yeah. So monkeytype is a great example of 1 way to automate adding annotations at scale. So it's really neat because it's dynamic and it it samples from your actual run time traffic and figures out what types might be. Then there's the static inference as well, which essentially just reuses the same engine that powers inference in the type checker itself to propagate whatever types it can infer and add explicit annotations, like, guess at explicit annotations. So those are definitely 2 ways to just automate the process entirely, but not everything can necessarily be automated. And automation tools also do benefit from, like, a seeding of, like, manually added types in many cases. Right? Like, some partial percentage of type annotations often makes at least a static inference a lot more powerful. So other approaches that I've seen teams take with quite a lot of success usually revolve around measuring coverage and making it very visible to developers. So a couple different forms, like either as the project as a whole and setting kind of better engineering goals and measuring individual contributions to type coverage. For example, changes that get submitted can very easily be dealt against coverage stats on the base and figure out, you know, like, who's making the effort to annotate more code, annotate new code or existing code, etcetera.
Another thing that we've recently been pushing for is something we call expression level type coverage, which basically exposes holes in the type inference process. So, I mean, basically, every type coverage hole will usually come from some missing annotation somewhere upstream. But, you know, like, I might be somewhere in in the body of some function where my function is perfectly typed and my entire module is perfectly typed. And maybe my whole project is strictly typed, and we have 99% type coverage. But then I might be shocked to find that I run this code and I get type errors that weren't caught by the type checker. And oftentimes, that's because I'm calling some function that's in the dependency and that project is poorly typed. So being able to expose to the user in, like, a visual way what the type inference engine thinks it's confused about is also another interesting way to both motivate more type coverage and maybe even identify, like, the libraries that are causing the most coverage holes downstream and, you know, push people to annotate those. And also avoid surprising people when, you know, they have a well typed module. And, for some reason, the type checker is still blind to some pieces.
That helps with the level of safety guarantee even if they can know, like, where the blind spots are.
[00:18:37] Unknown:
Question of coverage is also interesting because it's definitely possible to have everything in your code base with a type annotation and still not get any of the safety guarantees because everything is just in any type. And I'm wondering if there's any utility or capacity for being able to visualize not just the coverage, but the specificity of the types and whether or not it is actually covering the kind of right level of constraint where you say, okay. You have 95% type coverage, but 30% of your dictionaries are just type string any, so maybe you need to be a little bit more specific about that.
[00:19:12] Unknown:
Yeah. So for the purposes of coverage statistics, any counts as not types. But you're totally right. They can be embedded in other types. And so oftentimes, coverage gets a little bit more complicated where we have the concept of, like, a partially typed value where you might have, like, a list of any. So you know something about it, but you don't know everything, and you'll still get unknowns when you pull out of this list. Or, yeah, like the common dict string any oftentimes used for JSON. So, there is a little bit more nuance too in flagging not just, like, black and white, what is covered and what isn't, but also what's partially covered. As far as I'm aware, I think some other type checkers also have a, like, similar concept of, like, what is uncovered but is inferable. So, like, the type checker is making an educated guess, but it would be better if you annotated it, something like that. So there are different nuances to measuring coverage and showing it to developers in a useful way.
[00:20:05] Unknown:
Digging more into the Pyre project itself, you already mentioned that it's written in OCaml. I'm wondering if you can talk through some of the implementation and design and some of the ways that you think about the architecture of a type checker as compared to maybe like a web application.
[00:20:22] Unknown:
Sure. Yeah. I can do sort of a high level overview of how the type checking is put together. So, essentially, Pyre does have a, like, a Python client which handles, you know, the command line interface, file system changes, and code editor integrations, but the core type checking is all done in OCaml. We spin up a server, basically, that can be pinged for by the client for any incremental changes. And the server itself, when you type check something for the very first time, it will parse your Python source code and build an abstract syntax tree. And then it does some pre processing on this tree to deal with like name qualification, handle versioning, you know, expand some syntactic sugar and all that kind of stuff. So that's kind of the first step. And then we enter a phase that we call environment building.
So for pyre, we basically have a bunch of parallelizable layers and each layer collects some kind of global information that subsequent layers depend on. So, you know, we'll be collecting things like class names, function signatures, type aliases, like global values, and then eventually we'll, after being able to gather all of this information, we can build out, like, a class hierarchy and an ordering of types. So all of these are essential to answer questions about, you know, whether some type is less than or equal to some other type and to know, like, if I make a function call, what's the type like, basically, be able to resolve the types of anything that's globally visible. And then the final step really is the type checking itself. And so we type check all local scopes of code in parallel.
Once we have that global information, all of the local scopes are independent. I guess, nesting is maybe an asterisk, but, essentially, we have the AST, and we turn that into a control flow graph. You know, branches on if else statements, but generally just steps through, like, the execution path of the code. We analyze that statement and apply changes to, like, a set of inferred local types. We do this on a fixed point, and so we eventually have a snapshot of what the local type environment is at every statement, and then we can generate errors. And in the final exit statement of the local scope, we have a final set of errors at the end. We do a little bit of post processing, and that's what we show the users.
Let me know if I can get into any more detail, or that's, like, the high level bird's eye view of of how Pyre is implemented.
[00:22:44] Unknown:
Yeah. And 1 thing I'm interested in digging a little deeper into is kind of the state management where when you're working across a large code base, you obviously don't want to have to recompute all of the type information every time you come into it. And then there's also the question of being able to particularly if you're in exceedingly large code base, brings up the possibility of being able to share that state information across developers where you're you're working on different machines. And I'm wondering if you can talk to some of that state management aspect of Pyre and how you think about being able to generate that initial state and then maintain it and possibly propagate it for sharing?
[00:23:22] Unknown:
Yeah. That's a great question. I think that's something that we care a lot about. I mean, you can have a fast cold start, but it's still gonna be on the order of minutes on a massive, you know, multimillion line code base. So what we really want is for changes to give you results in seconds or less than a second, ideally. What we essentially do is we have this pretty neat, like, layering of dependency tables. So every single step that I had described before, so, you know, parsing, preprocessing, all the layers of environment building, and then type checking.
They each rely on certain artifacts produced by a previous step or a previous layer, And we maintain tables that track the, essentially, the fetches or reads of things that were produced from a previous step. And this essentially builds out, like, as we do the calculations, builds out a dependency table based on, you know, in actuality, like, what did this calculation need. And so when something changes in the source, we can really quickly tell, like, okay, what changed in the app checks and tax tree? And then we can recompute no more or no less than exactly what is necessary in order to get an updated set of type errors. And this problem this wasn't our original solution either. Like, we used to have a system where we would track dependencies on a module level, but that would mean that if you change something in a very interconnected module, maybe you even change something that doesn't really matter, like, that function itself or that, like, attribute itself isn't that interconnected, but, you know, its wrapping scope is, you would end up triggering type checks of just all these dependencies because, yeah, they depend on this module and name. Right? They import this thing, and then that would just be really slow. So this dependency table setup allows us to both with, like, really high, like, guarantee for reliability and with really fast speed, like, handle incremental updates. So, yeah, that's how we do incremental updates and all of the, like, environment state is saved on the server from the initial cold start. So then we're listening for file system changes, and then we have this server that like, this long running process that grabs that change and does this sort of dependency table informed recomputation.
We can also save that state. So we have, I guess, intuitively, they're called safe states where we can generate this cold start, like, basically, the server state of the memory and save that. And we have done something in Instagram where for, like, shared commits, like, basically, published commits, we will generate these things async, and then multiple developers who might be developing on just, like, 1 or 2 commits above this shared published commit just grab that appropriate safe state from storage and can essentially start with an incremental check rather than starting with a cold start. As far as the kind of time space trade off, I'm wondering what the
[00:26:10] Unknown:
storage cost of that stateful type information is in relation to the code base that it pertains to? So, like, are we talking where you have the code base and maybe the type information is half the size, a quarter of the size? I'm just wondering what the kind of relative scale is of the storage space that's required to be able to maintain that information.
[00:26:29] Unknown:
Oh, it's not okay. I don't have actually accurate number here, so I won't guess, but it's much smaller. So there was not any, like, real storage concern with storing. We did, like, I think, 1 in every 10 commits or something, and only for the past couple days. And so that was quite small. Like, we didn't have to really petition for extra storage or, you know, it's really special casing, the internal infrastructure we have for for storing and fetching things. It doesn't remember any of the source code itself. It remembers, like, basically, the the output of the environment building phase. So a lot of these are just tables of, like, globals and type hierarchies and things like that. So the space needs weren't huge, but then again, it also wasn't small enough that we could just store every single commit for quite a long time. And there also are a lot of projects where, you know, for example, if you're doing development on, like, a shared server that's warmed up for you beforehand, we don't even really need the save state for that project. Like, we can just modify the warm up process for a developer server. That just warms up your type checking beforehand, and you don't really have to deal with the system at all. Another interesting aspect of this
[00:27:44] Unknown:
problem space is that because of the fact that the Python type annotations and type definitions is a moving target and has been for several releases now, I'm wondering how that has impacted the work that you're doing on pyre being able to keep up with those changes in the ways that type annotations are specified, the types of types that are available, and, you know, the introduction of the typing module, and then the subsequent, you know, additions to it and changes where you can actually just use the built in name of that type without having to use the capitalized version and just all those kind of evolutions that we're going through and some of the ways that you're engaging with the Python developers and the Python community to help maybe shape that dialogue?
[00:28:25] Unknown:
I would hardly call it, like, keeping up with a changing system so much as, like, we're oftentimes the ones proposing changes. I think it's pretty exciting that the Python type system in many ways is really young, and it's changing a lot, and there are a lot of use cases and, you know, user pain points that we get to hear about, try to solve, and then eventually think of ways to really express this better, maybe more precisely or more clearly or more simply in the type system, and then bring these proposals to the open source typing community and write PEPs. So we write a lot of PEPs, actually. And so I guess we we kind of think of it as definitely a privilege that we can mold the type system that we're implementing a type checker against.
And oftentimes, we even implement things before they get accepted as, like, a proof of implementation, like, get feedback on, you know, a lot of users actually using this in real Python code, and that actually really helps the evolution or ability to accept the PEP with clear eyes on how this pans out. So, yeah, that is a big part of our work and something I really enjoy. And it's also pretty neat to see kind of, like, how the sausage gets made in terms of changes to the language. The community is really great, and it's really interesting getting to also weigh in on other proposals or other pain points and and see the type system change a lot, honestly, even over the past few years. But it's changed like crazy since 484 has been pushed out. Like, at that time, it was just like a comment style convention, really. So the rapid changes are really cool, and we're kind of at the point where we're, like, pushing a lot of those changes, which is a lot of fun.
[00:30:04] Unknown:
Another area that's been seeing a lot of growth and evolution alongside those changes to the typing system is the level of developer tooling and the integrations with that type system and how that manifests in terms of the developer experience. I'm thinking, in particular, things like editor integrations, language server protocol, in implementations. And I'm wondering how that has also shaped some of the feature direction and the areas of focus on Pyre for being able to work alongside some of those capabilities and being able to match some of the developer expectations of saying I've got my editor. I want it to tell me when I'm typing the wrong thing or when I'm using this variable in the wrong way. I guess editor integration for type results
[00:30:47] Unknown:
in general is was a was and remains, like, a really big priority because that is the primary interface that most people use to interact with Pyre. But there are also a lot of different applications for developer tooling that are made better by types. So, yeah, like you said, little things like go to definition, like hover, autocomplete, you know, like a lot of these, yeah, code navigation tools, code modding tools, also just smarter linters. So for Pyre, we've developed a query interface where a lot of these tools can ask of running Pyre server, not just for type results, but, you know, it's already doing the work of, like, listening to file system changes and updating its its understanding, right, incrementally. And you can also ping this server for, like, live results about what's the type of this thing at this location, or tell me if this is a subclass of that, or, yeah, tell me all of the names of something. So that has been 1 way that we're trying to do this integration, but a lot of this focus is also, like, something we're currently working on. So we really want to, like, support first class a lot of these kinds of linters or even additional analyses that kind of reuse the back end of the type checker and answer some slightly similar but not quite type system related questions and also support code navigation tooling. So, yeah, I think that there's, like, a really big space, honestly, in everything in the Python development workflow above the runtime that I think having, like, a a tool that has deep semantic understanding of control flow and the entirety of your project can provide just a lot of value in so many different ways. Absolutely.
[00:32:21] Unknown:
And as somebody who has been working with the Python community to, you know, understand and evolve the type system. I'm wondering what you see as being some of the strongest aspects of Python's approach to typing and some of the opportunities for improvement or some of the areas where you think the community has the potential to grow and improve upon the existing foundations that have been set over the past few releases?
[00:32:51] Unknown:
I would say I guess in some ways, I think the flexibility of the type system and the fact that it's like a kind of a living thing that's growing pretty quickly, I might describe that as both the strongest or kind of most impressive part of the type system, but also, like, maybe the reason why there's so much room for improvement. So I think it's pretty impressive, honestly, that we can design, like, a gradual type system that can provide a lot of soundness and value without stifling the dynamic language it was kind of tacked onto. I think when it comes to opportunities for improvement, I think there are kind of 2 directions, and sometimes they pull at each other. 1 of those being more expressability. Like, there are, you know, a lot of cases still that can't be expressed very well in the type system, and a lot of them are in, like, the numeric computing or, like, tensor typing areas where you need type arithmetic and a lot of other constructs that just, you know, the type system as it is can't fully express.
The other axes I was mentioning is, like, we don't want to detract from the cleanliness and the simplicity of Python and its syntax, and we really want types to be, like, kind of, like, augmenting the clarity and the readability of the language that I think is 1 of Python itself, like 1 of its points of pride, and like strong suits. So I think those 2 things are both ways that the type system can still improve. Like, a lot of peps are either expanding expressiveness of the type system, or they're trying to take something that can already be expressed, but kind of in a clunky way, and make it more sleek, more Pythonic. I guess a recent example of this that comes to mind is the self type I don't remember the PEP number off the top of my head, but it's 1 of the recent ones that's in 311.
11. It really reduces the amount of boilerplate in order to use type variables to bind to the type of self. So things like that where you can, you know, Python's very intuitive and you want you want the type system to reflect that, you know, like, what you think this probably should be structured as is kind of just what works. So, yeah, I think there's a lot of room for improvement.
[00:34:59] Unknown:
In terms of Python's investment in the type system and to your point of some of the earlier investments in projects like Flow and Hack, and I know that the Ruby community has started investing in adding some type annotation capabilities. I'm wondering what you see as some of the juxtaposition of Python's approach of wholly adopting typing as a core capability in the language where it's a fully supported syntax and feature and something that is exposed for tools like Pyre, etcetera, to be able to build on top of versus some of the things like Flow or TypeScript that is an entire language over and above JavaScript or the the work being done on Hack where it's not something that is being folded into the core language definition. Just some of the ways that that manifests as far as the broader community adoption and availability of tooling and support?
[00:35:53] Unknown:
You know, we've run into some implications of runtime evaluations of of types recently. So types are actually evaluated at runtime even though they're not validated. So we've kind of realized this with a lot of unexpected pain, perhaps when we propose to stop evaluating types at runtime, for performance reasons and, you know, some other trade offs. And we discovered like, the typing community, like, to our surprise, discovered that there were a whole lot, like, a whole set of of tools and applications and projects and libraries that, like, came out of the community saying, like, no, we rely on having these available at runtime. And so, like, you can't do this. So that, you know, kicked off a whole other discussion around, like, how yeah. Like, what are the trade offs and, like, when and how is this type of value make sense? It poses some issues too with, like, forward references and things that maybe other type systems won't struggle with. That's 1 aspect that might be different.
And I think in some ways too, it it is nice that the Python steering council and, you know, other, like, people who are responsible for sort of distilling and reflecting the community or the desires of the community, like, what they want from Python, have, like, a a lot of say and a lot of concerns with every change we propose to typing. There's a lot of syntax that is it would be nice to have maybe to express type annotations and more, you know, in more succinct ways or cleaner, quote, unquote, maybe, like, shorter ways. But, you know, adding that syntax would create a lot of complications for the parser and the runtime because that's actually part of the language. You can't just, like, throw in some angle brackets in the type and call that call that fine. Like, that's just part of the type system. Like, that actually is very difficult to do in Python because we I don't know if a lot of people actually want angle brackets brackets introduced into the syntax. It could bleed into other things. They could wind up getting picked up by other runtime features, etcetera. So it does make changes maybe a little bit harder, but I do think it kind of keeps the type system bound more tightly maybe to the rest of the language, both in terms of, like, syntax and
[00:37:56] Unknown:
style and readability, things like that. Yeah. That's definitely an interesting point as well, particularly for things like Flow or Hack where it's a layer on top of the underlying language, where if that underlying language decides to change their syntax and it happens to conflict with some of the syntax that was developed for the type annotations or the way that you represent that type information, then all of a sudden you have to do a whole huge refactor that because of the fact that the change was made that's outside of your control, where being part of the language community and something that is directed by the core developers and the broader ecosystem, it does have to maintain those same idioms. It does add some constraints and slows things down, but it makes sure that you don't have something that breaks all of a sudden because of something outside of your control.
And in terms of your experience of working on the PyRE project and working with people who are using it for their day to day development and interaction with their development processes, what are some of the most interesting or innovative or unexpected ways that you've seen Pyre applied?
[00:38:58] Unknown:
There are a lot of different, you know, tools and structures that you end up needing when you, like, scale out Python to support huge production systems. And so there are, actually, I think are quite a lot of innovative ways that you can make semantic understanding of control flow, like, quite applicable. I think the first thing that comes to mind is is our effort to detect security vulnerabilities. So I've mentioned this before, you know, but in that context, I was really talking about building, like, additional static analysis that reuses the Python type checkers back end and, of course, reuses the type information to actually do, like, taint analysis, for example. But the type system and the type checker itself can actually also play a pretty neat role in catching and eliminating kind of entire classes of vulnerabilities, for example, like command or SQL injection, even in ways that just standard linters can't, like data validation, etcetera.
For this topic, I would actually really recommend a talk by my coworkers at PyCon this past PyCon 2022. It's titled Securing Code with the Python Type System. It's given by, Pradeep Kumar Srinivasan and Graham Bleany. So maybe that's something we can link, but, that is a truck full of examples of like basically exactly this, innovative ways to apply the type system to catch security vulnerabilities via the type checker. So that's pretty neat and something that we've invested in a lot in and care a lot about. And then I think in addition to that, Pyre has frequently been pretty well positioned to just answer a lot of more complicated context based questions about the code base, either by exposing type information to other tools or, you know, inheritance structure, etcetera, or by just building some very simple additional analysis by itself. Like, we can detect, like, you know, dead code, unweighted, avoidables, etcetera, or even like more complicated questions about like how exceptions are thrown and whether they follow certain rules, you know, which functions throw what. So there are a lot of, like, I guess, glorified linters really that are also kind of innovative applications for the same basis of information that powers type checking.
[00:41:07] Unknown:
1 thing that I forgot to dig into briefly is for people who are adopting pyre for some of their development workflow, is it something where it's an either or, where you're only going to use pyre or you're only going to use mypyre, or do you see people using both or multiple different type checkers for different aspects of their kind of validation in their code base?
[00:41:30] Unknown:
No. I actually haven't seen very much use of both. Really, I think when a project is running both, it's in the process of transitioning to 1 or the other. And I think this might be because I think both type checkers are pretty up to standard when it comes to, like, feature completeness, so there isn't really a big class of errors that you might get in 1 and not in another. And then the annoyance probably of having, like, slightly different, like, location on errors in some cases where really they're exposing the same underlying problem. So it's not really that there is, like, either or would do just fine. But having both, I could see posing some issues when it comes to, suppressing errors, for example, like, all type checkers provide some escape hatches for, you know, either when you just don't have time to fix something or or something's, like, too dynamic and you want to just shut up the type checker. So in those cases, you would put in line suppressions, and I feel like 2 type checkers can oftentimes make that really messy.
So,
[00:42:33] Unknown:
no, I think usually people will pick a lane as far as I've seen, but I could be proven wrong. I haven't seen very many. In your work on developing the Pyre project and working in this type ecosystem for Python, what are some of the most interesting, unexpected, or challenging lessons that you've learned in the process?
[00:42:50] Unknown:
I really liked learning about, you know, how hard it is to scale a tool from, like, a greenfield implementation project to something that's actually in the core developer loop of, like, thousands of engineers every day. And there are a lot of, like, hard questions around how to measure, but maybe not, you know, overly numericize, like, user experience and developer speed? Like, how do you push against that in a falsifiable way? And, also, how do we scale our own internal infrastructure to be really a stable core developer team that that actually does support all of this traffic and and and we're, you know, in the blocking path of a lot of changes landing. So I I think that was a really interesting, just like scaling problem for us since we started from scratch. Like, we built this, and initially, it was we had, like, no users, and then we had Instagram, and now we have users just basically any Python project internally. So that's been really interesting. I think kind of what I mentioned before too, I've really enjoyed getting more exposure to the process of writing Peeps and just getting to know, like, kind of how Python is driven by just, like, open source community discussions and how much influence people who care can have on the language. I think that's been really neat and definitely something that was unexpected for me, you know, when I signed up to just start working on a type checker. So, yeah, I think that's been really cool.
[00:44:20] Unknown:
For people who are interested in being able to do some of this type validation and integrate that into their development work flow? What are the cases where Pyre is the wrong choice?
[00:44:30] Unknown:
Yeah. That's a good question. I would say it could be not worth it if you're doing small, like, throwaway prototyping. If what you really value is, like, speed of iteration and really not the longevity or reusability of your code. And maybe also in really dynamic code Or, I guess, right now, if you're working with NumPy or ML pipelines, in those cases, I would say Pyre is not the right choice at the moment. There's actually a lot of complexity in, like, dynamic patterns that can be very, very elegantly expressed with types. So you might be surprised if you think your project falls under the too dynamic category, like, it might actually not. But, yeah, I think either too dynamic or, like, just not intended for reuse might not be the best choice for investing in types.
And there's also kind of an interesting debate around how much value type annotations provide to Python when it's used as a learning language. So I've heard kind of both sides of this debate where some people argue that it it really helps with clarity and understanding to introduce types out of the gate to someone who's learning how to code for the first time, which Python is oftentimes, you know, used as that first language. And then, you know, other people argue that we really want the first experience to be, like, as close to English as possible. We don't want to make things more verbose. Like, even if those kind of expose the learner to what types are and, like, how values work in programming languages earlier, there is some debate there, and I can kind of see the argument on both sides. So maybe, you know, for certain new language learners, maybe types aren't the best choice yet. I don't know. I'm biased. So I I think types add clarity. So but I do see the other argument there. And, yeah, I guess, on the flip side, I think, you know, where pyre is the right choice in particular is probably just really large codebases that need to be maintained and need to be refactored.
That's definitely where a type checker like Pyre shines and provides the most value.
[00:46:22] Unknown:
As you continue to work on the project and iterate on its capabilities and engage with the Python community on the future of typing, what are some of the things you have planned for the near to medium term?
[00:46:34] Unknown:
Yeah. So for the future of typing itself, just 1 example of an area where I feel like there could be a lot of potential for change and evolution is that, like, type system expressiveness around numeric computing or tensor typing. There are a lot of question marks in that space, but there are a lot of really smart people thinking about it and drafting proposals and trying to figure out how to do this in a way that adds value and doesn't add too much complexity. So I think that is definitely something to keep an eye on in terms of the future of the type system itself. And then I think for Pyre, I also just see a lot of room for, like, Pyre to provide value for different tooling applications. I think we talked about some of this before. So, like, code indexing, code navigation, linting, code mods, refactors, etcetera, like, all these tools. And we really want to see a very consistent, like, well supported development ecosystem that's just kind of a joy to work in for Python developers. So I think we're, in many ways, still many steps away from that. So I think that'll be something like, 1 thing that Pyre is focused on in addition to just making type checker faster and better. There are also some pretty neat applications, honestly, in the data use and privacy space. I won't get into the details of, but, you know, as Python's kind of used for more, like, massive and privacy sensitive products, having, you know, language level features and enforcement that can be done statically can also be pretty powerful and enable a lot of different
[00:48:01] Unknown:
guarantees and checks and things like that. Are there any other aspects of the Pyre project or applications of typing for development ergonomics and developer experience that we didn't discuss yet that you'd like to cover before we close out the show?
[00:48:15] Unknown:
Nothing comes to mind in particular, but I do definitely want to call out we sort of launched into the podcast assuming that types are given, and we're talking about type checkers. Right? The things that validate them. I think there's a whole discussion potentially to be had about, like, why types in lie in Python. And if that's something that's interesting, there are a couple talks that I would super recommend that I think we can link. They're all from PyCon, which I mean, there are a lot of great talks at PyCon, but in particular, Carl Meyer had a 2018 PyCon talk called Type Check Python in the Real World. This is sort of a snapshot of of what Instagram was like when it was just converting over to Pyre, but was really largely still using Mypy, but it's really focused on, like, why types? Like, why do we bother to type check a huge untyped code base like Instagram and push it to, like, 99% typed where it's at now? And then also a really recent 1, so the 2022 PyCon keynote by Lukas Langa is also really great. So, yeah, would recommend those for sort of more discussion on why types and how does it integrate with the developer experience in Python, and what are the trade offs, and what are the concerns? Why is it worthwhile?
I think that's it.
[00:49:25] Unknown:
Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes, and I'll definitely add those links that you mentioned in there as well. And so with that, I'll move us into the picks. And this week, I'm going to choose the new Lord of the Rings TV series that Amazon put out, the Rings of Power. I just watched the first couple of episodes of that, so definitely very excited to see how that develops. Definitely recommend that for anybody who is at all interested in the whole Lord of the Rings world and ecosystem franchise, however you wanna label that. And so with that, I'll pass it to you, Shannon. Do you have any picks this week? Well, first of all, I've been watching it too. It's very beautiful.
[00:50:02] Unknown:
Really impressive show. Yeah. I think I recently played King's Dilemma. It's a board game. I would really recommend it. It's very different from the style of, like, any other board game I've played. It's a narrative driven game where it's semi cooperative, but also very adversarial.
[00:50:20] Unknown:
You know, it's like you're kind of representing a house and you're making decisions basically on a king's council. You don't really even know the win condition of the game when you play. It's a legacy game, and I had a ton of fun playing this. If that kind of style sounds interesting to anyone, I would highly recommend it. Yeah. That definitely sounds interesting. I'll have to take a look at that game. I'm a fan of board games myself, so definitely we'll have to give that a look. So thank you again for taking all the time today to join me and share the work that you've been doing on Pyre and in the typing ecosystem for Python. It's definitely very interesting project and 1 that I've used myself a few times. So I appreciate all the time and energy that you and the other maintainers are putting into that, and I hope you enjoy the rest of your day. Thank you so much. Thank you for having me.
[00:51:03] Unknown:
Thank you for listening. Don't forget to check out our other shows. The Data Engineering Podcast, which covers the latest on modern data management, and the Machine Learning Podcast, which helps you go from idea to production with machine learning. Visit the site at pythonpodcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. And if you learned something or tried out a project from the show, then tell us about it. Email hostspythonpodcast.com with your story. And to help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.
Introduction and Sponsor Message
Interview with Shannon Zhu: Introduction and Background
The Pyre Project: Origins and Motivation
Type Checking in Development Workflow
Landscape of Python Type Checkers
OCaml and Python: Juxtapositions and Lessons
Gradual Typing and Type Inference
Strategies for Adding Types to Existing Codebases
Implementation and Design of Pyre
State Management and Incremental Updates
Keeping Up with Python's Evolving Type System
Developer Tooling and Editor Integrations
Strengths and Opportunities in Python's Type System
Comparison with Other Typed Languages
Innovative Applications of Pyre
Lessons Learned from Developing Pyre
When Pyre is the Wrong Choice
Future Plans for Pyre and Python Typing
Final Thoughts and Recommendations