Summary
The release of Python 3.9 introduced a new parser that paves the way for brand new features. Every programming language has its own specific syntax for representing the logic that you are trying to express. The way that the rules of the language are defined and validated is with a grammar definition, which in turn is processed by a parser. The parser that the Python language has relied on for the past 25 years has begun to show its age through mounting technical debt and a lack of flexibility in defining new syntax. In this episode Pablo Galindo and Lysandros Nikolaou explain how, together with Python’s creator Guido van Rossum, they replaced the original parser implementation with one that is more flexible and maintainable, why now was the time to make the change, and how it will influence the future evolution of the language.
Announcements
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For more opportunities to stay up to date, gain new skills, and learn from your peers there are a growing number of virtual events that you can attend from the comfort and safety of your home. Go to pythonpodcast.com/conferences to check out the upcoming events being offered by our partners and get registered today!
- Your host as usual is Tobias Macey and today I’m interviewing Pablo Galindo and Lysandros Nikolaou about their work on replacing the parser in CPython and what that means for the language
Interview
- Introductions
- How did you get introduced to Python?
- Can you start by discussing the role of the parser in the lifecycle of a Python program?
- What were the limitations of the previous parser, and how did that contribute to complexity and technical debt in the CPython runtime?
- What are the options for styles of parsers, and what are the benefits of using a PEG style grammar?
- How does the new parser impact the approachability of the CPython code for new contributors?
- What was the process for reimplementing the parser and guarding against regressions in the syntax?
- As developers switch to the 3.9 release, what potential edge cases/bugs might they see from introducing the new parser?
- What new syntax options does this parser provide for the Python language?
- Are there any specific features that are planned for implementation in the 3.10 release that are enabled by the new parser grammar?
- As the language evolves due to new capabilities offered by the updated parser, how will that impact other implementations such as PyPy?
- What were the most interesting, unexpected, or challenging aspects of this project?
- What other aspects of the CPython code do you think should be reconsidered or reimplemented in light of the changes in computing and the usage of the language?
Keep In Touch
- Pablo
- pablogsal on GitHub
- @pyblogsal on Twitter
- Lysandros
- lysnikolaou on GitHub
- @lysnikolaou on Twitter
Picks
- Tobias
- Pablo
- Raised By Wolves TV Series
- Lysandros
- Afterlife TV show
Links
- PEP 617 – New PEG Parser for CPython
- Podcast Episode About Parsers
- CPython
- Bloomberg
- PEG Parsers
- Seafair
- LL(1) Parsers
- Łukasz Langa
- Parser Generator
- Concrete Syntax Tree
- Abstract Syntax Tree
- PyPy
- RustPython
- IronPython
- Structural Pattern Matching – PEP 622
- Pylint
- ASTroid
- Hy
- Walrus Operator/Assignment Expressions
- C99
- Reference Counting
- Cycle Hunting/Generational Garbage Collection
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to podcast dot in it, the podcast about Python and the people who make it great. When you're ready to launch your next app or want to try out a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode. With the launch of their managed Kubernetes platform, it's easy to get started with the next generation of deployment and scaling powered by the battle tested Linode platform, including simple pricing, node balancers, 40 gigabit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to python podcast.com/linode today. That's l I n o d e, and get a $60 credit to try out our Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show. You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis.
For more opportunities to stay up to date, gain new skills, and learn from your peers, there are a growing number of virtual events that you can attend from the comfort and safety of your own home. Go to python podcast.com/conferences to check out the upcoming events being offered by our partners and get registered today. Your host as usual is Tobias Macy. And today, I'm interviewing Pablo Galindo and Lisandros Nicolaou about their work on replacing the parser in CPython and what that means for the language. So Pablo, can you start by introducing yourself?
[00:01:28] Unknown:
So I'm Pablo Galindo. I come from Spain, although I live in London right now. I've been a core developer for almost 5 years now. I've been centering my efforts mainly on the compiler pipeline that includes the parser, the compiler, the SDK creation, the garbage collector, although I've been working also all over the place. I work currently at Bloomberg, in the Python infrastructure team, where we take care of basically the Python experience for all Python developers of the company. I think that's a great introduction.
[00:01:56] Unknown:
And, Lisandros, how about you? I'm Lisandros. I come from Greece, and I used to live in Berlin, Germany for 4 and a half years, but now I live in in Athens. Yeah. I've been a core dev since June this year, and all my efforts were focused on on the new pack parser project. And prior to that, I had done some stuff on the GitHub bots we have in the in the Python team. Yeah. That's my experience with Python Core, and I'm currently working at CFAIR, and that's an early stage startup in shipping.
[00:02:33] Unknown:
And, Pablo, do you remember how you first got introduced to Python?
[00:02:36] Unknown:
Oof, that was like actually even before. I think I was introduced first to Python when I was doing my PC. I work on researching black holes. I did my background in physics. Basically, our other software that we worked at was like some, you know, c plus plus and CUDA and things like that. And I started to use Python for the first time to which is I think is very typical for scientific, like, world, for basically gluing different pipelines, of computing pipelines together and things like that. You know, I fall in love with the language and I started to contribute in. I think like most people with some documentation typos, which, you know, many people think is not worth the contributions, but I think, you know, they are and everybody has to start with somewhere. And, yeah, that's been a long road, but that was how I started with Python.
[00:03:23] Unknown:
And, Lisandros, do you remember he first got introduced to Python?
[00:03:26] Unknown:
I think it was back in high school when I had some projects in computer science, and I remember using c for the first time. Then after a while, I found out about this awesome language, Python, which is much easier than c, which was the language I had been using knowing nothing about programming. That's where it all started. I fell in love with the language like Pablo said. It was just amazing. A short while after that, I developed an interest in finding out how the language works, and that's how I got started working on internals and kind of wanting to find out further details about the language.
[00:04:06] Unknown:
As I mentioned, we're discussing the work that you've both been doing on replacing the parser within the CPython runtime. So I'm wondering if you can just start by giving an overview of the role of the parser in the life cycle of a Python program.
[00:04:20] Unknown:
Basically, like, from a technical point of view, the parser is in charge from reading the text that basically composes a program, and let's say the number of way makes sense of it. Basically, in the technical terms, it inputs the text and it will output an abstract syntax tree, which is tree structure then is used by the compiler to produce by call or more in in in other languages or implementations. It can be a machine called or any other thing. Right? But ideally, it will basically grab your text as it is and it will produce a tree out of it. Right. But I think the technical term doesn't make justice to what the parser does. I think the parser encodes basically what the language is. Right? Because all this this particular tier that we're talking now, the abstract syntax tree, not only is something that, you know, compiler can find it handy because it doesn't have some of the things that are not important, things like commas and white space. The compiler doesn't care about that. But the this abstract syntax tree basically holds the actual structure of your of your language. It's not only from a technical standpoint, but also from what people normally understand as Python.
In some talks that I gave, I referred to the parser as the soul of the beast, the beast being Python here, because I think it's it's what reflects what people think about the language more than any other piece of the VM.
[00:05:40] Unknown:
And in terms of the existing parser and CPython, what were the limitations of it, and how did that contribute to the overall complexity and technical debt that exists in the CPython code base?
[00:05:53] Unknown:
Basically, the limitations of the old parser were threefold. Problem with it was that there were some constructs in the language that were impossible to parse, and that means that we had to implement several workarounds that, used to work around the fact that the parser itself could not parse those constructs, and that's things like like assignment targets or, like, multiline with statements. All those were impossible to parse at at parse time. At the times of the old parser, they were handled at compilation time at bytecode compilation time. And that's the first limitation. Then the second 1, not so much a limitation, but something that we felt we could do better is that the parser used to output a parse tree that was then translated into this AST Pablo mentioned before this abstract syntax strip, and we felt that we could optimize that. We could drop that intermediate parse tree altogether, and that's what the new parser does. It goes directly from a token stream from the actual source code to the abstract syntax tree, the solve of the beast.
And also there were other things mostly related to LL 1 parsers in general, like lack of left recursion and that the parser was was not actually expressing what the language actually looks like. Things like addition or subtraction. When the parser parses these things, it usually parses something like what addition will be made first. For example, when you have a plus b plus c, it's gonna be a plus b, and then outside that, plus c. And the old parser did not use to do that, and we had an intermediate step that used to take these kinds of things like additions or right hand sides and put them in the correct order. So all of these theory things, now the parser handles automatically without the need for more intermediate steps.
[00:07:57] Unknown:
So I've been discussing recently with Gooka Slanger. Like, he's the actor, release manager of 39 and the author of Black, as many people would know him. And, we have different views of what happened really. Tell my part of the thing. So like I think 2 years ago, I was having lunch with him at Facebook and we were discussing something that Black has a problem with, which is that when you have like multiple context managers, like let's say you say with something something comma something else, and then you have a bunch of those. So the problem is that Black couldn't format those in a nice way because, like, you cannot surround all of the groups of the context manager by parent. That's something that was not allowed. And sadly, that was a restriction of the old parts. Right? Like this little 1 parser that we used to have in 3 8, and before it doesn't allow you to write this. So he he approached me and, because I was working on on the parcel at the time, he said, like, oh, I have this problem in black. You know? If I could parenthesize these with statements, I could format them nicely 1 every line. Right? Like the way the same way you format imports. You say import open parenthesis, and then you put a bunch of things separated by commas and then a close parenthesis.
And then I try that and it worked almost that single construct that couldn't be made. Like, and for doing that, I had to do a huge amount of hacks, like, all the tricks in the book to make the other 1 parser kind of accepted. Sadly, I couldn't make every single thing work. And particularly, if you have a deal statement on your context manager, which is something weird, but you can do it, it couldn't work. And since then, I've been trying to find every single thing that I could use to hack around this. Like I asked like researchers on parsers. I've been looking at all my, you know, my old books on, on parsing and I been battling with this mal like, you know, multiline context managers surrounding the parenthesis for a long time. So when we though, you know, apparently this was something Lucas also mentioned to Widow. And when Widow started this project about, like, okay, let's let's let's change the parser. Right? I was super happy to to start jumping in. And my o 1 and single hole was this multiline surrounded parenthesis with the statements. Right? Like, 1 could say that we've been working for a year to make that work. So I'm pretty happy to be in a world when this is now possible, finally.
[00:10:04] Unknown:
And I know that in some of the initial blog posts that we do put out that covered some of his initial work on using a peg parser within c Python points to the fact that part of the reason that he landed on an l l 1 parser when Python was new and why it has been in the language for so long is that while PEG parsers did exist, they were too computationally expensive to be reasonable to include in the c runtime and that with the evolution of compute and the increased power that most devices have, it's more reasonable to use that peg parser. So I'm wondering if you can talk a bit more about some of the options for other parsing strategies and what it is about the current point in time that makes peg parsers the right answer for CPython going forward.
[00:10:52] Unknown:
I just wanted to say that basically a peg parser is not computationally more complex than an l l 1. Maybe it is, but not in the sense that it's not feasible. It was not feasible back then. The thing to consider is that its space complexity is much, much worse because PEC parsers and packet parsing use a cache, and this cache needs to hold all tokens of the program, and that could be huge. And basically back then, memory was not that cheap. And as it gets cheaper with the years, programs or parsers or programming techniques that just use more space to trade time for this become more feasible.
[00:11:40] Unknown:
Right. I think also kind of interesting, like at the end, the reason we went with a back person is a bit simpler than probably what you already imagined. The even the change for a little 1 is very interesting because normally when you design a grammar to be a little 1, what you want is to hang write your parser because l 1 grammars not only are straightforward to actually do automatically generated, like, you can automatically generate a parser very easily, but you can also hang write your parser very easily. So that is normally 1 of the goals, but the old parser was not handwritten, was actually automatically generated that well, and also in a in a way that it was not a standard. I think it was the first of its kind. We though use, like, such a specific way of of creating automata and then trimming them down, which is like, by itself, that is a known technique, but applying the same way was quite new, I think. The reason he could do that is because the Python grammar doesn't have no production. There is basically a rule that doesn't produce anything. Like, if you allow the empty string to be valid a valid root grammar, you have no productions and or Epsilon productions are their call, and that was not allowed. So we already have, like, a interesting choice of the parser and we, like, we don't like to automatically generate the parser for the source of truth that is the grammar. So when we started the PEG parser independently of peg or other choices, we certainly want parser generator. Right? We didn't want to write a handwritten parser.
And peg is just a bunch of options. I mean, if you go to other kind of parsers, you can also find parser generators, but it's much more complex. Like, you could we could have selected other technologies. I mean, I think it's also mentioned on 1 of the talks that we have made or 1 of the meetings that we have in the core team. Like, 1 of the reason we went with PEG is because it's easy to understand what's going on. So it was easy enough for us to write a parser generator that is not insanely big. Because for instance, for LLK grammars and or things like that, there is partial generators already, like Jack and all those. Vision Jack, I don't remember which 1 is the partial generator, which 1 is the tokenizer, but, like, there is already some of these. And the problem with this is that it's fairly much more complex to do 1 of these. We wanted to fine tune the password generator for doing different things, like for instance, now we have soft keywords. So in printed soft keywords in those, maybe easy, maybe not, but like we wanted something that we could understand, we could tweak to, you know, specifically be made for Python and PEC was good enough is enough to understand, and, you know, it also makes contributing to this a bit easier that it could have been we have choice choose something different.
[00:14:10] Unknown:
Yeah. I think that's a really important point that not only the the generator itself is easy to understand, but also the generated grammar. So debugging it or the generated parser. And so debugging or working on it is much easier and opens space for new contributors to work on it because it's really easy to understand. Every function of the parser actually has the same structure,
[00:14:33] Unknown:
and you can go through it and actually understand everything. Yeah. And I'm interested in digging a bit more into the fact that it does pave the way for new contributors to the CPython runtime because it eliminates 1 level of complexity. So I'm wondering if you can talk a bit about some of the opportunities that it unlocks for people who might want to introduce new capabilities to the language that might have been too difficult or too cumbersome or too daunting to do before this parser replacement?
[00:15:04] Unknown:
This is only my view. Probably other people in the project don't share this. I don't think it makes it easier. I mean, maybe it does in some way. Like for instance, the generation of the AST, like this is the part when it used to have this concrete syntax tree that we used to produce. We don't produce anymore, as Lisandro was mentioning before. But the thing that was translating that tree to the actual final 1, the AST, was insanely big file. It's called python /ast.c in the Python source. And that was like, I don't I don't remember, but it was like multiple thousands lines of code hungry. And that was insane because, you know, it need to reason about the number of nodes because it's, it's very tightly coupled with the shape of the CST. So there is 4 loops that were reasoning things like, oh, I have 3 nodes, then it's an if statement. If I have 6 nodes, then it's a for statement with, an else maybe. And that was, like, very difficult to understand. So in that regard, our parser may be much easier because from the grammar, it goes directly to ASTM. Those are much more close. Like, those are closer. Right? So a contributor that understands something about already about grammars and parsers may find certainly much easier to modify the grammar. But it still is complex, right? Because you need to understand several things about how parsers works in particular. You need to understand how big parsers work, which is new. Right? And unfortunately, 1 of the downsides of PEP parsing is that sometimes is when it, when it works, it's very easy to make it work. But when it fails, it's quite difficult. You need to understand very well what's going on because tech person are very well known to have this kind of surprise. Like, sometimes you put something, and then you don't understand why the thing is not passing.
We can go into examples. Maybe the Sanders can give some of the ones that we found. I think that the net gain is probably 0. Well, maybe maybe let's say it's positive to say, you know, to stay positive in life, but, like, it's very close to 0 because, you know, we gain the removal of this complexity of manually handwritten stuff. So now we have automatically united C code, which is great. But unfortunately I think we lost a bit on, you know, well, maybe not. Maybe also other 1 parser, even if it's like honestly simple, it's also complex in the sense that you need to understand what is another 1 violation to know why a rule that you're written doesn't work. So I don't know, maybe it's a positive change, but I don't think it's a great 1 or it's something super distinctive that we say, oh, now with this parcel light, you know, a contributor can jump in without any knowledge of grammars of Python and do a change. That probably, unfortunately, is still false.
[00:17:28] Unknown:
Right. And implementing a language feature does not only include the parser, and that's very important to know that in order to implement a new feature in Python, you have to go to the parser and then the bytecode compiler and then the bytecode interpretation. So in order for us to say that we've made it easier for someone to to implement the new feature, we'd have to have an easier way to do all those things, but we actually don't. We've made and that's arguable, but I think we've made it a bit more easy to took the parser.
But we've done nothing about the other compilation parts and the other things that go into implementing something new. So, yeah, someone might find it easier to work on the parser, but only that. Maybe the parser generator as well, which, the old 1 I find actually impossible to work with. But,
[00:18:27] Unknown:
yeah, I think that's pretty much it. I think now that Lisandro has mentioned this, there is a good point that may be easier, which is that, like, before if you get a syntax error on some construct that you're writing, so let's say you're writing a pro it's a syntax error, knowing where that syntax error is coming from was not trivial in the sense that because the l l 1, like, limitation, many of the syntax errors, for instance, were raised much later. Like, the l l 1 parser, basically, was parsing constructs that are not valid Python because it couldn't make the distinction in another 1 way. And while we're, we're translating the CST to the abstract syntax tree, which is after parsing, then we are saying, oh, if the abstracts industry tree that we are generating looks like this, this is actually a syntax error. So we were raising a syntax error. And even though sometimes after the the abstract syntax tree was created in the compiler itself or, like, when the compiler is actually reading that code, the compiler was also raising syntax error. So, basically the definition of what Python is was spread between the parser, the AST, and the compiler, which is what's not not great. Right? Because if someone is trying to understand not only the grammar, but also the language itself, it needs to look at the whole compiler pipeline. Right? Like parser, compiler AST.
Right now, although there is some small exceptions, but almost all syntax errors are raised now in the parse, so in the grammar itself. So if you go to the Python grammar right now, the new 1, you could immediately see 99%, let's say, syntax error. So you know what is Python and what is not. Like, if someone is now, for instance, creating an implementation of Python, it's enough to go to this grammar if they understand the back formalism. So it's enough to go to this grammar, look at it, and they can immediately implement a parser that can be or not, but they can know what Python is or how is defined. They don't need to read c code, or they need to read 1% of c code, let's say, or something. I mean, these numbers are arbitrary, but, like, it's it's very small cases. Things like you cannot assign to underscore, underscore, the back, underscore, underscore. That is a valid assignment except that the the name is 1 of the reserved ones. The person doesn't really know that. I don't think this is 1 of the ones that we have, but it used to be. But anyway, that's the kind of friends that that people will be looking for. So in that sense, it's easier now.
[00:20:44] Unknown:
And in terms of the overall process of actually making the migration from the l l 1 parser to the PEG parser, because of the fact that this is so core to the actual processing of Python adding any regressions or new edge cases and the issues that you encountered in the process of all of this work?
[00:21:15] Unknown:
Basically, the migration to the new parser is a multistep process. If we talk about very high level, then we know that in 3 9, there's both parsers, and the default 1 is the new 1. But you can always switch back to the old 1 in case you find anything odd and you want the old to parse your program. And then in 3:10, it's going to be the only parser and the old 1 will be removed. In the actual development of the new parser, there were many steps and many efforts we made in order to make it as secure as possible that it's not going to introduce any regressions. Basically, we first started out with trying to implement all the grammar until we could parse all all of Python. And then actually after that came the need for introducing not introducing, but finding edge cases in syntax errors or edge case in syntax that's allowed or and should not be or that's not allowed and should be allowed.
So there were many edge cases. We ran a test suite pretty regularly that used to try to parse all of the standard library. With every tweak to the parser generated. We used that test to make sure that the standard library still passes these tests. And then after we were somewhat sure that we've completed all of them all of their development on the background work, we implemented some scripts that downloaded basically a whole bunch of IPI packages and tried to parse those. And then we we identified identified some bugs there, so things like syntax errors in f strings that used to have some edge cases, and we had hadn't already found those.
And so after we did that these tests, we were actually pretty confident that most of our syntax works. So most of valid Python, our parser can parse. The even doing that much detail is verifying, negative input, and that means that it's pretty hard if you have a valid Python program to somehow make a sensible change to that syntax and render it invalid Python syntax. Actually, take that input, make it somehow invalid, and then test on that in order to see if the parser actually denies that input. We did do some tests on that, mainly handwritten tests in order to make sure that the syntax errors are correct, where the syntax error is is correct because the tree structure, holds meta information about where the syntax series.
That was and still is something we may need to work harder on.
[00:24:14] Unknown:
In this regard, there is an interesting fact that many people don't know why this problem is so hard. Like, it turns out that if you think about, like, okay, imagine that you have 2 parsers, right, in this case, it will be the old 1 and the new 1. And then you say there is any way to prove that these 2 parsers are equivalent. Surprisingly, like computer science tells you that this problem is undecidable. This is equivalent to the halting problem. You cannot prove that 2 parses are passed the same thing in the same way. You can also not only not prove that 2 grammars are the same. Like you have 2 grammars. You cannot prove that they pass the same language, But, like, this goes so far that even if you have a grammar, it needs to be a context free grammar, but a lot of them are context free. Like, you cannot prove that the grammar accept all the strings on itself, like, which is I find it insane. Like, that is undecidable. It's a problem that cannot be solved. So it's it's a very hard problem because the only way to do this is basically what Lisandro's mentioned. Like, you need a corpus, like, big enough of good programs or bad programs, and then you need to test both parsers in this way. Like, then every future validation will basically make like this. Right? Like, you have some program that you know is bad, and then you test it with the old parser and with the new parser, and then you basically check, okay, is is is the rejection the same? Is the error the same? Is the position of the error the same? And so on. Right? So unfortunately, for the good cases, like, it matters the most because, you know, you don't want valid Python program to be claimed that is invalid.
So for the good cases, we have the PyPI like corpus. So we have all the Python packages of people, which, you know, exercise a lot of grammar rules. There is some very weird things going on in the pipe. Right? But for the bad cases, like, you know, there is not a huge repository, like, you know, people don't publish in valid pricing programs. So unfortunately, we couldn't do that. If the standard library though has very good collection of, you know, testing and exercising syntax errors, you know, some some strings and says, okay, is this invalid or is this sort of invalid? We find a bunch of errors by running those tests. So we we are pretty confident that we handle negative cases good enough, but, you know, you also have this kind of very, very far and very small doubt that maybe you are missing something. For instance, without going very far. It turns out that we miss a valid Python case that was not exercised on any of the Python packages that we saw, which is that if you use a style unpacking, like the same way you can do a style variable equal and an interval, and, you know, it may be something as right. Star variable comma b equal interval.
So in the same way you can do that, you can also do this as time packing on with the statements, and turns out that we miss that on the release on 3 9. So we we fixed now. So 39 dot 1 be will be you will be able to use a star on packing on with the statements, which is already weird because for using a star on packing, you probably want, like, you know, you use a star on packing where you have an interval and you don't know how many items you normally have. With the statements, they normally don't have 600 context managers, not, but who knows? Yeah, that was valid. And unfortunately we missed that, but, you know, that is mad. With all these words, what I want to say is this is a hard problem. There is already discussions on Python Dev about how to make this easier, but any automatic way that you can think of is going to be very tricky. We we tried some very good and promising tool. Like, there's some work on top of hypothesis. Smith, I think it's called. It's a fantastic package. It generates a random Python programs given a grammar, and, you know, we try to use it. It's really impressive how it works. I I mean, the whole hypothesis is really impressive.
Unfortunately, our experience, and this doesn't mean that the, you know, the package is not good, but our experience is that it was not good enough for finding actual errors because the grammar space is infinite. Like, the amount of possible valid Python programs is infinite. So exploring an infinite space when the errors are finite I mean, it could also be infinite errors, but they hope they are not. It's it's kind of very difficult. Right? The product that you find something wrong is is very complex.
[00:28:03] Unknown:
Another interesting element of the fact that you're switching from this LLN parser to a parser generator using a peg grammar is whether or not the other implementations of the language such as PyPy or Rust Python or Iron Python would be able to take advantage of that fact and just use the PEG grammar that you've defined and then run their own parser generators to then use within their own run times?
[00:28:31] Unknown:
I would pick this 1. Fortunately, many Python plantations and, I I'm going to talk about for the ones I know. I think last time I saw, I mean, I could be wrong, but I think last time I saw is using a LO 1 parser. So if they want to pick up, I mean, if we don't change the grammar in any way, they don't need to do anything. If we change the grammar in a way that cannot be passed on another 1 parser and they were using another 1 parser, they will need to change. So for instance, in Python 310, if the pattern matching PEP is accepted, that PEP is only possible with the new parser, with a PEP parts. I mean, you can use all the parcels. It's not a necessary epic parcel, but certainly you cannot do that with another 1 parser. So if PyPI is using another 1 parser, unless they find a way to hack around, which maybe with the without particular construct is possible, They may be able to maintain their 1 parser. But in general, in I'm pretty sure Python is using another 1 parser. They probably will need to change. To peg or to something different? So for instance, I don't know what parser is Rust Python is using. I think it's using third party Rust library for that. So if that's a party RAS library is based on LLK parser, for instance, which is a generic 1, they may be safe because you may be able to spread this new construction on LRK. All these parts of names are a bit insane is combination. So it's kind of difficult to pronounce for me. But if they are using some, you know, something better than anyone, they may be safe. If not, then there is the other problem, which is the probably they will need to translate the peck grammar to whatever new grammar they have. But that's probably not super difficult to do. In any case, they can also switch to a pec parser as well. But, yes, this is not only a problem for Python implementations. This may be a problem to tools in the Python ecosystem that were relying on other 1 parser, which unfortunately is a bunch of those. For instance, I'm very sad to say this, but Black was using another 1 parser. So Black needs to change to something else. So, Lukas, if you are hearing this, I'm very sorry.
[00:30:19] Unknown:
Yeah. I just wanted to say that 1 step of the migration is that and an important thing is that in 3.9, there's not going to be any new language features. So that tools that rely on l l 1 parsers can switch or to pack parsers and not have any problems. So in 3.9, no new language features are gonna be there. An l l 1 parser can still parse 39, but probably not 3.10.
[00:30:49] Unknown:
So for folks who are in the process of updating their code from 3.8 or 3.7 or what have you up to 3.9, what are some of the possible edge cases or bugs that they might see because of the introduction of the new parser that they would then want to fall back to the old ll 1 parser?
[00:31:07] Unknown:
Hopefully, no errors are gonna be there. So the hope is, of course, that for positive input, that means for valid Python programs, we do not have any bags. But, yeah, today showed that we do have actually some of them. It's a thing Paolo mentioned before that we do have some bags. Probably the most edge cases lie around the at the meta information the tree has. So there might be some cases where the actual position of the token for the was parsed by the parser is wrong. So maybe for multiline strings or or stuff like that, syntax error that might appear might show to a different place. But that's actually only for negative input. And I think we've made a good job of, verifying that we're gonna be able to parse 99.999% of valid Python programs. And there's gonna be some edge cases we've missed, and there's gonna be some facts we didn't, weren't able to discover, but I think that it's not going to be that much.
When it comes to syntax error and meta information about Python parse trees, it's gonna be a different story. I mean, I'm sure we've covered most of the cases, but there's gonna be some edge cases there as well. And because it's so difficult to be sure that you've covered everything, I think there's gonna be some bugs there. But,
[00:32:40] Unknown:
yeah, hopefully, they're gonna be reported, and we'll be able to fix those in time. 1 important thing here is that, like, 1 story that happened before with the previous Python, like Python 3.8 previously, is that all this meta information that Lisandris was mentioning, unfortunately, even for positive parsing cases, was wrong. This is this is a well known thing. It was pretty annoying because remember the code that actually injected this meta information is this super gigantic file that now is gone, this AST dot c, which is it was handwritten. And, you know, handwritten call is very likely to include bags and inconsistencies, and there were plenty of them. I think some of the PyPI folks were working on something that was underlying in syntax error. What part of your expression actually is raising the error? So you have function a plus function b. So it tells you which function is active in the syntax error. And they found some problems because the actual meta information that we were reporting was wrong. The good thing now is that because we are generating the AST automatically, this meta information for the positive parsing is is correct. We actually find a huge amount of them, and we correct them. We correct them also Python 3 8, the ones possible. But now all this meta information is much, much higher quality just because it's not handwritten. So, you know, like now tools like this 1 that I was mentioning is not possible. For negative parsing cases, as the sender was mentioning, it is possible that there may be some errors as well. You know, please, if you find any, report it to us. People that, for instance, run on, maintain, linters and codes that check the source of the source, like, you know, black, pylene, flake, all of these. They they get out these things more than users will for certain. So they are already raising some of these to us. But in general, we have a lot of confidence. Right? Because we only not have the the information from the standard library for negative cases that we have, which, again, is not is not as much as positive ones, but this is a bunch of them. But we also run the test suite for flake aid, the pylin, and a bunch of them. And, you know, we some some of these folks reported to us some of the errors that they found, so and we corrected them. So, you know, it's possible that there may be some more, and I'm certain they will be. We will fix them as soon as possible, but we are pretty happy, I think, with the state of that. So people shouldn't shouldn't be afraid that, oh, no. Now my my syntax error will going to be reported in Mordor.
That won't happen. It's fine. There will be good syntax errors.
[00:35:05] Unknown:
Of course. Of course. And it's pretty important to know that all these bugs we found during development of the bug parser in 3.8 and previous versions were never discovered by users. So it's not going to be that someone finds a new syntax error out of the blue. That's not gonna be the case. Yeah. As Pablo said, we're pretty happy with our error reporting, and we hope that it's gonna be even better in the future.
[00:35:32] Unknown:
Yeah. I'm definitely excited to see improved syntax errors that do a better job of pointing specifically to the point of the program that through the error versus what you were saying before where it has to go through 2 additional layers of processing before it then has to try and backtrack to see where it went wrong with the parsing. And the other point that you made, tools that tie directly into the AST, I'm wondering if the removal of that AST dot c is going to simplify their work for staying up to date, most notably thinking of things like Pylint and Asteroid where there's usually a little bit of a lag between when a new release of Python goes out and when they're able to blint that code because of changes and how the AST is generated and the changes within the AST itself or any other projects like py that is a list that compiles to the Python AST, for instance?
[00:36:25] Unknown:
That is kind of a different problem in the sense that we expose AST, which, as you know, all these tools use. Like, you know, we say we have a way to these tools to obtain the AST directly in a Python form. But as you can imagine, precisely because chat grammar changes, and also we find some new optimizations, the output of the AST dot parse function, which is the 1 that parses the ST may be different between minor persons. That is the main reason there is also always a lag between a new release of Python and these tools releasing new versions with support for the new versions because sometimes we need to change the AST. That is independently of the partial technologies. Just how, you know, we found some new grammar. Obviously, there may be some AST nodes now, but sometimes we change the nodes. It used to be that we have different nodes for numbers and strings, and now we have a single unifying node called constant. So they need to change that, unfortunately. So that's 1 thing. The other thing is that normally, unfortunately for the others, it was true that they need to read this horrible file as you mentioned. But in general, we try to do our best to these changes in the what's new document for a new release. So 39, for instance, you to what's new. Any change on the AST will be documented there. So in reality, in a fantastic world, all the need all the things that this maintainers of this tool need to do is to go to the what's new, look at how porting Python 39 or something like that, and there will be all the things that they need to change. I mean, I'm certain maintainer listening to me, they will be like, you know, I mean, there's pitchforks and torches, But because they probably will need to do something different, and we are sometimes inconsistent, but we try to do our best. So we we document this there. Right? Unfortunately, even with the new pack parser, there will be some delay on this, maybe even bigger if we introduce new grammar rules. Right? Like for instance, Python, even nowadays, it's not even to parse the operator because the change fundamental assumptions on their handwritten parser, particularly that some notes were final and couldn't take assignments and things.
Unfortunately, like, for instance, in 3 8, Python is able to use the position on all the arguments syntax that I introduced, but not the operator because the operator was a huge undertaking over parser. So, unfortunately, these folks will need to, you know, adapt in some ways. Hopefully, if they need to find something weird now, there is a single file they need to go. Right? Which is the grammar file, which is what describes the AST now, which which I think is our advantage. But I don't expect any super quality of life improvement in that regard, unfortunately.
[00:38:51] Unknown:
As we've discussed already, because of the fact that we do have this new parser with a more expressive grammar available to it, it makes it possible or easier to introduce new syntax that was too cumbersome or too impractical with the l l 1 parser. I'm wondering what the scope of those new capabilities are and if there are any specific features or PEPs that are planned for implementation in the 3.10 release that are relying on this new parser.
[00:39:22] Unknown:
We all know about pattern matching. I knew perhaps about pattern match matching. They rely on match being a soft keyword, which means that match is going to be a keyword only in the context of a match statement. So that code with match as variable names does not break. And that's only feasible because of the new parser where we've implemented the the feature of sub keywords, which can handle sub keywords pretty easily. It was not that hard to do. I think, and Pablo can correct me if I'm wrong, in the old password, that was not possible to have soft keywords like that in such an easy way. We used to have async and no weight being soft keywords, but there were many hacks around things like that. And now it's pretty easy to implement those, pretty readable for someone to skim through the code and find out why this works.
Something other than the match statement, which is not accepted by the way yet.
[00:40:25] Unknown:
The the parenthesize with the statements. Don't forget about the don't forget about those. That's the whole deal. Who cares about budget statements? Like, the important thing is those those parenthesis. I'm I'm joking. I mean, that is obviously a small thing, but as you can imagine, I'm very excited. 1 1 of the things I want to complete about what this under said is that so it's true. It was not easy to implement these soft keywords feature on the old parser. I mean, it was kind of possible because as this animation, I think in a way was soft keywords, but the way we we hack around that is that we made a sync and away its own tokens. So normally when you have a body or a keyword, so the tokenizer, which is this part that, you know, the first thing in the parcel is is this tokenization, which is basically grabbing your source code as a bunch of bytes. And then it it creates these items called tokens, which is basically words, but with some information. So for instance, it says, okay, this variable is called, blech, is a name or is a keyword or is operation, like the plus sign, for instance. Right? So the tokenizer was normally when it founds a variable name like a sync or a wait. So either it says it's a keyword or either it says it's something else in particular before I think in a way where anything, it says it was a name. Like, I never anybody or anybody or what is tokenized as a name. But for this, async and await, the trick was saying, okay. Async and await are now going to be this new special token called async and await. And that was now feeding to the parser, and the parser, you know, can distinguish if a sync and await is using a context of a keyword or not.
For hacks around, like, even if you you forget about if it's a hack or not, the main problem is that your tokenizer should be independent of your programmer. Like you you guys, I mean, it's not only quality of life. That is like all theory of parsing is giving you. I mean, there is some super complex parsing technique that is called token tokenizer feedbacks in which the tokenizer as the parser, what kind of token this may be, and the parser can tell you different kind of tokens depending on the context. That is that is a thing. Like, you certainly learn, you know, if it's possible to do, someone will do it. But in general, the normal parts of where the tokenizer doesn't know anything about your grammar, you can actually have different grammars for the same tokenizer. And then the, the actual thing that raises like meaning to the tokens is the parser. The parser with the grammar is the 1 where we tells you, okay, this war is now an assignment or is is a keyword, for a for loop or something like that. Right? So it was a bit of a hack because the tokenizer was cheating and it knew that I think can await is is something special. Right? So it was saying, take care of this. Right? And I I don't know what to do with this. It's it's a sync. I don't know what to do. And then the parser was saying, okay. You know, this is a sync and it's in a context. And this was 37. Right? Because this was 1 async and a way where the soft keywords as well, because it was a time when I think we were introduced. Actually, I don't know if it's 37 or 35. Well, anyway, the first time I think we were introduced, I think it's 35. So in 35, let's say it's 35. The first time that they were introduced, you can assign to variables call as in kinda way. And this was to not break everyone on the first try. I'm giving them a release.
Right? And then in 3, let's say, is 3 6. So the first time this happened, I may have these dates, so I wiggling around. But in 3 6, we made them a strong keywords again. So you cannot now assign to a sync and await anymore. Right? It turns out that MyPai still needs the soft keyword technology to pass all called Python code. So there is way to make them soft keywords again, but this, as you see, is built all hacks on top of hacks. So the main reason is that you don't want your tokenizer to know about things, and that's how we made it before 39 with a new person.
[00:43:54] Unknown:
And another interesting point to what you were saying about parentheses, but going in the opposite direction, is the idea that Guido briefly floated, reinstating print as a statement and not just a function call.
[00:44:06] Unknown:
Yeah. And that is not only print, but, basically, it was not, pretty easy to implement a grammar for Python where every function could be called without parenthesis. So that could be done in the new parser relatively easy as well because the PyQ parser syntax is actually pretty strong and pretty powerful, and you can do hacks and things like that. So, yeah, when Guido back then proposed, yeah, to bring back print as a statement, he also showcased some code which implemented actually every function called Bing statement as well. So, yeah, that was pretty cool, of course. Then it didn't make it into a PEP, but it's pretty cool to see that the new parser is so powerful.
[00:44:55] Unknown:
Right. I mean, depending if you like the idea or not, you may argue that, you know, it's better or worse. I will not say my opinion here, but I think it's a good statement to how powerful this new technology that we have, this parser is. I mean, again, it's not the most powerful parser in the world. I'm sure that, you know, parser experts are saying, like, you you you choose a pack parser. You are so stupid or whatever. But, like, look, if we have something that we are comfortable because it's easy to understand, it's easy to to generate. Like, someone that understands the grammar can run the generator and they don't need to understand how the parser, you know, is generated or all the little details of the parser. Right? As long as they understand the grammar, they can continue editing Python. So going back to 1 of the previous question, now I realize that that actually makes contribution of it easier. Right? But the important fact is that now these examples of how little code that you cannot do, how many things you could do that were very difficult or impossible to do before. So I think even if you don't like this idea of, you know, suppressing parenthesis or not, I think it's it's shows that it's a good statement on how healthy
[00:45:56] Unknown:
and, you know, versatile is the new parser. In the work of bringing in this new parser and figuring out how to integrate it into the runtime and what new capabilities it will enable, I'm wondering what you have found to be some of the most interesting or unexpected or challenging aspects of that work.
[00:46:14] Unknown:
Yeah. The thing we've mentioned loads of times until now is that there were many edge cases when it comes to metadata about the tree. And I think that was the most challenging part, making sure that everything is correct and truly the parser parses what it needs to parse and and stores meta information, right metadata. Yeah. I think that was the most challenging part because after we were done with developing the grammar itself, we spent about 2 or 3 months. I think it was in, fixing bugs and finding such edge cases and fixing those. So it wasn't an easy job. I think that was the most challenging part. A very interesting aspect of developing the parser was trying to make it as fast as possible and maybe a bit faster than the old parser.
We tried some things out there. We also tried a custom virtual machine for the parser at 1 time, but that didn't work out. We tried out many different variations of rule reorderings because Peg, has a prioritized choice operator. That means every choice gets tried out in the order it's written. You can reorder those routes and make them go faster or slower for a specific input. That means, for example, if it's more probable that a name will follow an equal sign, then you've gotta choose the name variation first and and follow that path until you fail because that's the most probable at 1. And that means that for most of the programs, you're gonna be faster. And I think that was the really interesting part of all our work.
We spent lots of time on it and it was pretty fun to work on that.
[00:48:10] Unknown:
For me, I think 1 of the worst things that we need to deal with is that because CPython is done in C the problem that when in the parser, do you want to put together a bunch of nodes that are of different types? The container that we had was doing type ratio. So basically, we hold an array of void pointers, and those void pointers were actually things. So ideally, they are what you expect to be, but there were some times when, you know, when we're developing the parser, we were returning their own thing or, you know, some unexpected bags. The problem is that the bags related to this were manifesting super far away from that. Like, even in the compiler, itself. Right? Like, you generate some AST, but AST is corrupted because, you know, something that you said is of no expression. It turns out that it was of no statement or something like that. Right? And then is the compiler that grabs that memory and says, I don't know what to do with this and these segfaults. Right? This problem has given us so many headaches. Right? It's horrible. I've been working recently or we've, sorry, we've been working recently on adding type information to this. So should do proper typing and do some generics and seeds in 99, which is, is not elegant, unfortunately, because the language is not prepared to do generics, but we did our best. Now if someone is, is, you know, is modifying the parser, now they probably won't didn't won't need to deal with this problem anymore because the compiler will say, no. No. No. No. No. You are returning here this type, and now you are adding this to this thing. That's not possible. So instead of having to deal with super spooky bags at a distance, now the compiler will rise proper, you know, compare errors, which is much better. Right? But in the time, I think that was something that was haunting me at night is that, oh, man. Like, these bugs are like insane.
[00:49:54] Unknown:
As you continue to work on CPython and help to improve the overall language and bring in new capabilities, are there any other aspects of the way that it's implemented that you think should be reconsidered or redone in light of the changes in the capabilities of compute and the hardware that we have and the ways that the language itself is being used?
[00:50:15] Unknown:
I'm going to answer this. This is a very com I mean, it's a very good question. It's also very complex. Before saying what I think about this, I will want to say that in CPython, at least, it's very difficult to do any substantial changes. And this is a truth that many people don't realize when they propose newer rules or when they complain about things. Right? Because several of the stuff. Right? 1 of the problems is that we have a huge surface for a CAPI. So the CAPI is a way that you have to implement c extension modules done in c or other languages that have ABI compatibility with c, and basically interface with inter this is things like NumPy. Right? Like NumPy is embedded in a mix of c and Fortran, and it interacts with Python using the CAPI. Right? And this CAPI has a huge amount of function. So and we use it internally. It's not something that we only expose. It's something that is used internally in CPython. So sometimes we want to make some of these CAPI calls a bit better, but it turns out that we cannot because it will break these c extensions. For instance, there was this attempt to do something like pi pi, which is that you have a list of just integers.
So instead of having Python integers inside, which is not like the best representation for it because they are it's an extract that has an object and a reference count and a bunch of things. It's somewhere in that huge extract for the integrator that is the actual integrator in c. Right? For integers, not for longs. But, like, 1 optimization was, okay. So if the list contains all the integers, instead of, you know, storing Python objects in that list, we could store the integers themselves and represent the list as an array. And when someone is asking for a specific integer thing, we box it, which is, basically we create an object on the fly and we give it to the user. Right? And if the user then append something that is not an integer, then we box all the integers creating Python objects, and then we transform the list optimized version in the normal list. Right? So that is a cool optimization because doing things like adding all the interiors together or iterating of the list is much, much faster. Right? Well, it turns out that we cannot because there is many CAPI functions that will give, like, will retain pointers directly to the data in the list, unfortunately. This means that if we do this kind of weird list version optimized, it's incompatible with with some of the C API calls. So we cannot do that. And the same way we cannot do this thing with this particular list, we cannot do with many, many other things. Like think about this, right? Like we talked before about AST. Even when we change the AST, that technically we are able to we are allowed to because, you know, if you change the grammar, you need to change the AST, or you optimize all the people that are doing, flake and black and all that stuff. All all those people should start complaining and say like, oh, you're raking like this and that and this sort of thing. I mean, again, I'm not trying to trivialize this. I'm very sorry, and we like our users and we do the best that we could for them. But for us, it's a nightmare because as core developers, we are trying to provide a better experience for users, you know, and make the Python interpreter faster. Like, speed is something that everybody wants, like, independently who you are. And, unfortunately, like, doing this in a way that we don't have said anything or we don't break something is hugely complicated and forever is possible way for AST, for CAPI, even chain the compiler. Like, once I tried to fix some super weird bug that it was before, the bug was that if you have something, some conditional that is always false, so you use things like if false and then something, the compiler was basically distracting that thing. It was saying, I mean, this is not never going to be executed. I know it to generate code for it. But it turns out that if the inside that if you have syntax error, the syntax error was not being erased, which is a reliable whatever. I mean, who cares? But, like, you know, it's it's it's a bug technically. Right? So I fixed that bug. But, unfortunately, then the compiler was actually visiting those blocks, and coverage was reporting now that those blocks were being visited. And because now coverage was able to see those blocks, the blocks were never going to be visited by a Python program because the condition is always false. Then the coverage, the total coverage of your programs was slightly lower because now there is this block that wasn't it before. Right? And the coverage people, you know, complain and say, oh, this is unacceptable. Right? And I mean, without trivializing the error, I understand why it was like bad and we fix it because we get our coverage and our users. I'm not trying to trivialize this, but I wanted to say that for our point of view, even changing the compiler, whether it's something that, you know, users don't interact directly with the compiler, is super tricky to do in a way that, you know, people don't break or you don't do something different for some packages and things like that. Right? So for the coverage case in particular, we find a more elegant way to solve this and we solve it in that way that, you know, solve the problem that we're trying to solve and make COVID happy. So that end up very well. Thank you, courage people to raise this to us, But unfortunately, it's it's super difficult. So, I can go into what I would like to change, but it's a very difficult thing to do as a core developer, unfortunately. And I think that, you know, people don't normally realize how complex is is to do improvements in ways that people are not broken. Like, if we we have a faster Python, but NumPy doesn't work, like, who really is going to be happy about this? Right?
[00:55:12] Unknown:
Lisandros, do you have anything to add?
[00:55:15] Unknown:
Yeah. I totally agree with Pablo. It's a thing that most people don't understand that it's pretty difficult to come up with a change and then implement it and not break people's codes. And that's a thing that surely makes it much, much more difficult to implement new features, to change the language, or change the grammar or stuff. And when we're working on Python core, backwards compatibility is always a major consideration, especially after the t t 3 change. It's something we always consider and makes changing the language much much more difficult in cumbersome. In terms of what I'd like to see changed, now, thinking of working on 2 main, project, First 1 is f string parsing, and I think that that's the reasonable next step with this new parser we have now.
It could improve how f strings are parsed because f string also need to be parsed due to the literals. They have the expressions. They have the format specifiers. They have currently, they're parsed by a handwritten parser. We've had discussions of moving f string parsing into the new peg parser, how that could make things even more maintainable and readable. And another thing we've had discussions about is better error reporting and how the PetGrammar enables that and how we could further customize our error messages and display nicer information to the user when they have a syntax error and how we could make that experience better for users.
[00:56:59] Unknown:
I realized I didn't say what I like to change. I will do it very quickly, so I still don't don't talk a lot. But, 1 of the things I I wanted to this is something that other callers were interested on, is to move CPython from a reference count language into something that uses a different garbage collector. So our garbage collector's strategy right now is a mixture of preference counting and sickle cycle hunting. And ideally, the pro 1 of the problems that we have is doing things like removing the gill or adding some specific, you know, improvements in in multithreading is very difficult when you have reference counting. So ideally, 1 of the things I think is is for CPython in particular will be important is to try to move I mean, this probably will break a bunch of things, but so doing this in our way, that doesn't break is going to be the difficult part. But moving this into a different kind of garbage collector, probably not moving, not compacting garbage collector or something like that, Something that I really look forward for, you know, unlocking new implementations, things like removing the guilt for instance, but many many other. That is probably very ambitious. The 1 that I really, really look forward and we can certainly do it, I think, without breaking people, is to change how the Python compiler is handling signals. So, unfortunately, right now, the way we handle signals may some Python. This is surprising. I mean, I will understand myself here because we don't want to be here all day. But 1 of the things that people don't realize is that technically with a, with a, you know, with a statement, if an exception is being raised, you know, your exit function is being executed. Right? That's the purpose of the with a statement.
It turns out that if a signal arrives in a super specific moment, so it's a super rare, but it can happen. Then what will happen is that your context manager won't be called. Right? And many other things can happen. Like, if in a signal arise on very specific parts of the evaluation loop, very weird things can happen. Think not has a very good blog post explaining how this is a super big challenge and how this affects trio, for instance, but then it happens in many different ways. And this is because there is no way for the compiler to say, for instance, between this bytecode and this bytecode, you cannot handle signals because it's not safe. Right now, signals are handled, some intervals between bytecodes, between every possible bytecode, but there is combination of bytecodes when we say not here. We cannot have, like, regions when we say, please, between this and this, not handling the signals because this is not safe. Things like try finalist or context managers or anything that involves, like, you know, going back to the stack or raising exceptions, that is very, very tricky. And that has some changes on how the compiler works, how it means by code. So I really look forward to, in the future, work on this. I mean, I have other projects, as Lisandra mentioned, specifically, create better syntax error, which is probably what I'm going to focus with Lucinda Sandquito next. But this is something that I would like to collaborate, for instance, with, Mark Shannon or other people because this is something that's been bothering me, and it involves some change in the compare that I think is not insane to do, and and that's a good good improvement. But, yeah, I mean, probably not the most exciting change for people. People may expect some some other crazy things, but I'm a simple person.
[01:00:06] Unknown:
Yeah. It's definitely interesting just to hear some of the feedback from folks who are hands on with the CPython runtime on a the as somebody who is working with the actual runtime and keeping it maintained and up to date, I imagine there's a whole different set of priorities. And so it's always interesting to get that insight.
[01:00:32] Unknown:
We try to get our users happy. Right? Like, people need to understand also that we are humans. We are human beings, and we do this in our free time. I mean, yeah, people know this drill, but it's true. Right? We we are human beings. We have feelings, something like that. And, you know, it's not only that we have different target that are users. Many people say, the kind of developers, they are in their own world, and they don't know what users neither want. Well, it turns out that there is lots of users. Right? And some users want something, another users want something else, and no 1 thinks that are broken. Right? And unfortunately, the things that we need to care about in the world sometimes is is seems different or sometimes we try to do things that users want. Some some set of users want and the other set of users say, ah, this is horrible. Why are you like, you know, they sort of people saying, why you don't make Python faster? I mean, I work daily on making Python faster. Right? Or which is 1 of the people that have dedicated more time on this, but it's so hard. And, you know, it's not a false dichotomy that dichotomy or dichotomy. I don't know how that is pronounced, but it's not a false dichotomy that you say or faster or new, right? Like you can have both or none depending on your core developer time. Right? So it's a complex thing, but I just want to say that we listen to everyone or we try to listen to everyone. And even if you think we are in our own core developer, or whatever, we are people just like you, contributors that they do that we are doing this in our free time, and we think which or at least we try to understand what the users want, and we try to work towards that.
Not our the things that we're interested in only, but, you know, sometimes it's very difficult to explain that, you know, sometimes these kind of changes are needed before these other changes happen. Or sometimes difficult to say, oh, if you want this, then you are now going to have this other thing. Right? Like people that say, oh, I want more performance. Then many people don't realize, okay, then you won't have NumPy for a while. And then you say, no. I want NumPy. Right? But I also want performance. And it's very difficult to say, well, that is not possible in a in an easy way at least. Right? A way to say, please be good citizens. We are just people like you, and we try to listen to you. And, you know, you can just we are friendly people. You can come to us, and we will do our best or at least, Lisandro and myself, I know for a fact that we will do.
[01:02:40] Unknown:
Are there any other aspects of the work that you're doing on the peg parser and some of the new capabilities that it allows for, or your other work on the CPython runtime that we didn't discuss yet that you'd like to cover before we close out the show? Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you each add your preferred contact information to the show notes. And so with that, I'll move us into the picks. And this week, I'm going to choose the annual Python developer survey. It's going on right now at the time of recording. Hopefully, Hopefully, it's still going on by the time this episode goes out in about a week. So definitely recommend contributing to that so that we can get as high quality data as possible about the overall Python community and the ways that the language is being used. My other pick this week is the Jessica Jones TV show. Started watching that recently on Netflix and have been enjoying that story. So, you know, if you're looking for something new to watch, I can recommend that. And so with that, I'll pass it to you, Pablo. Do you have any picks this week? 1 show that I've been enjoying a lot is called Race by Wolves.
[01:03:40] Unknown:
It's a TV series. I don't know, what I'm looking at. It's a bit difficult to track if it's Netflix or Amazon series on HBO. I think it's HBO, but I could be around. Raised by Wolves, or very good thing is testing by Ridley Scott, if I'm not mistaken. I supervise names, so I'm very sorry if I'm mistaken here, but I think it's Ridley Scott. And I think although it mirrors a lot of the alien world, I think it's very interesting. It's very intriguing. It's sci fi. If you are into that, I think you will like it. So, yeah, 100% recommend. The Pablo seal of approval.
[01:04:08] Unknown:
And, Lisandros, do you have any picks this week?
[01:04:11] Unknown:
Actually, I very recently watched Afterlife on Netflix, and I think it's a great show to watch and just think about things. It kind of gives me the feeling that it's the only show I watched recently that you can laugh and cry at the same time. I think it's great, and, I'd definitely recommend it. I think it's written and directed by Rick Gervais, And, yeah, definitely recommend that.
[01:04:42] Unknown:
Well, thank you both again for taking the time today to join me and discuss your work on bringing in this new parser capability to the CPython runtime. I appreciate all the time and effort you've put into that, and I look forward to taking advantage of the new capabilities that that brings to the language. So thank you both again, and I hope you have a good rest of your day. Thank you very much for inviting us. Been a pleasure to be here. Thanks. Thank you for listening. Don't forget to check out our other show, The Data Engineering Podcast at dataengineeringpodcast.com for the latest on modern data management.
And visit the site of pythonpodcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email host@podcastinit.com with your story. To help other people find the show, please leave a review on Itunes and tell your friends and coworkers.
Introduction to Guests and Their Backgrounds
Overview of the Parser in CPython
Limitations of the Old Parser
Why PEG Parsers Are Now Feasible
Opportunities for New Contributors
Migration Process to the New Parser
Impact on Other Python Implementations
Potential Edge Cases and Bugs
Impact on Tools Tied to the AST
New Capabilities Enabled by the PEG Parser
Challenges and Interesting Aspects of the Work
Future Changes and Improvements in CPython
Closing Remarks and Picks