Summary
When you’re writing python code and your editor offers some suggestions, where does that suggestion come from? The most likely answer is Jedi! This week David Halter explains the history of how the Jedi auto completion library was created, how it works under the hood, and where he plans on taking it.
Preface
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- I would like to thank everyone who supports us on Patreon. Your contributions help to make the show sustainable.
- When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at www.podastinit.com/linode?utm_source=rss&utm_medium=rss and get a $20 credit to try out their fast and reliable Linux virtual servers for running your awesome app.
- Visit the site to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch.
- To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.
- Your host as usual is Tobias Macey and today I’m interviewing David Halter about Jedi, an awesome autocompletion and static analysis library for Python
Interview
- Introductions
- How did you get introduced to Python?
- Can you explain what Jedi is and what problem you were trying to solve when you created it?
- What is the story behind the name?
- While reading through the documentation I noticed that there is alpha support for linting with Jedi. Can you compare the linting approach and capabilities with those found in other tools such as pylint and flake8?
- What does the internal architecture and design look like?
- From the research that I did for the show it seems that, rather than use the AST to determine the structure of the code being completed you built your own parser and recursive evaluation of the other methods that you use for determining accurate completion?
- What was lacking in existing parsers that led you to build your own?
- What are some of the difficulties that you have encountered building and maintaining the grammar definitions and higher level API for parsing multiple versions of Python, including the 2 vs 3 split?
- What are some of the biggest challenges associated with introspecting user code?
- What are some of the ways that Jedi can be confounded by a user’s project?
- What are some of the most difficult technical hurdles that you have been faced with while building Jedi?
- What are some unusual or unexpected uses of Jedi that you have seen?
- What do you have planned for the future of Jedi?
Keep In Touch
- davidhalter on GitHub
- @jedidjah_ch on Twitter
Picks
Links
- Cloudscale.ch
- Vim
- Youcompleteme
- Neocomplete
- pyflakes
- pycodestyle
- pylint
- Parser Generator
- Parser Error Recovery
- lib2to3
- Python grammar file
- Finite state automata
- Type inference
- yapf
- AST module
- MyPy
- IPython
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
[00:00:14]
Unknown:
Hello, and welcome to podcast dot in it, the podcast about Python and the people who make it great. I would like to thank everyone who supports us on Patreon. Your contributions help to make the show sustainable. When you're ready to launch your next project, you'll need somewhere to deploy it, so you should check out linode at ww w.podcastinnit.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your app or experimenting with something that you hear about on the show. You can visit the site at www.podcastinit.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. To help other people find the show, please leave a review on Itunes or Google Play Music. Tell your friends and coworkers and share it on social media. Your host as usual is Tobias Macy. And today I'm interviewing David Halter about JEDI, an awesome auto completion and static analysis library for Python. So Dave, could you please introduce yourself?
[00:01:03] Unknown:
Well, I'm Dave. I do some work, some open source work, mostly on on all the completion stuff. You might know me from the JEDI library. I also I work for a company called CloudScale dotch. We are basically doing the cloud stuff like DigitalOcean. But for Switzerland, like that so that kind of direction, if you're ever interested in good cloud hosting in Switzerland where privacy is still somewhat active. I I like doing all kind of creative things, where I think programming is 1 of the things that I really do a lot, but also very many other things.
[00:02:02] Unknown:
And do you remember how you first got introduced to Python?
[00:02:05] Unknown:
Oh, yeah. That's that's kind of funny because, like, it started out with a with a boss that looked for a better language than PHP. I I wrote like a 1, 000 lines of rec x in PHP, which makes no sense at all. But so that's kind of how it started, and then he forced it on me. I'd well, yeah, kind of, but it was a good choice.
[00:02:34] Unknown:
And so can you start by explaining what the Jedi project is and the problem that you're trying to solve when you first created it? So how it started was very simple. When
[00:02:44] Unknown:
when I started working in in Python, I didn't really use an IDE. I used VIM and for VIM there wasn't really good auto completion. And I think still there, like, there's not a lot else in the Python world other than Jedi and PyCharm. And so I just thought, well, why wouldn't I kind of improve that state of the art? But but at the same time, I I was very new to Python. And I also think that I wasn't the best programmer back then. So but I I I just started and it kind of grew.
[00:03:26] Unknown:
Yeah. My understanding is that at least for Python auto completion and Vim, there's the you complete me project and neocomplete, both of which are decent, but I definitely think it's far and above the best 1 in terms of Python completion for the Vim editor, as well as many others. And when you were creating the project, how did you settle on the name of JEDI?
[00:03:50] Unknown:
Well, I I I think before we clear the name, like, 1 thing first is that, like, if you're using U Complete. Me, you're using Jedi below. Like you complete for like, if you if you're using it for Python auto completion, if you're using it for C plus plus there's other stuff that does it behind. But YouComplete Me is not a completion library. Like, so there's like 1 of the confusions, a lot of people have around Jedi is that there's now 2 projects called that are maintained by me or mostly by me. And and 1 is called JEDI and 1 is called JEDI VIM. And 1 is like the the plug in for VIM. Like, there's JEDI and other stuff.
And 1 is the the the auto completion library. And the auto completion library is used for a lot of different plugins and editors. Yeah. For a lot of different editors and IDs. Okay. So the name itself is like, I think my idea was just that my name should be in there and like Jedi is part of my second name. And so that was kind of a fun play with a name. Now I'm a bit afraid about trademarks, I guess, because it's it might not be the best choice after all, even though it's a fun name.
[00:05:19] Unknown:
Yeah. I think Jedi has become so colloquial that that it basically exists as its own word and language as, you know, independent of its original incarnation with the Star Wars series. And so when I was reading through the documentation, I noticed that there's also alpha support for linting capabilities within Jedi. And I'm wondering if you could compare and contrast the approach that you take for linting and the capabilities that it has in contrast to other tools such as Pylent or Flak8 that people might be familiar with?
[00:05:49] Unknown:
So I think the alpha support is really something that started out a few years ago, and I've kind of given up on it for the near future. I still think it has tremendous potential for the faraway future, but not not for now. The idea is basically that if you compare it to the flake project, I think it's called flake. Right? Yes. Pie flakes or yeah. Flake 8. Yeah. Flake a is the combination, but pie flakes. So pie flakes is doing some stuff really, really well and is really, really fast, but it doesn't really find all box it potentially could. So if you're cross referencing from 1 module to another, PyFlakes will not find that. And so the time came where I kind of started realizing that if you're writing a lot of type inference, you can also combine that and start doing static analysis with it and all and all of the other stuff that is kind of advanced ID stuff where like like refactoring.
Or also now, there there's a library that was called PEP 8 and is now called PyDocs no. Py codes code style. And it's basically just printing out issues that that you have in your code that are not like programming mistakes but style mistakes. They're like stuff that doesn't conform with PEP 8. And that is also something that I'm now working on, but that might not even be part of JEDI, but be part of another another project called Parso.
[00:07:37] Unknown:
When I was reading through the documentation, I did notice that, as you said, you have your own parser that you're using for destructuring the Python syntax rather than using the AST module built into the Python language. So I'm wondering if you can explain a bit about why you found the need to write your own parser rather than using the built in AST and some of the benefits
[00:08:01] Unknown:
and trade offs that that has enabled? I didn't actually start writing my own parser. Like, in the beginning, I I kind of did because the biggest problem you have in auto completion, contrary to kind of static analysis, is you have to deal with broken code. Like, you want to be able to say foo dot and then have a completion. And that is already broken code because there's like, that is not a valid Python. So in I think in parser theory, you call that error recovery. An error recovery is something kind of hard to do in the beginning if you, if you don't know how you're doing it. And so I I kind of like, I made so many mistakes there until I realized that I could actually be using a real parser and, like, not do all by myself and with my hands. And so I started noticing, guido van Rossum's lib 2 to 3 parser that he wrote.
And basically, the parser that I'm using now for Jedi is the lib 2 to 3 parser just with a few additions like error recovery. And I mean, there's a lot of code around it, but in in essence and, like, in in the center of it all, there's still guido's parser. And I think, like the parser idea is also important. Like I want I still actually want to do refactoring with JEDI. And if you want to do refactoring, you have to know like, you have to have all the white space information so you can kind of modify an a syntax tree and generate code from it again. So you don't have to yeah. Like like, that's just the easiest way to make re refactoring work. It's called it's called also in parser theory, it's called, round trips. So you can actually parse something and then have have your parser tree and then call a function on it and you have code again.
[00:10:12] Unknown:
And what led you to choose the lib 2 to 3 capability for parsing rather than some of the other parsers that are available in the Python ecosystem such as, ply or py parsing or things like that?
[00:10:27] Unknown:
That's actually a good question. I think, like, the biggest reason is that ply and py parsing and so on, they're they are not parser generators. And I think most of the listeners here might not know the difference between a parser and a parser generator. But basically, what what Python does by itself is you give it you give Python a a grammar file. You can Google for that. There's a Python grammar file, that is very nice to read and very, very obvious what it does. So the theory is that you can actually write a parser that parses that file and checks for what your structure, like, and checks for what the structure is of that file and builds a finite automaton or something. Like like, basically, it just it just uses that that that parser file, that grammar it's called a grammar file.
It uses that to build a parser from it. So what you're writing in the end, what a what a parser generator is is is is a parser for grammar files and then kind of wrapping that, to or generating a parser. It's it's kind of weird, the concept of it. But what I like about it is that you can actually have with a parser generator, you can actually have different grammars. So if you want to both parse Python 27 and Python 36, you're going to need a generator. Right? Because if something in the grammar changed, you have to write a lot of Python code somewhere if you don't have that. But if, like, my chain like, changing for me from 335 Python 35 to Python 36 was just adding a different grammar file and the whole thing still worked.
[00:12:23] Unknown:
Yeah. I know that that issue crops up whenever there's a new release of Python for things like Pylint is that it takes a little while for them to upgrade the Pylint repository to be able to lint the newer version of Python because of the fact that things in the AST have changed. And so there are different attributes that it doesn't recognize yet. So you have to wait for their upgrade cycle after the Python upgrade cycle completes. And so it sounds like with your approach, what you're doing is you're actually just pulling in the grammar files that are defining the specific version of python and then associating that with whichever runtime you're executing against.
[00:12:59] Unknown:
Exactly. Like you just save those grammar files and they I mean, they're not long. That's like it's like if you look at it, it's like 50 lines of code. And that's really awesome because now you have, like, 6 different Python versions and 6 times 50 lines of code and you can parse them all. What you what I mean, obviously, what you still have to do is there's this second step after parsing that is called type inference. And, I mean, you have to basically, once the async statement came in, like, there's new types in the language. So it's kind of you you you still have to do some work. But at least it's not like, oh my god. I'm doing there's like this new keyword and now I need to write new functions to parse it and also do the inference stuff. Now you're, like, with parser generators, there's only half the amount of work.
[00:13:55] Unknown:
It sounds like it's also a little bit more robust to failures for when that new syntax does get introduced because the parser generator will still be able to churn through the actual source code and generate a representation of it. It's just that as you're going through for trying to build the auto completion, there may be just a few holes where it's not able to recognize what's going on, but it will still be able to operate on the rest of the syntax that it's already familiar with.
[00:14:21] Unknown:
Absolutely. And and, like, that's 1 of the big reasons why I'm why I'm releasing this thing because it will actually empower a lot of projects like like the Pycode style project or like the Yap project or like maybe if they consider switching and so on where, like, they just have an like, they have if if you're just using the AST module, which is which is fine, like, but it like, if the code is not correct, it will just like, everything will just fail. And with the with the parser approach that does error recovery, like that's that's actually a big empowerment for a lot of for a lot of projects and makes a lot of stuff possible that it was previously very, very hard to do and involved a lot of manual work.
[00:15:11] Unknown:
And so we've talked a bit about the parsing capabilities that are used for Jedi, but I'm wondering if you can dig a bit further into the internal architecture of how Jedi operates and what the overall design architecture looks like?
[00:15:25] Unknown:
Well, I guess, like, the the biggest part at at a at a high level part, there's there's, like, 3 parts of Jedi. Like, 1 part is very simple that is just the API where, like, some stuff is made public and some other stuff is not. And then there's, like, 2 central in integral parts of of Chad. I am. 1 is the parser and 1 is the type inference. We've talked we've talked a little bit about that. I think we don't need to talk about the parser in detail anymore. Maybe we can do that later. But so the type inference is is kind of complicated because it involves a lot of manual code. If you actually want to have a good imagination of how type inference works, it's actually kind of similar to a compiler where instead of getting 1 type in the end or 1 value, you get multiple values. And what what what's also interesting forwards, you're analyzing code backwards. And analyzing code backwards means there's a lot of recursion.
So what you do is if you if you have a function call, you first analyze the return statement of that function call or the return statements if there's multiple ones. And once you have like, you check what what it returns. And if if if it returns a name name, that name is looked up. And so that's kind of how it works. It it's just recursing over all that stuff. It's yeah. I I think I I'm not I'm not exact exactly sure because my my my university degree has been a while ago, but I think you can also call it backtracking. That's kind of what you what it does.
[00:17:22] Unknown:
So you were talking about backtracking through the code and trying to infer the types of the potential return values values for a function and particularly for cases where there are multiple return statements based on different conditionals. What are some of the unique challenges that are posed by trying to understand those different types and be able to generate appropriate completions given how dynamic Python can be? Well, I I think like there's the standard
[00:17:52] Unknown:
kind of backtracking approach that is very easy. Like you can you can even program that very quickly. Like, there it's a lot of work, but it's not it's not impossible. Like, what gets really hard is that in Python, oftentimes you want you want completion if you're inside a function. What's what's the value of that parameter? And so you kind of have to recurse for that parameter as well. Like you have to you have to search function calls. And so that just gives you very nasty recursions. So you have to be able to to kind of stop your program at some point to be able to you want to be able to still analyze some other parts of the program if if if something is a dead end, but it shouldn't take too much time. So performance is very much an issue, with auto completion because you want it in real time. Right? 1 of the other core issues we've had is basically lists and, like, all those container types or let's call it state. Like in Python, you can just put anything inside a list and try understanding that. Jedi does some analysis on it. Like you will you will get completions on lists and stuff. It's not perfect, but it's it's probably better than PyCharm and others.
So, yeah, I think those are like, those things are the biggest issues. And then there's just stuff that you cannot understand with with auto completion, stuff like metaclasses. Like, try understanding a Django metaclass. It's just impossible. You can kind of imitate it, but you have to hard code it. It's not something your your program will understand. You have to for each and every meta class, you have to write code to understand it. And so that is just something that doesn't scale really well, which is unfortunate because it would be really nice.
[00:19:54] Unknown:
And is it possible for a user to provide those definitions to Jedi so that they can have a sort of maybe a plug in for Jedi for their particular project so that it is able to provide those completions for some of the complicated edge cases that Jedi might not necessarily be able to intuit on its own?
[00:20:13] Unknown:
There's, at the moment, no plug in system. It's definitely something that that is kind of planned. It's just kind of hard to to implement that stuff right because it's kind of a chicken egg problem with plug ins. If your API doesn't exist, nobody works with it. So you basically have to be the the first user of it. Like, you have to create the plug in yourself because just creating just creating a random API doesn't really help most most of the time. So, yeah, in conclusion, there's there's no API. It's it's kind of planned, but it probably takes another year because there's virtualenv support is coming, like proper virtualenv support, and that is definitely for for most people is more valuable.
[00:21:02] Unknown:
And in addition to being able to complete the point in the cursor where, for instance, if you're trying to complete the bar attribute on an instance of the Foo class, that's something that a lot of people would be able to understand. But 1 of the other things that Jedi can do is to actually find all of the usages of a particular function definition as well, which can be quite useful for things like refactoring or just doing code exploration when you don't necessarily know all the different places that something is happening. So are there any differences or additional complexities
[00:21:36] Unknown:
that arise from being able to provide that capability as well? It's I think, usages is is, in my opinion, is just horribly bugged. Like, I have to re I or someone else has just has to re rewrite that stuff. In my opinion, like, now, it's not a lot of additional complexity because I've refactored Jedi's internals for like like the the 5th time now. And so it it's it's getting less and less codes that you have to write to to make something like usage as possible. The biggest challenge there is that it's just very hard to not take 10 seconds for such a such a search because usages is just like if if you search like a 1, 000 files, it will just take a long, long time. So so in in general, usages or also refactoring stuff, It's just a time constraint that makes it hard.
[00:22:31] Unknown:
Yeah. It sounds like something where you would need to create some sort of caching capability where as you're parsing the source of the project that you're working with, you would need to necessarily cache the parsed representation so that you could then use that as a search space for the usages capability so that you don't necessarily need to repeat the same operations of finding all the places that something occurs.
[00:22:53] Unknown:
Yeah. I mean, that's that's definitely true. But that's something that's something we already do. We do some caching. Like, we do cut caching of parsers. But the bigger problem is that, you also have to to cache the references of those files, like like the cross references of of, like, modules, like when you when you start caching cross references, it's just a very hard problem because you have this big graph. And once 1 note is invalidated, it basically invalidates everything. So that kind of story and a lot of companies actually work around that by not always providing like like, take PyCharm, for example.
I think the way how PyCharm does it is either they provide sometimes invalid suggestions, but they they don't really matter because they hide it very well. Or they just like, it takes a lot of time in certain edge cases. And I think they they just have, like they they work a lot more with databases and that kind of stuff where jedi is really something almost stateless except for parser parser caches.
[00:24:05] Unknown:
Yeah. It's definitely a, difficult problem. And in my own use, I generally tend to fall back on things like grep and ag for just searching for the particular string that I'm trying to find all the usages of. But there are some limitations for that, particularly if you're trying to find a function that has the same name across different modules with potentially different function signatures.
[00:24:29] Unknown:
But it does it work for you? Does it act like are you actually using it? Because I'm not. And I and I I created it.
[00:24:39] Unknown:
I think I've used it a few times in the past, but for the most part now, I generally just rely on the code base for a particular function name, which most of the projects I'm working with are small enough that it works fairly well. But there are definitely cases where you have a sort of common method name and so you have to try and visually parse. Okay. This is coming from that module. This is coming from that module. You need to Okay. You need to do some of your own, sort of visual analysis
[00:25:04] Unknown:
where if the if it was built into the tool, it might save you a bit of that time. I see that. Yeah. I I also use grip a lot. So that's also my my poor man's method. Yeah. But it will get better. It will get better. I promise. Like, it's not it's not finished. Like, that's not where it ends.
[00:25:22] Unknown:
Yeah. It definitely seems like a fairly complicated and difficult problem to solve. But if and when it does become sort of more robust and production grade, it's certainly something that I would happily take advantage of. Me too. Speaking of ways that analyzing users code is difficult, particularly in terms of trying to find the usages, what are some of the other edge cases that Jedi runs up against that it has difficulty trying to perform completions of or some of the other maybe project structures or, sort sort of module imports that can confound JEDI in the process of trying to find appropriate completions?
[00:25:56] Unknown:
Well, I think I think there's a lot of a lot of stuff that can kind of make it not work, but we've gradually we've we've kind of improved. And there's I think there's a a small list of things that Cheddar just cannot understand at the moment and and some of it will not be possible in the future either. Like, some of some of the issues involve calls to locals and globals to those functions. Then there's like stuff like set, If you call set, there's obviously meta classes that are just impossible. Like set would be kind of possible maybe in the future, but it's really hard. And then there's a lot of binary code. I think that's of all of the things that I that I mentioned, like, that is the biggest issue that if you're using PyQ PyQ or I think or Pyside or whatever, like all those Python libraries that are that are written in c below, it's completions are just lacking because there's no way to really understand that code. It's just I mean, you can call dir on it, but you cannot if there's no docstring that kind of explains what it returns, you just have no idea what it returns. And I think that's the biggest issue for a lot of people. Yeah. When I was reading through the documentation, I was noticing that when you refer to the subject of trying to perform completions
[00:27:24] Unknown:
on binary modules, It sort of violates your safety guarantees of not actually executing any of the Python code to generate the completions because you don't have any other way of understanding what the, sort of function signatures are that are available from that particular module.
[00:27:41] Unknown:
Exactly. Exactly. Like, there's no, like, there's no security that JEDI doesn't execute code. Jedi doesn't really execute Python code. There's 1 exception for now that we will fix, but, it's it's not. But at and at the same time, like, you will be executing that code most of the time. Right. But it's it's not a it's not a good feeling. Like, it doesn't it doesn't have a good feel coming with it. I I don't like that it is that way. Like, a tool should be safe Sure. By default.
[00:28:15] Unknown:
And by virtue of the fact that it actually doesn't need to execute the majority of the code that it's working with to perform its analysis, it seems like it could be a potential benefit for building tools on top of it for doing things like security analysis of the project to try and find vulnerabilities or even, maliciously inserted modules.
[00:28:37] Unknown:
Yeah. Exactly. Exactly. That's something I wouldn't do now.
[00:28:44] Unknown:
And you also briefly alluded to the fact that because of the fact that you are actually parsing the code and not just relying on the AST, you're actually able to use docstrings to be able to infer things about the function signatures of the code in question as long as it's formatted appropriately. So I'm wondering if you can talk a bit about how that works and some of the mechanisms that you're using to be able to evaluate and understand those docstrings.
[00:29:10] Unknown:
Well, understanding docstrings is actually not that hard. It's it's it's it's like that part of Jedi where I have done the least amount of work on it because a lot of, like, like a lot of people actually like that functionality a lot, so they contribute to it. It's kind of the same with PEP 3 80 or was it 430? I'm not sure. Like, the PEP WER that it introduced type annotations. But coming back to docstrings, it's just it's just checking for certain patterns. It's a little bit of regex, regular expressions. And once you're done with that, you just statically analyze the the expression that's there. Like, there's not a lot of magic to evaluating docstrings, I think.
[00:29:59] Unknown:
Yeah. I think it really helps that there are such well established standards for the particular formatting of docstrings within the Python community so that it simplifies the job of actually trying to get meaningful information out of it.
[00:30:10] Unknown:
Yeah. I think that is that is very, very helpful. There's like 2 or 3 competing standards, and that is very easy to part to to work with. And I also I also think that docs like adding doc types to your docstrings is kind of a thing of the past in at least Python 35 or something because adding them as type annotations is just way nicer and way more readable most of the time, and it helps with understanding the code.
[00:30:42] Unknown:
And with the introduction of the typing module in 3.5, it's also more functionally applicable as well. So that rather than having to go to the docstring itself to try and mentally compile how things are hanging together, you can actually use that module and other modules built on top of it to let the computer do that work for you.
[00:30:59] Unknown:
Exactly. And, like, there's there's a lot of things once like docstrings get very old sometimes. And I mean, type annotations can also get old. But at least if you're referencing your own kind of classes and you rename them, you will get actual errors. And that is something very nice, I think. There's tools and that, like, even even the runtime throws errors there. And like also with the introduction of Mypy, I think there's a lot of work going into a direction where you can actually kind of validate
[00:31:34] Unknown:
certain parts of your programs. And I like that. And we've talked about some of the different technical difficulties that you have encountered and overcome in the process of building JEDI. But I'm wondering if there are any others that you'd like to highlight.
[00:31:48] Unknown:
There were so many, but, like, I think by large, the biggest the biggest mistake I've made was that I didn't use a real parser in the beginning. That cost me probably a year of work. And by year, I mean, a year of, like, working full time. Because what I did was just kind of play around with Python, but I didn't really it wasn't really well programmed. Like, it it's just and and that's what I kind of realized and what I'm what I'm now getting better at is that if you don't don't do something right, it will bite you some some time. And it has really given me a hard time that I didn't have that. And now I'm slowly kind of getting back and cleaning up my code, removing a lot of old stuff that is not really needed anymore because there's a proper representation of your code and not something fuzzy that that you have to deal with where you have to write a lot of ifs and loops to analyze stuff that is not valid anyway that is kind of partially valid. Now you just know either that code is valid and you analyze it or you don't. And that is kind of my big learning, I guess. And the biggest technical challenge kind of removing the old parser and replacing it with a new 1, which was really, really hard because you you can kind of imagine that replacing a heart of something like the parser is really the the the lowest component. Like, you you build on that. Like, the the parser tree that you get is is where, like, everything ends. Like, there's there's not really really a lot of things you don't have to touch if you remove the parser. So I basically rewrote the whole thing.
[00:33:41] Unknown:
And are there any projects or libraries that you have seen using JEDI in unusual or unexpected ways?
[00:33:49] Unknown:
Well, I think there's there's a couple. 1 of the things that I kind of like seeing is that sometimes at conferences or even even by mail, some people write me that, oh, we kind of used your library, but it's just internal in our company's own editor. And I'm kind of wondering why they're writing their own editor. But, anyway, like like, that's kind of funny to me. And I think the second 1 is IPython now. IPython version 6. So starting with version 6, IPython comes with JEDI. And there, JEDI is not used as the classical static analysis library, but more of please also do that sometimes, but analyze the the objects that we give you. And so that makes auto completion great, in in IPython now. And that's very unusual and very new for for like, a lot of people didn't think that was actually possible, but it is.
[00:34:58] Unknown:
Yeah. I've actually got my personal RC file for my Python shell as well that uses Jedi just within the regular REPL because of the fact that it does provide so many nice niceties of when you're just trying to do exploratory analysis of a particular code base or just trying to feel your way around a particular problem. It's nice having that easy auto completion. So having that is such an easy addition to any Python terminal is quite useful.
[00:35:23] Unknown:
I think so too. I'm I'm kind of, like sometimes it doesn't work, and I'm not really sure why. I'm hoping that that will improve a lot now that IPython is using it because it's an extremely large user base. And, like, from the beginning on, I've got a lot of I've gotten a lot of issues that we can fix now and, like, a lot of feedback, and that's very nice. There's more contributors coming because they want certain things fixed. And so I I really hope that Repl auto completion is is improving and is is going to get even better.
[00:35:59] Unknown:
And are there any other features or projects that you have planned for the future of JEDI?
[00:36:04] Unknown:
Well, there's, like I like I said, kind of my road map is virtual and understanding virtual ends and, like, being able to switch between virtual ends and, like, making that possible. I think that's that's the first thing I'm going to do. Then refactoring is something that will come to JEDI. That might still take a long time, but it's definitely in the works. There's going to be there might be, like, 1 of 1 of the things I'm I'm thinking about is is actually supporting supporting Django with a a plugin architecture. And also maybe it's it's language, it's template language.
So, kind of kind of branching out into cross language support. That might be interesting because now I kind of have to, like, I, I feel like I'm getting closer to actually knowing how I I I can do can understand the language in a really fast way because what I've been doing now is mostly improving my code base and not adding features to it. So, like, the last 3, 4 years were not just features. Mostly, it was just refactoring and not the not the refactoring feature that might come to Cheddar by refactoring Cheddar's code itself. So, yeah, I I I have a lot of plans and ideas. Like, I I've even had the idea of, like, maybe creating a Rust auto completion, but that's that's kind of far away. And and, like, it's also an issue for me because I don't have extreme amounts of time. I have, like, maybe 10 to 20 hours a week. It's kind of it.
[00:37:56] Unknown:
It's nice to be able to have that amount of time to dedicate to open source and we all appreciate your efforts. It's definitely not always easy to find the time to be able to work on things like that because of other pressures in life.
[00:38:09] Unknown:
Oh, yeah. It is. Like, I'm I'm in a good space. I'm not that old. I don't have a family for now. I, like, I work 80%. I don't work a 100%. So I I have a bit more time, but still it's it's a challenge. But I just like doing it. Like, it's just so much more relaxing than working at a company where you have to perform and here's something can just take half a year. People will be happy that it's out after half a year, and not after 2 weeks.
[00:38:44] Unknown:
Yeah. That's definitely great being able to have side projects like that because it does give you more room for creativity without necessarily having to work through the pressures of whatever the external requirements are, and it lets you have a lot more self direction.
[00:38:58] Unknown:
Oh, yeah. And it's, like, the biggest problem is that it's it's financially just unsustainable. Like, that's still my, like, I think my biggest issue then. You can't just not really make money with it, especially if you want to make it perfect, which is good, basically, but still.
[00:39:17] Unknown:
So are there any other topics or questions that you think we should cover before we start to close out the show?
[00:39:23] Unknown:
No. Not really. Like, I think I I would recommend people to watch out for Parso. Parso is coming in, like, a month or 2. It's it's a Python parser for everybody that is working with static analysis stuff that might be really interesting. But, yeah, for me, that's it.
[00:39:43] Unknown:
So for anybody who wants to get in touch with you and follow the work that you're up to, I'll have you add your preferred contact information to the show notes. And so with that, I'll move us to the picks. And this week, I'm going to choose the UNIX patch utility, which lets you download a patch file, for instance, from a GitHub pull request or from a diff and then be able to apply that to another code base because there are times when just relying on your version control system to be able to provide those merges isn't necessarily practical.
[00:40:14] Unknown:
And being able to have that utility to fall back on is quite useful. In fact, I ended up using it just yesterday. So with that, I'll pass it to you. Do you have any picks for us today? I'll start off with a band. It's called Bear Sten. Really nice band. Like listening to them. They're awesome. And then there's, like, all the creative stuff I mentioned in the beginning that I like doing. Like, in in in the last few months, I think it was like soccer, singing, and dancing. That's really awesome. I like, and then for technical stuff, I have 1 thing called Docopt.
Docopt is a it's a competition to arc parse and those kind of things where it's just very easy to build command line interfaces. It just makes your life easier. You just write documentation and everything's working, basically. Really small library, but really good as well. And the other technical thing I have, I just thought I'd mention it, because nobody ever mentions it, is OpenStack. I I actually like we're we're working on it with our company, but I actually like how clean the code base is. It's for for how large it is. Like, it's it's a few million lines of code, I think. It's just very, very readable. And I really like that because most of the open source code I've read is not. So, yeah, I think if you're ever planning on using an open source cloud solution, OpenStack is really in a good place.
[00:41:53] Unknown:
Yeah. It's definitely a very sizable project and very ambitious project, but I've used it a few times and it's definitely quite nice. Oh, yeah. Alright. Well, I appreciate you taking the time out of your day to join me and talk to me about the JEDI project and, digging into its past and its internals. It's definitely 1 that I use pretty regularly and, 1 that I'm happy to continue using. So I appreciate your work on that, and I appreciate your time. Well, thank you a lot for this interview. It Was really nice talking to you.
Hello, and welcome to podcast dot in it, the podcast about Python and the people who make it great. I would like to thank everyone who supports us on Patreon. Your contributions help to make the show sustainable. When you're ready to launch your next project, you'll need somewhere to deploy it, so you should check out linode at ww w.podcastinnit.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your app or experimenting with something that you hear about on the show. You can visit the site at www.podcastinit.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. To help other people find the show, please leave a review on Itunes or Google Play Music. Tell your friends and coworkers and share it on social media. Your host as usual is Tobias Macy. And today I'm interviewing David Halter about JEDI, an awesome auto completion and static analysis library for Python. So Dave, could you please introduce yourself?
[00:01:03] Unknown:
Well, I'm Dave. I do some work, some open source work, mostly on on all the completion stuff. You might know me from the JEDI library. I also I work for a company called CloudScale dotch. We are basically doing the cloud stuff like DigitalOcean. But for Switzerland, like that so that kind of direction, if you're ever interested in good cloud hosting in Switzerland where privacy is still somewhat active. I I like doing all kind of creative things, where I think programming is 1 of the things that I really do a lot, but also very many other things.
[00:02:02] Unknown:
And do you remember how you first got introduced to Python?
[00:02:05] Unknown:
Oh, yeah. That's that's kind of funny because, like, it started out with a with a boss that looked for a better language than PHP. I I wrote like a 1, 000 lines of rec x in PHP, which makes no sense at all. But so that's kind of how it started, and then he forced it on me. I'd well, yeah, kind of, but it was a good choice.
[00:02:34] Unknown:
And so can you start by explaining what the Jedi project is and the problem that you're trying to solve when you first created it? So how it started was very simple. When
[00:02:44] Unknown:
when I started working in in Python, I didn't really use an IDE. I used VIM and for VIM there wasn't really good auto completion. And I think still there, like, there's not a lot else in the Python world other than Jedi and PyCharm. And so I just thought, well, why wouldn't I kind of improve that state of the art? But but at the same time, I I was very new to Python. And I also think that I wasn't the best programmer back then. So but I I I just started and it kind of grew.
[00:03:26] Unknown:
Yeah. My understanding is that at least for Python auto completion and Vim, there's the you complete me project and neocomplete, both of which are decent, but I definitely think it's far and above the best 1 in terms of Python completion for the Vim editor, as well as many others. And when you were creating the project, how did you settle on the name of JEDI?
[00:03:50] Unknown:
Well, I I I think before we clear the name, like, 1 thing first is that, like, if you're using U Complete. Me, you're using Jedi below. Like you complete for like, if you if you're using it for Python auto completion, if you're using it for C plus plus there's other stuff that does it behind. But YouComplete Me is not a completion library. Like, so there's like 1 of the confusions, a lot of people have around Jedi is that there's now 2 projects called that are maintained by me or mostly by me. And and 1 is called JEDI and 1 is called JEDI VIM. And 1 is like the the plug in for VIM. Like, there's JEDI and other stuff.
And 1 is the the the auto completion library. And the auto completion library is used for a lot of different plugins and editors. Yeah. For a lot of different editors and IDs. Okay. So the name itself is like, I think my idea was just that my name should be in there and like Jedi is part of my second name. And so that was kind of a fun play with a name. Now I'm a bit afraid about trademarks, I guess, because it's it might not be the best choice after all, even though it's a fun name.
[00:05:19] Unknown:
Yeah. I think Jedi has become so colloquial that that it basically exists as its own word and language as, you know, independent of its original incarnation with the Star Wars series. And so when I was reading through the documentation, I noticed that there's also alpha support for linting capabilities within Jedi. And I'm wondering if you could compare and contrast the approach that you take for linting and the capabilities that it has in contrast to other tools such as Pylent or Flak8 that people might be familiar with?
[00:05:49] Unknown:
So I think the alpha support is really something that started out a few years ago, and I've kind of given up on it for the near future. I still think it has tremendous potential for the faraway future, but not not for now. The idea is basically that if you compare it to the flake project, I think it's called flake. Right? Yes. Pie flakes or yeah. Flake 8. Yeah. Flake a is the combination, but pie flakes. So pie flakes is doing some stuff really, really well and is really, really fast, but it doesn't really find all box it potentially could. So if you're cross referencing from 1 module to another, PyFlakes will not find that. And so the time came where I kind of started realizing that if you're writing a lot of type inference, you can also combine that and start doing static analysis with it and all and all of the other stuff that is kind of advanced ID stuff where like like refactoring.
Or also now, there there's a library that was called PEP 8 and is now called PyDocs no. Py codes code style. And it's basically just printing out issues that that you have in your code that are not like programming mistakes but style mistakes. They're like stuff that doesn't conform with PEP 8. And that is also something that I'm now working on, but that might not even be part of JEDI, but be part of another another project called Parso.
[00:07:37] Unknown:
When I was reading through the documentation, I did notice that, as you said, you have your own parser that you're using for destructuring the Python syntax rather than using the AST module built into the Python language. So I'm wondering if you can explain a bit about why you found the need to write your own parser rather than using the built in AST and some of the benefits
[00:08:01] Unknown:
and trade offs that that has enabled? I didn't actually start writing my own parser. Like, in the beginning, I I kind of did because the biggest problem you have in auto completion, contrary to kind of static analysis, is you have to deal with broken code. Like, you want to be able to say foo dot and then have a completion. And that is already broken code because there's like, that is not a valid Python. So in I think in parser theory, you call that error recovery. An error recovery is something kind of hard to do in the beginning if you, if you don't know how you're doing it. And so I I kind of like, I made so many mistakes there until I realized that I could actually be using a real parser and, like, not do all by myself and with my hands. And so I started noticing, guido van Rossum's lib 2 to 3 parser that he wrote.
And basically, the parser that I'm using now for Jedi is the lib 2 to 3 parser just with a few additions like error recovery. And I mean, there's a lot of code around it, but in in essence and, like, in in the center of it all, there's still guido's parser. And I think, like the parser idea is also important. Like I want I still actually want to do refactoring with JEDI. And if you want to do refactoring, you have to know like, you have to have all the white space information so you can kind of modify an a syntax tree and generate code from it again. So you don't have to yeah. Like like, that's just the easiest way to make re refactoring work. It's called it's called also in parser theory, it's called, round trips. So you can actually parse something and then have have your parser tree and then call a function on it and you have code again.
[00:10:12] Unknown:
And what led you to choose the lib 2 to 3 capability for parsing rather than some of the other parsers that are available in the Python ecosystem such as, ply or py parsing or things like that?
[00:10:27] Unknown:
That's actually a good question. I think, like, the biggest reason is that ply and py parsing and so on, they're they are not parser generators. And I think most of the listeners here might not know the difference between a parser and a parser generator. But basically, what what Python does by itself is you give it you give Python a a grammar file. You can Google for that. There's a Python grammar file, that is very nice to read and very, very obvious what it does. So the theory is that you can actually write a parser that parses that file and checks for what your structure, like, and checks for what the structure is of that file and builds a finite automaton or something. Like like, basically, it just it just uses that that that parser file, that grammar it's called a grammar file.
It uses that to build a parser from it. So what you're writing in the end, what a what a parser generator is is is is a parser for grammar files and then kind of wrapping that, to or generating a parser. It's it's kind of weird, the concept of it. But what I like about it is that you can actually have with a parser generator, you can actually have different grammars. So if you want to both parse Python 27 and Python 36, you're going to need a generator. Right? Because if something in the grammar changed, you have to write a lot of Python code somewhere if you don't have that. But if, like, my chain like, changing for me from 335 Python 35 to Python 36 was just adding a different grammar file and the whole thing still worked.
[00:12:23] Unknown:
Yeah. I know that that issue crops up whenever there's a new release of Python for things like Pylint is that it takes a little while for them to upgrade the Pylint repository to be able to lint the newer version of Python because of the fact that things in the AST have changed. And so there are different attributes that it doesn't recognize yet. So you have to wait for their upgrade cycle after the Python upgrade cycle completes. And so it sounds like with your approach, what you're doing is you're actually just pulling in the grammar files that are defining the specific version of python and then associating that with whichever runtime you're executing against.
[00:12:59] Unknown:
Exactly. Like you just save those grammar files and they I mean, they're not long. That's like it's like if you look at it, it's like 50 lines of code. And that's really awesome because now you have, like, 6 different Python versions and 6 times 50 lines of code and you can parse them all. What you what I mean, obviously, what you still have to do is there's this second step after parsing that is called type inference. And, I mean, you have to basically, once the async statement came in, like, there's new types in the language. So it's kind of you you you still have to do some work. But at least it's not like, oh my god. I'm doing there's like this new keyword and now I need to write new functions to parse it and also do the inference stuff. Now you're, like, with parser generators, there's only half the amount of work.
[00:13:55] Unknown:
It sounds like it's also a little bit more robust to failures for when that new syntax does get introduced because the parser generator will still be able to churn through the actual source code and generate a representation of it. It's just that as you're going through for trying to build the auto completion, there may be just a few holes where it's not able to recognize what's going on, but it will still be able to operate on the rest of the syntax that it's already familiar with.
[00:14:21] Unknown:
Absolutely. And and, like, that's 1 of the big reasons why I'm why I'm releasing this thing because it will actually empower a lot of projects like like the Pycode style project or like the Yap project or like maybe if they consider switching and so on where, like, they just have an like, they have if if you're just using the AST module, which is which is fine, like, but it like, if the code is not correct, it will just like, everything will just fail. And with the with the parser approach that does error recovery, like that's that's actually a big empowerment for a lot of for a lot of projects and makes a lot of stuff possible that it was previously very, very hard to do and involved a lot of manual work.
[00:15:11] Unknown:
And so we've talked a bit about the parsing capabilities that are used for Jedi, but I'm wondering if you can dig a bit further into the internal architecture of how Jedi operates and what the overall design architecture looks like?
[00:15:25] Unknown:
Well, I guess, like, the the biggest part at at a at a high level part, there's there's, like, 3 parts of Jedi. Like, 1 part is very simple that is just the API where, like, some stuff is made public and some other stuff is not. And then there's, like, 2 central in integral parts of of Chad. I am. 1 is the parser and 1 is the type inference. We've talked we've talked a little bit about that. I think we don't need to talk about the parser in detail anymore. Maybe we can do that later. But so the type inference is is kind of complicated because it involves a lot of manual code. If you actually want to have a good imagination of how type inference works, it's actually kind of similar to a compiler where instead of getting 1 type in the end or 1 value, you get multiple values. And what what what's also interesting forwards, you're analyzing code backwards. And analyzing code backwards means there's a lot of recursion.
So what you do is if you if you have a function call, you first analyze the return statement of that function call or the return statements if there's multiple ones. And once you have like, you check what what it returns. And if if if it returns a name name, that name is looked up. And so that's kind of how it works. It it's just recursing over all that stuff. It's yeah. I I think I I'm not I'm not exact exactly sure because my my my university degree has been a while ago, but I think you can also call it backtracking. That's kind of what you what it does.
[00:17:22] Unknown:
So you were talking about backtracking through the code and trying to infer the types of the potential return values values for a function and particularly for cases where there are multiple return statements based on different conditionals. What are some of the unique challenges that are posed by trying to understand those different types and be able to generate appropriate completions given how dynamic Python can be? Well, I I think like there's the standard
[00:17:52] Unknown:
kind of backtracking approach that is very easy. Like you can you can even program that very quickly. Like, there it's a lot of work, but it's not it's not impossible. Like, what gets really hard is that in Python, oftentimes you want you want completion if you're inside a function. What's what's the value of that parameter? And so you kind of have to recurse for that parameter as well. Like you have to you have to search function calls. And so that just gives you very nasty recursions. So you have to be able to to kind of stop your program at some point to be able to you want to be able to still analyze some other parts of the program if if if something is a dead end, but it shouldn't take too much time. So performance is very much an issue, with auto completion because you want it in real time. Right? 1 of the other core issues we've had is basically lists and, like, all those container types or let's call it state. Like in Python, you can just put anything inside a list and try understanding that. Jedi does some analysis on it. Like you will you will get completions on lists and stuff. It's not perfect, but it's it's probably better than PyCharm and others.
So, yeah, I think those are like, those things are the biggest issues. And then there's just stuff that you cannot understand with with auto completion, stuff like metaclasses. Like, try understanding a Django metaclass. It's just impossible. You can kind of imitate it, but you have to hard code it. It's not something your your program will understand. You have to for each and every meta class, you have to write code to understand it. And so that is just something that doesn't scale really well, which is unfortunate because it would be really nice.
[00:19:54] Unknown:
And is it possible for a user to provide those definitions to Jedi so that they can have a sort of maybe a plug in for Jedi for their particular project so that it is able to provide those completions for some of the complicated edge cases that Jedi might not necessarily be able to intuit on its own?
[00:20:13] Unknown:
There's, at the moment, no plug in system. It's definitely something that that is kind of planned. It's just kind of hard to to implement that stuff right because it's kind of a chicken egg problem with plug ins. If your API doesn't exist, nobody works with it. So you basically have to be the the first user of it. Like, you have to create the plug in yourself because just creating just creating a random API doesn't really help most most of the time. So, yeah, in conclusion, there's there's no API. It's it's kind of planned, but it probably takes another year because there's virtualenv support is coming, like proper virtualenv support, and that is definitely for for most people is more valuable.
[00:21:02] Unknown:
And in addition to being able to complete the point in the cursor where, for instance, if you're trying to complete the bar attribute on an instance of the Foo class, that's something that a lot of people would be able to understand. But 1 of the other things that Jedi can do is to actually find all of the usages of a particular function definition as well, which can be quite useful for things like refactoring or just doing code exploration when you don't necessarily know all the different places that something is happening. So are there any differences or additional complexities
[00:21:36] Unknown:
that arise from being able to provide that capability as well? It's I think, usages is is, in my opinion, is just horribly bugged. Like, I have to re I or someone else has just has to re rewrite that stuff. In my opinion, like, now, it's not a lot of additional complexity because I've refactored Jedi's internals for like like the the 5th time now. And so it it's it's getting less and less codes that you have to write to to make something like usage as possible. The biggest challenge there is that it's just very hard to not take 10 seconds for such a such a search because usages is just like if if you search like a 1, 000 files, it will just take a long, long time. So so in in general, usages or also refactoring stuff, It's just a time constraint that makes it hard.
[00:22:31] Unknown:
Yeah. It sounds like something where you would need to create some sort of caching capability where as you're parsing the source of the project that you're working with, you would need to necessarily cache the parsed representation so that you could then use that as a search space for the usages capability so that you don't necessarily need to repeat the same operations of finding all the places that something occurs.
[00:22:53] Unknown:
Yeah. I mean, that's that's definitely true. But that's something that's something we already do. We do some caching. Like, we do cut caching of parsers. But the bigger problem is that, you also have to to cache the references of those files, like like the cross references of of, like, modules, like when you when you start caching cross references, it's just a very hard problem because you have this big graph. And once 1 note is invalidated, it basically invalidates everything. So that kind of story and a lot of companies actually work around that by not always providing like like, take PyCharm, for example.
I think the way how PyCharm does it is either they provide sometimes invalid suggestions, but they they don't really matter because they hide it very well. Or they just like, it takes a lot of time in certain edge cases. And I think they they just have, like they they work a lot more with databases and that kind of stuff where jedi is really something almost stateless except for parser parser caches.
[00:24:05] Unknown:
Yeah. It's definitely a, difficult problem. And in my own use, I generally tend to fall back on things like grep and ag for just searching for the particular string that I'm trying to find all the usages of. But there are some limitations for that, particularly if you're trying to find a function that has the same name across different modules with potentially different function signatures.
[00:24:29] Unknown:
But it does it work for you? Does it act like are you actually using it? Because I'm not. And I and I I created it.
[00:24:39] Unknown:
I think I've used it a few times in the past, but for the most part now, I generally just rely on the code base for a particular function name, which most of the projects I'm working with are small enough that it works fairly well. But there are definitely cases where you have a sort of common method name and so you have to try and visually parse. Okay. This is coming from that module. This is coming from that module. You need to Okay. You need to do some of your own, sort of visual analysis
[00:25:04] Unknown:
where if the if it was built into the tool, it might save you a bit of that time. I see that. Yeah. I I also use grip a lot. So that's also my my poor man's method. Yeah. But it will get better. It will get better. I promise. Like, it's not it's not finished. Like, that's not where it ends.
[00:25:22] Unknown:
Yeah. It definitely seems like a fairly complicated and difficult problem to solve. But if and when it does become sort of more robust and production grade, it's certainly something that I would happily take advantage of. Me too. Speaking of ways that analyzing users code is difficult, particularly in terms of trying to find the usages, what are some of the other edge cases that Jedi runs up against that it has difficulty trying to perform completions of or some of the other maybe project structures or, sort sort of module imports that can confound JEDI in the process of trying to find appropriate completions?
[00:25:56] Unknown:
Well, I think I think there's a lot of a lot of stuff that can kind of make it not work, but we've gradually we've we've kind of improved. And there's I think there's a a small list of things that Cheddar just cannot understand at the moment and and some of it will not be possible in the future either. Like, some of some of the issues involve calls to locals and globals to those functions. Then there's like stuff like set, If you call set, there's obviously meta classes that are just impossible. Like set would be kind of possible maybe in the future, but it's really hard. And then there's a lot of binary code. I think that's of all of the things that I that I mentioned, like, that is the biggest issue that if you're using PyQ PyQ or I think or Pyside or whatever, like all those Python libraries that are that are written in c below, it's completions are just lacking because there's no way to really understand that code. It's just I mean, you can call dir on it, but you cannot if there's no docstring that kind of explains what it returns, you just have no idea what it returns. And I think that's the biggest issue for a lot of people. Yeah. When I was reading through the documentation, I was noticing that when you refer to the subject of trying to perform completions
[00:27:24] Unknown:
on binary modules, It sort of violates your safety guarantees of not actually executing any of the Python code to generate the completions because you don't have any other way of understanding what the, sort of function signatures are that are available from that particular module.
[00:27:41] Unknown:
Exactly. Exactly. Like, there's no, like, there's no security that JEDI doesn't execute code. Jedi doesn't really execute Python code. There's 1 exception for now that we will fix, but, it's it's not. But at and at the same time, like, you will be executing that code most of the time. Right. But it's it's not a it's not a good feeling. Like, it doesn't it doesn't have a good feel coming with it. I I don't like that it is that way. Like, a tool should be safe Sure. By default.
[00:28:15] Unknown:
And by virtue of the fact that it actually doesn't need to execute the majority of the code that it's working with to perform its analysis, it seems like it could be a potential benefit for building tools on top of it for doing things like security analysis of the project to try and find vulnerabilities or even, maliciously inserted modules.
[00:28:37] Unknown:
Yeah. Exactly. Exactly. That's something I wouldn't do now.
[00:28:44] Unknown:
And you also briefly alluded to the fact that because of the fact that you are actually parsing the code and not just relying on the AST, you're actually able to use docstrings to be able to infer things about the function signatures of the code in question as long as it's formatted appropriately. So I'm wondering if you can talk a bit about how that works and some of the mechanisms that you're using to be able to evaluate and understand those docstrings.
[00:29:10] Unknown:
Well, understanding docstrings is actually not that hard. It's it's it's it's like that part of Jedi where I have done the least amount of work on it because a lot of, like, like a lot of people actually like that functionality a lot, so they contribute to it. It's kind of the same with PEP 3 80 or was it 430? I'm not sure. Like, the PEP WER that it introduced type annotations. But coming back to docstrings, it's just it's just checking for certain patterns. It's a little bit of regex, regular expressions. And once you're done with that, you just statically analyze the the expression that's there. Like, there's not a lot of magic to evaluating docstrings, I think.
[00:29:59] Unknown:
Yeah. I think it really helps that there are such well established standards for the particular formatting of docstrings within the Python community so that it simplifies the job of actually trying to get meaningful information out of it.
[00:30:10] Unknown:
Yeah. I think that is that is very, very helpful. There's like 2 or 3 competing standards, and that is very easy to part to to work with. And I also I also think that docs like adding doc types to your docstrings is kind of a thing of the past in at least Python 35 or something because adding them as type annotations is just way nicer and way more readable most of the time, and it helps with understanding the code.
[00:30:42] Unknown:
And with the introduction of the typing module in 3.5, it's also more functionally applicable as well. So that rather than having to go to the docstring itself to try and mentally compile how things are hanging together, you can actually use that module and other modules built on top of it to let the computer do that work for you.
[00:30:59] Unknown:
Exactly. And, like, there's there's a lot of things once like docstrings get very old sometimes. And I mean, type annotations can also get old. But at least if you're referencing your own kind of classes and you rename them, you will get actual errors. And that is something very nice, I think. There's tools and that, like, even even the runtime throws errors there. And like also with the introduction of Mypy, I think there's a lot of work going into a direction where you can actually kind of validate
[00:31:34] Unknown:
certain parts of your programs. And I like that. And we've talked about some of the different technical difficulties that you have encountered and overcome in the process of building JEDI. But I'm wondering if there are any others that you'd like to highlight.
[00:31:48] Unknown:
There were so many, but, like, I think by large, the biggest the biggest mistake I've made was that I didn't use a real parser in the beginning. That cost me probably a year of work. And by year, I mean, a year of, like, working full time. Because what I did was just kind of play around with Python, but I didn't really it wasn't really well programmed. Like, it it's just and and that's what I kind of realized and what I'm what I'm now getting better at is that if you don't don't do something right, it will bite you some some time. And it has really given me a hard time that I didn't have that. And now I'm slowly kind of getting back and cleaning up my code, removing a lot of old stuff that is not really needed anymore because there's a proper representation of your code and not something fuzzy that that you have to deal with where you have to write a lot of ifs and loops to analyze stuff that is not valid anyway that is kind of partially valid. Now you just know either that code is valid and you analyze it or you don't. And that is kind of my big learning, I guess. And the biggest technical challenge kind of removing the old parser and replacing it with a new 1, which was really, really hard because you you can kind of imagine that replacing a heart of something like the parser is really the the the lowest component. Like, you you build on that. Like, the the parser tree that you get is is where, like, everything ends. Like, there's there's not really really a lot of things you don't have to touch if you remove the parser. So I basically rewrote the whole thing.
[00:33:41] Unknown:
And are there any projects or libraries that you have seen using JEDI in unusual or unexpected ways?
[00:33:49] Unknown:
Well, I think there's there's a couple. 1 of the things that I kind of like seeing is that sometimes at conferences or even even by mail, some people write me that, oh, we kind of used your library, but it's just internal in our company's own editor. And I'm kind of wondering why they're writing their own editor. But, anyway, like like, that's kind of funny to me. And I think the second 1 is IPython now. IPython version 6. So starting with version 6, IPython comes with JEDI. And there, JEDI is not used as the classical static analysis library, but more of please also do that sometimes, but analyze the the objects that we give you. And so that makes auto completion great, in in IPython now. And that's very unusual and very new for for like, a lot of people didn't think that was actually possible, but it is.
[00:34:58] Unknown:
Yeah. I've actually got my personal RC file for my Python shell as well that uses Jedi just within the regular REPL because of the fact that it does provide so many nice niceties of when you're just trying to do exploratory analysis of a particular code base or just trying to feel your way around a particular problem. It's nice having that easy auto completion. So having that is such an easy addition to any Python terminal is quite useful.
[00:35:23] Unknown:
I think so too. I'm I'm kind of, like sometimes it doesn't work, and I'm not really sure why. I'm hoping that that will improve a lot now that IPython is using it because it's an extremely large user base. And, like, from the beginning on, I've got a lot of I've gotten a lot of issues that we can fix now and, like, a lot of feedback, and that's very nice. There's more contributors coming because they want certain things fixed. And so I I really hope that Repl auto completion is is improving and is is going to get even better.
[00:35:59] Unknown:
And are there any other features or projects that you have planned for the future of JEDI?
[00:36:04] Unknown:
Well, there's, like I like I said, kind of my road map is virtual and understanding virtual ends and, like, being able to switch between virtual ends and, like, making that possible. I think that's that's the first thing I'm going to do. Then refactoring is something that will come to JEDI. That might still take a long time, but it's definitely in the works. There's going to be there might be, like, 1 of 1 of the things I'm I'm thinking about is is actually supporting supporting Django with a a plugin architecture. And also maybe it's it's language, it's template language.
So, kind of kind of branching out into cross language support. That might be interesting because now I kind of have to, like, I, I feel like I'm getting closer to actually knowing how I I I can do can understand the language in a really fast way because what I've been doing now is mostly improving my code base and not adding features to it. So, like, the last 3, 4 years were not just features. Mostly, it was just refactoring and not the not the refactoring feature that might come to Cheddar by refactoring Cheddar's code itself. So, yeah, I I I have a lot of plans and ideas. Like, I I've even had the idea of, like, maybe creating a Rust auto completion, but that's that's kind of far away. And and, like, it's also an issue for me because I don't have extreme amounts of time. I have, like, maybe 10 to 20 hours a week. It's kind of it.
[00:37:56] Unknown:
It's nice to be able to have that amount of time to dedicate to open source and we all appreciate your efforts. It's definitely not always easy to find the time to be able to work on things like that because of other pressures in life.
[00:38:09] Unknown:
Oh, yeah. It is. Like, I'm I'm in a good space. I'm not that old. I don't have a family for now. I, like, I work 80%. I don't work a 100%. So I I have a bit more time, but still it's it's a challenge. But I just like doing it. Like, it's just so much more relaxing than working at a company where you have to perform and here's something can just take half a year. People will be happy that it's out after half a year, and not after 2 weeks.
[00:38:44] Unknown:
Yeah. That's definitely great being able to have side projects like that because it does give you more room for creativity without necessarily having to work through the pressures of whatever the external requirements are, and it lets you have a lot more self direction.
[00:38:58] Unknown:
Oh, yeah. And it's, like, the biggest problem is that it's it's financially just unsustainable. Like, that's still my, like, I think my biggest issue then. You can't just not really make money with it, especially if you want to make it perfect, which is good, basically, but still.
[00:39:17] Unknown:
So are there any other topics or questions that you think we should cover before we start to close out the show?
[00:39:23] Unknown:
No. Not really. Like, I think I I would recommend people to watch out for Parso. Parso is coming in, like, a month or 2. It's it's a Python parser for everybody that is working with static analysis stuff that might be really interesting. But, yeah, for me, that's it.
[00:39:43] Unknown:
So for anybody who wants to get in touch with you and follow the work that you're up to, I'll have you add your preferred contact information to the show notes. And so with that, I'll move us to the picks. And this week, I'm going to choose the UNIX patch utility, which lets you download a patch file, for instance, from a GitHub pull request or from a diff and then be able to apply that to another code base because there are times when just relying on your version control system to be able to provide those merges isn't necessarily practical.
[00:40:14] Unknown:
And being able to have that utility to fall back on is quite useful. In fact, I ended up using it just yesterday. So with that, I'll pass it to you. Do you have any picks for us today? I'll start off with a band. It's called Bear Sten. Really nice band. Like listening to them. They're awesome. And then there's, like, all the creative stuff I mentioned in the beginning that I like doing. Like, in in in the last few months, I think it was like soccer, singing, and dancing. That's really awesome. I like, and then for technical stuff, I have 1 thing called Docopt.
Docopt is a it's a competition to arc parse and those kind of things where it's just very easy to build command line interfaces. It just makes your life easier. You just write documentation and everything's working, basically. Really small library, but really good as well. And the other technical thing I have, I just thought I'd mention it, because nobody ever mentions it, is OpenStack. I I actually like we're we're working on it with our company, but I actually like how clean the code base is. It's for for how large it is. Like, it's it's a few million lines of code, I think. It's just very, very readable. And I really like that because most of the open source code I've read is not. So, yeah, I think if you're ever planning on using an open source cloud solution, OpenStack is really in a good place.
[00:41:53] Unknown:
Yeah. It's definitely a very sizable project and very ambitious project, but I've used it a few times and it's definitely quite nice. Oh, yeah. Alright. Well, I appreciate you taking the time out of your day to join me and talk to me about the JEDI project and, digging into its past and its internals. It's definitely 1 that I use pretty regularly and, 1 that I'm happy to continue using. So I appreciate your work on that, and I appreciate your time. Well, thank you a lot for this interview. It Was really nice talking to you.
Introduction and Guest Introduction
David Halter's Background and Work
Introduction to Python and JEDI
JEDI Project Origins and Naming
Linting Capabilities and Comparisons
Custom Parser and Its Benefits
Internal Architecture of JEDI
Challenges in Code Analysis and Completions
Future Plans for JEDI
Closing Remarks and Picks