Summary
Creating well designed software is largely a problem of context and understanding. The majority of programming environments rely on documentation, tests, and code being logically separated despite being contextually linked. In order to weave all of these concerns together there have been many efforts to create a literate programming environment. In this episode Jeremy Howard of fast.ai fame and Hamel Husain of GitHub share the work they have done on nbdev. The explain how it allows you to weave together documentation, code, and tests in the same context so that it is more natural to explore and build understanding when working on a project. It is built on top of the Jupyter environment, allowing you to take advantage of the other great elements of that ecosystem, and it provides a number of excellent out of the box features to reduce the friction in adopting good project hygiene, including continuous integration and well designed documentation sites. Regardless of whether you have been programming for 5 days, 5 years, or 5 decades you should take a look at nbdev to experience a different way of looking at your code.
Announcements
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- Your host as usual is Tobias Macey and today I’m interviewing Jeremy Howard and Hamel Husain about nbdev, a library for turning Jupyter notebooks into Python libraries.
Interview
- Introductions
- How did you get introduced to Python?
- Can you start by describing what nbdev is and the goals of the project?
- What is the story behind how and why it got started?
- Who is the target audience for the nbdev project?
- How does that focus influence the features and design of nbdev?
- What do you see as the primary challenges of building and collaborating on projects written in notebooks?
- What are some of the other projects that are working to simplify or improve the experience of using notebooks?
- How does nbdev compare to or complement those other tools?
- Can you describe how nbdev is implemented?
- How has the design and goals of the project evolved since it was first started?
- What is the workflow of someone who is using nbdev?
- At what point in the lifecycle of a notebook oriented project should someone start integrating nbdev?
- How does nbdev scale when working on a project that spans multiple notebooks/modules?
- How does working in a notebook environment change your approach to software development and project design?
- What are the most interesting, innovative, or unexpected ways that you have seen nbdev used?
- What are the most interesting, unexpected, or challenging lessons that you have learned from working on nbdev?
- When is nbdev the wrong choice?
- What do you have planned for the future of the project?
Keep In Touch
- Jeremy
- @jeremyphoward on Twitter
- jph00 on GitHub
- Hamel
- hamelsmu on GitHub
- Website
- @HamelHusain on Twitter
Picks
- Tobias
- Jeremy
- Hamel
- Moonwalking With Einstein by Joshua Foer (affiliate link)
Closing Announcements
- Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
- Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
Links
- nbdev
- fast.ai
- GitHub
- Perl
- Fastmail
- R Studio
- R Markdown
- Literate Programming
- fastcore
- JupyterLab
- nteract
- Jupyter VoilÃ
- GitHub Actions
- Sphinx
- Google Colab
- Working In Public by Nadia Eghbal (affiliate link)
- Jekyll
- Hugo
- Cython
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to podcast dot in it, the podcast about Python and the people who make it great. When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode. With the launch of their managed Kubernetes platform, it's easy to get started with the next generation of deployment and scaling powered by the battle tested Linode platform, including simple pricing, node balancers, 40 gigabit networking, dedicated CPU and GPU instances, and worldwide data centers.
Go to python podcast.com/linode, that's l I n o d e, today and get a $100 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show.
[00:00:56] Unknown:
Your host as usual is Tobias Macy. And today, I'm interviewing Jeremy Howard and Hamil Hussain about nbdev, a library for turning Jupyter Notebooks into Python libraries. So, Jeremy, can you start by introducing yourself?
[00:01:08] Unknown:
Sure. I'm Jeremy Howard. I'm a founding researcher at fast dotai.
[00:01:13] Unknown:
And Hamil, how about you?
[00:01:15] Unknown:
I'm Hamil Hussain. I'm a staff machine learning engineer at GitHub. I spent a lot of my time working on Fast AI with Jeremy.
[00:01:23] Unknown:
Going back to you, Jeremy, do you remember how you first got introduced to Python?
[00:01:27] Unknown:
I was a Perl programmer largely. I started a started a company called Fastmail, which is a email provider, and I used Perl for that. And I remember when Python started getting popular, I was kinda not particularly interested in it because I thought that Pearl is really great. But as I did more and more stuff with machine learning, Python became a bigger and bigger part of my life, and now it's what I spend most of my time doing. And, Hamil, do you remember how you first got introduced to Python?
[00:01:55] Unknown:
I just started using Python. I was kind of a data analyst at some point, and I wanted to automate some things. And at that time, I started actually with R and, you know, I wanted another programming language, and Python seemed like a good 1 at that time. And so I can just kinda naturally drift into that.
[00:02:14] Unknown:
R has got a lot of cool libraries, but I never liked it as a language. I do much prefer working in Python as a language. But there's a lot of libraries from RMS.
[00:02:24] Unknown:
Yeah. The R ecosystem is definitely pretty attractive, and there's definitely a lot of stuff from there that has inspired things in Python because people working in Python wanted to be able to have all the nice tools that the R folks did. But I'll agree that the language coming from somebody who's worked primarily in Python is definitely a bit foreign.
[00:02:41] Unknown:
PR libraries just tend to be more elegantly put together, though, I find. You know, Python libraries have a tendency to get the job done, but they tend to have more clunky APIs for us. The Arc ecosystem, you know, that community seems to really care about the developer experience a lot, which I really like.
[00:02:59] Unknown:
There's 1 thing I really miss about R, which is relevant to this conversation, is the development environment that I used when I was working with R Studio and R Markdown, you know, where you could write prose and text in the same sort of context. I I thought that was really nice, and I I sort of missed that when I went to Python and, you know, until I discovered nbdev, which we'll talk about more. But
[00:03:22] Unknown:
So going into Nbdev, can you give a bit of an overview about what the project is and the overall goals of it?
[00:03:29] Unknown:
Nbdev comes from, you know, my kind of decades of enthusiasm for literate programming and exploratory programming and never quite finding the right tool for the job. The closest I got was Mathematica, which I always really enjoyed working in, but I always found nearly impossible to deploy and very difficult to kind of get good performance and but the actual idea of being able to mix any kind of outputs you want, whether they'd be animations or hypertext or whatever, along with code and kind of hierarchically structured documents and have your kind of coding and documentation all in 1 place. I always found that just quite brilliant.
So, obviously, when Jupyter Notebooks came along, I got very excited about that, that I had the same problem, which is, you know, how do you deploy these things? Like, it's a great kind of scientific journal kind of environment, but I was trying to build artifacts that other people could use easily. So nbdev is something which brings the kind of worlds of a software development of Python libraries and notebooks together so that you can use notebooks. A single notebook will create your tests, your documentation, and your actual Python module all in 1 place. And I just love working with that. I find it dramatically more enjoyable and more productive.
[00:05:01] Unknown:
In terms of some of the storyline behind it and how you got it off the ground and how it got to where it is today, I'm wondering if there are any interesting anecdotes that you can share.
[00:05:11] Unknown:
Well, it all came out of really the development of the fast AI library, which is 1 of the most popular libraries for doing deep learning. And, originally, the Fast AI Library kind of came out of creating courses. So Fast AI has, I think, pretty much the world's most popular courses for learning deep learning, and they're all done in Jupyter Notebooks. It's such a great way to teach and such a great way to learn. And we were also doing a lot of research in Jupyter Notebooks, finding better ways to train better models. And often, the course would include a whole lot of stuff about, like, oh, here's some research we just did. Let's learn about the research together and see how it's done and understand the kind of motivations behind it and so forth.
So we would very often be creating new algorithms or implementing algorithms that were in papers but didn't have code. And so we really needed to find ways to make that code available easily to everybody. So that was really where it started, was taking fast.ai research and educational materials and turning them into libraries. But, really, the long term goal for fast dotai is to make it so that anybody can do deep learning without needing to do much education. So the fast AI library has kind of become a focal point of that work.
And so it's just been a very natural progression of using notebooks to do research and to build educational materials and to build libraries. And it's been really wonderful to see how many other people have found that same approach to development works for them, and MB Dev is now getting really popular, which is great to see.
[00:07:03] Unknown:
And in terms of the I know that there are a lot of different types of people, and there are a number of different sort of verticals and industries where people are using notebooks. I'm wondering if you talk about who the sort of target audience is for the nbdiff project and how that influences the features and design of MB Dev project.
[00:07:25] Unknown:
I think MB Dev works well for really any kind of project that you want to do. I mean, it's not, you know, limited to data science projects at all. In fact, we've been using MBdev for a number of different types of projects, including various utilities, DevOps tools, APIs, a API clients, and lots of things. And so it's it's really a general software engineering tool. I think I know that some people ask, you know, when might you not want to use MBdev or when might that be challenging? You know, I've tried to introduce MBdev to a lot of people into a lot of projects, and it is a new way of developing software. And so you have to kind of look at what your colleagues are using, and what your colleagues are willing to use. And you have to kind of assess like whether or not it's worth it to transition a project to MBDEV or whether your colleagues will be willing to give it a go and write software in MBDEV. And you kinda have an have to have an open mind to try this new type of software environment.
So that's the main consideration I see deciding,
[00:08:35] Unknown:
you know, when to use MBdev for a project. Hey. 1 nice thing about MBdev is you don't have to set out deciding to use it. You know, you can just start hacking something together in a notebook, you know, which is often what I wanna do just to explore a new API or to explore an idea or explore an algorithm. Maybe you don't even have a sense yet that that exploration is gonna lead anywhere useful. And then I find that once I get to a point where I think, oh, this is actually turning out quite nicely, then it's very easy to then kind of n b devify that notebook.
You just add, you know, like, 1 comment to each cell that you actually wanna export. And the nice thing is for anybody who kinda cares about code productivity and code quality or project quality, you you end up with a very quickly in a very nice place because once you decide, okay, I do wanna make this something that other people can use. If you've used notebooks, then you nvdevify it. You now have a really nice high quality documentation site for free. You have parallel parallelized tests for free. You have a PIP and a condor installer for free. You have a read me generated for you for free. So, like, all those kind of things that make a project complete, you know, and helpful to developers and reliable and maintainable. They're all done for you automatically, which otherwise I found before I was using nbdev.
It just seemed like a huge learning curve and lots of things to maintain, and I'd have, like, 10 different places that I had the same information and dozens of different tools trying to work together. So it makes it, like, really easy to go from an experiment you're hacking around with to a really high quality complete library.
[00:10:24] Unknown:
Yeah. It's definitely easy to, like you said, start off with something. And then before you know it, realize that you have something that's actually full blown and you want to be able to use it in more places. And I'm wondering if you can talk to the primary challenges that you see of using a notebook itself as a means of building and collaborating on projects, particularly with other people, because notebooks are definitely useful for exploratory programming, as you said, and they can be very useful for sharing the results of your work from a documentation and display perspective. But in terms of the collaborative or team oriented aspects of it, I know that there are some shortcomings. I'm wondering if you could just talk to the challenges that you've come across in that regard.
[00:11:09] Unknown:
If you're not using nbdev and you're just using notebooks, there's a lot of shortcomings. Take something very simple, which is nbdev doesn't play sorry. Yeah. Notebooks don't play nicely with version control kind of out of the box. So you end up with the diff markers that Git will add in to the file, makes it not JSON anymore, which means the notebook can't read it anymore, and that's gonna be a real mess. You end up with a lot of conflicts because, like, metadata can change in the cells that creates, like, dozens of conflicts even in cells you haven't changed. So with MBDEV, it has its own diffing merging tool, which actually ends up really nice because it does it at a cell level. It knows to ignore metadata.
It ignores differences in outputs. So we actually end up with quite a nice Git integration. 1 of the things I really like about working with notebooks is there's a really nice web based tool you can use, ReviewNB, which all your code reviews and PRs can go through. And when I'm doing a PR, it's really nice. So I'm not just looking at source code. I'm also looking at the documentation, the outputs, the hypertext, you know, in a web page. So I can see, has somebody made a PR that has reduced the clarity of the image augmentations in the fast AI library, for example. So normally with the plain diff, I'd never be able to see that. I'd only see the code. But when you're working with notebooks in this way, suddenly, it becomes really, really nice that you actually get to see how it changed the outputs.
So that'd be 1 example. Another example of a challenge is simply that code that is in 1 notebook can't be kind of imported into and used in another notebook. So, again, for collaboration, that's a nightmare, not just collaboration, but for yourself. You know, you kind of end up putting everything into 1 notebook or copying and pasting. So, again, with the dev, that all gets handled for you. They get turned into libraries so that you can import code from 1 into another just like a plain Python library. Another problem with collaboration is, like, notebooks.
There's some quite nice notebook viewers on the web, and GitHub isn't has a basic notebook viewer, but they're not as not nearly as nice to work with as properly indexed documentation with proper hyperlinks and tables of contents and search and so forth. So, again, nbdev will add that for you. So, yeah, all the kind of limitations of working with notebooks, of which there are many, suddenly actually become
[00:13:48] Unknown:
features when you add nbdev on top of it. I just wanna add to this. And your question was, you know, what challenges MBdev present to, like, collaboration? And there's, like, little bit of fixed cost for a contributor to learn MBdev. But in my experience, once people do that little bit of learning about MB Dev, collaboration actually becomes a lot easier because MBdev promotes a very nice workflow for software engineering and promotes best practices. So NVIDIA really encourages you to write documentation and tests because you do it in the same context. You write your code, your documentation, and your pros and tests altogether.
And so when someone is trying to contribute to your project, and I've experienced this many times at work, you know, that person is forced to explain the code that they're adding. And oftentimes in that process, we realize, hey. Like, we're not able to really explain that code or that code is too complicated. It ends up being naturally refactored because you're writing docs and tests at the same time in the same context. And you're really looking at code as your documentation as a first class citizen and writing code so that it can be presented to other people and and understood. And so I found that that really helps with collaboration. It kind of naturally works out. I find I'm doing less back and forth with people.
[00:15:16] Unknown:
Yeah. I mean, that's a good point. I find as an open source maintainer, the PRs I receive are higher quality and then be dev projects because when somebody's adding code, they're in the middle of the tests, the documentation. So, you know, it's pretty rare for somebody to misunderstand the context of why their code is there because they're, like, literally in the middle of documentation about it as they write their code. Pretty rare that they wouldn't have tests because, again, they're kind of adding code in amongst all the tests. So, yeah, I do find I get higher quality PRs with MBdev projects.
[00:15:49] Unknown:
When I first started out with MBdev, I jumped into this project called Fastcore. It's a fairly advanced Python library built by Jeremy. And I thought there's no way I'm gonna understand this. This is, like, basically, like, magic. But it's because of MBdev. MBdev allowed me to read the documentation and code together and play with it in a very nice interactive environment that I was able to catch on really fast, much faster than any other project of similar complexity.
[00:16:22] Unknown:
And the nice thing is those explorations that you did, Hammel, became part of the documentation. Because quite a lot of those explorations, you made part of a PR to say, like, here's how this thing works. So I thought that was kind of cool, but you're exploring in the notebook became
[00:16:37] Unknown:
explorations that other people could then learn from. Yeah. Definitely. Yeah. It was really gratify you know, like, the learning also paid off. Anytime I would read code, I would say, hey. Let me just add a little bit to the documentation here. Let me add another test, and it's not clear. So that's what really got me hooked. I really saw the power of mbdev. Because what frustrates me as a user when I use any Python library is lack of documentation. I think documentation is really underrated. And so that's something that, you know, MBdev really promotes.
Allows you to just write it in a very natural way.
[00:17:13] Unknown:
There are a number of other projects that work to complement overall ecosystem of working with Jupyter Notebooks. You know, there's the JupyterLab project to make it a little bit more like an IDE. There are a number of different plugins to Jupyter itself. And then there's also the overall ecosystem of other notebook environments beyond just Jupyter. And I'm wondering if you can just talk to how nbdev compares to or complements some of those other tools either within or outside of the Jupyter ecosystem. Yeah. JupyterLab
[00:17:42] Unknown:
is an exciting development of Jupyter Notebooks. The most recent version, version 3, that just came out a week or 2 ago, includes an integrated graphical debugger, which is a really cool step. The nice thing is that n b dev works fine with whatever Jupyter Notebooks host or Jupyter Notebooks server you're using. So nbdev works just as well with the classic notebooks as with interact, as with lab, or whatever you prefer. So it it's great to see how the notebook community is rapidly iterating and improving. You know, other cool stuff happening in the notebooks world includes stuff like Voila. Voila is a system that lets you create a graphical web applications entirely in Jupyter.
And JupyterLab isn't even now has a beta version of a drag and drop GUI builder that will create a Voila app from a notebook for you. And, again, all this stuff integrates really well with nbdev because once you've got things working the way you want, nbdev will then let you turn that into a library that anybody can pip install or or condor install with continuous integration and tests and documentation.
[00:18:58] Unknown:
Digging a bit more into nbdev itself, can you talk to how it's implemented and the feature set that it provides, and how the overall design and goals of it have evolved since you first began working on it?
[00:19:09] Unknown:
There's a lot of features in MBDEV. Something that Jeremy just mentioned is continuous integration, which is really exciting. So a lot of people don't really find they don't understand continuous integration or find it very difficult. I mean, certainly, I, when I first learned about continuous integration, I thought it was pretty difficult to get my hands around. And so MBdev runs a CI for you out of the box without any intervention from the user. MBdev implements allows you to write tests in notebooks in a very natural way. You don't have to learn a special API. Like, for example, if you wanna use pytest, you don't have to learn pytest. You can just write tests, like, with assert statements.
Then b dev machinery will execute will find those and execute those as tests automatically, and then they'll also run them in CI. So when you write your code and you push it, let's say, to GitHub, it will run-in GitHub actions for you and execute those tests and let you know whether or not all your tests are passing. So that's pretty advanced, you know, production level best practices, stuff that gets done for you automatically.
[00:20:22] Unknown:
And to get to that point, you literally just type. So there are various command line tools installed with nbdev, and 1 of them is nbdev new. And that will create a project for you. And 1 of the things that's created in that project is a GitHub actions continuous integration runner. Now if you don't use GitHub, you use something else, you would obviously need to modify that a little bit to work with your CI, but it's pretty straightforward to do that. And then you'll see that as soon as you push, you'll actually get an email saying, oh, your continuous integration is currently failing. So you actually set it up so that it, like, shows you how to write and pass your first test. So, like, out of the box, you're actually being told about the fact that that continuous integration is there. It's set up for you, and it shows you how to get your first test passing.
[00:21:14] Unknown:
Another really central feature to MBDEV, perhaps 1 of the most central ones, is the doc how the documentation gets built. So you don't have to know anything about HTML, CSS, web hosting, anything like that. You don't have to know Sphinx. I don't know Sphinx myself. You don't have to learn any kind of special presentational API thing. Notebooks get rendered into documentation for you and get hosted for you on GitHub Pages. So, you know, you don't really have to do anything. And the documentation has a lot of nice touches to it that are added in for you automatically. So 1 of my favorite features of the documentation is if you surround a name of a module in backticks either from your library or the Python standard library or other things, MBUX will automatically introspect that and find the link to the source code and will create a link for that. And not just modules, but also functions and also classes, pretty much any kind of symbol. Yeah. Definitely. And, you know, you'll create table of contents.
It will automatically kind of expand documentation for you if you have, you know, docstrings. It's very robust, so you can hide cells, show cells, hide output, show output. You can have collapsible cells. So it's really easy to use. It's very customizable. That's another feature that is super exciting for me. All of these things
[00:22:45] Unknown:
happen from these simple command line tools I mentioned. So 1 of the nice things about this is, you know, you can work in whatever environment you like, you know, because they're just tools that you run at the terminal. You can integrate them into any scripts or processes or whatever, and they'll integrate well with any other extensions that you're using and so forth. So a big part of the design of n b dev has been to ensure that it's very flexible and doesn't lock you into any particular details about the tools that you're using other than that you're writing stuff in notebooks.
[00:23:24] Unknown:
And in terms of the workflow of somebody who's using nbdev within a notebook environment to build a project, Can you just talk through some of the steps involved? I know you mentioned the commenting on certain cells and how you're able to mark them as being used for particular purposes, whether it's the code or the documentation or tests, etcetera, hiding and showing. And, also, for somebody who is working in Jupyter, at what point should they start thinking about whether they want to bring nbdev in and just the overall experience of building a project with it?
[00:23:57] Unknown:
When I start a new project, I always start by typing nbdev new, regardless of whether I actually think this is gonna end up being something that I export into a library and documentation with MB dev or not just because that's gonna create the, you know, the basic structure that I need regardless. And there are certain, like, nice little things that are gonna be created there. Like, if I type make release, it'll upload things to PyPI and Anaconda for me. If I build a library, it'll create a read me for me. So I can kind of, like, get a bunch of nice functionality even if I don't actually need nbdev for that much stuff for for a particular project. So I'll start by typing nbdev new.
Pretty much anything I do regardless whether I'm creating a server or a command line application or a, you know, model training library for deep learning or whatever, I'll start in a notebook because a notebook is basically a a REPL. But it's a REPL that is highly flexible and is not just text and is not just line oriented. So it's kind of this incredibly flexible, powerful REPL. And so then I'll generally start exploring. You know, I very rarely know exactly what I want to build and exactly how to build it. You know, I'll often now have to learn about some API I haven't tried before or try and implement an algorithm or whatever. So I'll start exploring.
And often just to help myself explore, I'll write little bits of markdown pros here and there to kinda say, like so for example, recently, I played with the GitHub API. It has a new fairly new OpenAPI specification. And I've never used an OpenAPI specification directly before, so I started just, like, loading in the JSON, finding out what keys were in it, and so forth. And as I did that, I was just adding little bits and pieces of markdown to kind of explain to myself as I went along what it was that I was doing. And, yeah, at some point, I kind of thought like, oh, okay. Those steps I just did look like a pretty good way to, you know, pull the the list of methods out of a open API specification.
So I merge them into a cell, create a function, and then at at the top of that cell, I'd write hash export. And so that now is gonna be the first thing in my library. And then the markdown that's around that will then become along with the docstring, and the signature will become the documentation for that. So I can just kind of gradually build out from there.
[00:26:32] Unknown:
I'm interested in understanding the scalability of this solution as you work on projects that grow notebook to export to. Now you don't
[00:26:46] Unknown:
have to have, the notebook to export to. Now you don't have to have everything export to the same module. A notebook is very customizable. You can have different cells export to different modules, but, you know, you can also have a notebook export to a module. So it's not that different than writing code in a text editor with regards to organizing that code. You know, oftentimes, we'll have 1 notebook per, like, a 1 to 1 mapping almost between notebooks and Python files. So it scales pretty well. There's no issues that that I can see where scaling per se is is a concern.
[00:27:25] Unknown:
I mean, the fast AI library, for instance, is a pretty big and complex library with many dozens of modules. But, yeah, because as Hamel said, really, most of the time, it's just a notebook maps to a module. It doesn't really look any different to any other kind of, Python library you would build.
[00:27:45] Unknown:
Another aspect of working with notebooks is the ability to do out of order execution where, particularly if you're exploring, you start with cell 1, and then you get down to cell 15, and then decide, oh, I need this this value back in cell 4. And so you might go bouncing between various cells in, you know, a semi random order, and then you want to be able to ensure that everything actually works from top to bottom. And I'm just wondering what that looks like in terms of your work flow when using nbdev to build an exportable module and just ensuring that you aren't confusing the functionality of the code as it is displayed with the inherent internal state that's built up over the course of working within that notebook?
[00:28:28] Unknown:
The ability to bounce around and manipulate the state in a notebook is kind of much misunderstood feature of the environment, which is actually critical to all kinds of explorations. So, for example, in deep learning, often, it's gonna take a few hours to train a model, and you don't wanna, like, have that few hours retrained every time you modify a cell. You know? So the ability to have state and manipulate it is critical. Or if you've downloaded, you know, some big JSON data structure and you don't wanna be having to deal with, like, figuring out what things to serialize and then load back and find some way to optimize things so that you can work interactively.
It's just like using your shell, whether it be bash or zsh or whatever, that your shell is stateful. You know, your file system is stateful. You create files, delete files, move files, and depending on the order of things, you know, it it's not fully reproducible unless you rerun those commands in the same order. So a notebook's really like that. Now as you say, once you've done that, since you're gonna want to turn this into a library, into a module or a bunch of modules that other people can run, they are gonna run it from top to bottom. So both Jupyter and nbdev have things to make this convenient.
Both the continuous integration and the integrated interactive tests that can happen at your terminal with nbdev run things from top to bottom, and they run every cell from top to bottom. So that will let you know if anything's not working. And then Jupyter itself lets you run every cell from top to bottom starting out with a clean state. And unfortunately, out of the box, it doesn't come with a key binding. So 1 of the first things I do when I set up a new machine, the 1 nowadays is all automated, but I always tell my students, put a key binding on the restart and run all command in Jupyter, because that's something that you wanna be running from time to time just to double check that everything's working smoothly.
[00:30:36] Unknown:
We've talked a bit about this as far as the experience and the change in perspective that comes from working in a notebook and using this literate environment and how that influences your approach to software engineering. But I'm wondering if you can just talk through some of the more detailed aspects of how you change your approach to writing software if you're in a text editor such as Versus Code or emacs or Vim versus working in Jupyter and just how that changes the way you think about the project design and the approach to building the software.
[00:31:14] Unknown:
When developing software notebooks like this, 1 thing that has changed for me is compared to a text editor where you might have a bunch of code and, you know, you have various functions, and those functions may have entry points. It's unclear, like, what the entry point to that function is or what code path leads to that function. So debugging can be a little bit sometimes complicated. But when you, develop code in notebooks and along with the documentation, you're creating playground where you want to show everybody what is the entry point to that function, how to execute it, what are the dependencies. You know, you kinda create this environment with the minimal dependencies required to execute that function or method, and that is really powerful.
You And you want to also be able to do that to specify your test in a convenient way. That is 1 thing. Another thing is I try to simplify my code a lot. Because when you're writing documentation, if something is trying to do too many things, you know, that can be really painful for you while you're, you know, trying to explain it. So it really forces you to write better code. 1 of the things actually
[00:32:24] Unknown:
Hamil was talking about kind of having this playground to explore, there's a feature in nbdev, an optional feature you can turn on in the configuration that will automatically add a launch in Colab button at the top of every page of the documentation. So Colab is a free online Jupyter environment. And so this means that you can literally click a button or your users can click a button in your documentation, and instantly, that documentation has been converted from something you read to something you interact with. And that's really great because I love working with other people's nbdev libraries because I can click that button, and then I can start actually experimenting with the examples they have in their documentation.
I mean, overall, you know, I've been coding for, gosh, many, many decades, and I find working in Jupyter Notebooks and MB Dev, I am some multiples more productive than I am using Versus Code or Visual Studio or VM or, you know, other I've used a lot of different environments. And I hear this a lot from other people as well. We quite a few people, you know, come on to our Discord chat and say, my workplace, you know, has not standardized on nbdev, and I have to use something else. And, literally, we hear people talking about sharing stories of which companies let you use nbdev, and people are, like, talking about quitting their jobs in order to go to another job where they can use nbdev. That's, like, the level of love that people have for for using this and frustration they have when they can't.
[00:34:07] Unknown:
I think it's really counterintuitive to people that there can be a much better development environment and way to develop software because those tools haven't changed for so long. And when you say that to someone, it's almost like a disbelief. Like, what are you talking about? They look at you like you're a quack. But it's only until you try it that these things become apparent to you and you realize, hey. I am a lot more productive. My code is more maintainable and spending a lot less time toiling away on these, you know, tasks I don't care about. And so, yeah, I think that's what we're
[00:34:41] Unknown:
seeing. And for somebody who has an existing project that has been written in just the quote unquote standard fashion of just flat files that they're organizing into a hierarchical structure. What is the process of converting that to use nbdev and moving from the previous approach of my documentation lives here, my code lives over here, my tests are in a different place, and merging them all back together in a more natural form.
[00:35:10] Unknown:
There are tools out there which will help do that for you. It's important to remember that a notebook is just a JSON file, and the JSON each cell basically is part of a JSON array, and so then there's a dictionary with 1 attribute that says whether it's a code cell or a markdown cell and 1 attribute saying what the contents are. So it's actually trivially easy to turn a Python module source code file back, you know, into a JSON file, splitting each functional class into a cell. And so there are tools that'll do that for you, but that's only the first part of the process because to actually take advantage of this properly, you really wanna be thinking about the flow of that notebook in terms of somebody reading it is not just reading it there. Hopefully, they're interacting with it. So I would kind of start with some automated tool to create a notebook that basically does the job from the from the source code of the module.
And then I'd start think you know, looking at my tests and thinking, okay. Well, which 1 of these really quite descriptive of what this module's really doing? Can I turn those into kind of documentation tests? You know? And then what things in the documentation can I kind of integrate with those and, you know, just gradually bring it together 1 piece at a time? You don't have to do it all at once.
[00:36:36] Unknown:
And another interesting point is how nbdev integrates with the rest of the Python ecosystem. I'm thinking in particular about things like dependency management, whether you wanna use pip or poetry or PIP tools and how it fits with things like linting and just the overall integration points that are available for using nbdev for actually building the project, but also taking advantage of the the rest of the developer tooling that exists for people using Python and building Python projects?
[00:37:11] Unknown:
The integration is pretty good with, you mentioned PIP, for example. So nbdev automatically generates standard setup tools, setup packages. 1 of the nice things about it is that you have a single configuration file that your version number and description and so forth are in. So for something like your PyPI package, when it's uploaded, that'll all be used. It'll automatically use your index dotipynbnotebookfiletocreatethe description that will appear in PyPy. You know, things like poetry and stuff are not particularly either here or there. You can use whatever environment you like.
Most of the developers of nbdev generally use conda environments, but you can do whatever you like there. For linting, that's pretty orthogonal to nbdev. You can use whatever linter you like. JupyterLab has extensions that lets you plug into whatever linter you prefer, or you could do it as part of the GitHub or GitHub actions for, again, working on the JSON file. It's not opinionated at all about what the rest of your environment should look like and what other tools you might use.
[00:38:30] Unknown:
In terms of people who are using nbdev and building things with it, what are some of the most interesting or innovative or unexpected ways that you've seen it used?
[00:38:39] Unknown:
There's a lot of cool things that I've seen. So 1 example that sticks out of my mind is what Jeremy was describing earlier about the Python client for GitHub's API that uses the open API spec. You know, if you go through GitHub's documentation, you have to click on 20 different pages to see all the endpoints. But because he's generating things from the open API spec, there's, like, a 1 pager of, like, all the endpoints, and that's linked to all the various things that you need to know about using that endpoint. And that's integrated deeply into the documentation itself for the Python client also.
And so when you try to use the Python client, it's called gh API, and you call help on an endpoint, you get a link in the docs that take you to the GitHub documentation for the endpoint. I've seen some really cool things people have done with documentation to make the documentation richer with regards to linking to other relevant sources automatically. I think that's really cool.
[00:39:42] Unknown:
In terms of your experience of building and working with nbdev, what are some of the most interesting or unexpected or challenging lessons that you've learned in that process?
[00:39:53] Unknown:
I mean, the interesting lesson for me overall is the power of this type of software development. It's fairly under the radar. People don't know about it, but it certainly is a really good insight into how powerful this technique is for writing software. And it also gives you a window into, like, maybe how these tools could improve, you know, in the future.
[00:40:19] Unknown:
I guess 1 thing that's been a challenge, been a slightly surprising from some people kind of push back against the very idea of using anything other than a standard text editor to create code. I find that very new programmers and extremely experienced programmers are very interested in nvdev and really wanna try it out. But there's a group of people who are, like, kind of 3 to 8 year experience marks or kind of intermediate level programmers who seem to find it almost threatening, the idea that people might wanna use something other than emacs or VM or Versus code or something to to and they kind of people get sometimes quite emotional about, like, no. That's not how real software engineers write real code. And this kind of emotional response from some people is not something I expected.
[00:41:19] Unknown:
I'm actually not surprised by that dynamic, I suppose. I find that with many tools, there often is a resistance to change, especially something like developer tools. So people have just gotten used to the idea that they've taken developer tools for granted, and they haven't changed, and people are resistant to the idea that it could be an order of magnitude better. So I was skeptical too. When I first got into it, I said, okay. I mean, certainly, you know, developer tools are as kind of a staple. They would have been improved themselves if they could have been like, you know, how is it that much better? You know, I tried it and I was really surprised.
Another interesting phenomenon is so there's this book called Working in Public, the Making and Maintenance of Open Source Software by Nadia Ekblal. She's a former GitHub employee. She's done a study, a lot of open source projects and kind of the dynamics of them. 1 thing that she documents in her book is deluge that maintainers face in terms of low quality pull requests that they have to deal with. And I've talked with Jeremy about this before. Like, we don't really see that across fast AI. And fast AI has tons of projects and is extremely popular on GitHub, and it has a lot of activity.
And I think the reason for that is MBdev kind of forces you to write high quality PRs. And so I think it saved Jeremy's sanity as a side effect, which is really interesting from a maintainer's perspective and the open source
[00:42:58] Unknown:
economy perspective. Yeah. And speaking as somebody who uses emacs and has become very comfortable there, you know, the thought of editing in my browser is painful in the regard of I've I've gotten so used to the keyboard commands, but I'm also very attracted by the possibility of weaving together the code and the documentation and the tests because it it can be all too easy to be working in a text editor and write the main body of the functionality of the code and then say, okay. Well, I'll get back to the test another time. You know, you can have, you know, your test open in 1 window or in 1 buffer and your code in another and bounce between them. But, you know, I'm definitely interested in experimenting with nbdev to see how it works. But I'm also curious what level of support there is for people who are very comfortable in their text editing environment, but still want to be able to take advantage of what Nvdev has to offer.
[00:43:51] Unknown:
That describes me very much. I've been coding for many decades, and as you can imagine, I'm in love with tooling since I invest so much in tooling. So I, yeah, I know every keyboard shortcut pretty much of every, you know, piece of software I use. So I certainly love to jump into Vim and, you know, do some stuff with a quick macro or some motion commands or whatever. And, yeah, that's fine. You can do that with nbdev. You can edit the modules, the text files directly, and sync back into the notebooks automatically. I will say though that the more I use nbdev and notebooks, the less I find myself doing that.
I used to do it a little bit, but it was mainly kind of habit. It's very nice to be able to jump around to cells rather than code and to kind of jump into through hierarchies. You can kind of, like, hack together hierarchies in a max and vim and so forth, but I really like the true hierarchical nature of notebooks that you can create actual headings and stuff like that. So, yeah, you can certainly use your own editor if you want to, but I find I do it less and less.
[00:45:14] Unknown:
For people who are interested in experimenting with nbdev, are there any problem domains or integrations with existing libraries or workflows where you see nbdev as being the wrong choice or something that is incompatible with the existing environment?
[00:45:32] Unknown:
Hamil and I are both working on something that doesn't lend itself very well to nbdev, which is we're working on build tools. So we're doing a lot of stuff with, you know, make files and conda packages and automatic build systems running on GitHub actions. And so there's basically almost no Python involved, and it's yeah. It doesn't lend itself particularly well to nbdev. We were just saying to each other, we wished this morning it did because we aren't really enjoying being outside of the notebook environment. This is something I'd like to improve actually because Jupyter can do other kernels other than Python. There's a bash kernel, for example, which is kind of cool, and I've written some nice documentation using the bash kernel.
To my surprise, I found that nbdev and notebooks works very well for creating servers. I didn't really expect that at first, but, actually, I found I could write servers with n b dev very nicely. So, yeah, generally, I mean, I haven't found too much stuff that is largely Python based, which isn't suited to MBdev. I don't know if you have AML or thought of other things like that. To be quite honest, at this point, MBdev is like crack to me. Like, I just
[00:46:45] Unknown:
it's hard not to use it. It's very painful not to use
[00:46:49] Unknown:
it. As you continue to work on the project, what are some of the plans that you have in store for the near to medium future?
[00:46:56] Unknown:
We are doing a rewrite of NV Dev at the moment. I tend to rewrite my major pieces of software every year or 2, which I really like. And the new version's gonna be orders of magnitude faster. We're also looking at replacing the Jekyll based documentation with Hugo based documentation. Again, 1 of the reasons there is for performance that Hugo is really fast, which is very nice. We we kind of love working with tools that are fast enough that things feel almost instant. Definitely isn't the case with Jekyll. You know, 1 of the things I've been thinking about also is supporting directly building c based extensions by integrating Cython within b dev.
So those are some of the big things that we're hoping to implement in the coming months.
[00:47:48] Unknown:
Are there any other aspects of the Nbdev project or working in the notebook environment that we didn't discuss yet that you'd like to cover before we close out the show?
[00:47:56] Unknown:
I think, you know, 1 thing that we may have not covered is is fast core. So, you know, fast core is kind of an extension. You can think of it as an extension almost to the Python programming language. I mean, don't take those words literally, but, you know, it adds a lot of functionality that's easy to access. That's important for MB dev because we've add there's a lot of utilities in fast core that make using Python and MBDEV a lot easier. So for example, if you have a let's say you have a really big class that have tons of methods in them, and you wanna write pros that surround your text, you might want to define, like, a method in a different cell. You might not want this 1 giant cell for your let's say, you know, to your class.
Well, with fast core, give you easy ways to kind of break up that class so you can just pull the methods out into a different cell. It's all tested and works with this integration testing with MBdev. So a lot of utilities that just make your life a lot easier. I would recommend checking that out. It's a very interesting library.
[00:49:01] Unknown:
For anybody who wants to get in touch with either of you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And with that, I'll move us into the picks. And this week, I'm going to choose an audiobook I listened to with the family recently called Rivals, Frenemies Who Changed the World. It's just a really fun production of some short pieces of from history about different people who were at 1 point friends and then ended up creating these historic rivalries and the impact that that's had on our modern world. So a lot of fun stories there and just fun, production value to keep kids interested in learning some history. So definitely recommend checking that out. And so with that, I'll pass it to you, Jeremy. Do you have any picks this week?
[00:49:42] Unknown:
My pick is the game of chess, which I always assumed was really boring until my 5 year old daughter started getting into it. And so we started playing a bit together, and I suddenly discovered is actually really deep and much more fun than I expected.
[00:50:00] Unknown:
Yeah. I'll definitely second that 1. And, Hamil, how about you, Dave? Any picks this week? Actually, Jeremy recommended this book to me, which I've been reading with great
[00:50:08] Unknown:
interest and surprise. It's called Moonwalking with Einstein by Joshua Foer. Before reading this book, I thought not having a good memory was a sign of stupidity. But, actually, this book goes into really deep in great detail about, like, how memory works, common misconceptions about memory, how people that have good memory, what techniques they often use, and what it means. So it's really fascinating.
[00:50:36] Unknown:
Well, thank you both for taking the time today to join me and share the work that you've been doing with nbdev. It's definitely a very interesting project and 1 that I'll have to experiment with myself to try and understand benefits that it can provide to my own development. So thank you for the time and effort you've put into that, and I hope you enjoy the rest of your day. Thank you very
[00:50:55] Unknown:
much.
[00:50:57] Unknown:
Thank you for listening. Don't forget to check out our other show, the Data Engineering podcast@dataengineeringpodcast.com for the latest on modern data management. And visit the site of pythonpodcastdot com to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email host at podcastinit.com with your story. To help other people find the show, please leave a review on Itunes and tell your friends and coworkers.
Introduction to Guests and nbdev
Overview and Goals of nbdev
Target Audience and Use Cases
Challenges and Benefits of Using nbdev
Implementation and Features of nbdev
Workflow and Scalability
Comparison with Traditional Development Environments
Integration with Python Ecosystem
Lessons Learned and Community Feedback
Future Plans for nbdev