Exploring Literate Programming For Python Projects With nbdev

Hello, and welcome to podcast dot in it, the podcast about Python and the people who make it great.

When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode.

With the launch of their managed Kubernetes platform, it's easy to get started with the next generation of deployment and scaling powered by the battle tested Linode platform,

including simple pricing, node balancers, 40 gigabit networking,

dedicated CPU and GPU instances, and worldwide data centers.

Go to python podcast.com/linode,

that's l I n o d e, today and get a $100 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show.

Your host as usual is Tobias Macy. And today, I'm interviewing Jeremy Howard and Hamil Hussain about nbdev, a library for turning Jupyter Notebooks into Python libraries. So, Jeremy, can you start by introducing yourself?

Sure. I'm Jeremy Howard. I'm a founding researcher at fast dotai.

And Hamil, how about you?

I'm Hamil Hussain. I'm a staff machine learning engineer at GitHub. I spent a lot of my time working on Fast AI with Jeremy.

Going back to you, Jeremy, do you remember how you first got introduced to Python?

I was a Perl programmer

largely. I started a started a company called Fastmail, which is a email provider, and I used Perl for that.

And I remember when Python started getting popular,

I was kinda not particularly interested in it because I thought that Pearl is really great. But as I did more and more stuff with machine learning,

Python became a bigger and bigger part of my life, and now it's what I spend most of my time doing. And, Hamil, do you remember how you first got introduced to Python?

I just started using Python. I was kind of a data analyst at some point, and I wanted to automate some things. And at that time, I started actually with

R and,

you know, I wanted

another programming language, and Python seemed like a good 1 at that time. And so I can just kinda naturally drift into that.

R has got a lot of cool libraries, but I never liked it as a language. I do much prefer

working in Python as a language. But there's a lot of libraries from RMS.

Yeah. The R ecosystem is definitely

pretty attractive, and there's definitely a lot of stuff from there that has inspired things in Python because people working in Python wanted to be able to have all the nice tools that the R folks did. But I'll agree that the language

coming from somebody who's worked primarily in Python is definitely a bit foreign.

PR libraries just tend to be

more elegantly put together, though, I find. You know, Python libraries have a tendency

to get the job done, but they tend to have more clunky APIs for us. The Arc ecosystem,

you know, that community seems to really care about the developer experience a lot, which I really like.

There's 1 thing I really miss about R, which is relevant to this conversation,

is the development environment that I used when I was working with R Studio and R Markdown,

you know, where you could write prose and text in the same sort of

context. I I thought that was really nice, and I I sort of missed that when I went to Python and, you know, until I discovered nbdev, which we'll talk about more. But

So going into Nbdev,

can you give a bit of an overview about what the project is and the overall goals of it?

Nbdev

comes from,

you know, my kind of decades of enthusiasm

for literate programming and exploratory programming

and

never quite finding the right tool for the job. The closest I got was

Mathematica,

which

I always really enjoyed working in, but I always found

nearly impossible to deploy

and very difficult to kind of

get good performance and

but the actual idea of being able to mix

any kind of outputs you want, whether they'd be animations

or hypertext or whatever,

along with code

and kind of hierarchically structured documents

and

have your kind of coding and documentation all in 1 place. I always found that just quite brilliant.

So, obviously, when Jupyter Notebooks came along, I got very excited about

that, that I had the same problem, which is, you know, how do you deploy these things? Like, it's a great kind of scientific journal

kind of environment,

but

I was trying to build

artifacts that other people could use easily.

So nbdev is something which brings the kind of worlds of

a software development of Python libraries and notebooks together so that you can use notebooks. A single notebook will create your tests,

your documentation,

and your actual Python module

all in 1 place. And I just love working with that. I find it dramatically more enjoyable and more productive.

In terms of some of the storyline behind it and how you got it off the ground and how it got to where it is today, I'm wondering if there are any interesting anecdotes that you can share.

Well, it all came out of really the development of the fast AI library, which is 1 of the most popular

libraries for doing deep learning. And,

originally,

the Fast AI Library kind of came out of

creating courses. So Fast AI has, I think, pretty much the world's most popular courses for learning deep learning,

and they're all done in Jupyter Notebooks. It's such a great way to teach and such a great way to learn.

And

we were also doing a lot of research

in Jupyter Notebooks, finding better ways to train better models.

And often, the course would include a whole lot of stuff about, like, oh, here's some research we just did. Let's learn about the research together and see how it's done and understand the kind

of motivations behind it and so forth.

So

we would very often be creating

new algorithms or implementing algorithms that were in papers but didn't have code.

And so we really needed to find ways to

make that code available easily to everybody. So that was really where it started,

was

taking fast.ai

research

and educational

materials and turning them into libraries.

But, really, the long term goal for fast dotai is to make it so that anybody can do deep learning

without needing to do much

education.

So

the fast AI library

has kind of become a focal point of

that work.

And so

it's just been a very natural progression

of using

notebooks to do research and to build educational materials and to build libraries.

And it's been really wonderful to see

how many other people have

found that same approach to development works for them, and MB Dev is now getting

really popular, which is great to see.

And in terms of the I know that there are a lot of different types of people,

and

there are a number of different sort of verticals and industries where people are using notebooks.

I'm wondering if you talk about who the sort of target audience is for the nbdiff project and how that influences

the features and design of MB Dev project.

I think MB Dev works well

for

really any kind of project that you want to do. I mean, it's not, you know, limited to data science projects at all. In fact, we've been using MBdev for a number of different types of projects,

including

various utilities, DevOps tools,

APIs,

a API clients,

and lots of things. And so it's it's really a general software engineering tool.

I think I know that some people

ask, you know, when

might you not want to use MBdev or

when might that be challenging? You know, I've tried to introduce MBdev to a lot of people into a lot of projects, and it is a new way of developing software.

And so you have

to kind of look at what your colleagues are using, and what your colleagues are willing to use. And you have to kind of assess like whether or not it's worth it

to transition a project to MBDEV

or whether your colleagues will be willing to give it a go and write software in MBDEV. And you kinda have an have to have an open mind to try this new type of software environment.

So that's the main consideration I see deciding,

you know, when to use MBdev for a project. Hey. 1 nice thing about MBdev is you don't have to set out deciding to use it.

You know, you can just start hacking something together in a notebook, you know, which is often what I wanna do just to explore a new API or to explore an idea or explore an algorithm.

Maybe you don't even have a sense yet that that exploration is gonna lead anywhere useful.

And then I find that once I get to a point where I think, oh, this is actually turning out quite nicely,

then it's very

easy to then kind of n b devify that notebook.

You just add, you know, like,

1 comment to each cell that you actually wanna export.

And the nice thing is for anybody who kinda cares about code productivity and code quality

or project quality, you you end up with a very quickly in a very nice place because once you decide, okay,

I do wanna make this something that other people can use.

If you've used notebooks, then you nvdevify it. You now have a really nice high quality documentation site for free. You have

parallel parallelized tests for free. You have a PIP and a condor installer for free. You have a read me generated for you for free. So, like, all those kind of things that make a project

complete,

you know, and helpful to developers

and reliable and maintainable. They're all done for you

automatically,

which otherwise I found before I was using nbdev.

It just seemed like a huge learning curve and lots of things to maintain, and I'd have, like,

10 different places that I had the same information

and dozens of different tools trying to work together. So it makes it, like, really easy to go from an experiment you're hacking around with to a really high quality

complete library.

Yeah. It's definitely easy to, like you said, start off with something. And then before you know it, realize that you have something that's actually full blown and you want to be able to use it in more places. And I'm wondering if you can talk to

the primary challenges that you see of using a notebook itself as a means of

building and collaborating on projects, particularly with other people, because notebooks are definitely

useful

for exploratory programming, as you said, and they can be very useful for sharing the results of your work from a documentation

and display perspective. But

in terms of the

collaborative or team oriented aspects of it, I know that there are some shortcomings. I'm wondering if you could just talk to the challenges that you've come across in that regard.

If you're not using nbdev and you're just using notebooks, there's a lot of shortcomings.

Take something very simple, which is

nbdev doesn't play sorry. Yeah. Notebooks don't play nicely with version control kind of out of the box.

So you end up with

the diff markers that Git will add in to the file, makes it not JSON anymore, which means the notebook can't read it anymore, and that's gonna be a real mess. You end up with a lot of conflicts

because, like, metadata can change in the cells that creates, like, dozens of conflicts even in cells you haven't changed.

So

with MBDEV, it has its own

diffing merging

tool,

which actually

ends up really nice because it does it at a cell level. It knows to ignore metadata.

It ignores differences in outputs.

So we actually end up with quite a nice

Git integration.

1 of the things I really like about working with notebooks is there's a

really nice web based tool you can use, ReviewNB,

which all your code reviews and PRs can go through. And when I'm doing a PR, it's really nice. So I'm not just looking at source code. I'm also looking at the documentation, the outputs, the hypertext,

you know,

in a web page. So I can see, has somebody made a PR that has

reduced the clarity of the

image augmentations in the fast AI library, for example. So normally with the plain diff,

I'd never be able to see that. I'd only see the code. But when you're working with notebooks in this way,

suddenly, it becomes

really, really nice that you actually get to see how it changed the outputs.

So that'd be 1 example.

Another example of a challenge is simply that

code that is in 1 notebook can't be kind of imported into and used in another notebook. So, again, for collaboration, that's

a nightmare, not just collaboration, but for yourself.

You know, you kind of end up putting everything into 1 notebook or copying and pasting.

So, again, with the dev,

that all gets handled for you. They get turned into libraries so that you can import code from 1 into another just like a plain Python library.

Another problem with collaboration is, like, notebooks.

There's some quite nice notebook viewers on

the web, and GitHub isn't has a basic notebook viewer, but they're not as

not nearly as nice to work with as properly

indexed

documentation

with proper hyperlinks and tables of contents and search and so forth.

So, again, nbdev will add that for you.

So, yeah, all the kind of limitations

of working with notebooks, of which there are many,

suddenly actually become

features when you add nbdev on top of it. I just wanna add to this. And your question was, you know, what challenges MBdev

present to, like, collaboration?

And there's, like, little bit of fixed cost for a contributor to learn MBdev. But in my experience,

once people

do that little bit of learning about MB Dev, collaboration actually becomes a lot easier

because MBdev

promotes

a very nice workflow for software engineering and promotes best practices.

So NVIDIA

really encourages you to write documentation

and tests

because you do it in the same context. You write your code, your documentation, and your pros and tests altogether.

And

so when someone is trying to contribute to your project, and I've experienced this many times

at work, you know, that person is forced to explain the code that they're adding. And oftentimes

in that process, we realize, hey. Like, we're not able to really explain that code or that code is too complicated. It ends up being naturally refactored because you're writing docs and tests at the same time in the same context.

And you're really looking at code as

your documentation as a first class citizen

and writing code so that it can be presented to other people and and understood. And so I found that that really helps with collaboration. It kind of naturally works out. I find I'm doing less back and forth with people.

Yeah. I mean, that's a good point. I find as an open source maintainer,

the PRs I receive are higher quality and then be dev projects because when somebody's adding code, they're in the middle of the tests, the documentation. So,

you know, it's pretty rare for somebody to

misunderstand the context of why their code is there because they're, like, literally in the middle of documentation about it as they write their code. Pretty rare that they wouldn't have tests because, again, they're kind of adding code in amongst all the tests.

So, yeah, I do find I get higher quality PRs with MBdev projects.

When I first started out with MBdev, I jumped into this project

called Fastcore.

It's a fairly advanced Python library

built by Jeremy. And I thought there's no way I'm gonna understand this. This is, like, basically, like, magic.

But it's because of MBdev. MBdev allowed me to read the documentation and code together

and play with it in a very nice

interactive environment that I was able to catch on really fast,

much faster than any other project of similar complexity.

And the nice thing is those explorations that you did, Hammel, became part of the documentation. Because quite a lot of those explorations,

you made part of a PR to say, like, here's how this thing works. So I thought that was kind of cool, but you're exploring in the notebook became

explorations that other people could then learn from. Yeah. Definitely. Yeah. It was really gratify you know, like, the learning also paid off. Anytime I would read code, I would say, hey. Let me just add a little bit to the documentation here. Let me add another test, and it's not clear. So that's what really got me hooked. I really saw the power of mbdev.

Because what frustrates me as a user

when I use any Python library

is lack of documentation.

I think documentation is really underrated.

And so that's something that, you know, MBdev

really promotes.

Allows you to just write it in a very natural way.

There are a number of other projects that work to complement overall ecosystem of working with Jupyter Notebooks.

You know, there's the JupyterLab project to make it a little bit more like an IDE.

There are a number of different plugins to Jupyter itself.

And then there's also the overall ecosystem of other

notebook environments

beyond just Jupyter. And I'm wondering if you can just talk

to how nbdev compares to or complements some of those other tools either within or outside of the Jupyter ecosystem.

Yeah. JupyterLab

is an exciting

development

of Jupyter Notebooks.

The most recent version, version 3, that just came out a week or 2 ago,

includes an integrated graphical debugger,

which is a really cool step. The nice thing is that n b dev works fine with whatever

Jupyter Notebooks

host or Jupyter Notebooks server you're using. So nbdev works just as well with the classic notebooks

as with interact, as with lab, or whatever you prefer.

So it it's great to see how

the notebook community is

rapidly iterating and improving.

You know, other cool stuff happening in the notebooks world includes stuff like Voila. Voila is a system that lets you create a graphical web applications

entirely in

Jupyter.

And JupyterLab

isn't even now has a

beta version of

a drag and drop GUI builder that will create a Voila app from a notebook for you. And, again, all this stuff integrates really well with nbdev because once you've got things working the way you want, nbdev will then let you turn that into a library that anybody can pip install or or condor install with continuous integration and tests and documentation.

Digging a bit more into nbdev itself, can you talk to how it's implemented and the feature set that it provides, and how the overall design and goals of it have evolved since you first began working on it?

There's a lot of features in MBDEV.

Something that Jeremy just mentioned is continuous integration, which is really exciting. So

a lot of people don't really find they don't understand continuous integration or find it very difficult. I mean, certainly, I,

when I

first learned about continuous integration, I thought it was pretty difficult

to get my hands around.

And so MBdev runs a CI for

you out of the box without any intervention from the user.

MBdev implements allows you to write tests in notebooks in a very natural way. You don't have to learn a special API.

Like, for example,

if you wanna use pytest,

you don't have to learn pytest. You can just

write tests, like, with assert statements.

Then b dev machinery will execute will find those and execute those as tests automatically,

and then they'll also run them in CI. So when you write your code and you push it, let's say, to GitHub,

it will run-in

GitHub actions for you and execute those tests and let you know whether or not all your tests are passing. So that's pretty advanced, you know, production level

best practices,

stuff that gets

done for you automatically.

And to get to that point, you literally just type. So there are various

command line tools installed with nbdev, and 1 of them is nbdev

new. And that will create a project for you. And 1 of the things that's created in that project

is a GitHub actions

continuous integration

runner. Now if you don't use GitHub, you use something else, you would obviously need to modify that a little bit to work with your CI,

but it's pretty straightforward to do that. And then you'll see that as soon as you

push,

you'll actually get an email saying,

oh, your continuous integration is currently failing. So you actually set it up so that it, like, shows you how to write and pass your first test. So, like, out of the box, you're actually being told about the fact that that continuous integration is there. It's set up for you, and it shows you how to get your first test passing.

Another really central feature to MBDEV,

perhaps 1 of the most central ones, is the doc how the documentation gets built.

So you don't have to know anything about HTML, CSS, web hosting, anything like that. You don't have to know Sphinx. I don't know Sphinx myself. You don't have to learn any kind of special

presentational

API thing.

Notebooks get rendered into documentation for

you and get hosted for you

on GitHub Pages.

So, you know, you don't really have to do anything. And the documentation

has a lot of nice touches to it that are added in for you automatically. So 1 of my favorite features of the documentation

is if you surround

a name of a module in backticks

either from your library or the Python standard library or other things, MBUX will automatically

introspect that and find the link

to the source

code and will create a link for that. And not just modules, but also functions and also classes, pretty much any kind of symbol. Yeah. Definitely. And, you know, you'll create table of contents.

It will automatically

kind of expand documentation for you if you have, you know, docstrings.

It's very robust, so you can hide cells,

show cells, hide output, show output. You can have collapsible cells.

So it's really easy to use. It's very customizable.

That's another feature that is super exciting for me. All of these things

happen

from these simple command line tools I mentioned.

So 1 of the nice things about this is, you know, you can work in

whatever

environment you like, you know, because they're just tools that you run at the terminal. You can integrate them into any

scripts or processes or

whatever, and they'll integrate well with any other extensions that you're using and so forth. So a big part of the design of n b dev has been to ensure that it's

very flexible

and doesn't lock you into

any particular details about the tools that you're using

other than that you're writing stuff in notebooks.

And in terms of the workflow of somebody who's using nbdev within a notebook environment to build a project, Can you just talk through some of the steps involved? I know you mentioned the

commenting on certain cells and how you're able to mark them as being used for particular purposes, whether it's the code or the documentation or tests, etcetera, hiding and showing.

And, also, for somebody who is working in Jupyter, at what point should they start thinking about whether they want to bring nbdev in and just the overall experience of building a project with it?

When I start a new project, I always start by typing nbdev new,

regardless of whether I actually think this is gonna end up being something that I

export into a library and documentation with MB dev or not just because that's gonna create the, you know, the basic structure that I need regardless.

And there are certain, like,

nice little things that are gonna be created there. Like, if I type make release, it'll upload things to PyPI and Anaconda for

me. If I build a library, it'll create a read me for me. So I can kind of, like,

get a bunch of nice functionality even if I don't actually need nbdev for that much stuff for for a particular project. So I'll start by typing nbdev new.

Pretty much anything I do regardless whether I'm creating a server or a command line application

or a, you know, model training library for deep learning or whatever, I'll start

in a notebook because a notebook is basically a a REPL.

But it's a REPL that is highly flexible and is not just text and is

not just line oriented.

So it's kind of this incredibly

flexible, powerful REPL.

And so then I'll generally start exploring. You know, I very rarely know

exactly what I want to build and exactly how to build it. You know, I'll often now have to learn about some API I haven't tried before

or

try and implement an algorithm

or whatever. So I'll start

exploring.

And often just to help myself explore, I'll write little bits of markdown

pros here and there to kinda say, like so for example, recently, I played with the GitHub API.

It has a new fairly new OpenAPI

specification.

And I've never used an OpenAPI specification directly before, so I started just, like,

loading in the JSON, finding out what keys were in it, and so forth. And as I did that, I was just adding little bits and pieces of markdown to kind of explain to myself

as I went along

what it was that I was doing.

And, yeah, at some point, I kind of thought like, oh, okay. Those steps I just did look like a pretty good way to,

you know, pull the the list of methods out of a open API specification.

So I merge them into a cell, create a function, and then at at the top of that cell, I'd write hash export.

And so that now is gonna be the first thing in my library.

And then the markdown

that's around that will then become along with the docstring,

and the signature will become the documentation

for that. So I can just kind of gradually

build out from there.

I'm interested in understanding the scalability of this solution as you work on projects that grow

notebook

to

export

to.

Now

you

don't

have

to

have,

the notebook to export to. Now you don't have to

have everything export to the same module. A notebook is very customizable. You can have different cells export to different modules,

but, you know, you can also have a notebook

export to a module. So it's not that different than writing code in a text editor with regards to organizing that code. You know, oftentimes, we'll have 1 notebook per,

like, a 1 to 1 mapping almost between notebooks and Python files. So it scales pretty well. There's no issues that that I can

see where scaling per se is is a concern.

I mean, the fast AI library, for instance, is a pretty big and complex library with many dozens of modules.

But, yeah, because as Hamel said, really, most of the time, it's just a notebook maps to a module. It doesn't really look any different to

any other kind of,

Python library you would build.

Another aspect of working with notebooks is the ability to do out of order execution where, particularly if you're exploring, you start with cell 1, and then you get down to cell 15, and then decide, oh, I need this this value back in cell 4.

And so you might go bouncing between various cells in, you know, a semi random order, and then you want to be able to ensure that everything actually works from top to bottom. And I'm just wondering what that looks like in terms of your work flow when using nbdev to build an exportable module and just ensuring that you

aren't confusing

the functionality

of the code as it is displayed with the inherent internal state that's built up over the course of working within that notebook?

The ability to bounce around and

manipulate the state in a notebook is kind of much misunderstood

feature of the environment, which is actually critical to

all kinds of explorations.

So, for example, in deep learning,

often, it's gonna take a few hours to

train a model,

and you don't wanna, like,

have that few hours retrained

every time you modify a cell. You know? So the ability to have state

and manipulate it is critical.

Or if you've downloaded, you know, some big JSON data structure

and you don't wanna be having to deal with, like, figuring out what things to serialize and then load back and find some way to optimize

things so that you can work interactively.

It's just like using

your shell, whether it be bash or zsh or whatever,

that your shell is

stateful. You know, your file system is stateful.

You create files, delete files, move files, and depending on the order of things, you know, it it's not

fully reproducible unless you rerun those commands in the same order. So a notebook's really like that.

Now as you say,

once you've done that, since you're gonna want to turn this into

a library, into a module or a bunch of modules that other people can run,

they are gonna run it from top to bottom. So both Jupyter and nbdev

have things to make this convenient.

Both the continuous integration and the

integrated interactive tests that can happen at your terminal with nbdev

run things from top to bottom, and they run every cell from top to bottom. So that will let you know if anything's not working.

And then Jupyter itself

lets you run every cell from top to bottom starting out with a clean state. And unfortunately, out of the box, it doesn't come with a key binding. So 1 of the first things I do when I set up a new machine, the 1 nowadays is all automated, but I always tell my students, put a key binding

on the restart and run all

command in Jupyter, because that's something that you wanna be running from time to time just to double check that everything's working smoothly.

We've talked a bit about this as far as the

experience and the change in perspective that comes from working in a notebook and using this literate environment

and how that influences your approach to software engineering. But I'm wondering if you can just talk through some of the more detailed aspects of

how you change your approach to writing software if you're in a text editor such as Versus Code or emacs or Vim versus working in Jupyter

and just how that changes the way you think about the project design and the approach to building the software.

When developing software notebooks like this,

1 thing that has changed for me is compared to a text editor where you might have a bunch of code and, you know, you have various functions,

and those functions may have entry points.

It's unclear, like, what the entry point to that function is or what code path leads to that function.

So debugging can be a little bit sometimes complicated. But when you, develop code in notebooks and along with the documentation,

you're creating

playground

where

you want to show everybody what is the entry point to that function, how to execute it, what are the dependencies.

You know, you kinda create this environment with the minimal dependencies required

to execute that function or method, and that is really powerful.

You And you want to also be able to do that to specify your test in a convenient way. That is 1 thing. Another thing is I try to simplify my code a lot. Because

when you're writing documentation,

if something is trying to do too many things,

you know, that can be really

painful

for you while you're, you know, trying to explain it. So it really forces you to write better code. 1 of the things actually

Hamil was talking about kind of having this playground to explore,

there's a feature in nbdev, an optional feature you can turn on in the configuration that will automatically add a

launch in Colab button at the top of every page of the documentation.

So Colab is a free online

Jupyter

environment.

And so this means that you can literally click a button

or your users can click a button in your documentation, and instantly, that documentation has been converted from

something you read

to something you interact with. And that's really great because

I love working with other people's nbdev

libraries because I can click that button, and then I can start actually experimenting

with the examples they have in their documentation.

I mean, overall,

you know, I've been coding for, gosh, many, many decades,

and I find working in Jupyter Notebooks and MB Dev, I am

some multiples

more productive

than I am

using

Versus Code or Visual Studio or VM or, you know, other I've used a lot of different environments.

And I hear this a lot from other people as well. We quite a few people, you know, come on to our Discord chat and

say, my workplace,

you know, has not standardized on nbdev, and I have to use something else.

And, literally, we hear people talking about

sharing

stories of which companies

let you use nbdev, and people are, like, talking about quitting their jobs in order to go to another job where they can use nbdev. That's, like, the level of

love that people have for for using this and frustration they have when they can't.

I think it's really counterintuitive

to people that there can be

a much better

development

environment and way to develop software because those tools haven't changed for so long.

And when you say that to someone, it's almost like a disbelief.

Like, what are you talking about? They look at you like you're a quack.

But it's only until you try it that these things become apparent to you and you realize, hey. I am a lot more productive. My code is more maintainable and spending a lot less time toiling away on these, you know, tasks I don't care about. And so, yeah, I think that's what we're

seeing. And for somebody who has an existing project that has been written in just the quote unquote standard fashion of just flat files

that they're organizing into a hierarchical structure.

What is the process of converting that to use nbdev

and moving from the

previous approach of my documentation lives here, my code lives over here, my tests are in a different place, and merging them all back together in a more natural form.

There

are tools out there

which

will help do that for you. It's important to remember that a notebook is just a JSON file,

and the JSON

each cell basically is part of a JSON array,

and so then there's a dictionary with 1 attribute that says whether it's a code cell or a markdown cell and 1 attribute saying what the contents are. So it's actually trivially easy to

turn

a Python

module source code file back, you know, into a JSON file,

splitting

each functional class into a cell. And so there are tools that'll do that for you,

but that's only

the first part of the process because to actually take advantage of this properly,

you really wanna be thinking about the flow

of that notebook in terms of somebody reading it is not just reading it there. Hopefully, they're interacting with it. So I would kind of start with some automated tool to create a notebook that basically does the job

from the

from the source code of the module.

And then I'd start think you know, looking at my tests and thinking, okay. Well, which 1 of these

really quite descriptive of what this module's really doing? Can I turn those into kind of documentation

tests?

You know? And then what things in the documentation

can I kind of integrate with those and, you know, just gradually bring it together 1 piece at a time? You don't have to do it all at once.

And another interesting point is how nbdev

integrates with the rest of the Python ecosystem.

I'm thinking in particular about things like dependency management, whether you wanna use pip or

poetry

or

PIP tools

and

how it fits with things like linting

and just the overall

integration points that are available

for using nbdev for actually

building the project, but also taking advantage of the the rest of the developer tooling that exists for people using Python and building Python projects?

The integration is pretty good with, you mentioned PIP, for example.

So

nbdev

automatically

generates

standard setup tools,

setup packages. 1 of the nice things about it is that you have a single configuration

file that

your version number and description and so forth are in. So for something like your PyPI package, when it's uploaded,

that'll all be used. It'll automatically use your

index dotipynbnotebookfiletocreatethe

description that will appear in PyPy.

You know, things like poetry and stuff are not particularly

either here or there. You can use whatever

environment

you like.

Most of the developers of nbdev

generally use conda environments,

but you can do whatever you like there.

For

linting,

that's pretty orthogonal to nbdev. You can use whatever linter you like.

JupyterLab

has

extensions that lets you plug into whatever linter

you prefer,

or you could do it as part of the GitHub or GitHub actions for, again, working on the JSON file.

It's not

opinionated at all about

what the rest of your environment should look like and what other tools you might use.

In terms of people who are using nbdev and building things with it, what are some of the most interesting or innovative or unexpected ways that you've seen it used?

There's a lot of cool things that I've seen. So 1 example that sticks out of my mind is what Jeremy was describing earlier about the

Python client for GitHub's API that uses the open API spec. You know, if you go through GitHub's documentation,

you have to click on 20 different pages to see all the endpoints.

But because he's generating things from the open API spec, there's, like, a 1 pager of, like, all the endpoints,

and that's linked to all the various

things that you need to know about using that endpoint.

And that's integrated deeply into the documentation

itself

for the Python client also.

And so

when you try to use the Python client,

it's called gh API,

and you call help on an endpoint, you get a link in the docs

that take you to the GitHub

documentation

for the endpoint.

I've seen some really cool things people have done with documentation

to make the documentation richer

with regards to linking to other relevant sources automatically.

I think that's really cool.

In terms of your experience

of building and

working with nbdev, what are some of the most interesting or unexpected or challenging lessons that you've learned in that process?

I mean, the interesting lesson for me overall

is the power of

this type of software development.

It's fairly under the radar. People

don't know about it, but it certainly

is a really good insight into

how powerful this technique is for writing software.

And it also

gives you a window into, like, maybe how these tools could improve,

you know, in the future.

I guess 1 thing that's been a challenge,

been a slightly surprising

from some people kind of push back against the very idea of using anything other than a standard text editor to create code.

I find that

very new programmers

and extremely experienced programmers

are very interested in nvdev

and really wanna try it out.

But

there's a group of people who are, like, kind

of 3 to 8 year

experience marks or kind of intermediate level programmers

who

seem to find it almost threatening,

the idea that

people might wanna use something other than emacs or VM or Versus code or something to to and they kind of people get sometimes quite emotional

about, like,

no. That's not how real software engineers write real code. And this kind of emotional response

from some people is not something I expected.

I'm actually not surprised by that

dynamic,

I suppose.

I find that with many tools,

there often is a resistance to change,

especially something like developer tools. So people have just gotten used to the idea that they've

taken developer tools for granted, and they haven't changed, and people are resistant to the idea that

it could be an order of magnitude better.

So

I was skeptical too.

When I first got into it, I said, okay. I mean, certainly,

you know, developer tools are as kind of a staple.

They would have been improved

themselves if they could have been like, you know, how is it that much better? You know, I tried it and I was really surprised.

Another interesting

phenomenon

is so there's this book called Working in Public, the Making

and Maintenance of Open Source Software by Nadia Ekblal. She's a former GitHub employee. She's done a study,

a lot of open source projects

and kind of the dynamics of them. 1 thing that she

documents in her book is

deluge that maintainers face

in terms of low quality pull requests that they have to deal with.

And I've talked with Jeremy about this before. Like, we don't really see that across fast AI. And fast AI has tons of projects and is extremely popular on GitHub,

and it has a lot of activity.

And I think the reason for that is MBdev

kind of forces you to write high quality PRs.

And so I think it saved Jeremy's sanity

as a side effect,

which is really interesting from a maintainer's perspective

and the open source

economy perspective. Yeah. And speaking as somebody who uses emacs and has become very comfortable there, you know, the thought of editing in my browser

is painful in the regard of I've I've gotten so used to the keyboard commands,

but I'm also very attracted

by the possibility

of

weaving together the code and the documentation and the tests because it it can be all too easy to be working in a text editor and write the main body of the functionality of the code and then say, okay. Well, I'll get back to the test another time.

You know, you can have, you know, your test open in 1 window or in 1 buffer and your code in another and bounce between them. But, you know, I'm definitely interested in experimenting with nbdev to see how it works. But I'm also curious

what level of support there is for people who are very comfortable in their text editing environment, but still want to be able to take advantage of what Nvdev has to offer.

That describes me very much. I've been coding for many decades, and

as you can imagine, I'm in love with tooling

since I invest so much in tooling.

So I, yeah, I know every keyboard shortcut pretty much of every,

you know, piece of software I use. So I certainly

love to jump into

Vim and, you know, do some stuff with a quick macro

or some motion commands or whatever.

And, yeah, that's fine. You can do that with nbdev. You can edit the modules, the text files directly,

and sync back into the notebooks automatically.

I will say though that the more I use

nbdev and notebooks, the less I find myself

doing that.

I used to do it a little bit, but it was mainly kind of habit.

It's very nice to be able to

jump around to cells

rather than code and to kind of jump into through hierarchies.

You can kind of, like,

hack together hierarchies in

a max and vim and so forth, but I really like the true

hierarchical nature of notebooks that you can create actual headings and

stuff like that. So, yeah, you can certainly use your own editor if you want to, but I find I do it less and less.

For people who are interested

in experimenting with nbdev,

are there any problem domains or

integrations with existing libraries or workflows where you see nbdev as being the wrong choice or something that is incompatible

with the existing environment?

Hamil and I are both working on something that doesn't lend itself very well to nbdev, which is we're working on build tools.

So we're doing a lot of stuff with, you know, make files and

conda packages and automatic build systems

running on GitHub actions.

And so there's basically almost no Python involved, and it's yeah. It doesn't lend itself particularly well to nbdev. We were just saying to each other, we wished this morning it did because we

aren't really enjoying being outside of the notebook environment.

This is something I'd like to improve actually because Jupyter can do other kernels other than Python. There's a bash kernel, for example, which is kind of cool, and I've written some nice documentation using the bash kernel.

To my surprise,

I found

that nbdev and notebooks works very well for creating servers.

I didn't really expect that at first, but, actually, I found I could write servers with n b dev very nicely.

So, yeah, generally, I mean, I haven't found too much stuff that is largely Python based, which

isn't suited to MBdev. I don't know if you have AML or thought of other things like that. To be quite honest, at this point, MBdev is like crack to me. Like, I just

it's hard not to use it. It's very painful not to use

it. As you continue to work on the project, what are some of the plans that you have in store for the near to medium future?

We are doing a rewrite of NV Dev at the moment. I tend to rewrite

my major pieces of software every year or 2, which I really like.

And

the new version's gonna be orders of magnitude faster.

We're also looking at replacing the Jekyll based

documentation

with Hugo based documentation.

Again, 1 of the reasons there is for

performance that Hugo is really fast, which is very nice. We we kind of love

working with tools that

are fast enough that things feel almost instant.

Definitely isn't the case with Jekyll. You know, 1 of the things I've been thinking about also is supporting

directly building

c based extensions

by integrating Cython

within b dev.

So those are some of the big things that we're hoping to implement in the coming months.

Are there any other aspects of the Nbdev project or working in the notebook environment that we didn't discuss yet that you'd like to cover before we close out the show?

I think, you know, 1 thing that we may have not covered is is fast core. So, you know, fast core is kind of an extension. You can think of it as an extension almost

to the Python programming language.

I mean, don't take those words literally, but, you know, it adds a lot

of functionality

that's easy to access. That's important for MB dev because we've add there's a lot of utilities

in fast core

that make

using

Python and MBDEV a lot easier.

So for example,

if you have a let's say you have a really big class

that have tons of methods in them, and you wanna write pros

that surround your text, you might want to

define, like, a method in a different cell. You might not want this 1 giant cell for your let's say, you know, to your class.

Well, with fast core, give you easy ways to kind of break up that class so you can just pull the methods out into a different cell. It's all tested and works with this integration testing with MBdev. So a lot of utilities that just make your life a lot easier. I would recommend checking that out. It's a very interesting library.

For anybody who wants to get in touch with either of you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And with that, I'll move us into the picks.

And this week, I'm going to choose an audiobook I listened to with the family recently called Rivals, Frenemies Who Changed the World. It's just a really fun production of some

short pieces of from history about different people who were

at 1 point friends and then ended up creating these historic rivalries and the impact that that's had on our modern world. So a lot of fun stories there and just fun, production value to keep kids interested in learning some history. So definitely recommend checking that out. And so with that, I'll pass it to you, Jeremy. Do you have any picks this week?

My pick is the game of chess,

which I

always assumed was

really boring until

my 5 year old daughter started getting into it. And so we started playing a bit together, and I suddenly discovered is actually

really deep and much more fun than I expected.

Yeah. I'll definitely second that 1. And, Hamil, how about you, Dave? Any picks this week? Actually, Jeremy recommended this book to me, which I've been reading with great

interest and surprise.

It's called Moonwalking with Einstein

by Joshua

Foer.

Before reading this book, I thought

not having a good memory was a sign of stupidity.

But, actually,

this book goes into really deep in great detail about, like, how memory works,

common

misconceptions about memory,

how people that have good memory, what techniques

they often use, and what it means. So it's really fascinating.

Well, thank you both for taking the time today to join me and share the work that you've been doing with nbdev. It's definitely a very interesting project and 1 that I'll have to experiment with myself to try and understand

benefits that it can provide to my own development. So thank you for the time and effort you've put into that, and I hope you enjoy the rest of your day. Thank you very

much.

Thank you for listening. Don't forget to check out our other show, the Data Engineering

podcast@dataengineeringpodcast.com

for the latest on modern data management.

And visit the site of pythonpodcastdot

com to subscribe to the show, sign up for the mailing list, and read the show notes.

And if you've learned something or tried out a project from the show, then tell us about it. Email host at podcastinit.com

with your story.

To help other people find the show, please leave a review on Itunes and tell your friends and coworkers.

The Python Podcast.init

Summary

Announcements

Interview

Keep In Touch

Picks

Closing Announcements

Links

The Python Podcast.__init__

Summary

Announcements

Interview

Keep In Touch

Picks

Closing Announcements

Links

The Python Podcast.init