You can find past episodes and other information about the show at podcastinit.com
Brief Introduction
- Date of recording – June 3rd, 2015
- Hosts – Tobias Macey and Chris Patti
- Overview – Interview with Fernando Perez and Brian Granger, core developers of IPython/Project Jupyter
- Follow us on iTunes, Stitcher or TuneIn
- Give us feedback! (iTunes, Twitter, email, Disqus comments)
- You can donate (if you want)!
Interview with Brian Granger and Fernando Perez
- Introductions
- How did you get introduced to Python? – Chris
- For anyone who may not have heard of or used IPython, can you describe what it is?
- How challenging was it to port IPython to Python 3?
- What prompted the name change from IPython to Project Jupyter and were there any associated changes in the project itself?
- Name inspired by Julia, Python and R – the three programming languages of data science
- Data scientists have adopted the use of IPython notebooks in their work on a large scale, what is it about notebooks that lend themselves to this particular problem domain?
- IPython Notebook seems like an incredible tool for educators is advanced fields. Have you seen wide spread adoption in this area and is it a focus for the project?
- Github recently added the ability to render notebooks in a repo. Did you work with them to build that integration?
- What are some of the most interesting uses of IPython notebooks that you have seen?
- Gallery of interesting notebooks on the wiki
- Reproducible academic publications
- Couple of dozen scientific papers, some very high profile
- Educational notebooks on various subjects
- Great learning resource, as well as entertaining
- MOOC taught between distributed team on Open EdX using IPython notebooks about numerical computing with Python
- Peter Norvig collection of IPython notebooks
- notebooks.codeneuro.org– time series data analysis <- Couldn’t get this to work. -Chris
- Gallery of interesting notebooks on the wiki
- Are there any notable projects that use IPython as one of their components?
- KBase for computational biology
- Sage – Open source mathematics project written in Python
- Created by number theorist William Stein
- Custom parser to allow for non-python syntax
- Quantopian – Collaborative platform for financial modeling. Runs on top of IPython
- Wakari from Continuum Analytics – hosted IPython with computing environment
- Rackspace hosts TempNB and other IPython services
- Where do you see Project Jupyter going in the future? Are there any particular new features you’d like to see added? – Tobias
- One of the biggest targeted features is real-time collaboration
- Prototyped by engineers from Google
- More modular UI and architecture
- Multi-user deployments with Jupyter Hub
- One of the biggest targeted features is real-time collaboration
- A few weeks ago we interviewed Jonathan Slenders who wrote ptpython, which brings IDE like capabilities to interactive Python. Have you ever considered including this in IPython?
- What are some of the features that an average user might not know about?
- Is there anything in particular that you would like to ask our listeners for help with?
- Pitch in with the development effort
- Organize community events on behalf of IPython/Jupyter
- Be patient while documentation improves
Picks
- Tobias
- Chris
- Brian Granger
- Fernando Perez
Keep in Touch
- Twitter @projectjupyter, @ipythondev, @ellisonbg, @fperez_org
The intro and outro music is from Requiem for a Fish The Freak Fandango
Orchestra
/ CC BY-SA
[00:00:14]
Unknown:
Hello, and welcome to podcast.init. We're recording today on June 3, 2015. Your hosts as usual are Tobias Macy and Chris Patti. Today, we're interviewing Fernando Perez and Brian Granger about IPython. You can follow us on Itunes, Stitcher, or TuneIn Radio, and please give us feedback. You can leave us a review on Itunes or Stitcher, find us on Twitter, email us at hosts@podcastinit.com, or leave a comment on our show notes. Brian, Fernando, could you please introduce yourselves?
[00:00:48] Unknown:
Sure. Thank you folks for having us. My name is Fernando Perez. I am a staff scientist at Lawrence Berkeley National Laboratory, researcher at UC Berkeley, and I've been programming in Python since, 2001. I'm originally a particle physicist and applied mathematician. And A lot of what I do is really scientific open source software development around high content source tools.
[00:01:12] Unknown:
And I'm Brian Granger. I'm a physics professor at Cal Poly State University. It's a California state school on the central coast of California about 3 hours south of the Bay Area and I too am a physicist by training. Actually, Fernando and I were classmates in graduate school at the University of Colorado and originally I studied theoretical, atomic, molecular, and optical physics and I started with Python about I guess it was around 2000, late 2002, 2003, a little bit after Fernando got into it.
[00:01:50] Unknown:
Very cool. It's kind of amazing how strong the connection is between physics and computer science. I mean, it goes just kind of beyond, well, physicists use computers. I cannot tell you how many people in my career that I've encountered who have now basically given up physics and gone full time computer science in 1 stripe or other and really just kinda never looked back. It's there seems to be some sort of kindred souls thing going on between people who study physics and people who end up working in the tech industry.
[00:02:24] Unknown:
It's it's kind of interesting. Yeah. And I'll say too that of the number of physicists who I've spoken with who did make the transition or at least who do programming on a regular basis, a lot of them tend to use Python for whatever reason.
[00:02:39] Unknown:
Yeah. I I think it's got the amazing numerical capabilities that Python brings to the table that other languages in this sphere kinda don't. So how did you folks get introduced to Python?
[00:02:50] Unknown:
In my case, it was, a suggestion actually from my office mate, another grad student in my case. I had a stack of complicated a complicated stack of many programming languages that I used to run my simulations when I was in grad school. A lot of Perl, babysit supercomputing runs, a lot of Oc, Bash, Sed, Big C codes, gnuplot, IDL, Mathematica, and it was just a complicated pile of codes. My officemate told me, hey. You should look into this Python language. It's kinda like Perl, but a lot simpler, and it can actually do a lot of numerics. It's also interactive. It's a lot simpler, and you might be able to replace many of those tools with something easier. And I 1 day, I looked at the stack of books I had on my desk, and I realized I was probably spending more time context switching between languages, between 6 or 7 languages than actually getting any work done. And I realized maybe if I could replace 5 or 6 of those with just 1 and reduce that stack maybe to see for the high performance stuff and maybe Mathematica because of all this, I might not get get more work done. I mean, little did I think that I would just spend all my time running Python tools, but, but it was true that I can't use these tools. And so and when I started learning Python and I realized after about 24 hours reading the Python tutorial that I had that I was able to do in Python things that about, that after a couple of years of writing heavy duty Pearl, I had no idea whether they were even possible or not. I realized that I would never write another line of Pearl in my life.
So after that, I just stopped, and I I switched to Python, and I haven't stopped writing Python ever since.
[00:04:26] Unknown:
And, for me, I was actually as a postdoc at the time. I came back to visit Boulder for a conference, and Fernando was still in Boulder. We met and had dinner, and I think it was probably about a year after you started working in Python, Fernando. And I remember Fernando telling me about Python with this great excitement, and I went back to my post doc and 1 afternoon sat down and, as an exercise, ported over 1 of my codes that I wrote as a grad student looking at phase transitions and traffic flows. And I used that to learn the language. And I remember within a few hour period, it was all in 1 afternoon, getting everything working with the visualization and just being blown away at the ease of both learning it and also how the flexibility of the language allowed me to explore physical questions that wouldn't have even occurred to me because of the limitations of previously I was doing everything in c plus plus.
And I think seeing seeing how that flexibility of the language had a direct impact on my ability to do science was a huge selling point and over the next year, year and a half, I transitioned everything to, Python.
[00:05:53] Unknown:
It's very cool. It's it's really interesting how many scientists made that transition, made that leap from using Perl and, you know, being productive with it, but to moving to Python and finding that their productivity just kind of exploded. I certainly saw that happening at the Human Genome Project when I was there a number of years back, and people definitely regarded Perl as the sort of legacy code base that they were trying to root out wherever they had the opportunity to, and everything new was being written in Python, and they were much happier with it.
[00:06:25] Unknown:
Yeah.
[00:06:28] Unknown:
So it sounds like the 2 of you knew each other before you got involved in the IPython project. Can you tell us a little bit about how you guys
[00:06:36] Unknown:
met? Yeah. Yeah. 1st day of grad school, actually. I remember we were we yeah. We were both grad students, as Brian mentioned, in physics at CU Boulder, and sitting outside of the induction, 1st year grad students at in Boulder, Colorado in 1996, and we've been friends ever since.
[00:06:54] Unknown:
Great. So for anyone who may not have heard of or used IPython, can you describe what it is?
[00:07:01] Unknown:
Sure. IPython well, back in the day, IPython started its life as, as an interactive as an improving interactive shell for Python. I was a grad student in 2001, finishing my dissertation in in particle physics in in Boulder. And when I got when, at the behest of of my office mate, I I made the switch from Perl to Python. I realized, oh, wait a minute. I can actually use this thing and, and actually port a lot of my workflow to this tool, and I can add numerics, and I can add plotting, and I can actually run a lot of my codes in this tool, but I need a better interactive environment. If I'm going to actually use this tool, I wanna actually call my codes, I wanna access the file system, I want tab completion, I wanna run my scripts, I want to call out my figures, I want to actually run this as my environment. And I was I was used to tools like IDL, like Mathematica, like the Unix shell, and the interactive Python prompt is compared to those are very, very primitive tools. You can't do LS. You can't easily execute scripts and get really good trace packs. You can't easily do good debugging.
There it it it just isn't a very, sophisticated tool for that. And so, obviously, a grad student facing the option of either finishing a dissertation and writing it up or playing with code, the the choice was easy. And so I dug into just building a a better interactive shell, and thus was born. I was born in Python. Eventually, my advisor kind of called me to order, and I I I went back to actually getting getting work done, and, and graduated. But I put out the project out there. Initially, it was just, as all open source projects are born, kind of a a a 1 man show for a while. It was just put, hosted on my my web page at the university.
Then a company from Austin called Anthought offered me to put it up and host it. They were the company behind, behind the SciPy, library to host it on their CVS repositories and to host the mailing list, which today they continue to host the mailing list for the project to host it. And, they were the people behind the nascent SciPy Conferences and and the scientific Python community. And so at that point, IPython sort of became a project that became 1 piece in the puzzle of scientific Python and that's really what gave it life as a project and not just as a 1 off exercise. If it had been just that, it probably would have died as just a, okay, a 1 off little interactive exercise.
But because it became part of other of the the workflow of other scientists and other scientists began to contribute, and there was a conference that at the time was held every year at Caltech for scientists building sci fi and then Matplotlib and numerical computing tools and other tools for scientific computing where we all congregated every year to build these tools, then, a community of users began to grow around, not just type Python, but all of these tools where we collaborated, where we exchange code, where we we kind of work together on trying to build an entire ecosystem of similarly developed open source tools for scientific computing, around the Python language. And today, this is now a large effort. And in fact, it was it was, I think, at the second or third of those conferences where Brian and I started collaborating on IPython because, I mean, Brian and I, obviously, as we mentioned, we knew each other well, but we hadn't we didn't start working on IPython at the very beginning, but it didn't take very long before we joined forces on the project, and it was precisely after me spending some time at 1 of these scientific Python conferences where we're doing courses on the project.
[00:10:27] Unknown:
Cool. That's very interesting to think of particularly for anybody who uses IPython as part of their daily work flow that if it weren't for mthaw offering to host the project that it may not have continued to exist as we know it.
[00:10:41] Unknown:
No. They were extremely they were extremely generous and and and I I we owe them a huge debt of gratitude because they they provided support for the project at the very beginning and throughout the project, and they they supported us at critical junctures. They actually funded Brian and I in, 2, 009, 2010 at a critical point when we were developing the network protocols that actually allowed us to to build the machinery that eventually led to the QT console and the modern network architecture that supports the notebook and then the multi language, all of the modern features that really made possible kind of the renaissance of the project and its and its modern incarnation. All of that was thanks to, to MThought's support and financial contributions, and they've been a fantastic supporter of the community. They've supported the Sci Fi conference for many years, and so now we owe them a huge debt of gratitude.
[00:11:34] Unknown:
Another part of the history of the project that I was gonna add is that so Fernando and I started talking about some of the general ideas around interactive computing, in 2, 005 and almost immediately, we started to think about a notebook like interface. And from 2, 005 until 2011, we actually made multiple attempts to build something like this. I don't what was it? 4 or 5 attempts, Fernando?
[00:12:08] Unknown:
Something like 5. I mean, in in in version 0 0 1 of the system, there's even references And
[00:12:24] Unknown:
And it it really took us all those different iterations of trying to build this, trying to understand the problem before we could do a good job of it. The other part was that a lot of the tools that we rely on such as XeroMQ, WebSockets just didn't exist. And the notebook came about sort of over 3 summers in 2009, 'ten, and 'eleven. We had funding in 2, 009 and 'ten to refactor some of the old code base to get it ready to build this more network based model. That was in 2009. And then in 2010, Fernando and I had the funding from NthoT to build the network architecture that enabled the qt console.
And then in 2010, we built on top of that and and actually created the the current iteration of the notebook that it continues to exist today.
[00:13:19] Unknown:
Very cool. It struck me as you were as you were talking about its genesis, how how you as scientists had this need for a really great interactive environment more than just sort of a prompt with better editing capabilities and and things like that because a lot of languages have that. But IPython, from my perspective, is really 1 of the crown jewels of the Python ecosystem. It's something that's and IPython and IPython Notebook are really kind of unique. And I realize project Jupyter now encompasses other languages, but prior to that, there really was kinda nothing else like it that I've ever seen. And it really struck me that when when other people may ask in other programming language communities, well, why should we?
What's what's the the gain for us in in sort of doing the the work to make it more make our programming language more accessible or more desirable for science work. And I would point to that and say, this is something that grew out of a need that only scientists would really have to that extent, and now the rest of the community can benefit from it because I'm not a scientist by any stretch of the imagination. I'm a I'm a tech worker and, an information worker, I guess you'd say. And the things that I can do with IPython Notebook are are just amazing from my, you know, limited perspective.
So, that's really cool. Thank you for for telling us that story. How challenging was it to port IPython to Python 3?
[00:14:51] Unknown:
So the the the simple answer for Fernando and I is that it was completely trivial, but that's because he and I didn't do it. We, the summer of 2011, when we were building the the current version of the notebook, Thomas Clover, who's now 1 of the core developers on the project, came and started to to work on the Python 3 port. And he basically did the the initial version of that single handed and did an absolutely fantastic job with that. I I would say it was not a trivial port in the sense that most of the weeds importing to Python 3 have to do with strings and bytes and Unicode and a lot of what hypython deals with are strings.
Maybe it's a string of code or a file name or output and so getting all of that working right was fairly non trivial in terms of the technical details though. Thomas is our resident expert on that.
[00:15:58] Unknown:
So what prompted the I guess it wasn't a name change from my Python to project Jupyter, but what what prompted the creation of project Jupyter as an extension to IPython, and were there any large changes associated with that in the IPython project
[00:16:15] Unknown:
itself? Well, the the, the the prompting was kind of just a a natural gradual realization as we were building the machinery for the system where we said, well, wait a minute. Everything that we're doing here when we looked at the protocol, when we looked at how the notebook and the console and the and the graphical console worked, and we said, wow, we're sending code, it's executed on the other side by a kernel and comes back, the results come back. There's nothing in here that really is very specific about Python other than at the other end, something takes that and happens to interpret that string as a blob of Python.
It could be Perl or it could be Ruby or anything else for all it cares. Right? What if what if we just abstract that over, and then all of a sudden we have basically a generic protocol to do interactive computing in any language. Right? And what if we just clean we if we if we do that, we'll have to do a little bit of work because we've probably let, we've allowed kind of some hidden assumptions about there being Python creep into the system. But if we do that, we a unique system that will be that will allow us to reuse our architecture for And so once we kind of accepted to pay that price, it took some work. Yes.
Once we accepted to pay to pay to pay the price of of abstracting over programming languages and to clean up any assumptions that we had about it being specific to the Python language, then we we saw the benefit because now all of a sudden people began actually implementing the back end, for other languages. And so we actually went through the our implementation. We cleaned we cleaned things up, and we defined how to implement what we call the kernel for other for other languages. And people began implementing kernel for other kernels for other programming languages, and we began seeing a Haskell kernel and then a Ruby kernel and then, kernels for other languages. But then, obviously, the questions began appearing. Why is it called IPython if I'm running the Julia kernel? And, obviously, the Julia community wanted to call it IJulia, not IPython.
Why do I have to run at the command line, IPython dash dash Julia? Why do I have to run IPython dash dash profile Haskell? Right? And so we we wanted to come up with a name that would really indicate that the project is language agnostic. We may have a reference implementation of the kernel that may still be in Python. Much like in the Python language, CPython happens to be the reference implementation of the language, that happens to be written in c, but maybe at some point in the future. Who knows? PyPy may become the reference implementation in a few years if it if at some point it becomes better. So for us, IPython now is the reference implementation of our protocols. It is our default kernel. It is the 1 that we develop, and a lot of our tools are developed in Python. But Jupyter represents this larger project, which is about interactive computing. The name is inspired by Julia, Python, and R being the 3 what we consider to be the 3 open languages of data science. That name came to us thinking actually about a conversation that that I was having 1 evening at GitHub, with folks who were arguing about basically the those languages. And and at 1 point, I said, look. These languages are not enemies, of each other. We shouldn't be arguing amongst amongst ourselves. Julia, Python, and R are not enemies. The enemy is closed source science. These 3 languages are all partners in having open open source tools for scientific research.
But it's not an acronym. It really is it really is meant to simply represent 3 languages that that that are part of an open source ecosystem for scientific for for scientific computing and for computing, in general. It doesn't have to be scientific research. It really is about interactive computing in general. So
[00:20:05] Unknown:
I know that there's a project that I came across recently called the Beaker Notebook, which seems to be based largely on IPython and project Jupyter, and it seems that it has the capability of executing multiple languages within the same notebook. Does product Jupyter itself have that capability?
[00:20:27] Unknown:
Well, IPython the IPython kernel can actually execute multiple notebooks, multiple languages. If you prefix individual cells with double percent for example, if you prefix a cell with double percent pearl, that cell can execute in Ruby. If you prefix a cell with double percent, r, that cell can be an r. But that is specific to the IPython kernel. Our design is that the basic protocol refers to 1 kernel at a time. That kernel can define semantics, whatever semantics it wants. The IPython kernel happens to define very sophisticated semantics that can include more than 1 language. And in fact, the IPython kernel is capable of juggling in 1 process, Python, Fortran loaded code, Julia, r, Bash, Ruby, you name it. You can have 20 languages juggling back and forth in the same process. So we do have that capability within the IPython kernel. But the design of the protocol is that the notebook speaks to 1 kernel at a time.
[00:21:30] Unknown:
Data scientists have adopted the use of IPython notebooks in their work at a large scale. What is it about notebooks that lend themselves to this particular problem domain?
[00:21:41] Unknown:
I think it's the fact that in, in the context of data science, a lot of what you're trying to do is to combine you're not trying to develop a software application. You're trying to understand a specific problem typically centered around a specific dataset, and you're trying to extract insight and communicate your understanding of that problem, your understanding of that insight as a narrative. You're trying to extract that something out of it to communicate, and that communication may be to yourself because you're gonna try to read that tomorrow or to someone else, whether it's your partner, whether it's a client, whether it's scientific research that you're going to publish. The point is you're executing code, but that code is not to build software. That code is to grind through your data using your tools, your tool your Python tools, your R tools, your Julia tools, to visualize your data, to apply an algorithm, to reduce your data, and then you will likely produce tools, produce, visualizations, summarize plots, produce summary statistics. And then, typically, you will actually introduce a narrative in English and or in whatever language you speak. You will introduce human narrative, and it's the combination of that. It's the combination of the code, the results of the code, and the human narrative that put together constitutes the output that you care about. And that combination together really is it's sort of the the perfect format for that. Right? And the ability to have all of those things together and to share that as a compact unit that you can share with anyone, that you can send to someone, and they can directly jump into it. They can re execute, and that someone can be yourself 3 weeks from now or your colleague or your audience. It's sort of the perfect atomic unit. And I think that's why it has sort of caught on like wildfire. And and as we said earlier, as Brian alluded to earlier, this is something that we we fully acknowledge we did not invent and we wanted it because we, as grad students, had used these tools in our own research and we knew how powerful these ideas were.
[00:23:39] Unknown:
So, yeah, the as you mentioned, the capability of including a narrative in line with the processing and display of the data that you're talking about is very powerful and I've actually seen at at well, a few instances, but 1 in particular of somebody actually using IPython as a platform for writing a book. And the 1 I'm thinking of in particular is Bayesian Methods for Hackers, I believe, is the title by Cameron Davidson Pylon.
[00:24:07] Unknown:
Yeah. And there there's been a number of books or book like compilations of notebooks. Another example of that is Jose Unpinko. Started out writing a series of blog posts on doing signal processing with Python, and then he he wrote those blog posts as notebooks and then worked with Springer to turn those notebooks into a traditional Springer book on signal processing. And we're starting to see a number of other books developed in that way. And another exciting development along these lines is that in the last month, O'Reilly Media has built support for notebooks into their publishing platform which is Atlas and so O'Reilly authors can now author their content that end up in O'Reilly Books and online posts as notebooks and then also distribute that content to users as notebooks.
And so that's another example of these computational narratives being authored in different contexts and distributed in in different contexts.
[00:25:27] Unknown:
That's really very cool. I mean, I think that's that's something that's been missing. You know, when I when when you look at how younger people just aren't consuming books like a lot of us became used to sort of coming up in the field now. They're they're using online resources like blogs and and the like. I think 1 of the things that that is causing that, or at least contributing to it, is that lack of immediacy. And I think that having them be able to open up an ebook and bring up an IPython notebook with, if they're trying to learn functional programming or something, you know, a lesson in what are monads and how do I use them that's interactive that they can actually play with. I think that could really sort of that could save technical books.
[00:26:16] Unknown:
That that's something that we're very excited about.
[00:26:19] Unknown:
Yeah.
[00:26:20] Unknown:
So the IPython Notebook seems like a really great tool for educators in advanced fields. Have you seen widespread adoption in this area? And is it a focus for the project?
[00:26:30] Unknown:
Yeah. It's definitely a focus for the project, partially because we ourselves are educators, and we end up using the notebook a lot ourselves, to teach. So this is 1 area that we definitely dog food a lot. Just this quarter, actually today, I had the last class for this spring quarter of a computational physics course that I'm teaching here to undergrads, but we're seeing the notebook used for teaching across a wide range of fields ranging from bioinformatics to to physics to data science to computational fluid dynamics. And honestly, at this point, we're having trouble even keeping track of all the different people who are doing this now, and it it definitely is a focus for us in terms of development as well.
What we're finding is that if you're writing a notebook as a single individual, the workflow is fairly decent now, and and the user experience, when you start to work with a large group of people, such as in a classroom, in a collaborative context where you want to distribute materials to the students, you want to gather materials back from the students, there's grading involved. There's still a lot of significant pain points associated with that and we're we actually have a a new project called nbgrader, as in notebookgrader, that is designed to help address some of those pain points.
And, the lead developer of that is a grad student at Berkeley, Jess Hamrick, and she has been working on a course at Berkeley that's been using this since January, and then I've been using it for the last few months. And it definitely helps, but we still have a long ways to go and there's a lot of interesting work that I think in terms of really making the notebook practical for teaching, you know, we really need to address these points.
[00:28:36] Unknown:
That's very cool. I can I can totally see that as someone who, to be honest, I'm a really crappy student, to be just to be perfectly frank about it? And I think that, for me, 1 of the things that that made my classroom experience very difficult is that I definitely found there was a a real gulf when I sat there in class and just tried to absorb what was being taught to me and what I was being tested on. And so to have the opportunity to have this kind of interactive thing that I can noodle on and really sort of, like for me, I feel like I have a learning cliff, you know. I have to play with something and poke at it and prod at it and do it, do it, do it. And then all of a sudden, I really get it. And it sounds like using a notebook in the classroom context, and even in in terms of grading notebooks in a test environment sounds like something that I could really get behind. That's really exciting.
Yep. So, GitHub recently added the ability to render notebooks in a repo. Did you work with them to build that integration?
[00:29:49] Unknown:
Yeah. Though, to be fair, the the bulk of the hard work was done by the GitHub team, but we did coordinate with them for a long time, and we discussed we we discussed extensively sort of what what would be required for it and how we could help, with it. But the vast majority of the work was obviously on on their internal architecture. But, yes, it was something that we were very keen on seeing happen, and it was wonderful to collaborate with the GitHub team. And they were they were great, they were great about it.
[00:30:18] Unknown:
Yeah. It definitely seems like something that will really drive even more awareness of the presence and capabilities of IPython to people outside of Python who may have never come across it before?
[00:30:32] Unknown:
Yeah. We're we're we're really excited about that and I think, you know, for a long time now, we have had the n b viewer site that allows users to see a static HTML rendering of a notebook and the way nb viewer works is you could pass it a URL or a GitHub repo and org and view notebooks in that context and that that's been a very popular platform for people to share notebooks, but I think part of the awkwardness is that a lot of people are already on GitHub and they're browsing through repos and notebooks in that context and it does get to be a little difficult when you click on a notebook on at least this is how it used to be click on a notebook on GitHub and all you see is raw JSON data and so to to be able to see on GitHub the rendered notebook is is a huge improvement in the user experience.
And I I think part of what we're hoping is that this drives the discussion about open and reproducible science and journalism and we're starting to see more and more people who are using GitHub and the notebook as a platform for sharing their content associated with various types of publications in a way that allows users to come on and look at exactly what they've done and reproduce it if they want to.
[00:32:07] Unknown:
Very cool. Yeah, there's definitely I actually was just at the open data science conference here in Boston this past weekend, and Nice. There was a lot of discussion around open data and open access to data and exchange of ideas around that and it's definitely a narrative that I've been seeing pop up in a lot of places that people, even just the average citizen who may not have any interest or facility in technology are becoming more aware of the data that they produce and what that what sort of an impact that has on their lives, and also the fact that there's just a general need for them to become a little bit more well versed in how that data is used and how algorithms are present in everything that they do from when they wake up to when they go to bed if they live in any sort of a modern and developed nation.
So it's very cool to see IPython being used as a tool to help facilitate some of that transparency. So what are some of the most interesting uses of IPython and IPython Notebooks that you have seen?
[00:33:24] Unknown:
Well, we, we've tried to sort of enable enable things with with some of with some of our infrastructure, and we we collect we collect a gallery. We have a gallery of of interesting notebooks that of, where we keep track of some of some of the interesting things that we see. It's it's on our Wiki. It doesn't have it doesn't have absolutely, obviously, absolutely everything because it's it it it only has, it's it's limited by our own bandwidth. And it's a manually it's a manually curated directory. But, but there are there are things like reproducible and and this is obviously biased by our sign sign of interest. The things like reproducible academic publications is a is a section that, as as Brian was mentioning already, is particular interest to us. There is, already a couple of dozen scientific papers that you can, go to and some of which are very high profile scientific papers that you can click on and are accompanied by an entire GitHub repository that has carefully curated collections of notebooks. Some of them have 10 IPython notebooks that you can click through and follow every single step of the process to reproduce every table and figure in the paper, which is really, really beautiful to see scientists sort of saying, this is how I did all of the work in that paper, follow through, and and get all of my results.
You can see notebooks also that have purely educational value, notebooks on topics like sing signal processing, engineering, linguistics, machine learning, physics, chemistry, math, visualization, whimsical stuff like analyzing soccer soccer data, analyzing Wikipedia data, all kinds of topics. And and it's it's kind of an interesting collection for the public to peruse because you can learn about all kinds of things. And it's a very it's a very useful way of seeing the kinds of stuff that people are doing. There are also very notable projects that that that people have carried, that people have carried that that have kind of highly dedicated effort. For example, Lorena Auerbach, who is who is an educator that Brian mentioned earlier, recently put together a MOOC that was taught collaboratively.
She's a professor in mechanical engineering at George Washington University, and she put together a MOOC taught between herself, a colleague of hers at the University of Southampton, and another colleague of hers in Chile. And the 3 of them put together a move taught on the Open edX platform, that is an a large collection of IPython Notebooks on numerical computing, all of it available online, all of it presented on the Open edX platform and that you can work through as basically, and execute the entire set to teach yourself numerical computing with Python end to end. Another, another highlight that has been extremely popular is Peter Norvig's collection of iPython Notebooks. Peter Norvig, who's the director of research at Google, periodically will tackle interesting problems that he finds that he's thinking about.
And he will often post his solutions to to challenging problems as notebooks. And those notebooks tend to generate a lot of traffic online. We see them. We there are several of his posts are listed on on on our gallery, and they're very, very interesting. Recently, he posted 1 on the traveling salesman problem, for example, which is a fascinating exam discussion on how on dynamic programming and how to tackle the traveling salesman program, where he goes through and analyzes the problem very, very carefully. And it's a it's a great example precisely showing why the notebook is a good way is a good platform for this because he builds these little snippets of code where he explains what he's about to do. He builds a tiny little function, then he runs it. He shows the result and he builds another small function. The code is never very large, and you can follow the discussion and understand his train of thought. And and he tackles really, really complex logic and builds it gradually. And then finally, a very recent project that just came online a few days ago from a neuroscientist at Janelia Farm, a rising star neuroscientist in neuroscience called, Jeremy Freeman is, at notebooks.codeneuro.org.
He is a demonstration that uses some of our technology, some of the Docker images that we built, and a project that comes out of, IPython called TempNB that allows you to basically spawn ephemeral notebooks notebooks in Docker containers. And Jeremy has put together, the TempNB infrastructure together with his libraries, libraries that he has built using Spark, the, the, the distributed processing framework together with his code for the analysis of time series data. And he's put this beautiful system dot called notebooks.codnoord.org to demonstrate the analysis of time series data taken from in in this case, you have some examples from mice and from zebrafish.
And it's an entire demonstration that you can just open and click and and load up these container these Spark containers, with code and data with the entire Spark machinery ready to go and learn how to use these libraries and and run the demonstrations with real with, with real datasets running this code on on on very interesting scientific datasets. So it's a it's a it's a it's a very it's a very relevant and very timely timely example of using both our tools and the Spark framework as well as Jeremy's, Jeremy's research libraries to demonstrate it online.
[00:38:56] Unknown:
Yeah. That's 1 of the really kinda incredibly compelling things about IPython Notebook. Even as you as kinda dovetailing on what Tobias said for members of the general public. Right? Because lately, there has been kind of an increasing fascination with consuming data in in terms of these infographics that you see in places like wired and floating around the Internet and things like that. And what's kind of amazing about IPython Notebook, even for a layperson, is it's it's like, you know, going from a photo to virtual reality. It's that kind of a transition going from these static infographics into an IPython notebook because you could you the data is live. You can actually sort of slice it, dice it, manipulate it, view it differently, change it.
It's a living, breathing thing that you can actually really sort of interact with and move through and, you know, make your own inferences about. It's it's really kind of it's really just to say amazing is not giving it enough praise. It's kinda mind boggling, actually. So are there any notable projects that use IPython as 1 of their components?
[00:40:10] Unknown:
Yeah. There's, there's actually, quite a few and we have a list on the website. We probably just wanna highlight a few right now. 1, I I wanna plug my plug my day job because it's a very interesting, scientific research project funded by the Department of Energy. It's a large platform called KBASE for computational biology. It's a project led by Lawrence Berkman National Laboratory, and that is also which is a large collaboration also with Argonne National Lab, Oak Ridge National Laboratory, and Brookhaven National Lab to build a platform for, bioinformatics, at scale that is meant to allow biologists to basically analyze, large scale datasets and and actually collaborate on large scale analysis from single organ from single suborganisms to microbial communities all the way to plants, and plants and and modeling plants in their environments and do complex and predictive modeling across, data that lives in the entire system and collaborate on that in the entire interface. And sort of the the whole system runs as a as a highly customized system, through IPython. So the the what is called the narrative interface is really a very complex and customized IPython interface, and that actually is my main my main research project, in my day job as a scientist at the Department of Energy. And this is a project that has roughly 50 people working both on it at at the Department Energy.
So, it's currently in its 4th year. Another project that I wanna mention, which is a project that we've collaborated for a long time with, it's called Sage. It's an open source mathematics project, which is written in Python. It was started by a mathematician who used to be a professor at UCSD when he started it. He's now professor at the University of Washington in Seattle. Sage is a is a project that basically tries to unify many open source math projects as well as writing its own code, providing a lot of new code in Python, basically, for providing a a single unified environment for math. It's not so much for numerical computing. It's like NumPy and SciPy and Matloflib. Even though it uses NumPy, SciPy and Matloflib, it was really more designed for pure math it was originally especially, more designed for pure mathematics. The creator of Sage is William Stein. He's a number theorist.
And it's a very rich project. It has used like Python since the beginning. The the original, William contacted me back in 2005. The terminal console for Sage has always been a a very customized version of IPython because Sage has its own custom syntax, which isn't quite Python. He actually preparses Python to to allow syntax, which is invalid Python. And so for him, the fact that hypython allowed a custom parcel is perfect because that's exactly what he was what he wanted to do. Sage has had a notebook since the beginning, and there's been a very kind of playful back and forth between the Sage notebook and the IPython notebook where we've learned from Sage and Sage has learned from us. And we've there's been kind of a competitive collaboration where we've learned from each other over the years. But there's been a very but we've collaborated for a long time. It's been a very fruitful relationship, and we're very good friends with the Sage team.
Quantopian is a company, is a startup in, in the Boston area that has built a collaborative platform for financial modeling where the, the, environment allows people basically to design trading algorithms that run on the Quantopian platform. Quantopian offers people the data and the compute environment. And if people have algorithms that perform well, then they can basically, expose those algorithms to others and and and and market them. And that platform is actually built on top of the IPython machinery, running IPython kernels on top of their customized runtime and IPython notebooks. And Quantopian developers actually contribute to the project. They they have made fantastic contributions, and they've developed new infrastructure and and and new machinery for the project that they and we have a great relationship with them. Wakari, which is a hosted platform from a company called Continuum Analytics, which is another big player in the in the scientific computing ecosystem, is a company that offers effectively a completely hosted is a completely hosted IPython IPython notebook system that they've offered for a while. We should we should mention that a lot of the the hosted stuff that we've mentioned such as tempingb before and many of our services and a lot of the hosted notebooks that you see online, a lot of this is actually backed by the fact that, notebooks are available hosted by Rackspace. Many of our services are actually hosted by Rackspace because Rackspace actually develops a lot of our, hosts a lot of our services. Rackspace is a company that develops a lot of Python code. 1 of our core developers is a full time Python full time IPython developer and he's a full time Rackspace engineer.
And, they host a full time temp this ephemeral Python service called TempNB is hosted by Rackspace, and a lot of this infrastructure is, has been built, with Rackspace resources. So these are some of the both open source government and, commercial, projects that in 1 form or another are are building on top of the the IPython machinery. I hope that answers the question.
[00:45:30] Unknown:
Very much so. It does. That is a pretty impressive list. So definitely a lot of very cool projects for people to check out. There are a couple I've heard of, but a number that I haven't heard of as well. So I guess I know what I'm doing for the rest of the evening. So where do you see project Jupyter going in the future, and are there any particular new features you'd like to see added?
[00:45:53] Unknown:
Yeah. So we have quite ambitious plans for the future. It's been a huge transition for us over the last 12 months moving from the more narrow focus of just Python to the broader focus of many different languages and I think 1 of the the biggest challenges we have right now is scaling up organization from a human perspective. Part of that is attracting new developers but honestly part of it is just finding good ways of working with the developers we have. We we have enough people now working on the project that it's really challenging to just get all of us working in a productive way.
Part of that is working on getting more sustainable funding for the project and also building up the user community in useful ways. In terms of new features that we're looking at, 1 of the big areas is collaboration and the biggest feature that we're targeting right now in that front is real time collaboration similar to what you get in the context of Google Drive where multiple people can open up a document at the same time, edit it at the same time, see the edits. And about, I guess, it's about a year and a half ago now, some engineers at Google approached us with a very nice prototype of this capability that integrated the Google Cloud APIs for collaboration, the document storage with the IPython notebook and we currently have funding from Google for a postdoc at Berkeley that's helping us build that basically integrate that the work that they did into the Jupyter notebook more efficiently, and we have some other funding that's going to be coming online to help us with that work as well.
A lot of the work that we're looking at over the next, I would say, 2 to 3 years has to do with making our user interface and architecture more modular. We're seeing a lot of people who want to take different components of the notebook and plug it into different contexts. For example, what O'Reilly Media has done is taken the back end, the architecture for running code, but their front end is not the the traditional notebook user interface. They have a custom user interface and they're using part of our JavaScript code but right now it's a little bit wonky to do that and so we're spending a lot of time redesigning our user interface to make it more modular and part of this gets back to the user experience that people have in the notebook right now.
The latest version of the notebook also includes the text editor and a terminal And if you've used that, you'll quickly realize that you might want to have a notebook next to a text editor or next to a tabbed set of terminals. And so we're really pushing the limits on what's possible in the in the web browser in terms of like paneled layouts, paneled and tab layouts and we're we're going to be spending a lot of time working on the user interface. A lot of the things we're thinking about have to do with just the general usability and design of the notebook. While the current notebook definitely, I think, eases the pain, of a lot of usage cases, there's still a lot of pain points that our our users experience every day with our current notebook.
And so we're gonna spend a lot of time thinking about the usability of it. I don't know, Fernando. They're I'm probably missing a lot here. You wanna jump in on some of this?
[00:50:02] Unknown:
No. No. I I think those are. That that's pretty good. That's a that's a pretty good summary of the of the highlights.
[00:50:10] Unknown:
Well, if we get nothing else, I'm sure people will be absolutely ecstatic with just the features that you listed above because those are pretty remarkable. Even just the collaborative real time editing of a notebook would be pretty amazing for a lot of people. I'm sure.
[00:50:27] Unknown:
Actually, 1 other 1 other general area that we see a lot of work on and that is multi user deployments of the notebook. We have a new sub project called JupyterHub that adds a multi user deployment system for the notebook that you can deploy to multiple users on a centralized server or servers and there are different people using that in different ways. Temp n b, which we host is 1 example of that, but there's also a lot of other companies, startups, scientific projects, universities leveraging that in different ways and that's a very active area of development for us.
[00:51:14] Unknown:
May maybe maybe maybe 2 2 very quick comments. 1 is that the collaboration actually has is has multiple angles. The the real time live collaboration is is obviously a super important 1. But in the context of what Brian just said of of JupyterHub and and these multi user things, it's actually a much more nuanced problem because it really is about what does it mean to share projects that involve notebooks when notebooks are executable things. So when you share things on, say, Google Docs, you're just sharing static documents. When you share projects on something like, say, GitHub, you're sharing effectively repos where there's a model of there's the forking model of for repository contributions.
Neither of those models quite work well when you actually have executable state. And so there's a lot of nuance that comes into play when you're talking about sharing and collaborating on things that have live effectively live executable state in memory. And we haven't completely figured out exactly how that's what that should look like. And so there's a lot of really interesting questions in that. Some of which have to do with with the version which is about live live collaboration, but others go beyond that. And so there's a lot of open questions there that we'll be hammering on in the next few years. And, and the other the other point that I wanted to mention is the fact that we're also exploring sort of the questions that go in the direction of what we call sort of software engineering in the notebook, which is a kind of a catchphrase for asking, well, if I'm if I'm developing code interactively, but I want to extract maybe something reusable out of it, what would that look like if I don't simply want to say copy it out into a file and then turn it just into a library file that I edit in text editor. If I actually want to maybe keep it in a notebook because I maybe want to be able to go back and still test it interactively. If I wanna keep something which remains live because I want to keep it with live examples or something, but I still want to be able to use it as a library, How could we make the notebook environment better support those use cases? And we're not quite sure what that should look like, but we feel that it's our responsibility to sort of explore basically where the notebook creeps a little bit into an IDE. And that may be anathema to some people, but we feel that it's our job to kind of at least push that push that boundary a little bit.
[00:53:42] Unknown:
Yeah. I I guess 1 1 other area is the Jupyter notebook supports interactive widgets that allow you to quickly build user interfaces in the notebook documents that are bound to data and objects in the back end kernels and right now those those user interfaces can really can be embedded in these notebook documents and there's a very strong demand for people to be able to develop those user interfaces in a notebook document but then deploy them in more of a dashboard context where the consumers, the viewers of that content may not want to see any code at all. Maybe all they want to see is a visualization widget has some sliders and a text box and some check boxes to specify what algorithms are being run or the options passed to various functions.
And so the sort of viewing a notebook as a sort of web application slash dashboard type entity that can be deployed and used and reused in different contexts is something that we're we're thinking a lot about and there's a lot of interest in.
[00:55:01] Unknown:
That is so cool. I mean, just the the thought of being able to like, I'm thinking about in industry in my day to day work, I'm relatively new to Python, still I've only been using it for the last 6 or 7 months. And to be able to go to 1 of the more senior Python devs and say I'm having trouble with this piece of code and not having to completely, if they're working from home or something, deal with Skype or whatever the case may be, to be able to be able to share an IPython notebook with them with the code that they that we can then collaboratively edit and play with and evolve.
The the opportunities with that are just amazing to me. A few weeks ago, we interviewed Jonathan Slenders who wrote pt Python. I don't know if you folks have encountered that project, but it brings IDE like capabilities to interactive Python. So if you're at the prompt and you have, an object defined or even just, let's say, like string, you can say string dot and hit tab, and ptpython will give you, in your text prompt, a list of completions that you can then choose to invoke the method that you wanna invoke. Have you ever considered including this kind of capability in in IPython?
[00:56:19] Unknown:
There there's a thread there's a thread already with the author on the mailing list this week. So if you check our archives, like, 2 days ago, there's a thread with him already.
[00:56:27] Unknown:
That's great. I I I wonder if, you know, I perhaps concededly, I'm wondering if we had some small part in that because when we interviewed him a few weeks ago, he said he had done a shell or, you know, he had done some work to allow IPython to be used from within PTPython, but the 2 kernels were incompatible. And I I said to him at that point in time, well, you know, it sounds to me like maybe it might be worth talking to those guys because if you could add this capability to IPython, it would be a really incredible opportunity to to improve, you know, to raise the bar once again. That's great. I'm glad it's being thought about and potentially worked on. Now we're looking into it and partly because apparently there's there there seems to be really fundamental
[00:57:13] Unknown:
incompatibilities between Python 3.5 and the library that we need on Windows to support the the terminal called PyReadLine. Apparently, it doesn't work at all on Python 3.5. And, it apparently, the thing that this guy is using may work on Windows out of the box and it's pure Python dependency. So we may swap out we may swap out Pyreeline with his thing, and it may actually give us sort of better capabilities than ReadLine and kind of, and actually support a richer experience. ReadLine, even though it seems to be a simpler dependency because it apparently has shipped, it actually turns out to be a really thorny 1 because on on OSX, it's actually not shipped at fault because it is a it's a fake read line. You get to look at it, which is has its own set of problems. And on Windows, it doesn't even exist. And what we have is is this fake thing called PyredLine, which apparently on 3 5 doesn't even run because of changes that were made to Python. And it's not clear at all that will ever be fixed. And so we're we're we're actually looking seriously at the same.
That's great. I wouldn't know We don't we don't know what'll what'll happen, but but it is a possibility even for pretty serious technical reasons.
[00:58:21] Unknown:
Very cool. I wouldn't normally recommend that 1 set of guests go back and listen to an interview with another set of guests, but you might actually consider take giving a listen to our interview with Jonathan because he kind of explains how some of it works, and it really seems like a very elegantly well thought through implementation. It really honestly, talking with him really kind of impressed me. So I'm I'm glad to hear that you folks are looking at that. And and, honestly, it has moved me to go really browse some of his code because his implementation is very, very interesting. I think it can t it's it's nice in a way kind of a simple use case for asyncio, a very understandable use case. So it makes for some really interesting code reading.
[00:59:08] Unknown:
Martin here Martin here.
[00:59:10] Unknown:
Yeah. It's very cool having this connection between guests that we have interviewed and are interviewing. So just very cool to see how that comes about in such a broad and welcoming community as Python. So very cool. So what are some of the features of IPython and IPython Notebooks that an average user might not know about?
[00:59:34] Unknown:
So I'm gonna take a little bit of liberty and stretch the definition of the word average.
[00:59:41] Unknown:
Absolutely.
[00:59:42] Unknown:
And I think the the most important point that in this front is that the the architecture that all of these tools are built on is a very open flexible architecture and the the main building blocks of this, 1, is the notebook format. So the notebook documents are just a JSON data structure that is stored on the file system and there's a lot of flexible of flexibility around that notebook document. For example, it's really easy to read and write these notebook documents from just about any programming language you could imagine. So for example, you may have some system that's not the Jupyter notebook where there's content.
It would be, I mean, a few lines of code to import and export data from this other system to notebooks. Another example is the network protocol that we have for communicating with kernels where kernels are the separate process that we start that runs code in a particular language and there's a lot of people thinking about and working on various, I would say, instantiations of interactive computing systems and what we're seeing is that the the people who take the time to understand our architecture end up getting, like, superpowers in terms of what they can do in a really short period of time.
1 example of that is the Atom text editor that came out of GitHub and recently someone came up with an Adam plugin called hydrogen that can run code, that you have in the Adam text editor using any of the Jupyter kernels and the person who developed this took the time to understand our network protocol and how this worked and what they got from that is that their plug in could all of a sudden run code in any of the approximately 40 languages that we support and so with a very minimal amount of work compared to what it would take to get something talking to 40 different languages from scratch.
And so I I think the thing that I would encourage people is that, you know, if you're interested in in taking what we've done and using it in different ways, it will be really worth your time to to spend the time understanding the different building blocks that we have, the network architecture, and all of this And there's a lot of potential for reassembling these building blocks in new and interesting ways.
[01:02:39] Unknown:
So is there anything in particular that you would like to ask our listeners to help with?
[01:02:46] Unknown:
Yeah. I mean, obviously, first of all, as as every open source project is join us as kind of as developers. We're we're an open source project. There's lots of work to do on multiple areas. There's front end work to do in the notebook On the, on the UI, there's lots of JavaScript work to do. There's also work in the back end. There is, the the Brian already mentioned kind of the the document publishing pipeline, the the idea of writing notebooks and turning them into documents, into websites, into blog posts, into papers, into books. It's a complex pipeline, but it's a really interesting problem that has many different aspects and many targets. And and that can be actually a pretty fascinating 1 that, and that is somewhat isolated from other aspects of the project.
And so it it doesn't necessarily intersect with, with everything else. There's a piece of the project that that is somewhat technical but also well isolated, which we haven't talked too much about today, which is all of the parallel computing stuff, a little bit more technical, a little bit more on the scientific end, but also something where people can contribute. But an important and completely different 1 is that, as the community grows and as the project grows, 1 part that doesn't scale is us. We cannot be everywhere at once. We we cannot fly everywhere. We can't be in multiple locations. And so organizing meetups, organizing kind of community events in other places. And 1 thing that we do have a little bit of is resources. We can if if community if folks in the community want to organize a meetup and they want to basically run a small community event, we can probably send your way some resources, even some financial resources to support those events because we are getting some fun some, some funding that will allow us to support to the community in that fashion. And that is 1 way that actually helps us because it means we don't have to be the ones who do who do have to get on a plane to be everywhere.
And also be a little bit patient with us in the sense that we know that our documentation in some aspects of the project are not ideal. We are working hard, to improve them. We, we'll get there.
[01:05:00] Unknown:
So is there anything that we didn't ask you about or didn't talk about that you would like to bring up?
[01:05:06] Unknown:
Yeah. I think we we do wanna make sure that we that we credit, sort of 2 aspects of the project. 1 is our, the rest of our core developers. Even though Brian and I are here and have spoken today a lot, the project is not is doesn't exist of us. It really exists because of the rest of our core developers. We have we have a number of folks who work either full time or nearly full time on the project and who are absolutely fantastic. And they're really the ones who make all the magic happen. Jonathan Frederick, Kyle Kelly, Minragan Kelly, Thomas Kloiber, Matthias Busonnier, Jess Hamrick, Damian Avila, Sylvain Corley, Jason Grout, Ryan am I forgetting someone?
[01:05:52] Unknown:
Some of the folks at Google?
[01:05:54] Unknown:
Yes. Tester Tong, in particular, Kyra Patel. And I I may be forgetting, Paul Ivanoff, who is a little bit less active lately but was a very active core developer for a long time in the project. And also our institutional, sponsors, not, not just the the institutions that I happen to employ Brian and myself, Cal Poly and UC Berkeley slash LBL, but the fact that we have institutions who support our work by funding it, the Alfred P. Sloan Foundation, the Gordon and Betty Moore Foundation, Rackspace in particular has supported us enormously. Microsoft, Google, the Simons Foundation, Continuum Analytics, MThought.
If it wasn't, if it weren't for for these, for these entities, the project just wouldn't exist. Without their resources, it wouldn't exist. We also wanna give a shout out to Numfocus. Numfocus is a 5 0 1 c 3. It is a foundation that actually supports the project, in a slightly different manner. It's a foundation that, Brian is in the board of directors. I'm an ex member of the board of directors. I'm actually 1 of its founders. And it's kind of a parallel entity, if you will, to the Python Software Foundation, but that was created by the scientific computing community to really serve as a as an umbrella foundation for the open source scientific computing community to serve the needs of scientific projects that are that are open source open source project. And it is the project that hosts the IPython slash Jupyter project as a as a 501c3.
And, and Enfocus is our fiscal is our fiscal sponsor. It's our fiscal, it's our fiscal home. It's what allows us to receive to receive tax deductible tax deduct tax deductible donations. And, and, and that's a and it's a very important project in the in the scientific, in the opens in the open science community. It also hosts it's not just a Python project. It it hosts the Julia project. It hosts the r open science community. It hosts the software carpentry project and the data carpentry project. And without all of these people and and these projects, we wouldn't be anywhere.
[01:08:08] Unknown:
Yeah.
[01:08:10] Unknown:
That's great. It's it's really heartening to see the amount of community support and contribution that goes into a project like this and just the the amount of time and effort and dedication that everybody puts into it.
[01:08:24] Unknown:
That's great. At this point in time, we usually do the PIX. So, Tobias, why don't you get us started?
[01:08:31] Unknown:
Sure. So my first pick today is the Dayworlds trilogy by Philip Jose Farmer. It's a sci fi trilogy set in the future where population has grown to the point where it's not possible for everyone to be awake and active at the same time. So people are divided up into specific days of the week during which they're active and they're put into suspended animation during the rest of the week. And so houses are are outfitted with a basement with the suspended animation pods for the various families who live in that household during the different days of the week. And in this world, the most heinous crime is to be what they call a day breaker, which is somebody who is awake all days of the week.
And the protagonist of the story happens to be a day breaker, and it is just a very compelling and thought provoking exploration of the effects of population as it expands, and it's just very well written, very entertaining, highly recommend it. My next pick is a site called readruler.com, which is a site that will link up to your Pocket account and analyze all of the articles, and it will it it allows you to tag them with the length of time that it will take you to read them. And it actually has a, in the settings, you can go in. It has a reading sample so that you can calculate your reading speed so that it can more accurately tag the various articles. So I have a large number of articles in my pocket account that I have been meaning to get to, and I found this is a really interesting way of being able to pick up the low hanging fruit and see, okay, these are the very short articles without necessarily having to go through and look at them individually.
And I am going to leave it at that for today. So, Chris, please take it away.
[01:10:31] Unknown:
Cool. I'm gonna do commit a a little bit of heresy here On a Python podcast, I'm going to recommend a Ruby thing. I'm gonna recommend it because it is so good that I think that you can learn something about programming by watching these. It doesn't matter what programming language you're in. Yes. They're they're oriented towards Ruby, but the fundamental concepts that Avdi puts forth are really interesting, could be really could be useful to anyone, I think. There's a series of screencasts called Ruby Tapas. And what's amazing about them, I know everybody and their uncle has started producing paid screencasts these days, is they are the most perfect exemplar from my perspective of just little itsy bitsy, exactly what they say, tapas, little tidbits, little, like, 3 to 5 minute videos that show you 1 concept and only 1 concept, and and do it really clearly, really in a really straightforward way with a little bit of humor. They're just they're phenomenal. I I still subscribe to them even though I don't use Ruby much anymore.
I've kinda transitioned to Python in my day job. I still write Ruby with Chef, but I I still keep giving Avdi Man money because they're just that good. So there. The next pick that I have, we've mentioned her or rather I've mentioned her a bunch in the podcast, but 1 of our readers mentioned that I had not actually picked her work yet and so it hadn't been in our show notes. I just wanted to pick codenewbies.com. It's it's a site and associated gaggle of communities put together by Saran Yitbarak, we hope to have on our show at some point in the future. She is a a a, 1 woman dynamo, when it comes to helping people learn how to code and get their 1st job in the in in the technology field.
Really, it's it's an amazing resource, both obviously for people who are trying to learn how to code and even from my perspective, even if you're an old hand, it's a really great community. Her podcast is amazing. Her forums are great. You can help new people out and and probably learn something while you're doing it. It's, it's it's definitely worth if not looking into for your own sake, it's definitely worth being aware of just because it's amazing work that she's doing and deserves recognition. And, for my last pick, I'm going to pick a, a Twitter client for Max called tweetbot.
And it's 1 of those pieces of software that really just like hits the spot. The UI is beautiful. It is featureful. It does everything that I need it to do and more. It's very intuitive. The authors are very responsive. It works great on the Mac and on my iOS devices and syncs my tweets in between so that, you know, I can stop reading at work and or at home and then start reading on the subway and, it it keeps my place. It's just it's, you know, it's it's well worth its weight in gold, worth its price. I forget how much it is, but it's pretty cheap. So, yeah. That.
And I think that's about it for me this week. Brian, why don't you give us your picks?
[01:13:46] Unknown:
Yeah. So there's actually 2 books that I'm really enjoying right now. The first is a new book out by Joel Gruss. I think that's how you say his his last name. Apologies if not and in the book it's an O'Reilly book called Data Science from Scratch and part of what's fantastic about Python is that there's all of these libraries for doing data science such as Pandas, statsmodels, scikit learn and those libraries are great but I think they have a weakness in that they end up being very easy to use black boxes that don't necessarily help someone who's learning about data science to understand what the algorithms are actually doing.
And I think getting that deeper knowledge of what's actually going on is really important. And the the philosophy of data science from scratch is that he he walks through how to implement a lot of the core algorithms in data science from scratch without relying on scikit learn, for example, or pandas. And it's at a really nice level where someone who knows the basics of Python and NumPy and matplotlib is going to be able to pick this up and learn a lot about the internal details, and so I've I've been really enjoying that especially as I think about how to teach data science to undergrads. And then the second book is William Cleveland's Elements of Graphing Data.
I got this a few months ago and I've been spending time with it and it's just an extremely practical nice book about visualizations and plotting and it's it's a type of book that, you know, I spend 5 minutes with it and I feel like a lot of things that I had fuzzy thinking about in terms of visualization all of a sudden became crystal clear, like simple questions of should your should the tick marks on your axes be pointing inside or outwards? And, he says, well, you have data on the inside of the frame and if your tick marks point inwards, there's a good chance that the data is going to overlap the tick marks, making it difficult to see them so they should point out loops.
And obviously it's not a universal rule that you always must follow but it's a very practical way of thinking about choices like that that otherwise in the past I think I would have approached from a more of a visual aesthetic perspective and so it it's a wonderful book.
[01:16:39] Unknown:
Yeah. I I guess, kind of a little bit less pragmatic right now of a mindset. In terms of books, a book that has resonated a lot lately with me is Republic Lost from Lawrence Lessig. It's really a fantastic read, I think, for today's world from Lessig. Lessig kind of took a step back away from this. A lot of his work had been about code and the the culture of the Internet. And this is really a book about Congress and about the modern world of politics. And what I find fascinating about it is that it's it's a book about how, sort of how humans with good intentions and, and, fundamentally sort of trying in many ways trying to do the right thing, can can do, things that are so problematic to society.
And I think it's it's a book that has very enlightened lessons, I think to teach many of us, and and and asks really important questions about society. And the other 1 is actually an author more than a specific book from Colombia. I've I've heard a lot of, in the in in the geek world, because I'm I'm a Colombian. In the in the geek world, there are a lot of I hear a lot of people are, in the US, like, actually, a Latin American author called Jorge Luis Borges because a lot of his poetry and a lot of his short stories have a very kind of logical mathematical bent to it.
And, Columbia has an author that in my mind that it was very, was much less well known than Orjes, but who I would recommend, because there's echoes of Borges in in him, but, but with a little bit of a Garcia Marquez list. And his name is Alvaro Multis, and he has both short stories and poetry that, that has kind of some echoes of Borges with a lot of really crazy kind of character who has who is a wanderer traveler through the jungles of Colombia, who is a very interesting character called called the Macrolle Gabriel. And I would, I would highly recommend his novels to people who, to to people who have kind of a a love a love for Moriches' logical mindset, but at the same time, were interested in in a venture. So, again, Alvaro Maltis. He is available actually in English, but he's not very well known. And so I figured I would make, some, kind of slightly oddball picks.
[01:19:10] Unknown:
Great. Well, thank you very much, and thank you both for taking the time to come on the show with us today. So for anybody who is interested in following you and the things that you guys are doing, what would be the best way to do that?
[01:19:26] Unknown:
Twitter or email in my case?
[01:19:28] Unknown:
Yeah. Honestly, Twitter these days has become an invaluable way where we keep in touch with users, we hear about things,
[01:19:37] Unknown:
so that that's a great place to to go for that. Okay. So what Twitter handle should should users be following to to do that?
[01:19:46] Unknown:
So, the Twitter handle for Jupyter is at projectjupyter and the 1 for IPython is IPython dev And then, my Twitter handle is Ellison BG and Fernando Moses.
[01:20:08] Unknown:
At, fperez_org. Excellent.
[01:20:16] Unknown:
So, again, thank you very much for coming on the show. It has been a wonderful discussion, and I learned a whole lot about the IPython and Jupyter projects that I never knew before. And I definitely think I have a lot of things to read up about. So
[01:20:30] Unknown:
thank you very much. Me too. And I and I and I also just wanna really quickly, in addition to thanking Brian and Fernando, wanna thank Travis Oliphant for actually hooking us up with these 2 guys because the IPython notebook project is 1 of the first things that when Tobias and I were saying, hey, we should do a Python podcast, we both sort of batted around as, you know, in my opinion, this is 1 of the coolest things in the Python ecosystem. So we really wanted to talk to you from the very beginning. And, once again, thank you, Travis, for for for hooking us up.
[01:21:05] Unknown:
Thanks for having us. We really appreciate it. And, obviously, Travis is another 1 of our long standing friends, supporters, colleagues, and sort of heroes of our community. So thanks, Travis.
[01:21:16] Unknown:
Yeah. Thank thanks so much for having us on. It's been a pleasure.
Hello, and welcome to podcast.init. We're recording today on June 3, 2015. Your hosts as usual are Tobias Macy and Chris Patti. Today, we're interviewing Fernando Perez and Brian Granger about IPython. You can follow us on Itunes, Stitcher, or TuneIn Radio, and please give us feedback. You can leave us a review on Itunes or Stitcher, find us on Twitter, email us at hosts@podcastinit.com, or leave a comment on our show notes. Brian, Fernando, could you please introduce yourselves?
[00:00:48] Unknown:
Sure. Thank you folks for having us. My name is Fernando Perez. I am a staff scientist at Lawrence Berkeley National Laboratory, researcher at UC Berkeley, and I've been programming in Python since, 2001. I'm originally a particle physicist and applied mathematician. And A lot of what I do is really scientific open source software development around high content source tools.
[00:01:12] Unknown:
And I'm Brian Granger. I'm a physics professor at Cal Poly State University. It's a California state school on the central coast of California about 3 hours south of the Bay Area and I too am a physicist by training. Actually, Fernando and I were classmates in graduate school at the University of Colorado and originally I studied theoretical, atomic, molecular, and optical physics and I started with Python about I guess it was around 2000, late 2002, 2003, a little bit after Fernando got into it.
[00:01:50] Unknown:
Very cool. It's kind of amazing how strong the connection is between physics and computer science. I mean, it goes just kind of beyond, well, physicists use computers. I cannot tell you how many people in my career that I've encountered who have now basically given up physics and gone full time computer science in 1 stripe or other and really just kinda never looked back. It's there seems to be some sort of kindred souls thing going on between people who study physics and people who end up working in the tech industry.
[00:02:24] Unknown:
It's it's kind of interesting. Yeah. And I'll say too that of the number of physicists who I've spoken with who did make the transition or at least who do programming on a regular basis, a lot of them tend to use Python for whatever reason.
[00:02:39] Unknown:
Yeah. I I think it's got the amazing numerical capabilities that Python brings to the table that other languages in this sphere kinda don't. So how did you folks get introduced to Python?
[00:02:50] Unknown:
In my case, it was, a suggestion actually from my office mate, another grad student in my case. I had a stack of complicated a complicated stack of many programming languages that I used to run my simulations when I was in grad school. A lot of Perl, babysit supercomputing runs, a lot of Oc, Bash, Sed, Big C codes, gnuplot, IDL, Mathematica, and it was just a complicated pile of codes. My officemate told me, hey. You should look into this Python language. It's kinda like Perl, but a lot simpler, and it can actually do a lot of numerics. It's also interactive. It's a lot simpler, and you might be able to replace many of those tools with something easier. And I 1 day, I looked at the stack of books I had on my desk, and I realized I was probably spending more time context switching between languages, between 6 or 7 languages than actually getting any work done. And I realized maybe if I could replace 5 or 6 of those with just 1 and reduce that stack maybe to see for the high performance stuff and maybe Mathematica because of all this, I might not get get more work done. I mean, little did I think that I would just spend all my time running Python tools, but, but it was true that I can't use these tools. And so and when I started learning Python and I realized after about 24 hours reading the Python tutorial that I had that I was able to do in Python things that about, that after a couple of years of writing heavy duty Pearl, I had no idea whether they were even possible or not. I realized that I would never write another line of Pearl in my life.
So after that, I just stopped, and I I switched to Python, and I haven't stopped writing Python ever since.
[00:04:26] Unknown:
And, for me, I was actually as a postdoc at the time. I came back to visit Boulder for a conference, and Fernando was still in Boulder. We met and had dinner, and I think it was probably about a year after you started working in Python, Fernando. And I remember Fernando telling me about Python with this great excitement, and I went back to my post doc and 1 afternoon sat down and, as an exercise, ported over 1 of my codes that I wrote as a grad student looking at phase transitions and traffic flows. And I used that to learn the language. And I remember within a few hour period, it was all in 1 afternoon, getting everything working with the visualization and just being blown away at the ease of both learning it and also how the flexibility of the language allowed me to explore physical questions that wouldn't have even occurred to me because of the limitations of previously I was doing everything in c plus plus.
And I think seeing seeing how that flexibility of the language had a direct impact on my ability to do science was a huge selling point and over the next year, year and a half, I transitioned everything to, Python.
[00:05:53] Unknown:
It's very cool. It's it's really interesting how many scientists made that transition, made that leap from using Perl and, you know, being productive with it, but to moving to Python and finding that their productivity just kind of exploded. I certainly saw that happening at the Human Genome Project when I was there a number of years back, and people definitely regarded Perl as the sort of legacy code base that they were trying to root out wherever they had the opportunity to, and everything new was being written in Python, and they were much happier with it.
[00:06:25] Unknown:
Yeah.
[00:06:28] Unknown:
So it sounds like the 2 of you knew each other before you got involved in the IPython project. Can you tell us a little bit about how you guys
[00:06:36] Unknown:
met? Yeah. Yeah. 1st day of grad school, actually. I remember we were we yeah. We were both grad students, as Brian mentioned, in physics at CU Boulder, and sitting outside of the induction, 1st year grad students at in Boulder, Colorado in 1996, and we've been friends ever since.
[00:06:54] Unknown:
Great. So for anyone who may not have heard of or used IPython, can you describe what it is?
[00:07:01] Unknown:
Sure. IPython well, back in the day, IPython started its life as, as an interactive as an improving interactive shell for Python. I was a grad student in 2001, finishing my dissertation in in particle physics in in Boulder. And when I got when, at the behest of of my office mate, I I made the switch from Perl to Python. I realized, oh, wait a minute. I can actually use this thing and, and actually port a lot of my workflow to this tool, and I can add numerics, and I can add plotting, and I can actually run a lot of my codes in this tool, but I need a better interactive environment. If I'm going to actually use this tool, I wanna actually call my codes, I wanna access the file system, I want tab completion, I wanna run my scripts, I want to call out my figures, I want to actually run this as my environment. And I was I was used to tools like IDL, like Mathematica, like the Unix shell, and the interactive Python prompt is compared to those are very, very primitive tools. You can't do LS. You can't easily execute scripts and get really good trace packs. You can't easily do good debugging.
There it it it just isn't a very, sophisticated tool for that. And so, obviously, a grad student facing the option of either finishing a dissertation and writing it up or playing with code, the the choice was easy. And so I dug into just building a a better interactive shell, and thus was born. I was born in Python. Eventually, my advisor kind of called me to order, and I I I went back to actually getting getting work done, and, and graduated. But I put out the project out there. Initially, it was just, as all open source projects are born, kind of a a a 1 man show for a while. It was just put, hosted on my my web page at the university.
Then a company from Austin called Anthought offered me to put it up and host it. They were the company behind, behind the SciPy, library to host it on their CVS repositories and to host the mailing list, which today they continue to host the mailing list for the project to host it. And, they were the people behind the nascent SciPy Conferences and and the scientific Python community. And so at that point, IPython sort of became a project that became 1 piece in the puzzle of scientific Python and that's really what gave it life as a project and not just as a 1 off exercise. If it had been just that, it probably would have died as just a, okay, a 1 off little interactive exercise.
But because it became part of other of the the workflow of other scientists and other scientists began to contribute, and there was a conference that at the time was held every year at Caltech for scientists building sci fi and then Matplotlib and numerical computing tools and other tools for scientific computing where we all congregated every year to build these tools, then, a community of users began to grow around, not just type Python, but all of these tools where we collaborated, where we exchange code, where we we kind of work together on trying to build an entire ecosystem of similarly developed open source tools for scientific computing, around the Python language. And today, this is now a large effort. And in fact, it was it was, I think, at the second or third of those conferences where Brian and I started collaborating on IPython because, I mean, Brian and I, obviously, as we mentioned, we knew each other well, but we hadn't we didn't start working on IPython at the very beginning, but it didn't take very long before we joined forces on the project, and it was precisely after me spending some time at 1 of these scientific Python conferences where we're doing courses on the project.
[00:10:27] Unknown:
Cool. That's very interesting to think of particularly for anybody who uses IPython as part of their daily work flow that if it weren't for mthaw offering to host the project that it may not have continued to exist as we know it.
[00:10:41] Unknown:
No. They were extremely they were extremely generous and and and I I we owe them a huge debt of gratitude because they they provided support for the project at the very beginning and throughout the project, and they they supported us at critical junctures. They actually funded Brian and I in, 2, 009, 2010 at a critical point when we were developing the network protocols that actually allowed us to to build the machinery that eventually led to the QT console and the modern network architecture that supports the notebook and then the multi language, all of the modern features that really made possible kind of the renaissance of the project and its and its modern incarnation. All of that was thanks to, to MThought's support and financial contributions, and they've been a fantastic supporter of the community. They've supported the Sci Fi conference for many years, and so now we owe them a huge debt of gratitude.
[00:11:34] Unknown:
Another part of the history of the project that I was gonna add is that so Fernando and I started talking about some of the general ideas around interactive computing, in 2, 005 and almost immediately, we started to think about a notebook like interface. And from 2, 005 until 2011, we actually made multiple attempts to build something like this. I don't what was it? 4 or 5 attempts, Fernando?
[00:12:08] Unknown:
Something like 5. I mean, in in in version 0 0 1 of the system, there's even references And
[00:12:24] Unknown:
And it it really took us all those different iterations of trying to build this, trying to understand the problem before we could do a good job of it. The other part was that a lot of the tools that we rely on such as XeroMQ, WebSockets just didn't exist. And the notebook came about sort of over 3 summers in 2009, 'ten, and 'eleven. We had funding in 2, 009 and 'ten to refactor some of the old code base to get it ready to build this more network based model. That was in 2009. And then in 2010, Fernando and I had the funding from NthoT to build the network architecture that enabled the qt console.
And then in 2010, we built on top of that and and actually created the the current iteration of the notebook that it continues to exist today.
[00:13:19] Unknown:
Very cool. It struck me as you were as you were talking about its genesis, how how you as scientists had this need for a really great interactive environment more than just sort of a prompt with better editing capabilities and and things like that because a lot of languages have that. But IPython, from my perspective, is really 1 of the crown jewels of the Python ecosystem. It's something that's and IPython and IPython Notebook are really kind of unique. And I realize project Jupyter now encompasses other languages, but prior to that, there really was kinda nothing else like it that I've ever seen. And it really struck me that when when other people may ask in other programming language communities, well, why should we?
What's what's the the gain for us in in sort of doing the the work to make it more make our programming language more accessible or more desirable for science work. And I would point to that and say, this is something that grew out of a need that only scientists would really have to that extent, and now the rest of the community can benefit from it because I'm not a scientist by any stretch of the imagination. I'm a I'm a tech worker and, an information worker, I guess you'd say. And the things that I can do with IPython Notebook are are just amazing from my, you know, limited perspective.
So, that's really cool. Thank you for for telling us that story. How challenging was it to port IPython to Python 3?
[00:14:51] Unknown:
So the the the simple answer for Fernando and I is that it was completely trivial, but that's because he and I didn't do it. We, the summer of 2011, when we were building the the current version of the notebook, Thomas Clover, who's now 1 of the core developers on the project, came and started to to work on the Python 3 port. And he basically did the the initial version of that single handed and did an absolutely fantastic job with that. I I would say it was not a trivial port in the sense that most of the weeds importing to Python 3 have to do with strings and bytes and Unicode and a lot of what hypython deals with are strings.
Maybe it's a string of code or a file name or output and so getting all of that working right was fairly non trivial in terms of the technical details though. Thomas is our resident expert on that.
[00:15:58] Unknown:
So what prompted the I guess it wasn't a name change from my Python to project Jupyter, but what what prompted the creation of project Jupyter as an extension to IPython, and were there any large changes associated with that in the IPython project
[00:16:15] Unknown:
itself? Well, the the, the the prompting was kind of just a a natural gradual realization as we were building the machinery for the system where we said, well, wait a minute. Everything that we're doing here when we looked at the protocol, when we looked at how the notebook and the console and the and the graphical console worked, and we said, wow, we're sending code, it's executed on the other side by a kernel and comes back, the results come back. There's nothing in here that really is very specific about Python other than at the other end, something takes that and happens to interpret that string as a blob of Python.
It could be Perl or it could be Ruby or anything else for all it cares. Right? What if what if we just abstract that over, and then all of a sudden we have basically a generic protocol to do interactive computing in any language. Right? And what if we just clean we if we if we do that, we'll have to do a little bit of work because we've probably let, we've allowed kind of some hidden assumptions about there being Python creep into the system. But if we do that, we a unique system that will be that will allow us to reuse our architecture for And so once we kind of accepted to pay that price, it took some work. Yes.
Once we accepted to pay to pay to pay the price of of abstracting over programming languages and to clean up any assumptions that we had about it being specific to the Python language, then we we saw the benefit because now all of a sudden people began actually implementing the back end, for other languages. And so we actually went through the our implementation. We cleaned we cleaned things up, and we defined how to implement what we call the kernel for other for other languages. And people began implementing kernel for other kernels for other programming languages, and we began seeing a Haskell kernel and then a Ruby kernel and then, kernels for other languages. But then, obviously, the questions began appearing. Why is it called IPython if I'm running the Julia kernel? And, obviously, the Julia community wanted to call it IJulia, not IPython.
Why do I have to run at the command line, IPython dash dash Julia? Why do I have to run IPython dash dash profile Haskell? Right? And so we we wanted to come up with a name that would really indicate that the project is language agnostic. We may have a reference implementation of the kernel that may still be in Python. Much like in the Python language, CPython happens to be the reference implementation of the language, that happens to be written in c, but maybe at some point in the future. Who knows? PyPy may become the reference implementation in a few years if it if at some point it becomes better. So for us, IPython now is the reference implementation of our protocols. It is our default kernel. It is the 1 that we develop, and a lot of our tools are developed in Python. But Jupyter represents this larger project, which is about interactive computing. The name is inspired by Julia, Python, and R being the 3 what we consider to be the 3 open languages of data science. That name came to us thinking actually about a conversation that that I was having 1 evening at GitHub, with folks who were arguing about basically the those languages. And and at 1 point, I said, look. These languages are not enemies, of each other. We shouldn't be arguing amongst amongst ourselves. Julia, Python, and R are not enemies. The enemy is closed source science. These 3 languages are all partners in having open open source tools for scientific research.
But it's not an acronym. It really is it really is meant to simply represent 3 languages that that that are part of an open source ecosystem for scientific for for scientific computing and for computing, in general. It doesn't have to be scientific research. It really is about interactive computing in general. So
[00:20:05] Unknown:
I know that there's a project that I came across recently called the Beaker Notebook, which seems to be based largely on IPython and project Jupyter, and it seems that it has the capability of executing multiple languages within the same notebook. Does product Jupyter itself have that capability?
[00:20:27] Unknown:
Well, IPython the IPython kernel can actually execute multiple notebooks, multiple languages. If you prefix individual cells with double percent for example, if you prefix a cell with double percent pearl, that cell can execute in Ruby. If you prefix a cell with double percent, r, that cell can be an r. But that is specific to the IPython kernel. Our design is that the basic protocol refers to 1 kernel at a time. That kernel can define semantics, whatever semantics it wants. The IPython kernel happens to define very sophisticated semantics that can include more than 1 language. And in fact, the IPython kernel is capable of juggling in 1 process, Python, Fortran loaded code, Julia, r, Bash, Ruby, you name it. You can have 20 languages juggling back and forth in the same process. So we do have that capability within the IPython kernel. But the design of the protocol is that the notebook speaks to 1 kernel at a time.
[00:21:30] Unknown:
Data scientists have adopted the use of IPython notebooks in their work at a large scale. What is it about notebooks that lend themselves to this particular problem domain?
[00:21:41] Unknown:
I think it's the fact that in, in the context of data science, a lot of what you're trying to do is to combine you're not trying to develop a software application. You're trying to understand a specific problem typically centered around a specific dataset, and you're trying to extract insight and communicate your understanding of that problem, your understanding of that insight as a narrative. You're trying to extract that something out of it to communicate, and that communication may be to yourself because you're gonna try to read that tomorrow or to someone else, whether it's your partner, whether it's a client, whether it's scientific research that you're going to publish. The point is you're executing code, but that code is not to build software. That code is to grind through your data using your tools, your tool your Python tools, your R tools, your Julia tools, to visualize your data, to apply an algorithm, to reduce your data, and then you will likely produce tools, produce, visualizations, summarize plots, produce summary statistics. And then, typically, you will actually introduce a narrative in English and or in whatever language you speak. You will introduce human narrative, and it's the combination of that. It's the combination of the code, the results of the code, and the human narrative that put together constitutes the output that you care about. And that combination together really is it's sort of the the perfect format for that. Right? And the ability to have all of those things together and to share that as a compact unit that you can share with anyone, that you can send to someone, and they can directly jump into it. They can re execute, and that someone can be yourself 3 weeks from now or your colleague or your audience. It's sort of the perfect atomic unit. And I think that's why it has sort of caught on like wildfire. And and as we said earlier, as Brian alluded to earlier, this is something that we we fully acknowledge we did not invent and we wanted it because we, as grad students, had used these tools in our own research and we knew how powerful these ideas were.
[00:23:39] Unknown:
So, yeah, the as you mentioned, the capability of including a narrative in line with the processing and display of the data that you're talking about is very powerful and I've actually seen at at well, a few instances, but 1 in particular of somebody actually using IPython as a platform for writing a book. And the 1 I'm thinking of in particular is Bayesian Methods for Hackers, I believe, is the title by Cameron Davidson Pylon.
[00:24:07] Unknown:
Yeah. And there there's been a number of books or book like compilations of notebooks. Another example of that is Jose Unpinko. Started out writing a series of blog posts on doing signal processing with Python, and then he he wrote those blog posts as notebooks and then worked with Springer to turn those notebooks into a traditional Springer book on signal processing. And we're starting to see a number of other books developed in that way. And another exciting development along these lines is that in the last month, O'Reilly Media has built support for notebooks into their publishing platform which is Atlas and so O'Reilly authors can now author their content that end up in O'Reilly Books and online posts as notebooks and then also distribute that content to users as notebooks.
And so that's another example of these computational narratives being authored in different contexts and distributed in in different contexts.
[00:25:27] Unknown:
That's really very cool. I mean, I think that's that's something that's been missing. You know, when I when when you look at how younger people just aren't consuming books like a lot of us became used to sort of coming up in the field now. They're they're using online resources like blogs and and the like. I think 1 of the things that that is causing that, or at least contributing to it, is that lack of immediacy. And I think that having them be able to open up an ebook and bring up an IPython notebook with, if they're trying to learn functional programming or something, you know, a lesson in what are monads and how do I use them that's interactive that they can actually play with. I think that could really sort of that could save technical books.
[00:26:16] Unknown:
That that's something that we're very excited about.
[00:26:19] Unknown:
Yeah.
[00:26:20] Unknown:
So the IPython Notebook seems like a really great tool for educators in advanced fields. Have you seen widespread adoption in this area? And is it a focus for the project?
[00:26:30] Unknown:
Yeah. It's definitely a focus for the project, partially because we ourselves are educators, and we end up using the notebook a lot ourselves, to teach. So this is 1 area that we definitely dog food a lot. Just this quarter, actually today, I had the last class for this spring quarter of a computational physics course that I'm teaching here to undergrads, but we're seeing the notebook used for teaching across a wide range of fields ranging from bioinformatics to to physics to data science to computational fluid dynamics. And honestly, at this point, we're having trouble even keeping track of all the different people who are doing this now, and it it definitely is a focus for us in terms of development as well.
What we're finding is that if you're writing a notebook as a single individual, the workflow is fairly decent now, and and the user experience, when you start to work with a large group of people, such as in a classroom, in a collaborative context where you want to distribute materials to the students, you want to gather materials back from the students, there's grading involved. There's still a lot of significant pain points associated with that and we're we actually have a a new project called nbgrader, as in notebookgrader, that is designed to help address some of those pain points.
And, the lead developer of that is a grad student at Berkeley, Jess Hamrick, and she has been working on a course at Berkeley that's been using this since January, and then I've been using it for the last few months. And it definitely helps, but we still have a long ways to go and there's a lot of interesting work that I think in terms of really making the notebook practical for teaching, you know, we really need to address these points.
[00:28:36] Unknown:
That's very cool. I can I can totally see that as someone who, to be honest, I'm a really crappy student, to be just to be perfectly frank about it? And I think that, for me, 1 of the things that that made my classroom experience very difficult is that I definitely found there was a a real gulf when I sat there in class and just tried to absorb what was being taught to me and what I was being tested on. And so to have the opportunity to have this kind of interactive thing that I can noodle on and really sort of, like for me, I feel like I have a learning cliff, you know. I have to play with something and poke at it and prod at it and do it, do it, do it. And then all of a sudden, I really get it. And it sounds like using a notebook in the classroom context, and even in in terms of grading notebooks in a test environment sounds like something that I could really get behind. That's really exciting.
Yep. So, GitHub recently added the ability to render notebooks in a repo. Did you work with them to build that integration?
[00:29:49] Unknown:
Yeah. Though, to be fair, the the bulk of the hard work was done by the GitHub team, but we did coordinate with them for a long time, and we discussed we we discussed extensively sort of what what would be required for it and how we could help, with it. But the vast majority of the work was obviously on on their internal architecture. But, yes, it was something that we were very keen on seeing happen, and it was wonderful to collaborate with the GitHub team. And they were they were great, they were great about it.
[00:30:18] Unknown:
Yeah. It definitely seems like something that will really drive even more awareness of the presence and capabilities of IPython to people outside of Python who may have never come across it before?
[00:30:32] Unknown:
Yeah. We're we're we're really excited about that and I think, you know, for a long time now, we have had the n b viewer site that allows users to see a static HTML rendering of a notebook and the way nb viewer works is you could pass it a URL or a GitHub repo and org and view notebooks in that context and that that's been a very popular platform for people to share notebooks, but I think part of the awkwardness is that a lot of people are already on GitHub and they're browsing through repos and notebooks in that context and it does get to be a little difficult when you click on a notebook on at least this is how it used to be click on a notebook on GitHub and all you see is raw JSON data and so to to be able to see on GitHub the rendered notebook is is a huge improvement in the user experience.
And I I think part of what we're hoping is that this drives the discussion about open and reproducible science and journalism and we're starting to see more and more people who are using GitHub and the notebook as a platform for sharing their content associated with various types of publications in a way that allows users to come on and look at exactly what they've done and reproduce it if they want to.
[00:32:07] Unknown:
Very cool. Yeah, there's definitely I actually was just at the open data science conference here in Boston this past weekend, and Nice. There was a lot of discussion around open data and open access to data and exchange of ideas around that and it's definitely a narrative that I've been seeing pop up in a lot of places that people, even just the average citizen who may not have any interest or facility in technology are becoming more aware of the data that they produce and what that what sort of an impact that has on their lives, and also the fact that there's just a general need for them to become a little bit more well versed in how that data is used and how algorithms are present in everything that they do from when they wake up to when they go to bed if they live in any sort of a modern and developed nation.
So it's very cool to see IPython being used as a tool to help facilitate some of that transparency. So what are some of the most interesting uses of IPython and IPython Notebooks that you have seen?
[00:33:24] Unknown:
Well, we, we've tried to sort of enable enable things with with some of with some of our infrastructure, and we we collect we collect a gallery. We have a gallery of of interesting notebooks that of, where we keep track of some of some of the interesting things that we see. It's it's on our Wiki. It doesn't have it doesn't have absolutely, obviously, absolutely everything because it's it it it only has, it's it's limited by our own bandwidth. And it's a manually it's a manually curated directory. But, but there are there are things like reproducible and and this is obviously biased by our sign sign of interest. The things like reproducible academic publications is a is a section that, as as Brian was mentioning already, is particular interest to us. There is, already a couple of dozen scientific papers that you can, go to and some of which are very high profile scientific papers that you can click on and are accompanied by an entire GitHub repository that has carefully curated collections of notebooks. Some of them have 10 IPython notebooks that you can click through and follow every single step of the process to reproduce every table and figure in the paper, which is really, really beautiful to see scientists sort of saying, this is how I did all of the work in that paper, follow through, and and get all of my results.
You can see notebooks also that have purely educational value, notebooks on topics like sing signal processing, engineering, linguistics, machine learning, physics, chemistry, math, visualization, whimsical stuff like analyzing soccer soccer data, analyzing Wikipedia data, all kinds of topics. And and it's it's kind of an interesting collection for the public to peruse because you can learn about all kinds of things. And it's a very it's a very useful way of seeing the kinds of stuff that people are doing. There are also very notable projects that that that people have carried, that people have carried that that have kind of highly dedicated effort. For example, Lorena Auerbach, who is who is an educator that Brian mentioned earlier, recently put together a MOOC that was taught collaboratively.
She's a professor in mechanical engineering at George Washington University, and she put together a MOOC taught between herself, a colleague of hers at the University of Southampton, and another colleague of hers in Chile. And the 3 of them put together a move taught on the Open edX platform, that is an a large collection of IPython Notebooks on numerical computing, all of it available online, all of it presented on the Open edX platform and that you can work through as basically, and execute the entire set to teach yourself numerical computing with Python end to end. Another, another highlight that has been extremely popular is Peter Norvig's collection of iPython Notebooks. Peter Norvig, who's the director of research at Google, periodically will tackle interesting problems that he finds that he's thinking about.
And he will often post his solutions to to challenging problems as notebooks. And those notebooks tend to generate a lot of traffic online. We see them. We there are several of his posts are listed on on on our gallery, and they're very, very interesting. Recently, he posted 1 on the traveling salesman problem, for example, which is a fascinating exam discussion on how on dynamic programming and how to tackle the traveling salesman program, where he goes through and analyzes the problem very, very carefully. And it's a it's a great example precisely showing why the notebook is a good way is a good platform for this because he builds these little snippets of code where he explains what he's about to do. He builds a tiny little function, then he runs it. He shows the result and he builds another small function. The code is never very large, and you can follow the discussion and understand his train of thought. And and he tackles really, really complex logic and builds it gradually. And then finally, a very recent project that just came online a few days ago from a neuroscientist at Janelia Farm, a rising star neuroscientist in neuroscience called, Jeremy Freeman is, at notebooks.codeneuro.org.
He is a demonstration that uses some of our technology, some of the Docker images that we built, and a project that comes out of, IPython called TempNB that allows you to basically spawn ephemeral notebooks notebooks in Docker containers. And Jeremy has put together, the TempNB infrastructure together with his libraries, libraries that he has built using Spark, the, the, the distributed processing framework together with his code for the analysis of time series data. And he's put this beautiful system dot called notebooks.codnoord.org to demonstrate the analysis of time series data taken from in in this case, you have some examples from mice and from zebrafish.
And it's an entire demonstration that you can just open and click and and load up these container these Spark containers, with code and data with the entire Spark machinery ready to go and learn how to use these libraries and and run the demonstrations with real with, with real datasets running this code on on on very interesting scientific datasets. So it's a it's a it's a it's a very it's a very relevant and very timely timely example of using both our tools and the Spark framework as well as Jeremy's, Jeremy's research libraries to demonstrate it online.
[00:38:56] Unknown:
Yeah. That's 1 of the really kinda incredibly compelling things about IPython Notebook. Even as you as kinda dovetailing on what Tobias said for members of the general public. Right? Because lately, there has been kind of an increasing fascination with consuming data in in terms of these infographics that you see in places like wired and floating around the Internet and things like that. And what's kind of amazing about IPython Notebook, even for a layperson, is it's it's like, you know, going from a photo to virtual reality. It's that kind of a transition going from these static infographics into an IPython notebook because you could you the data is live. You can actually sort of slice it, dice it, manipulate it, view it differently, change it.
It's a living, breathing thing that you can actually really sort of interact with and move through and, you know, make your own inferences about. It's it's really kind of it's really just to say amazing is not giving it enough praise. It's kinda mind boggling, actually. So are there any notable projects that use IPython as 1 of their components?
[00:40:10] Unknown:
Yeah. There's, there's actually, quite a few and we have a list on the website. We probably just wanna highlight a few right now. 1, I I wanna plug my plug my day job because it's a very interesting, scientific research project funded by the Department of Energy. It's a large platform called KBASE for computational biology. It's a project led by Lawrence Berkman National Laboratory, and that is also which is a large collaboration also with Argonne National Lab, Oak Ridge National Laboratory, and Brookhaven National Lab to build a platform for, bioinformatics, at scale that is meant to allow biologists to basically analyze, large scale datasets and and actually collaborate on large scale analysis from single organ from single suborganisms to microbial communities all the way to plants, and plants and and modeling plants in their environments and do complex and predictive modeling across, data that lives in the entire system and collaborate on that in the entire interface. And sort of the the whole system runs as a as a highly customized system, through IPython. So the the what is called the narrative interface is really a very complex and customized IPython interface, and that actually is my main my main research project, in my day job as a scientist at the Department of Energy. And this is a project that has roughly 50 people working both on it at at the Department Energy.
So, it's currently in its 4th year. Another project that I wanna mention, which is a project that we've collaborated for a long time with, it's called Sage. It's an open source mathematics project, which is written in Python. It was started by a mathematician who used to be a professor at UCSD when he started it. He's now professor at the University of Washington in Seattle. Sage is a is a project that basically tries to unify many open source math projects as well as writing its own code, providing a lot of new code in Python, basically, for providing a a single unified environment for math. It's not so much for numerical computing. It's like NumPy and SciPy and Matloflib. Even though it uses NumPy, SciPy and Matloflib, it was really more designed for pure math it was originally especially, more designed for pure mathematics. The creator of Sage is William Stein. He's a number theorist.
And it's a very rich project. It has used like Python since the beginning. The the original, William contacted me back in 2005. The terminal console for Sage has always been a a very customized version of IPython because Sage has its own custom syntax, which isn't quite Python. He actually preparses Python to to allow syntax, which is invalid Python. And so for him, the fact that hypython allowed a custom parcel is perfect because that's exactly what he was what he wanted to do. Sage has had a notebook since the beginning, and there's been a very kind of playful back and forth between the Sage notebook and the IPython notebook where we've learned from Sage and Sage has learned from us. And we've there's been kind of a competitive collaboration where we've learned from each other over the years. But there's been a very but we've collaborated for a long time. It's been a very fruitful relationship, and we're very good friends with the Sage team.
Quantopian is a company, is a startup in, in the Boston area that has built a collaborative platform for financial modeling where the, the, environment allows people basically to design trading algorithms that run on the Quantopian platform. Quantopian offers people the data and the compute environment. And if people have algorithms that perform well, then they can basically, expose those algorithms to others and and and and market them. And that platform is actually built on top of the IPython machinery, running IPython kernels on top of their customized runtime and IPython notebooks. And Quantopian developers actually contribute to the project. They they have made fantastic contributions, and they've developed new infrastructure and and and new machinery for the project that they and we have a great relationship with them. Wakari, which is a hosted platform from a company called Continuum Analytics, which is another big player in the in the scientific computing ecosystem, is a company that offers effectively a completely hosted is a completely hosted IPython IPython notebook system that they've offered for a while. We should we should mention that a lot of the the hosted stuff that we've mentioned such as tempingb before and many of our services and a lot of the hosted notebooks that you see online, a lot of this is actually backed by the fact that, notebooks are available hosted by Rackspace. Many of our services are actually hosted by Rackspace because Rackspace actually develops a lot of our, hosts a lot of our services. Rackspace is a company that develops a lot of Python code. 1 of our core developers is a full time Python full time IPython developer and he's a full time Rackspace engineer.
And, they host a full time temp this ephemeral Python service called TempNB is hosted by Rackspace, and a lot of this infrastructure is, has been built, with Rackspace resources. So these are some of the both open source government and, commercial, projects that in 1 form or another are are building on top of the the IPython machinery. I hope that answers the question.
[00:45:30] Unknown:
Very much so. It does. That is a pretty impressive list. So definitely a lot of very cool projects for people to check out. There are a couple I've heard of, but a number that I haven't heard of as well. So I guess I know what I'm doing for the rest of the evening. So where do you see project Jupyter going in the future, and are there any particular new features you'd like to see added?
[00:45:53] Unknown:
Yeah. So we have quite ambitious plans for the future. It's been a huge transition for us over the last 12 months moving from the more narrow focus of just Python to the broader focus of many different languages and I think 1 of the the biggest challenges we have right now is scaling up organization from a human perspective. Part of that is attracting new developers but honestly part of it is just finding good ways of working with the developers we have. We we have enough people now working on the project that it's really challenging to just get all of us working in a productive way.
Part of that is working on getting more sustainable funding for the project and also building up the user community in useful ways. In terms of new features that we're looking at, 1 of the big areas is collaboration and the biggest feature that we're targeting right now in that front is real time collaboration similar to what you get in the context of Google Drive where multiple people can open up a document at the same time, edit it at the same time, see the edits. And about, I guess, it's about a year and a half ago now, some engineers at Google approached us with a very nice prototype of this capability that integrated the Google Cloud APIs for collaboration, the document storage with the IPython notebook and we currently have funding from Google for a postdoc at Berkeley that's helping us build that basically integrate that the work that they did into the Jupyter notebook more efficiently, and we have some other funding that's going to be coming online to help us with that work as well.
A lot of the work that we're looking at over the next, I would say, 2 to 3 years has to do with making our user interface and architecture more modular. We're seeing a lot of people who want to take different components of the notebook and plug it into different contexts. For example, what O'Reilly Media has done is taken the back end, the architecture for running code, but their front end is not the the traditional notebook user interface. They have a custom user interface and they're using part of our JavaScript code but right now it's a little bit wonky to do that and so we're spending a lot of time redesigning our user interface to make it more modular and part of this gets back to the user experience that people have in the notebook right now.
The latest version of the notebook also includes the text editor and a terminal And if you've used that, you'll quickly realize that you might want to have a notebook next to a text editor or next to a tabbed set of terminals. And so we're really pushing the limits on what's possible in the in the web browser in terms of like paneled layouts, paneled and tab layouts and we're we're going to be spending a lot of time working on the user interface. A lot of the things we're thinking about have to do with just the general usability and design of the notebook. While the current notebook definitely, I think, eases the pain, of a lot of usage cases, there's still a lot of pain points that our our users experience every day with our current notebook.
And so we're gonna spend a lot of time thinking about the usability of it. I don't know, Fernando. They're I'm probably missing a lot here. You wanna jump in on some of this?
[00:50:02] Unknown:
No. No. I I think those are. That that's pretty good. That's a that's a pretty good summary of the of the highlights.
[00:50:10] Unknown:
Well, if we get nothing else, I'm sure people will be absolutely ecstatic with just the features that you listed above because those are pretty remarkable. Even just the collaborative real time editing of a notebook would be pretty amazing for a lot of people. I'm sure.
[00:50:27] Unknown:
Actually, 1 other 1 other general area that we see a lot of work on and that is multi user deployments of the notebook. We have a new sub project called JupyterHub that adds a multi user deployment system for the notebook that you can deploy to multiple users on a centralized server or servers and there are different people using that in different ways. Temp n b, which we host is 1 example of that, but there's also a lot of other companies, startups, scientific projects, universities leveraging that in different ways and that's a very active area of development for us.
[00:51:14] Unknown:
May maybe maybe maybe 2 2 very quick comments. 1 is that the collaboration actually has is has multiple angles. The the real time live collaboration is is obviously a super important 1. But in the context of what Brian just said of of JupyterHub and and these multi user things, it's actually a much more nuanced problem because it really is about what does it mean to share projects that involve notebooks when notebooks are executable things. So when you share things on, say, Google Docs, you're just sharing static documents. When you share projects on something like, say, GitHub, you're sharing effectively repos where there's a model of there's the forking model of for repository contributions.
Neither of those models quite work well when you actually have executable state. And so there's a lot of nuance that comes into play when you're talking about sharing and collaborating on things that have live effectively live executable state in memory. And we haven't completely figured out exactly how that's what that should look like. And so there's a lot of really interesting questions in that. Some of which have to do with with the version which is about live live collaboration, but others go beyond that. And so there's a lot of open questions there that we'll be hammering on in the next few years. And, and the other the other point that I wanted to mention is the fact that we're also exploring sort of the questions that go in the direction of what we call sort of software engineering in the notebook, which is a kind of a catchphrase for asking, well, if I'm if I'm developing code interactively, but I want to extract maybe something reusable out of it, what would that look like if I don't simply want to say copy it out into a file and then turn it just into a library file that I edit in text editor. If I actually want to maybe keep it in a notebook because I maybe want to be able to go back and still test it interactively. If I wanna keep something which remains live because I want to keep it with live examples or something, but I still want to be able to use it as a library, How could we make the notebook environment better support those use cases? And we're not quite sure what that should look like, but we feel that it's our responsibility to sort of explore basically where the notebook creeps a little bit into an IDE. And that may be anathema to some people, but we feel that it's our job to kind of at least push that push that boundary a little bit.
[00:53:42] Unknown:
Yeah. I I guess 1 1 other area is the Jupyter notebook supports interactive widgets that allow you to quickly build user interfaces in the notebook documents that are bound to data and objects in the back end kernels and right now those those user interfaces can really can be embedded in these notebook documents and there's a very strong demand for people to be able to develop those user interfaces in a notebook document but then deploy them in more of a dashboard context where the consumers, the viewers of that content may not want to see any code at all. Maybe all they want to see is a visualization widget has some sliders and a text box and some check boxes to specify what algorithms are being run or the options passed to various functions.
And so the sort of viewing a notebook as a sort of web application slash dashboard type entity that can be deployed and used and reused in different contexts is something that we're we're thinking a lot about and there's a lot of interest in.
[00:55:01] Unknown:
That is so cool. I mean, just the the thought of being able to like, I'm thinking about in industry in my day to day work, I'm relatively new to Python, still I've only been using it for the last 6 or 7 months. And to be able to go to 1 of the more senior Python devs and say I'm having trouble with this piece of code and not having to completely, if they're working from home or something, deal with Skype or whatever the case may be, to be able to be able to share an IPython notebook with them with the code that they that we can then collaboratively edit and play with and evolve.
The the opportunities with that are just amazing to me. A few weeks ago, we interviewed Jonathan Slenders who wrote pt Python. I don't know if you folks have encountered that project, but it brings IDE like capabilities to interactive Python. So if you're at the prompt and you have, an object defined or even just, let's say, like string, you can say string dot and hit tab, and ptpython will give you, in your text prompt, a list of completions that you can then choose to invoke the method that you wanna invoke. Have you ever considered including this kind of capability in in IPython?
[00:56:19] Unknown:
There there's a thread there's a thread already with the author on the mailing list this week. So if you check our archives, like, 2 days ago, there's a thread with him already.
[00:56:27] Unknown:
That's great. I I I wonder if, you know, I perhaps concededly, I'm wondering if we had some small part in that because when we interviewed him a few weeks ago, he said he had done a shell or, you know, he had done some work to allow IPython to be used from within PTPython, but the 2 kernels were incompatible. And I I said to him at that point in time, well, you know, it sounds to me like maybe it might be worth talking to those guys because if you could add this capability to IPython, it would be a really incredible opportunity to to improve, you know, to raise the bar once again. That's great. I'm glad it's being thought about and potentially worked on. Now we're looking into it and partly because apparently there's there there seems to be really fundamental
[00:57:13] Unknown:
incompatibilities between Python 3.5 and the library that we need on Windows to support the the terminal called PyReadLine. Apparently, it doesn't work at all on Python 3.5. And, it apparently, the thing that this guy is using may work on Windows out of the box and it's pure Python dependency. So we may swap out we may swap out Pyreeline with his thing, and it may actually give us sort of better capabilities than ReadLine and kind of, and actually support a richer experience. ReadLine, even though it seems to be a simpler dependency because it apparently has shipped, it actually turns out to be a really thorny 1 because on on OSX, it's actually not shipped at fault because it is a it's a fake read line. You get to look at it, which is has its own set of problems. And on Windows, it doesn't even exist. And what we have is is this fake thing called PyredLine, which apparently on 3 5 doesn't even run because of changes that were made to Python. And it's not clear at all that will ever be fixed. And so we're we're we're actually looking seriously at the same.
That's great. I wouldn't know We don't we don't know what'll what'll happen, but but it is a possibility even for pretty serious technical reasons.
[00:58:21] Unknown:
Very cool. I wouldn't normally recommend that 1 set of guests go back and listen to an interview with another set of guests, but you might actually consider take giving a listen to our interview with Jonathan because he kind of explains how some of it works, and it really seems like a very elegantly well thought through implementation. It really honestly, talking with him really kind of impressed me. So I'm I'm glad to hear that you folks are looking at that. And and, honestly, it has moved me to go really browse some of his code because his implementation is very, very interesting. I think it can t it's it's nice in a way kind of a simple use case for asyncio, a very understandable use case. So it makes for some really interesting code reading.
[00:59:08] Unknown:
Martin here Martin here.
[00:59:10] Unknown:
Yeah. It's very cool having this connection between guests that we have interviewed and are interviewing. So just very cool to see how that comes about in such a broad and welcoming community as Python. So very cool. So what are some of the features of IPython and IPython Notebooks that an average user might not know about?
[00:59:34] Unknown:
So I'm gonna take a little bit of liberty and stretch the definition of the word average.
[00:59:41] Unknown:
Absolutely.
[00:59:42] Unknown:
And I think the the most important point that in this front is that the the architecture that all of these tools are built on is a very open flexible architecture and the the main building blocks of this, 1, is the notebook format. So the notebook documents are just a JSON data structure that is stored on the file system and there's a lot of flexible of flexibility around that notebook document. For example, it's really easy to read and write these notebook documents from just about any programming language you could imagine. So for example, you may have some system that's not the Jupyter notebook where there's content.
It would be, I mean, a few lines of code to import and export data from this other system to notebooks. Another example is the network protocol that we have for communicating with kernels where kernels are the separate process that we start that runs code in a particular language and there's a lot of people thinking about and working on various, I would say, instantiations of interactive computing systems and what we're seeing is that the the people who take the time to understand our architecture end up getting, like, superpowers in terms of what they can do in a really short period of time.
1 example of that is the Atom text editor that came out of GitHub and recently someone came up with an Adam plugin called hydrogen that can run code, that you have in the Adam text editor using any of the Jupyter kernels and the person who developed this took the time to understand our network protocol and how this worked and what they got from that is that their plug in could all of a sudden run code in any of the approximately 40 languages that we support and so with a very minimal amount of work compared to what it would take to get something talking to 40 different languages from scratch.
And so I I think the thing that I would encourage people is that, you know, if you're interested in in taking what we've done and using it in different ways, it will be really worth your time to to spend the time understanding the different building blocks that we have, the network architecture, and all of this And there's a lot of potential for reassembling these building blocks in new and interesting ways.
[01:02:39] Unknown:
So is there anything in particular that you would like to ask our listeners to help with?
[01:02:46] Unknown:
Yeah. I mean, obviously, first of all, as as every open source project is join us as kind of as developers. We're we're an open source project. There's lots of work to do on multiple areas. There's front end work to do in the notebook On the, on the UI, there's lots of JavaScript work to do. There's also work in the back end. There is, the the Brian already mentioned kind of the the document publishing pipeline, the the idea of writing notebooks and turning them into documents, into websites, into blog posts, into papers, into books. It's a complex pipeline, but it's a really interesting problem that has many different aspects and many targets. And and that can be actually a pretty fascinating 1 that, and that is somewhat isolated from other aspects of the project.
And so it it doesn't necessarily intersect with, with everything else. There's a piece of the project that that is somewhat technical but also well isolated, which we haven't talked too much about today, which is all of the parallel computing stuff, a little bit more technical, a little bit more on the scientific end, but also something where people can contribute. But an important and completely different 1 is that, as the community grows and as the project grows, 1 part that doesn't scale is us. We cannot be everywhere at once. We we cannot fly everywhere. We can't be in multiple locations. And so organizing meetups, organizing kind of community events in other places. And 1 thing that we do have a little bit of is resources. We can if if community if folks in the community want to organize a meetup and they want to basically run a small community event, we can probably send your way some resources, even some financial resources to support those events because we are getting some fun some, some funding that will allow us to support to the community in that fashion. And that is 1 way that actually helps us because it means we don't have to be the ones who do who do have to get on a plane to be everywhere.
And also be a little bit patient with us in the sense that we know that our documentation in some aspects of the project are not ideal. We are working hard, to improve them. We, we'll get there.
[01:05:00] Unknown:
So is there anything that we didn't ask you about or didn't talk about that you would like to bring up?
[01:05:06] Unknown:
Yeah. I think we we do wanna make sure that we that we credit, sort of 2 aspects of the project. 1 is our, the rest of our core developers. Even though Brian and I are here and have spoken today a lot, the project is not is doesn't exist of us. It really exists because of the rest of our core developers. We have we have a number of folks who work either full time or nearly full time on the project and who are absolutely fantastic. And they're really the ones who make all the magic happen. Jonathan Frederick, Kyle Kelly, Minragan Kelly, Thomas Kloiber, Matthias Busonnier, Jess Hamrick, Damian Avila, Sylvain Corley, Jason Grout, Ryan am I forgetting someone?
[01:05:52] Unknown:
Some of the folks at Google?
[01:05:54] Unknown:
Yes. Tester Tong, in particular, Kyra Patel. And I I may be forgetting, Paul Ivanoff, who is a little bit less active lately but was a very active core developer for a long time in the project. And also our institutional, sponsors, not, not just the the institutions that I happen to employ Brian and myself, Cal Poly and UC Berkeley slash LBL, but the fact that we have institutions who support our work by funding it, the Alfred P. Sloan Foundation, the Gordon and Betty Moore Foundation, Rackspace in particular has supported us enormously. Microsoft, Google, the Simons Foundation, Continuum Analytics, MThought.
If it wasn't, if it weren't for for these, for these entities, the project just wouldn't exist. Without their resources, it wouldn't exist. We also wanna give a shout out to Numfocus. Numfocus is a 5 0 1 c 3. It is a foundation that actually supports the project, in a slightly different manner. It's a foundation that, Brian is in the board of directors. I'm an ex member of the board of directors. I'm actually 1 of its founders. And it's kind of a parallel entity, if you will, to the Python Software Foundation, but that was created by the scientific computing community to really serve as a as an umbrella foundation for the open source scientific computing community to serve the needs of scientific projects that are that are open source open source project. And it is the project that hosts the IPython slash Jupyter project as a as a 501c3.
And, and Enfocus is our fiscal is our fiscal sponsor. It's our fiscal, it's our fiscal home. It's what allows us to receive to receive tax deductible tax deduct tax deductible donations. And, and, and that's a and it's a very important project in the in the scientific, in the opens in the open science community. It also hosts it's not just a Python project. It it hosts the Julia project. It hosts the r open science community. It hosts the software carpentry project and the data carpentry project. And without all of these people and and these projects, we wouldn't be anywhere.
[01:08:08] Unknown:
Yeah.
[01:08:10] Unknown:
That's great. It's it's really heartening to see the amount of community support and contribution that goes into a project like this and just the the amount of time and effort and dedication that everybody puts into it.
[01:08:24] Unknown:
That's great. At this point in time, we usually do the PIX. So, Tobias, why don't you get us started?
[01:08:31] Unknown:
Sure. So my first pick today is the Dayworlds trilogy by Philip Jose Farmer. It's a sci fi trilogy set in the future where population has grown to the point where it's not possible for everyone to be awake and active at the same time. So people are divided up into specific days of the week during which they're active and they're put into suspended animation during the rest of the week. And so houses are are outfitted with a basement with the suspended animation pods for the various families who live in that household during the different days of the week. And in this world, the most heinous crime is to be what they call a day breaker, which is somebody who is awake all days of the week.
And the protagonist of the story happens to be a day breaker, and it is just a very compelling and thought provoking exploration of the effects of population as it expands, and it's just very well written, very entertaining, highly recommend it. My next pick is a site called readruler.com, which is a site that will link up to your Pocket account and analyze all of the articles, and it will it it allows you to tag them with the length of time that it will take you to read them. And it actually has a, in the settings, you can go in. It has a reading sample so that you can calculate your reading speed so that it can more accurately tag the various articles. So I have a large number of articles in my pocket account that I have been meaning to get to, and I found this is a really interesting way of being able to pick up the low hanging fruit and see, okay, these are the very short articles without necessarily having to go through and look at them individually.
And I am going to leave it at that for today. So, Chris, please take it away.
[01:10:31] Unknown:
Cool. I'm gonna do commit a a little bit of heresy here On a Python podcast, I'm going to recommend a Ruby thing. I'm gonna recommend it because it is so good that I think that you can learn something about programming by watching these. It doesn't matter what programming language you're in. Yes. They're they're oriented towards Ruby, but the fundamental concepts that Avdi puts forth are really interesting, could be really could be useful to anyone, I think. There's a series of screencasts called Ruby Tapas. And what's amazing about them, I know everybody and their uncle has started producing paid screencasts these days, is they are the most perfect exemplar from my perspective of just little itsy bitsy, exactly what they say, tapas, little tidbits, little, like, 3 to 5 minute videos that show you 1 concept and only 1 concept, and and do it really clearly, really in a really straightforward way with a little bit of humor. They're just they're phenomenal. I I still subscribe to them even though I don't use Ruby much anymore.
I've kinda transitioned to Python in my day job. I still write Ruby with Chef, but I I still keep giving Avdi Man money because they're just that good. So there. The next pick that I have, we've mentioned her or rather I've mentioned her a bunch in the podcast, but 1 of our readers mentioned that I had not actually picked her work yet and so it hadn't been in our show notes. I just wanted to pick codenewbies.com. It's it's a site and associated gaggle of communities put together by Saran Yitbarak, we hope to have on our show at some point in the future. She is a a a, 1 woman dynamo, when it comes to helping people learn how to code and get their 1st job in the in in the technology field.
Really, it's it's an amazing resource, both obviously for people who are trying to learn how to code and even from my perspective, even if you're an old hand, it's a really great community. Her podcast is amazing. Her forums are great. You can help new people out and and probably learn something while you're doing it. It's, it's it's definitely worth if not looking into for your own sake, it's definitely worth being aware of just because it's amazing work that she's doing and deserves recognition. And, for my last pick, I'm going to pick a, a Twitter client for Max called tweetbot.
And it's 1 of those pieces of software that really just like hits the spot. The UI is beautiful. It is featureful. It does everything that I need it to do and more. It's very intuitive. The authors are very responsive. It works great on the Mac and on my iOS devices and syncs my tweets in between so that, you know, I can stop reading at work and or at home and then start reading on the subway and, it it keeps my place. It's just it's, you know, it's it's well worth its weight in gold, worth its price. I forget how much it is, but it's pretty cheap. So, yeah. That.
And I think that's about it for me this week. Brian, why don't you give us your picks?
[01:13:46] Unknown:
Yeah. So there's actually 2 books that I'm really enjoying right now. The first is a new book out by Joel Gruss. I think that's how you say his his last name. Apologies if not and in the book it's an O'Reilly book called Data Science from Scratch and part of what's fantastic about Python is that there's all of these libraries for doing data science such as Pandas, statsmodels, scikit learn and those libraries are great but I think they have a weakness in that they end up being very easy to use black boxes that don't necessarily help someone who's learning about data science to understand what the algorithms are actually doing.
And I think getting that deeper knowledge of what's actually going on is really important. And the the philosophy of data science from scratch is that he he walks through how to implement a lot of the core algorithms in data science from scratch without relying on scikit learn, for example, or pandas. And it's at a really nice level where someone who knows the basics of Python and NumPy and matplotlib is going to be able to pick this up and learn a lot about the internal details, and so I've I've been really enjoying that especially as I think about how to teach data science to undergrads. And then the second book is William Cleveland's Elements of Graphing Data.
I got this a few months ago and I've been spending time with it and it's just an extremely practical nice book about visualizations and plotting and it's it's a type of book that, you know, I spend 5 minutes with it and I feel like a lot of things that I had fuzzy thinking about in terms of visualization all of a sudden became crystal clear, like simple questions of should your should the tick marks on your axes be pointing inside or outwards? And, he says, well, you have data on the inside of the frame and if your tick marks point inwards, there's a good chance that the data is going to overlap the tick marks, making it difficult to see them so they should point out loops.
And obviously it's not a universal rule that you always must follow but it's a very practical way of thinking about choices like that that otherwise in the past I think I would have approached from a more of a visual aesthetic perspective and so it it's a wonderful book.
[01:16:39] Unknown:
Yeah. I I guess, kind of a little bit less pragmatic right now of a mindset. In terms of books, a book that has resonated a lot lately with me is Republic Lost from Lawrence Lessig. It's really a fantastic read, I think, for today's world from Lessig. Lessig kind of took a step back away from this. A lot of his work had been about code and the the culture of the Internet. And this is really a book about Congress and about the modern world of politics. And what I find fascinating about it is that it's it's a book about how, sort of how humans with good intentions and, and, fundamentally sort of trying in many ways trying to do the right thing, can can do, things that are so problematic to society.
And I think it's it's a book that has very enlightened lessons, I think to teach many of us, and and and asks really important questions about society. And the other 1 is actually an author more than a specific book from Colombia. I've I've heard a lot of, in the in in the geek world, because I'm I'm a Colombian. In the in the geek world, there are a lot of I hear a lot of people are, in the US, like, actually, a Latin American author called Jorge Luis Borges because a lot of his poetry and a lot of his short stories have a very kind of logical mathematical bent to it.
And, Columbia has an author that in my mind that it was very, was much less well known than Orjes, but who I would recommend, because there's echoes of Borges in in him, but, but with a little bit of a Garcia Marquez list. And his name is Alvaro Multis, and he has both short stories and poetry that, that has kind of some echoes of Borges with a lot of really crazy kind of character who has who is a wanderer traveler through the jungles of Colombia, who is a very interesting character called called the Macrolle Gabriel. And I would, I would highly recommend his novels to people who, to to people who have kind of a a love a love for Moriches' logical mindset, but at the same time, were interested in in a venture. So, again, Alvaro Maltis. He is available actually in English, but he's not very well known. And so I figured I would make, some, kind of slightly oddball picks.
[01:19:10] Unknown:
Great. Well, thank you very much, and thank you both for taking the time to come on the show with us today. So for anybody who is interested in following you and the things that you guys are doing, what would be the best way to do that?
[01:19:26] Unknown:
Twitter or email in my case?
[01:19:28] Unknown:
Yeah. Honestly, Twitter these days has become an invaluable way where we keep in touch with users, we hear about things,
[01:19:37] Unknown:
so that that's a great place to to go for that. Okay. So what Twitter handle should should users be following to to do that?
[01:19:46] Unknown:
So, the Twitter handle for Jupyter is at projectjupyter and the 1 for IPython is IPython dev And then, my Twitter handle is Ellison BG and Fernando Moses.
[01:20:08] Unknown:
At, fperez_org. Excellent.
[01:20:16] Unknown:
So, again, thank you very much for coming on the show. It has been a wonderful discussion, and I learned a whole lot about the IPython and Jupyter projects that I never knew before. And I definitely think I have a lot of things to read up about. So
[01:20:30] Unknown:
thank you very much. Me too. And I and I and I also just wanna really quickly, in addition to thanking Brian and Fernando, wanna thank Travis Oliphant for actually hooking us up with these 2 guys because the IPython notebook project is 1 of the first things that when Tobias and I were saying, hey, we should do a Python podcast, we both sort of batted around as, you know, in my opinion, this is 1 of the coolest things in the Python ecosystem. So we really wanted to talk to you from the very beginning. And, once again, thank you, Travis, for for for hooking us up.
[01:21:05] Unknown:
Thanks for having us. We really appreciate it. And, obviously, Travis is another 1 of our long standing friends, supporters, colleagues, and sort of heroes of our community. So thanks, Travis.
[01:21:16] Unknown:
Yeah. Thank thanks so much for having us on. It's been a pleasure.
Introduction and Host Details
Interview with Fernando Perez and Brian Granger
Guest Introductions
Physics and Computer Science Connection
Introduction to Python
Genesis of IPython
Project Jupyter
Multi-Language Support
Adoption in Data Science
Notable Projects Using IPython
Future of Project Jupyter
Community Involvement and Support
Acknowledgements
Picks and Recommendations