For show notes and other content, visit our site at http://www.pythonpodcast.com?utm_source=rss&utm_medium=rss
Brief Introduction
- Date of recording – Apr 28th 2015
- Hosts – Tobias Macey and Chris Patti
- Overview – Interview with Travis Oliphant
Interview with Travis Oliphant
- Introductions
- How did you get introduced to Python?
- I’m curious what inspired you to create NumPy and SciPy?
- Why did you choose Python for those libraries?
- Numeric, Jim Hugunin
- Morphology library in NumArray
- For those of us who aren’t in the know, can you provide a brief definition of what data science is and how you got involved in it?
- Term coined by DJ Patil
- Answer: Anybody who takes data and tries to derive insights from it
- Nobody really knows what this means
- Can you tell us the story of how Continuum Analytics came to be?
- What are some interesting projects that you have worked on with Continuum Analytics?
- Can you explain a bit about what NumFocus is and how it got started?
- How can our audience get involved with NumFocus?
- For someone just starting out in the data science and data analytics space, what advice would you give?
- Download Anaconda, learn as much Python as you can
- Google search “Data Analysis in Python”
- iPython Notebooks in data analysis
- R community
- Meetups
- Online classes
- R Community can be helpful
- Of your myriad achievements, what are you most proud of?
Picks
- Tobias
- Used bookstores
- Cloudy with a Chance of Meatballs
- Kickin’ it Old School
- Chris
- Travis Oliphant
- Data Carpentry
- Tracy Teal (@tracykteal)
- Patterned on Software Carpentry
- Brain Science Podcast – Ginger Campbell, MD
- Money, Bank Credit and Economic Cycles
- Data Carpentry
- Travis Contacts
- Twitter:
- Travis – @teoliphant
- NumFocus – @numfocus
- Continuum Analytics – @ContinuumIO
- Twitter:
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
[00:00:00]
Unknown:
Did you guys get started with this? What's the
[00:00:03] Unknown:
what's driving you? We both listen to a lot of podcasts, and, you know, I've been using Python for a while. Chris has been interested He's recently started using it, and there just haven't been any Python podcasts out in the wild for a while now because all the ones that had been around stopped producing any new content. So we decided, hey, why not us? Cool.
[00:00:25] Unknown:
Yeah. It was it was definitely 1 of those things where I feel like Python has such a thriving community. And, you know, I I listen to other podcasts for other stacks like the Ruby Rogues and things like that. And I just thought to myself, you know, it is it is a a crying shame that Python doesn't have something in this space to sort of, like, to to do outreach to people. Because there are a lot of people who while they're working out or driving or whatever the case may be, they listen to podcast to keep informed.
[00:01:07] Unknown:
Hello, and welcome to podcasts dot in it. Thank you for joining us. Today, we are recording on April 28, 2015. Your hosts, as usual, are Tobias Macy and Chris Patti. And tonight we're interviewing Travis Oliphant.
[00:01:23] Unknown:
Travis, why don't you introduce yourself?
[00:01:26] Unknown:
Oh, thanks. I appreciate being here, Tobias and Chris. It's a pleasure. My name is Travis Oliphant. I've been, working with Python since 1998. 98 about. Oh, for a long time. I have been a I was a scientist really or someone passionate about engineering and science and came across Python as a graduate student. Really loved it. Ended up, getting pulled in to and getting addicted to the open source community and the ability to change the world through, interactions with the community and have been a part of the scientific ecosystem ever since.
[00:02:00] Unknown:
So I'm curious what is priority to create NumPy and SciPy and why you chose Python for creating those libraries?
[00:02:08] Unknown:
Yeah. So, so SciPy was my real passion when I was a graduate student. I was studying electrical engineering and then biomedical imaging at the Mayo Clinic. And I had 5 dimensional derivatives I need to take, and I knew wanted to do it at a high level and not have to write c code all the time. I could write c. I could write, you know, Pascal, but I really like this expressivity of Python. I had an experience basically at the I started to use it, and then a year later, I came back to code I'd written previously and I could still read it, which is the opposite experience I'd had with Pearl 3 years earlier when I'd written some Pearl code to do some high level manipulation of scatter scatterometer data coming off the tape. And then I went back and tried to look at the code that I had written. I didn't understand it, my own code.
So that that kind of for me, I remember that moment when a year later, I looked about the same code. I said, I get this still. And I went, and and I have this really unusual feeling of of kind of loving to program, like, and just having fun with it. I don't know because of maybe it gave me it felt powerful. It felt like I could do things quickly and I could connect. People have expressed this in many ways. Like, it fit in your head. It didn't get in your way. It lets you it left room for you to think about your problem rather than the programming. I don't know exactly all the reasons, but for me, I was hooked. And and I and I got excited as a student to have to to be able to create more. So I I looked around and said, I wanna be able to do order numbers equation solving. I wanted to simulate the MRI an MRI machine, and I needed the the the fundamentals of MRI is the block equations, which simulate the magnetization vector. It's a fairly straightforward, simulation, but I needed an or an ordinary differential equation solver. Wasn't didn't exist in Python.
So I went out and found 1 on the Internet. It was written in Fortran, and I connected it to the Python interpreter by writing a c extension by hand, basically. I wrote a c extension to Python by hand to connect that Fortran code so I can call it from high level and specify the equations in Python but have this Fortran solver ultimately do it. That was sort of the start of SciPy. I ended up connecting integration solver and, some optimization libraries. I released them as Minpak in 1999, roughly. So I was just excited about the possibility to create this this this cool library that lets you do high level coding and programming but in a language that was open. And you could then share and other people would use it. So that was a start. I gotta get excited with SciPy.
Then as I as I graduated and I got a job as an assistant professor at Ringling University, there was some energy around the basically, in the community, there was a a a new array object being promoted called numarray. And in the when I had joined the Python ecosystem Numeric written by Jim Huguenen was, the array object that was being that I used and built sci fi around. So number a was this other array object that was kind of emerging, promoted by the Space Science Telescope Institute. It had some features, improved features they needed. And then the then 1 day, I saw, basically a a library came out. A I'd always wanted. It was a morphology library. And I was a medical imaging student and I really wanted a morphology library in sci fi, but it came out for Numarray.
And that at that point, I went, oh, we're building these 2 communities of different libraries on different array objects. And I I became frustrated. I was like, oh, this is not good. And so, that happened to coincide a few a few months or 2 later with a class that I was teaching canceled, and so I was left without a class. I had 4 months of no class and just my research to work on. And so couple that with this itch and this passion to merge these communities and the feeling that I've been I've I've been the only 1 kind of around long enough. I felt that would be able to do something about this. So I kinda felt an obligation, a duty, and then a desire to work on it.
[00:06:00] Unknown:
That's that's really great. You know, it's it's interesting. I worked for the Human Genome Project quite a number of years back at this point. And I I at that point in time, everyone there was sort of graduating on from this kinda crusty old pearl, and everyone was really excited about the potential for for using Python in that sphere. So based on the timeline that you're mentioning, you know, it seems like there were a whole bunch of things sort of coalescing and coming together to really sort of provide for kind of a Python explosion in the sciences that happened all around the same time. That's right. It's exactly right. There was. This is about 2006
[00:06:40] Unknown:
that I, wrote NumPy, which built out of numeric and pulled in the features of of numarray. So to me, that was a very important critical moment because it was a it's a challenging thing to do. I I I'm not a I was not trained as a developer, as a computer scientist, as a scientist. Cared about what people did with the code. I learned enough c and written and seen enough of the Python c if you gotta be dangerous. Became a core Python developer at the time to actually move and and and try to promote some of the Python, or the NumPy array structure into Python itself to avoid the any future problem of Eridium compatibility.
So a lot of work I did around that time was a real learning experience for me, to kind of understand, kind of what, software engineering and and architecture is like. I made mistakes, but but I know what I know now at the time I would've done things a bit differently. But, you know, you that's the great thing about the Python community is people, you know, they they they dive in where they can and they and they support what they can do and then move from there and iterate. So, yes, about that time so around that time also, you know, John Hunter had created Matplotlib. In 2001, I we put together scipy out of the early work I've done in 99.
I kind of organized a library of tools. So you started to get this critical mass, and then IPython interactive, environment was also emerging at the time. And after 2006, kinda creating NumPy really helped kinda solidify this is the array object, everyone agrees. I did a lot of work to try to, you know, just organize community and and and really by and the work was a lot of writing code, a lot of writing examples, a lot of writing documentation, a lot of writing emails to try to encourage everybody to use it. And I remember when John Hunter removed his support for number, right, in numeric. It was just NumPy support in map. And that was about 2007, and that was a big deal. That and then sort of everybody followed suit, and then there's the explosion occurred of a lot of things came together as you said. It was a and definitely 1 of the great things about the Python ecosystem is how many people have participated in making it great. And the scientific computing system is no different.
Although it does take individuals taking risks, it was not good for my academic career to spend time on NumPy. Nobody none of my people folks who would, who would eventually vote on my tenure application cared about NumPy. But I felt like it was the right thing to do and the and the important thing to do for the world. So
[00:09:01] Unknown:
We're certainly glad you did.
[00:09:03] Unknown:
We're definitely all happy that you did that because it has certainly become the basis of a large amount of scientific and numerical code in Python, and even people who aren't necessarily doing any hard science or incredibly complicated maths have definitely found some uses for it.
[00:09:21] Unknown:
Yeah. And what's in you you look at and then, you know, Penn is a similar story. Right? You've talked to Wes. You'll hear, you know, a data frame story and a lot of other folks are using data frames, but he wrote 1 and got his his good employer to let him open source what he had worked on. And now it's become the standard for people doing data analysis. So, you know, the store and, you know, John Hunter before him with with map.lib and and Fernando Perez and Brian Granger on IPython. You just see, you know, the scikit learn team is another 1 that just exploded since 2009. Like, it's just fantastic to see all this, but it does take effort. I guess that's the thing I wanna definitely emphasize. Somebody has to have the courage to do and to act even in the face of uncertainty.
And but once you do that, then and not everything you do also succeeds. Right? Sometimes you write something that's it's the wrong direction, but if you're listening open to feedback, you can usually get it. NumPy, everybody wanted it. Everybody was excited about it. They were unsure it could be done, but once I started making progress and people could see we were I was gonna make it, then I got a lot of support.
[00:10:23] Unknown:
I'll bet. So for those who want in the know, can you provide a brief description of what data science is and how you got involved in it? Data science, man. That's a term that was coined by DJ Patil.
[00:10:34] Unknown:
And, so it's being used as a as a coalescing point, for a general concept of anybody that takes data and tries to get it from and tries to get insight from it, it kind of becomes this most people who are data scientists do a little bit of applied mathematics, a little bit of system administration, a little bit of coding, programming. They kinda put it all together and have to kinda little do a little bit of all 3. So, you know, data science is a popular term, but I think a lot of people don't really know what it means or or what it is. If I look back when I was doing graduate school, I was taking scaturometry data from satellites and estimating wind speed. And that was data science to a degree, but so science has been doing data science for a long time.
But currently today, it's kind of it's it's more popular because now the business professionals are doing it too. The marketer the marketing folks doing it too. The people look at their logs are saying, oh, we gotta get information from this, know what to sell to people. So it's got more money around it now and so people use it. The cool thing is that, you know, these tools we use for the scientists like NumPy have now this it's application to a much larger group of people. And Python, because it's so accessible to people besides just scientists and programmers, it's accessible to even business analysts with the right tools around it.
Now they're in now now you this whole ecosystem is now made available to a lot of more people. So that's exciting.
[00:11:57] Unknown:
Absolutely. And, you know, talking about the the I have 2 things to say. The first is it's interesting. You talk about, you know, data science kind of becoming this almost sort of like supercharged term that the marketers have picked up. It's funny how many of those there are. It seems like they've really become more and more common in recent years, things like Cloud or DevOps and it's like Yep. You have this core group of people like yourself who built this thing that does a thing
[00:12:24] Unknown:
you know, and it it fills a niche and suddenly, like, a term gets applied to it and then it kinda takes on a life of its own and takes off. So Right. That's that's, you know And, you know, for data science, pandas has been really helpful to kinda orient. Pandas built on top of NumPy and orients it towards a particular audience that needs like, NumPy is a multidimensional array. Right? And, you know, I'm very excited about the multidimensional aspect of it. I've been doing, you know, 5 dimensional derivative calculations when grad school. That's why it was in my head at the time. But Pandas is just it's a 2 dimensional table.
It's very it's a very simple structure, and you could do it in NumPy, but Pandas added an API on top and a couple of operations. A few operations are very simple, and it made it more accessible. And that's kind of been the I think Python generally makes it easy to create objects and structures that make coding and calculations accessible to others. And there's still a rich opportunity in a lot of spaces to do that. That's what's exciting. It's it's not done, not by any means. You know, I've I've always been passionate about creating technology solutions that let even more get built around it. You know, kind of looking at the fundamental blockers that are causing disconnects and then try to resolve some of those so that we can get get a cooperative explosion on top of additional ideas.
So that's still available right now. In Python, I it's still the right language to do it in. It's still the the it's got its challenges. It's, you know, there's lots of things that I'd love to see different in Python, but it's still it's got a critical mass, it's accessible, people can understand it, and and it's it's still a great language for all of that.
[00:13:57] Unknown:
Absolutely. And I and I really think sort of dovetailing what you were just saying in the point you made previously about these tools having wider, you know, and broader and broader applicability. I've been seeing in the last few years, I work in the infrastructure as code space. I do a lot of work with, you know, things like Chef and Ansible and SaltStack and and all that kind of thing. And it's been really interesting watching people who build sort of care build care and feed for infrastructure sort of really start to leverage some of these amazing tools for, you know, data science in terms of, visualization and because it's really easy to end up with these huge masses of data that it can become very ungainly. And all of a sudden, when you can apply these, you know, hey, scientists have been dealing with this stuff for, you know, longer than we've been alive, and we now have these amazing computational tools that make it easier than falling off a log. It's it's really cool to see what happens when people from all kinds of different, you know, disciplines, fields, and and industries start to leverage this stuff and and only goodness can result.
[00:15:07] Unknown:
I agree. Python has allowed a lot of people to cooperate in building interesting tools that otherwise wouldn't talk to each other. That's a very interesting aspect of Python. Yeah. It definitely seems to have become sort of the,
[00:15:19] Unknown:
lingua franca of anybody who's trying to do integrations between a large variety of problem domains because of the, you know, myriad different libraries that are available for it and the power and flexibility of those libraries.
[00:15:35] Unknown:
Yes. Python got its got its strength as glue. I think we've just started to see how superglued that will be. It's sort of, you know, a glue that becomes like Legos.
[00:15:48] Unknown:
It's funny that you mentioned Legos because when you're talking about leveraging these core components that can get built out in more general ways, 1 of the architects where I work, he likes to use that term. But the truth is it's a really good term, especially for things like this that, you know, people just combine in really interesting ways. And you look at a little kid with a pile of Legos, and it's gonna be kind of amazing what they come up with sometimes.
[00:16:10] Unknown:
Yes. Yes. Yeah. To do just like with Legos, sometimes like, 1 thing about being Pythonic. Right? I've I've heard that word for a lot of years. 1 angle of being Pythonic is, having the right structure in your Legos. Right? They can connect to each other. Right? Right. Right? You wanna make sure you can build on top and and have different layers of abstractions. So, yeah, that's, it's powerful. So I'm really excited still to be a part of the Python community and, I know that's, it's it's been a a tremendous ride and I'm still pretty passionate about it.
[00:16:47] Unknown:
Great. Great. So can you tell us some of the story of how Continuum Analytics came to be and sort of your position with that company? Yeah. So I started in academia
[00:16:57] Unknown:
and really love open source. Realized that but I also love markets. I I I my mind, open source is too valuable to the world to be left to just volunteerism, and so I wanna figure out and support any efforts in the marketplace to connect what we buy to what to what gets built in open source as well. So I I left Acone about 7 years ago, went to the industry and started working as a consultant. And in the process saw a lot of opportunities that for, you know, open source was solving problems that businesses needed. And so and over that time, built up some ideas particularly, saw an opportunity in bringing these tools to the data analytics world. And so, Peter and I created a continuum to connect the the scientific codes that we are very aware of and have been a part of to the larger data analysis problems, in particular to helping experts everywhere, build tools faster and easier.
Our our motto is to connect expertise to data, our mission, our our our goal. A lot of people believe that there's gonna be a magic I don't know if they believe it or they hope it. I mean, I would love it too if there were some magic predictor that could take your data and just give you results. You have that here's the day input data input and then output comes exactly what I need to do next, you know, action statements. We're a long way from being there. I think in some cases, you can get close, but, generally, what's needed is the ability for people who understand the domain to quickly put together solutions they can iterate with quickly because the data changes, the problems change, and you come to an answer soon and quick and as fast as possible. And Python's a deal for that. So our our mission is to use Python and build on Python to make it easy to be able to build solutions that includes dashboards and visualizations and and, even to full applications very, very quickly from the large data they have.
And so that that was that was the idea around continuum and so we we wanted to support open source. Our you know, 1 of our metrics of success is how many open source projects we're contributing to and and and releasing, because that's 1 of our reasons for existing is to give, to keep building this community that we that I find so compelling in the world.
[00:19:07] Unknown:
Well, I think by that metric, you you guys can be considered to be a raging success. I mean, I'm not really as as as well versed in in this area as my cohost here is, and Tobias was filling me in on I'd I'd heard of of Numba and NumPy, but I had no idea that you folks have a a whole little, you know, mini universe of projects that you folks have have brought into the into the community. It's it's it's kind of kind of impressive and really cool. I appreciate that. I mean, they they are we are we do try to focus a little bit around visualization.
[00:19:38] Unknown:
Voci is our visualization tool, and the goal there is to bring to make you not have to write JavaScript. You can write Python and then have interactive visualization in the browser. And then, you know, Blaze, I'm really excited about Blaze. Blaze is a topic for another podcast, though. It's all about helping people write expressions at a high level and translate into the back end where the data sits. It solves the problem of currently today when people have data problems. The first question they ask is where is my data because that defines how they talk about their data problems. If it's in a database, they write a SQL statement. If it's in CSV files, write some parsing code. If it's in HDFS, then they either do do spark or they write some hive or they they do something different depending on where it is. We don't think it should be that way. We think you should should have an understanding of your data as a as a high level object, data frame, or array, and then write your code. And then the back end, it's where it sits is a is a implementation detail and you can map your expression to wherever it sits.
So much like SQLAlchemy does that for data and databases, Blaze does this for data everywhere.
[00:20:42] Unknown:
That's that's really neat. So when you say so when you I I like your your analogy of of of likening it to, to sequel alchemy. I guess I guess a question that I have is what I I guess what I I what I really need to do is after the show, go go look it up and and get some more detail. But can you give me, like, a a potential use case? Like, what is a real world problem that someone might have that Blaze would elegantly solve?
[00:21:12] Unknown:
There's a couple that it elegantly solves today. I mean, 1 is I wanna translate data 1 form to another. A lot of people don't try out the tools that are available, don't know the difference between a Cassandra or a Hive or a Spark or a or a SQLite implementation or using b calls and and PI tables. You know, they typically get get stuck with the current data format they have and there's and they're and they don't realize they could be having huge performance gains, they use a slightly different back end and maybe adjust the approach they're taking. And Blaze makes it easy to try them. You can take your data. You can write a high level expression, use your query, and then, then you can quickly port it. Say, okay. I'm doing this with SQLite or I'm doing it in my table in in Oracle. I'm doing it with a bunch of CSV files. What if I just convert all this to HDF 5 files, which is a scientific data format, and then ran my query. Could I could I be faster?
And it lets you test that out very, very quickly. You know, within, you know, a few lines of code, you're right the back end is switched and your same query runs on that on that new set. So it lets you it lets you switch back ends very quickly and migrate data quickly. There's a sub product in the Blaze ecosystem called Odo. Odo does the the extract, you know, the the copying of data. Just as it's almost like you're saying copy, but you don't have to do all the work. You just say, here's my, URL to this to the CSV file collection in this directory, and then here's my URL to a bunch of tables in Hive. And it'll copy them. It'll just do the transforms for you. And then you can run a query on it. And a query is a table expression. So if you're used to the Pandas API, you can write a Pandas API like an expression and now it runs on the data in Hive. You don't have to learn Spark. You don't have to learn, MapReduce. You don't have to learn a new system. You can just use the same high level interface.
[00:22:53] Unknown:
That is super cool. And that's definitely a problem that a lot of people in a lot of fields are having, you know, like, where they realize, gee, our performance stinks and, you know, we're using a SQL database for this. But really, really, the way this data is being used isn't relational at all. Let's go explore a NoSQL, but that's not something as trivial as saying, hey. Change the back end out. Like, there's a lot of work involved there. And so to be able to sort of, like,
[00:23:20] Unknown:
toggle a toggle and try something else, that's that's potentially life changing for people. Yeah. That's really amazing. It Definitely. Our goal. And we and and in many cases, we we reach it. Right? There I'm not gonna say it's it's, like, mapping to a different dataset. If you know a larger dataset, you have transfer it. You have to actually you do have to connect it to a different you know, pulling it all out of the database. It's a simple 1 liner, but it might take 3 hours. Right? There it's a transfer if your dataset's really big. But the idea is you can separate operational efforts from the coding involved and the mindset of the person. Because a lot of times, just finding those people that can do all that is really hard. Absolutely.
[00:23:58] Unknown:
Definitely just really reduces the amount of friction that and inertia that your data has. Yes. Exactly.
[00:24:07] Unknown:
Exactly. We kinda have this mindset of, you know, we're trying to reduce silos. You know, a lot of data reduce silos. People don't want silos. They wanna connect data from multiple sources. And, to get insights quickly, you need to be able to write ideas at a high level. Our big goal with Blaze is to turn the Internet into its own into a single database. Right? Even there is gonna take some effort. But but that the idea is why every data anywhere, any URL should essentially be like a table in a universal database. And you can access it from a pandas like API in Python.
[00:24:41] Unknown:
That's that's definitely really cool. I wonder if you might say a few words about, Okari because I really 1 of the things that really struck me when I came to the Python community was IPython notebook. I mean, it just this it it kinda I don't know if you you remember, I think you're old just by the sound of your voice, you sound like you might be old enough to remember this. There was a cartoon years years years ago with Tennessee Tuxedo and, mister Whoopi's 3 d b b, just like magic blackboard that he could use to sketch out anything in the animations, explain how things worked in the whole 9 yards. And IPython Notebook really kind of reminded me of that and the idea of being able to sort of, like, even add an additional dimension of that in in the realm of science and data analysis and experimentation
[00:25:27] Unknown:
seems like it fits the the metaphor even even more fully. Yeah. It's it's an amazing phenomenon. I've really it's been awesome to watch it, kind of emerge. And Fernando Perez and Brian Granger are both good friends of mine. We've we've been in this we've been in this together. We often, talk about how we're kind of old guard here in this in this fight for 18 years. But I'm IPython actually started with Jenko Houser or something called IPP that Fernando grabbed and used as the interface. He's the the key, superpower of Fernando is he's constantly trying to figure out how to improve the user experience. And Brian Granger joined him with a similar, desire.
And, there's a guy named William Stein who actually built an early version of a JavaScript front end to the Python kernel and it's currently existing in SageMathLab. So he has this whole Sage interface which has been it's it's really geared towards the the, mathematician to the, kind of replacement for Mathematica. But he's, you know, he's a amazing guy and he since adopted IPython notebook as well, but he had that early thing in 2007 and that inspired Fernando and Brian to go, let's build this. But they they architected around a kernel. You have a running kernel then an interface that could be swapped out. But that interface has cut captured the attention of a lot of people. People recognize that it's a way to communicate information quickly.
At Continuum, we've constantly been looking at what are the bottlenecks to really enable collaboration and shared, understanding across a large number of because to me, that's the secret to getting insight from data. You're not gonna get insight from data because you just happen to get get lucky. It really is about taking what you know and interacting with the data and interviewing exploring and talking to somebody else about it, and you want that process, the whole workflow to be as seamless as possible. So we knew about IPython Notebooks, so have been very interested in kind of empowering people to use that more effectively and working with the IPython team to help them. So Akari really was a was a was a notion of the notebook's great, but it's not enough.
You have to have the not only just the the interface, you have to have your environment, your code environment. Yep. All the dependencies that are needed to run that workflow have to be available and installed for you. The data that you that that work with depends on also has to be available and installed for you, and you and you want quickly available. So, you know, we started working with with those those ideas in mind and and came up with kind of an initial cloud based solution, and we also have an on premise solution that we we we install. We're still iterating. We have a lot of things we're improving with that basic idea.
Mostly right now, we're we're we're resource constrained. We're looking to hire people because we don't have enough people to help us support the the the success we're having in some of these initiatives. So that's been a constraint for us. It's just getting the right people who can help. But yeah. So Akari is all about, again, collaboration, helping people leverage the IPython notebook. So I've you know, that that phenomenon is is is is a real 1, and a lot of people have seen that this, it can change their approach. I it can for some people, it's replacing Excel workflows, Excel workbook, flows. They'll instead of having a bunch of Excel sheets, they'll use an IPython notebook to express their work.
Super excited about the the future potential of it.
[00:28:31] Unknown:
Great. And I have 1 1 last, last question in the realm of projects that you folks have come out with. Numba is 1 of the things that I had actually learned about before I even actively started using Python just because it made such a splash in the sort of general computing news. The idea of a, you know, JIT compiled hyper optimized Python for a numeric computation is is really, really super cool. And you talk about wide applications, I mean, that's being used in everything from, obviously, data science all the way out to games and, you know, all kinds of areas.
[00:29:07] Unknown:
That really, it's quite an achievement. So what led you folks to to build it, and what are some of the challenges, and and where do you see it going? It's a lot of fun. What led to it, honestly, was my desire when I wrote NumPy, the first thing I wrote actually and remember I talked about SciPy starting with bunch of modules I wrote? 1 of the first ones I wrote in 1998 was something called SIFIs, which is a bunch of special functions. Things like airy function and Bessel functions and a whole host of these scientific functions that nobody cares about unless you're in physics, but then show up in various ways. I wanted all these available to Python users and to do it so NumPy has this thing called the universal function, but to build a universal function you had to write c code. And I always wanted to be able to say kind of have a Python expression of the function and then create a virtual a a NumPy new func just writing Python code. So I've always wanted that, and I you can never do it because, frankly, it needs a compiler. You need to be able to a compiler that can take Python code and produce machine code to do that. So kinda with that in mind, I I came across the LLVM library, and it really helps.
It's a great it's a great library system. There's there's issues in terms of compatibility with back versions and so forth, but but what it allowed is that I saw a lot of companies like Apple and NVIDIA using LLVM as a common compiler framework and realized there's an opportunity to use that to make the process compiling simpler. I did a compiler course after I wrote the first version of Numba. Fortunately, Numba's had 3 reversions since then with a larger team and people with more compiler expertise than I had. But I I was just crazy enough to think that I could do something once I had LLVM.
Like I joke about, a compiler is easy to write if you don't have to write the parser or the code generator. And with Python, I don't have to write the parser because I get bytecode out of Python and I don't have to write the code generator because l v m does it for me. So it's truly just translating bytecode to lvm intermediate representation. There's still a lot of challenges and most of the challenges are semantic and definitional in terms of what are we really doing because, you know, taking arbitrary Python code and making it faster is a really, really hard problem. You can't do it in general.
Right? What you have to do, you can have subsets of code and particularly, like, the kind of code that uses NumPy arrays and scalars and just if statements and so forth. You can make that fast. And there's no reason to write Fortran or c if that's the kind of code you're writing. So it was recognizing that and see what can we do. Let's make progress in here. It was inspired a little bit by a conversation with the Pew Pew community. Some people think there's a conflict. There isn't a conflict. There's just different ways of looking at the world. I actually see a way to forward to working with those folks now that maybe in a later point I could talk about. I'm pretty excited about it, actually. You know, I wrote a blog post because they PyPI was exciting to a lot of people. Wow. Future of Python. PyPI. That's awesome. But they were unaware or of the the the deep roots of the numeric Python scientific computing ecosystem and how connected to c code and Fortran code and all the c extensions that it required.
Like, to really move that community over to PyPI would take an enormous effort, and they they seemed unaware of that challenge. And so my approach say, look. We'll start doing the other way. Well, you know, maybe we'll meet in the middle somewhere. Right? We'll start with take the NumPy sci fi community and start adding JIT compilation and then add those features in that direction. So it's a different approach, same problem. It's all possible, but, again, how much effort it's gonna take? You know, with the right 1, 000, 000 of dollars, we can do everything. But, you know, how do we do this in a community in a way that we can meet in the middle? And I think there's actually some really powerful solutions in the that that could be accomplished if we work together.
[00:32:50] Unknown:
I'll bet. It's it's really interesting that you say that. It it really occurs to me that so often in technology, we get this, like, you know, I'm gonna do it this way. I'm gonna do it that way. And it's like, whether it is that way or not, people assume that it's like, you know, pistols at 20 paces, when in reality, it's, you know, it's just a different approach to the problem and and and it it it benefits no 1 to have this kind of bizarre everyone is in constant competition. I mean, healthy competition sort of like, you know, I can do better than that is is good, but, you know, not collaborating for the sake of partisanship makes no sense. I I came from many years in in Ruby, I learned Python, and and all of a sudden I'm realizing you guys have some really cool toys in this side of the pool. I mean,
[00:33:36] Unknown:
it just makes sense. There's no reason to to block yourself off from from what other people are doing. It may not be what you would choose. It may not even be the right answer for you, but at least be aware of it and be open to it. Totally agree. So actually right now, I'm I I see a lot of, in our future is integrating with with the Java Stack and with the R community and with like, it's really you know, a lot of people are trying to solve the same problems and how do we do it in a way that's cooperative so we can share each other's successes. That's that to me would be the dream. It's it's difficult. There's some real challenges there, but there are also good there are solutions at times, especially if you're looking for them.
And very, very transformative solutions if you if you keep searching. So that drives me. I'm excited about that. I like to I guess maybe all computer programs are lazy, some people say. We all wanna just take advantage of other people's work. And that's certainly I mean, SciPy was a making available old Fortran code, you know, bring it back to life and connect it to the modern user. And that was a big part of what made sci fi, and it's still a big part of sci fi. And that's you gotta keep doing that. You know? Let's let's instead of having to reinvent the wheel, let's figure out how to connect with each other.
[00:34:49] Unknown:
So switching gears a bit. Can you explain a bit about what Numfocus is and how that got started? Oh, yeah. I'd love to, actually. So Numfocus when I worked as a consultant,
[00:34:58] Unknown:
I, I recognized the the challenges of being in a company and then also supporting community. Because I do I'm passionate about community, but I also am passionate about markets and I and I have to feed my family, I have to have a job. And sometimes the the pressures are they cause people to be a little suspicious too. Like, our company is doing this. What does it mean? And so I wanted to create an organization that was fully community run and and and funded and and and supported. So the if people had concerns you know, if you you wanna support companies. I think some companies are doing great things. Go buy their stuff, help them, make them successful. But there are other people who wanna just support the community. So I wanted to make sure there was a place where Sandeep Python could be community supported. And people could understand if they had any concerns, they could they could just focus on 1 in the community side. Then it also becomes a place that people can have different ideas in the marketplace. You know? My company, your company, we have different ideas. We can cooperate together through an organization. So same time, if we found a continuum, I also found a numb focus and got together with leading, you know, with, Brenda Perez and John Hunter and Perry Greenfield and Jared and Anthony John Hunter and Perry Greenfield and Jared and Anthony Scopats. And we created NumFOCUS as a community centric organization, much like the PSF, much like the Apache Apache Foundation, much like the Linux Software Foundation.
Continuum basically hired an executive director and gave her full time to work on Unfocus. That's kind of our commitment to the community, and that's been a successful approach. She's been able to really rally a community around PyData, around the John Hunter Technical Fellowship, around women in science technology events, and really help that grow into a, what should and what we and and the fiscal sponsorships for SymPy and for IPython and for several other projects. And then, you know, eventually, we're we're finally getting fiscal sponsorship for NumPy itself. 1 of the challenges is as I step back away from NumPy, you leave a little bit of a vacuum and and then kind of there's a few people. There's like a committee of people who really make it work and keep running the NumPy and Cypher ecosystems, but helping them get a fiscal sponsorship together so we can fund them. So the mission of NumFOCUS is to fund the tools everybody uses and to be a rallying point for raising money to help these tools keep going.
So it's a fully support if it's a 501c3 nonprofit, you can jump in and participate as a as a member and donate. You can, use your time. You can participate as a in 1 of the Pi Data events. There's a lot of ways to participate and become a supporter of the community.
[00:37:28] Unknown:
That's that's really very cool. I think I think that so much how should I say this? I look at so many technology stacks like like, Linux as an example, Linux on the desktop. And I think to myself, you know, it is such a shame that these folks can't come together and just agree on some standards, some interfaces. Like like, you know, you can have a completely different way of doing things, but being able to to come together and support the infrastructure that you both use and figure out a way to not inconvenience and fragment your user base, that's definitely a wonderful thing. It sounds like that's part of what NumFocus, in addition to offering physical support,
[00:38:14] Unknown:
is trying to do. Is that is that right? Oh, absolutely. Yeah. It's it's definitely a place where conversations can take place that between projects, where people can know there's a there's an interesting body of folks who care about the overall the overall experience and wanna make sure that scikit learn and IPython and NumPy and SciPy are all kind of talking together as even though they're independent projects, that they have standards interfaces they agree on. A lot of that energy sort of happened in fact, you know, I, Fernando Perez, when we we organized NumFOCUS, he had the thought of, you know, confederation of of federate groups coming together and having a place in a in a common organization. He used references to the founding of the country, actually, the United States of America. Oh, wow. Different states, and they only need to come together and there'd be a group. And so, you know, there that's been a part of it. It's figuring out how do we we open source definitely has a community and a and a individual spirit and it needs to, like, retain that, but there does it helps to have organizations that are supportive of that, but a place for people to come together and and, take action together.
And there's it's great to be involved, you know, to have a legal entity that can take money, a legal entity that can employ people, a legal entity that can support if if something you know, patents and trademark ownership and all those things that are important in to associate with the laws of a of a particular country, but it can represent the community. There are challenges associated with it. You know, volunteer run is is hard. That's why I've been grateful that we can continue to support an executive you know, Leah Silan is the executive director and she's a a driving force in the community just as an actor. We can continually move things along. Other companies are starting to give resources as well. So it's exciting to see.
So it's really starting to take off and I'm really pleased about, the response to community to organize it. So just helping sustain itself. Again, my passion has always been about I love open source. I love to see it work. I love to see it sustained. You know, I I have 6 kids. Right? And I have 3 kids in undergraduate school. So from the very beginning of my involvement with open source, I've been aware of the need for me to provide for my family. And so I've had to figure out how do we continue to do this and still put shoes on the kid's feet and put them through now it's put them through college. And so that's that's been a part of what I you know, I wanted that for everybody. You know, how do we how do we make this this work? And sometimes it's through community and sometimes it's through companies. I think both participate and work together to make it happen.
[00:40:35] Unknown:
Absolutely. So for someone just starting out in the data science and, pardon me, analytics space, what advice would you give?
[00:40:43] Unknown:
Yeah. I would say learn Python as much as you can. I would say, download Anaconda. I would say go and, do a do a Google search on data analysis top in Python. Learn learn learning basic data analysis in Python. And there's enormous number of resources on, IPython Notebooks that'll just teach you and you can walk through. I would say go to, find a community, find a group, find a local meetup group, or find a local, you know, either Pydata or a data science group or a even the R community can be helpful. You might go there and find a few like minded interested folks. You can take an online class, but get involved and have it and then have something you care about. Like, if you wanna be doing data science, it means you wanna you wanna transform information into a form to answer a question.
So so think what do you wanna do? Find a problem you're interested in and do the analysis and just and and use that to guide your studies.
[00:41:36] Unknown:
That's great. I think that'll definitely be some very useful advice to a lot of people particularly because of data analysis and data science being such big big industries now. And, you know, the I've been reading a lot of things about how lots of companies are hiring for it. They don't have enough people to fill the positions that they are trying to fill, and so it's definitely a very lucrative direction for people to go in whether they're just starting off in technology or just getting out of college or if they've been in the industry for years years and are just looking for something new to do.
[00:42:09] Unknown:
It is. 1 reason Python's an excellent choice is and don't don't be afraid to learn some programming. I think a lot of people find that the the the the folks want data scientists, but they want data scientists who can program. Yep. We put together a solution. We had a lot of applicants for data science roles. Our needs like, the number of needs where we want someone just to be analyzing data are small. Where we want people to be able to take the problem somebody's trying to solve, again, because the tools right now are not where they could be and will be in 10 years. Today, there's still a you gotta do some integration.
You gotta do some gluing together with legacy solutions with legacy data, and that's gonna take some some programming. Python can do it all for you. So just learn Python and become good at it and don't be afraid to steer around. Maybe you're just interested in Pandas and NumPy, but Don't be afraid to learn a little bit about, you know, the URL, you know, request library or the other libraries and, you know, a parsing library, a scraping library. Don't be afraid to learn a little more and expand your knowledge.
[00:43:08] Unknown:
So of your myriad achievements and projects that you've been involved with over your life, what are you most proud of? That's a tough question. I mean, honestly,
[00:43:19] Unknown:
honestly, it's my kids who I'm most proud of. That's a tough thing to say to a dad of 6 kids. He's oldest as 20 and youngest as 7. That tells you how old I am. So beside my family, I would say jury's still out a bit. I'm certainly proud of what NumPy has become. Certainly, you know, and and realign and mostly because the effort it took to to create it and it was definitely a, it was 1 of those situations where I did not know the end from the beginning, had to, you know, leap of faith, feel this urge, feel this past, feel this need, and take a step in the dark and without a lot of support and then but but have the confidence. And so, you know, and then have it emerge and and be a a real success. Definitely definitely proud of that. But you know, there's other things I'd like to see more proud of in the future. So hoping to replicate that in in other projects.
[00:44:09] Unknown:
That's great. So at the end of our episodes, we like to provide listeners with some pics. And so this can be anything that you find interesting enough to wanna share with other people. So it could be technology, it could be a movie, it could be a board game, whatever it happens to be. So we'll get it started. So my first pick is going to be used bookstores. They are amazing. There's 1 actually down in East Lyme, Connecticut called the Book Barn, and I took my kids and my wife there recently, and we walked out with a giant box of books for about a $100, which would otherwise probably have cost us about 5 times that much. So used bookstores are amazing for getting a lot of really good and interesting material for reading, whether it's fiction or nonfiction or kids' books. Just great thing to go out and do. Good way to spend the day. My next choice is going to be the movie cloudy with a chance of meatballs.
The book is a kid's classic, has been for years. The movie came out a couple years ago, and it is hilarious. My 4 year old loves watching it. He can watch it repeatedly, and I've seen it a few times, and I still think it's 1 of the funniest kids' movies I've ever seen. And then going on that theme for funny movies, another really great 1 is Kickin' It Old School with Jamie Kennedy. And I've watched that movie probably 8 or 9 times, and I still love watching it. It is 1 of the funniest movies I've ever seen. So for anybody who has even a tangential experience with the eighties, it's well worth a watch. Mhmm.
[00:45:48] Unknown:
Yeah. Awesome.
[00:45:50] Unknown:
Chris, why don't you go ahead? Alrighty.
[00:45:53] Unknown:
So this week, the first thing I'd like to pick to to sort of continue along Tobias' theme of comedians is kids in the hall. For anybody who went to college around the same time I did in the sort of late eighties early nineties, these guys are in a Canadian comedy troop, and they are just so funny, so bizarre, really great stuff. I mean, they've gone on to do lots of other great things. You probably know, if not them, then some of the work they've done in other venues, but, they still make me laugh, you know, out loud 20 years later. So that's that says something, I think.
The next pick that I have is the Museum of Fine Arts here in Boston. Every year they do this really cool event called Art in Bloom, and it's really something kind of neat. Like, every year they they they call in floral designers, oddly enough, to make floral designs and pair them with pieces of art all around the museum. So, for the over the course of a weekend, you get to wander around the Museum of Fine Arts and see all these really kind of amazing, creative, cool floral designs paired with great art, you know, and and the Museum of Fine Arts has some really timeless pieces. So it's just a really great experience. It's it's sort of like a great, you know, a great evening or day out, and and I highly recommend it.
Then my next pick is, Saran Bark and the the the code newbies community. Talking about, you know, bringing people in and and enabling people to do good work. You know, so many there's been so much discussion out in our field right now about sort of, like, diversity and being welcoming to newcomers. And she, more than anybody else that I can think of, has really sort of like walked the walk. She's created a whole little empire of communities, sub communities, I guess. They do she does a weekly Twitter chat, they have a Slack team channel room, whatever it's called.
They have a discourse forum and it's all oriented towards helping people get started with programming. And once they do sort of, like, get their foot in the door and get that get that first job or get that dream job in software development. And just sort of like and also, she does a podcast that's totally amazing and whether you're new or old or have been coding for 20 years like I have, it's really neat and and worth a listen. So kudos to her and all the work she does. And my last pick, because I've been blathering on long enough, is the Apple 27 inch, Retina Imac 5 k, which I'm sitting in front of as we speak.
I realized that this is gonna out me as a total Apple fanboy and that's okay. I love this machine. It is fast. The display is just as gorgeous as you might think. It's beautifully engineered, well put together, and just a really a real pleasure to set up, use. It has been a delight since I've gotten it, and I can't recommend it highly enough for anybody who needs a machine with, you know, a fair bit of horsepower under the hood, but doesn't necessarily wanna go through the pain of, you know, building their own PC or or supercharged PC or something like that. It's a really impressive machine.
Travis, why don't you why don't you give us your picks?
[00:49:20] Unknown:
Awesome. Wow. Okay. Okay. I'll I'll start with Data Carpentry. Data Carpentry is, basically, an organization run by Tracy Teal. Just getting off the ground kind of after the pattern of software carpentry, which was very successful at training scientists how to program. Data carpentry is about training people into various industries how to deal with data and how to how to how to manipulate it, how to use it. So it's it's, they're just getting off the ground, but check them out, datacarpentry.org. Tracy is an amazing, participant in the Python community. She was at pie PyCon this year with her family and was, has been a longtime supporter of Python.
So that's 1. Second 1 is the, I would say the Brain Science podcast by Ginger Campbell, MD. She was a emergency room physician who who decided that she really loved science and wanted to go back to her roots and start a science podcast. I love it. It's an old 1. It's been out for a while, but you can still go there and get a lot of great book ideas about how the brain works, and I've just really enjoyed listening to her. And finally, a little bit maybe different, but, a favorite book of mine is, Money, Bank, Credit, and Economic Cycles. It's a bit of a big book. It's pretty thick. It's not light reading. It's definitely for someone who's serious about trying to understand.
I I I feel like I understand the world much better. I understand, financial systems and banking much better because I've read this book. It's it takes you through the history of money from the Roman Roman days to today and kind of all and and how it works at a fundamental level. So really appreciate that. He's out of Spain. He's a Spanish professor, at the University of Ray car Juan Carlos Madrid. Really appreciate that book. So that's that's it.
[00:50:58] Unknown:
Great. Well, we really appreciate you taking the time out of your evening to speak with us. It's been a lot of fun, really interesting. So for anybody who wants to follow you and continue analytics and the work you guys are doing, what would be the best way to find you and keep track of your Yeah. You can follow me on Twitter. I'm at teolephant,
[00:51:19] Unknown:
t o l I p h a n t. You can follow PyData, you know, at pydata, p y d a t a. You can come to our website and follow at continuum IO. Twitter is easy. Facebook, we post too occasionally. You can come to our website. We're we're at every piloted event. So look for piloted events in in an area near you, and that's a way to kind of follow both the company and the community.
[00:51:40] Unknown:
Great. Great.
[00:51:42] Unknown:
Great. Alright. Well, it's been a real pleasure. I appreciate everything, you're doing. Thanks for inviting me. Thank you for coming.
Did you guys get started with this? What's the
[00:00:03] Unknown:
what's driving you? We both listen to a lot of podcasts, and, you know, I've been using Python for a while. Chris has been interested He's recently started using it, and there just haven't been any Python podcasts out in the wild for a while now because all the ones that had been around stopped producing any new content. So we decided, hey, why not us? Cool.
[00:00:25] Unknown:
Yeah. It was it was definitely 1 of those things where I feel like Python has such a thriving community. And, you know, I I listen to other podcasts for other stacks like the Ruby Rogues and things like that. And I just thought to myself, you know, it is it is a a crying shame that Python doesn't have something in this space to sort of, like, to to do outreach to people. Because there are a lot of people who while they're working out or driving or whatever the case may be, they listen to podcast to keep informed.
[00:01:07] Unknown:
Hello, and welcome to podcasts dot in it. Thank you for joining us. Today, we are recording on April 28, 2015. Your hosts, as usual, are Tobias Macy and Chris Patti. And tonight we're interviewing Travis Oliphant.
[00:01:23] Unknown:
Travis, why don't you introduce yourself?
[00:01:26] Unknown:
Oh, thanks. I appreciate being here, Tobias and Chris. It's a pleasure. My name is Travis Oliphant. I've been, working with Python since 1998. 98 about. Oh, for a long time. I have been a I was a scientist really or someone passionate about engineering and science and came across Python as a graduate student. Really loved it. Ended up, getting pulled in to and getting addicted to the open source community and the ability to change the world through, interactions with the community and have been a part of the scientific ecosystem ever since.
[00:02:00] Unknown:
So I'm curious what is priority to create NumPy and SciPy and why you chose Python for creating those libraries?
[00:02:08] Unknown:
Yeah. So, so SciPy was my real passion when I was a graduate student. I was studying electrical engineering and then biomedical imaging at the Mayo Clinic. And I had 5 dimensional derivatives I need to take, and I knew wanted to do it at a high level and not have to write c code all the time. I could write c. I could write, you know, Pascal, but I really like this expressivity of Python. I had an experience basically at the I started to use it, and then a year later, I came back to code I'd written previously and I could still read it, which is the opposite experience I'd had with Pearl 3 years earlier when I'd written some Pearl code to do some high level manipulation of scatter scatterometer data coming off the tape. And then I went back and tried to look at the code that I had written. I didn't understand it, my own code.
So that that kind of for me, I remember that moment when a year later, I looked about the same code. I said, I get this still. And I went, and and I have this really unusual feeling of of kind of loving to program, like, and just having fun with it. I don't know because of maybe it gave me it felt powerful. It felt like I could do things quickly and I could connect. People have expressed this in many ways. Like, it fit in your head. It didn't get in your way. It lets you it left room for you to think about your problem rather than the programming. I don't know exactly all the reasons, but for me, I was hooked. And and I and I got excited as a student to have to to be able to create more. So I I looked around and said, I wanna be able to do order numbers equation solving. I wanted to simulate the MRI an MRI machine, and I needed the the the fundamentals of MRI is the block equations, which simulate the magnetization vector. It's a fairly straightforward, simulation, but I needed an or an ordinary differential equation solver. Wasn't didn't exist in Python.
So I went out and found 1 on the Internet. It was written in Fortran, and I connected it to the Python interpreter by writing a c extension by hand, basically. I wrote a c extension to Python by hand to connect that Fortran code so I can call it from high level and specify the equations in Python but have this Fortran solver ultimately do it. That was sort of the start of SciPy. I ended up connecting integration solver and, some optimization libraries. I released them as Minpak in 1999, roughly. So I was just excited about the possibility to create this this this cool library that lets you do high level coding and programming but in a language that was open. And you could then share and other people would use it. So that was a start. I gotta get excited with SciPy.
Then as I as I graduated and I got a job as an assistant professor at Ringling University, there was some energy around the basically, in the community, there was a a a new array object being promoted called numarray. And in the when I had joined the Python ecosystem Numeric written by Jim Huguenen was, the array object that was being that I used and built sci fi around. So number a was this other array object that was kind of emerging, promoted by the Space Science Telescope Institute. It had some features, improved features they needed. And then the then 1 day, I saw, basically a a library came out. A I'd always wanted. It was a morphology library. And I was a medical imaging student and I really wanted a morphology library in sci fi, but it came out for Numarray.
And that at that point, I went, oh, we're building these 2 communities of different libraries on different array objects. And I I became frustrated. I was like, oh, this is not good. And so, that happened to coincide a few a few months or 2 later with a class that I was teaching canceled, and so I was left without a class. I had 4 months of no class and just my research to work on. And so couple that with this itch and this passion to merge these communities and the feeling that I've been I've I've been the only 1 kind of around long enough. I felt that would be able to do something about this. So I kinda felt an obligation, a duty, and then a desire to work on it.
[00:06:00] Unknown:
That's that's really great. You know, it's it's interesting. I worked for the Human Genome Project quite a number of years back at this point. And I I at that point in time, everyone there was sort of graduating on from this kinda crusty old pearl, and everyone was really excited about the potential for for using Python in that sphere. So based on the timeline that you're mentioning, you know, it seems like there were a whole bunch of things sort of coalescing and coming together to really sort of provide for kind of a Python explosion in the sciences that happened all around the same time. That's right. It's exactly right. There was. This is about 2006
[00:06:40] Unknown:
that I, wrote NumPy, which built out of numeric and pulled in the features of of numarray. So to me, that was a very important critical moment because it was a it's a challenging thing to do. I I I'm not a I was not trained as a developer, as a computer scientist, as a scientist. Cared about what people did with the code. I learned enough c and written and seen enough of the Python c if you gotta be dangerous. Became a core Python developer at the time to actually move and and and try to promote some of the Python, or the NumPy array structure into Python itself to avoid the any future problem of Eridium compatibility.
So a lot of work I did around that time was a real learning experience for me, to kind of understand, kind of what, software engineering and and architecture is like. I made mistakes, but but I know what I know now at the time I would've done things a bit differently. But, you know, you that's the great thing about the Python community is people, you know, they they they dive in where they can and they and they support what they can do and then move from there and iterate. So, yes, about that time so around that time also, you know, John Hunter had created Matplotlib. In 2001, I we put together scipy out of the early work I've done in 99.
I kind of organized a library of tools. So you started to get this critical mass, and then IPython interactive, environment was also emerging at the time. And after 2006, kinda creating NumPy really helped kinda solidify this is the array object, everyone agrees. I did a lot of work to try to, you know, just organize community and and and really by and the work was a lot of writing code, a lot of writing examples, a lot of writing documentation, a lot of writing emails to try to encourage everybody to use it. And I remember when John Hunter removed his support for number, right, in numeric. It was just NumPy support in map. And that was about 2007, and that was a big deal. That and then sort of everybody followed suit, and then there's the explosion occurred of a lot of things came together as you said. It was a and definitely 1 of the great things about the Python ecosystem is how many people have participated in making it great. And the scientific computing system is no different.
Although it does take individuals taking risks, it was not good for my academic career to spend time on NumPy. Nobody none of my people folks who would, who would eventually vote on my tenure application cared about NumPy. But I felt like it was the right thing to do and the and the important thing to do for the world. So
[00:09:01] Unknown:
We're certainly glad you did.
[00:09:03] Unknown:
We're definitely all happy that you did that because it has certainly become the basis of a large amount of scientific and numerical code in Python, and even people who aren't necessarily doing any hard science or incredibly complicated maths have definitely found some uses for it.
[00:09:21] Unknown:
Yeah. And what's in you you look at and then, you know, Penn is a similar story. Right? You've talked to Wes. You'll hear, you know, a data frame story and a lot of other folks are using data frames, but he wrote 1 and got his his good employer to let him open source what he had worked on. And now it's become the standard for people doing data analysis. So, you know, the store and, you know, John Hunter before him with with map.lib and and Fernando Perez and Brian Granger on IPython. You just see, you know, the scikit learn team is another 1 that just exploded since 2009. Like, it's just fantastic to see all this, but it does take effort. I guess that's the thing I wanna definitely emphasize. Somebody has to have the courage to do and to act even in the face of uncertainty.
And but once you do that, then and not everything you do also succeeds. Right? Sometimes you write something that's it's the wrong direction, but if you're listening open to feedback, you can usually get it. NumPy, everybody wanted it. Everybody was excited about it. They were unsure it could be done, but once I started making progress and people could see we were I was gonna make it, then I got a lot of support.
[00:10:23] Unknown:
I'll bet. So for those who want in the know, can you provide a brief description of what data science is and how you got involved in it? Data science, man. That's a term that was coined by DJ Patil.
[00:10:34] Unknown:
And, so it's being used as a as a coalescing point, for a general concept of anybody that takes data and tries to get it from and tries to get insight from it, it kind of becomes this most people who are data scientists do a little bit of applied mathematics, a little bit of system administration, a little bit of coding, programming. They kinda put it all together and have to kinda little do a little bit of all 3. So, you know, data science is a popular term, but I think a lot of people don't really know what it means or or what it is. If I look back when I was doing graduate school, I was taking scaturometry data from satellites and estimating wind speed. And that was data science to a degree, but so science has been doing data science for a long time.
But currently today, it's kind of it's it's more popular because now the business professionals are doing it too. The marketer the marketing folks doing it too. The people look at their logs are saying, oh, we gotta get information from this, know what to sell to people. So it's got more money around it now and so people use it. The cool thing is that, you know, these tools we use for the scientists like NumPy have now this it's application to a much larger group of people. And Python, because it's so accessible to people besides just scientists and programmers, it's accessible to even business analysts with the right tools around it.
Now they're in now now you this whole ecosystem is now made available to a lot of more people. So that's exciting.
[00:11:57] Unknown:
Absolutely. And, you know, talking about the the I have 2 things to say. The first is it's interesting. You talk about, you know, data science kind of becoming this almost sort of like supercharged term that the marketers have picked up. It's funny how many of those there are. It seems like they've really become more and more common in recent years, things like Cloud or DevOps and it's like Yep. You have this core group of people like yourself who built this thing that does a thing
[00:12:24] Unknown:
you know, and it it fills a niche and suddenly, like, a term gets applied to it and then it kinda takes on a life of its own and takes off. So Right. That's that's, you know And, you know, for data science, pandas has been really helpful to kinda orient. Pandas built on top of NumPy and orients it towards a particular audience that needs like, NumPy is a multidimensional array. Right? And, you know, I'm very excited about the multidimensional aspect of it. I've been doing, you know, 5 dimensional derivative calculations when grad school. That's why it was in my head at the time. But Pandas is just it's a 2 dimensional table.
It's very it's a very simple structure, and you could do it in NumPy, but Pandas added an API on top and a couple of operations. A few operations are very simple, and it made it more accessible. And that's kind of been the I think Python generally makes it easy to create objects and structures that make coding and calculations accessible to others. And there's still a rich opportunity in a lot of spaces to do that. That's what's exciting. It's it's not done, not by any means. You know, I've I've always been passionate about creating technology solutions that let even more get built around it. You know, kind of looking at the fundamental blockers that are causing disconnects and then try to resolve some of those so that we can get get a cooperative explosion on top of additional ideas.
So that's still available right now. In Python, I it's still the right language to do it in. It's still the the it's got its challenges. It's, you know, there's lots of things that I'd love to see different in Python, but it's still it's got a critical mass, it's accessible, people can understand it, and and it's it's still a great language for all of that.
[00:13:57] Unknown:
Absolutely. And I and I really think sort of dovetailing what you were just saying in the point you made previously about these tools having wider, you know, and broader and broader applicability. I've been seeing in the last few years, I work in the infrastructure as code space. I do a lot of work with, you know, things like Chef and Ansible and SaltStack and and all that kind of thing. And it's been really interesting watching people who build sort of care build care and feed for infrastructure sort of really start to leverage some of these amazing tools for, you know, data science in terms of, visualization and because it's really easy to end up with these huge masses of data that it can become very ungainly. And all of a sudden, when you can apply these, you know, hey, scientists have been dealing with this stuff for, you know, longer than we've been alive, and we now have these amazing computational tools that make it easier than falling off a log. It's it's really cool to see what happens when people from all kinds of different, you know, disciplines, fields, and and industries start to leverage this stuff and and only goodness can result.
[00:15:07] Unknown:
I agree. Python has allowed a lot of people to cooperate in building interesting tools that otherwise wouldn't talk to each other. That's a very interesting aspect of Python. Yeah. It definitely seems to have become sort of the,
[00:15:19] Unknown:
lingua franca of anybody who's trying to do integrations between a large variety of problem domains because of the, you know, myriad different libraries that are available for it and the power and flexibility of those libraries.
[00:15:35] Unknown:
Yes. Python got its got its strength as glue. I think we've just started to see how superglued that will be. It's sort of, you know, a glue that becomes like Legos.
[00:15:48] Unknown:
It's funny that you mentioned Legos because when you're talking about leveraging these core components that can get built out in more general ways, 1 of the architects where I work, he likes to use that term. But the truth is it's a really good term, especially for things like this that, you know, people just combine in really interesting ways. And you look at a little kid with a pile of Legos, and it's gonna be kind of amazing what they come up with sometimes.
[00:16:10] Unknown:
Yes. Yes. Yeah. To do just like with Legos, sometimes like, 1 thing about being Pythonic. Right? I've I've heard that word for a lot of years. 1 angle of being Pythonic is, having the right structure in your Legos. Right? They can connect to each other. Right? Right. Right? You wanna make sure you can build on top and and have different layers of abstractions. So, yeah, that's, it's powerful. So I'm really excited still to be a part of the Python community and, I know that's, it's it's been a a tremendous ride and I'm still pretty passionate about it.
[00:16:47] Unknown:
Great. Great. So can you tell us some of the story of how Continuum Analytics came to be and sort of your position with that company? Yeah. So I started in academia
[00:16:57] Unknown:
and really love open source. Realized that but I also love markets. I I I my mind, open source is too valuable to the world to be left to just volunteerism, and so I wanna figure out and support any efforts in the marketplace to connect what we buy to what to what gets built in open source as well. So I I left Acone about 7 years ago, went to the industry and started working as a consultant. And in the process saw a lot of opportunities that for, you know, open source was solving problems that businesses needed. And so and over that time, built up some ideas particularly, saw an opportunity in bringing these tools to the data analytics world. And so, Peter and I created a continuum to connect the the scientific codes that we are very aware of and have been a part of to the larger data analysis problems, in particular to helping experts everywhere, build tools faster and easier.
Our our motto is to connect expertise to data, our mission, our our our goal. A lot of people believe that there's gonna be a magic I don't know if they believe it or they hope it. I mean, I would love it too if there were some magic predictor that could take your data and just give you results. You have that here's the day input data input and then output comes exactly what I need to do next, you know, action statements. We're a long way from being there. I think in some cases, you can get close, but, generally, what's needed is the ability for people who understand the domain to quickly put together solutions they can iterate with quickly because the data changes, the problems change, and you come to an answer soon and quick and as fast as possible. And Python's a deal for that. So our our mission is to use Python and build on Python to make it easy to be able to build solutions that includes dashboards and visualizations and and, even to full applications very, very quickly from the large data they have.
And so that that was that was the idea around continuum and so we we wanted to support open source. Our you know, 1 of our metrics of success is how many open source projects we're contributing to and and and releasing, because that's 1 of our reasons for existing is to give, to keep building this community that we that I find so compelling in the world.
[00:19:07] Unknown:
Well, I think by that metric, you you guys can be considered to be a raging success. I mean, I'm not really as as as well versed in in this area as my cohost here is, and Tobias was filling me in on I'd I'd heard of of Numba and NumPy, but I had no idea that you folks have a a whole little, you know, mini universe of projects that you folks have have brought into the into the community. It's it's it's kind of kind of impressive and really cool. I appreciate that. I mean, they they are we are we do try to focus a little bit around visualization.
[00:19:38] Unknown:
Voci is our visualization tool, and the goal there is to bring to make you not have to write JavaScript. You can write Python and then have interactive visualization in the browser. And then, you know, Blaze, I'm really excited about Blaze. Blaze is a topic for another podcast, though. It's all about helping people write expressions at a high level and translate into the back end where the data sits. It solves the problem of currently today when people have data problems. The first question they ask is where is my data because that defines how they talk about their data problems. If it's in a database, they write a SQL statement. If it's in CSV files, write some parsing code. If it's in HDFS, then they either do do spark or they write some hive or they they do something different depending on where it is. We don't think it should be that way. We think you should should have an understanding of your data as a as a high level object, data frame, or array, and then write your code. And then the back end, it's where it sits is a is a implementation detail and you can map your expression to wherever it sits.
So much like SQLAlchemy does that for data and databases, Blaze does this for data everywhere.
[00:20:42] Unknown:
That's that's really neat. So when you say so when you I I like your your analogy of of of likening it to, to sequel alchemy. I guess I guess a question that I have is what I I guess what I I what I really need to do is after the show, go go look it up and and get some more detail. But can you give me, like, a a potential use case? Like, what is a real world problem that someone might have that Blaze would elegantly solve?
[00:21:12] Unknown:
There's a couple that it elegantly solves today. I mean, 1 is I wanna translate data 1 form to another. A lot of people don't try out the tools that are available, don't know the difference between a Cassandra or a Hive or a Spark or a or a SQLite implementation or using b calls and and PI tables. You know, they typically get get stuck with the current data format they have and there's and they're and they don't realize they could be having huge performance gains, they use a slightly different back end and maybe adjust the approach they're taking. And Blaze makes it easy to try them. You can take your data. You can write a high level expression, use your query, and then, then you can quickly port it. Say, okay. I'm doing this with SQLite or I'm doing it in my table in in Oracle. I'm doing it with a bunch of CSV files. What if I just convert all this to HDF 5 files, which is a scientific data format, and then ran my query. Could I could I be faster?
And it lets you test that out very, very quickly. You know, within, you know, a few lines of code, you're right the back end is switched and your same query runs on that on that new set. So it lets you it lets you switch back ends very quickly and migrate data quickly. There's a sub product in the Blaze ecosystem called Odo. Odo does the the extract, you know, the the copying of data. Just as it's almost like you're saying copy, but you don't have to do all the work. You just say, here's my, URL to this to the CSV file collection in this directory, and then here's my URL to a bunch of tables in Hive. And it'll copy them. It'll just do the transforms for you. And then you can run a query on it. And a query is a table expression. So if you're used to the Pandas API, you can write a Pandas API like an expression and now it runs on the data in Hive. You don't have to learn Spark. You don't have to learn, MapReduce. You don't have to learn a new system. You can just use the same high level interface.
[00:22:53] Unknown:
That is super cool. And that's definitely a problem that a lot of people in a lot of fields are having, you know, like, where they realize, gee, our performance stinks and, you know, we're using a SQL database for this. But really, really, the way this data is being used isn't relational at all. Let's go explore a NoSQL, but that's not something as trivial as saying, hey. Change the back end out. Like, there's a lot of work involved there. And so to be able to sort of, like,
[00:23:20] Unknown:
toggle a toggle and try something else, that's that's potentially life changing for people. Yeah. That's really amazing. It Definitely. Our goal. And we and and in many cases, we we reach it. Right? There I'm not gonna say it's it's, like, mapping to a different dataset. If you know a larger dataset, you have transfer it. You have to actually you do have to connect it to a different you know, pulling it all out of the database. It's a simple 1 liner, but it might take 3 hours. Right? There it's a transfer if your dataset's really big. But the idea is you can separate operational efforts from the coding involved and the mindset of the person. Because a lot of times, just finding those people that can do all that is really hard. Absolutely.
[00:23:58] Unknown:
Definitely just really reduces the amount of friction that and inertia that your data has. Yes. Exactly.
[00:24:07] Unknown:
Exactly. We kinda have this mindset of, you know, we're trying to reduce silos. You know, a lot of data reduce silos. People don't want silos. They wanna connect data from multiple sources. And, to get insights quickly, you need to be able to write ideas at a high level. Our big goal with Blaze is to turn the Internet into its own into a single database. Right? Even there is gonna take some effort. But but that the idea is why every data anywhere, any URL should essentially be like a table in a universal database. And you can access it from a pandas like API in Python.
[00:24:41] Unknown:
That's that's definitely really cool. I wonder if you might say a few words about, Okari because I really 1 of the things that really struck me when I came to the Python community was IPython notebook. I mean, it just this it it kinda I don't know if you you remember, I think you're old just by the sound of your voice, you sound like you might be old enough to remember this. There was a cartoon years years years ago with Tennessee Tuxedo and, mister Whoopi's 3 d b b, just like magic blackboard that he could use to sketch out anything in the animations, explain how things worked in the whole 9 yards. And IPython Notebook really kind of reminded me of that and the idea of being able to sort of, like, even add an additional dimension of that in in the realm of science and data analysis and experimentation
[00:25:27] Unknown:
seems like it fits the the metaphor even even more fully. Yeah. It's it's an amazing phenomenon. I've really it's been awesome to watch it, kind of emerge. And Fernando Perez and Brian Granger are both good friends of mine. We've we've been in this we've been in this together. We often, talk about how we're kind of old guard here in this in this fight for 18 years. But I'm IPython actually started with Jenko Houser or something called IPP that Fernando grabbed and used as the interface. He's the the key, superpower of Fernando is he's constantly trying to figure out how to improve the user experience. And Brian Granger joined him with a similar, desire.
And, there's a guy named William Stein who actually built an early version of a JavaScript front end to the Python kernel and it's currently existing in SageMathLab. So he has this whole Sage interface which has been it's it's really geared towards the the, mathematician to the, kind of replacement for Mathematica. But he's, you know, he's a amazing guy and he since adopted IPython notebook as well, but he had that early thing in 2007 and that inspired Fernando and Brian to go, let's build this. But they they architected around a kernel. You have a running kernel then an interface that could be swapped out. But that interface has cut captured the attention of a lot of people. People recognize that it's a way to communicate information quickly.
At Continuum, we've constantly been looking at what are the bottlenecks to really enable collaboration and shared, understanding across a large number of because to me, that's the secret to getting insight from data. You're not gonna get insight from data because you just happen to get get lucky. It really is about taking what you know and interacting with the data and interviewing exploring and talking to somebody else about it, and you want that process, the whole workflow to be as seamless as possible. So we knew about IPython Notebooks, so have been very interested in kind of empowering people to use that more effectively and working with the IPython team to help them. So Akari really was a was a was a notion of the notebook's great, but it's not enough.
You have to have the not only just the the interface, you have to have your environment, your code environment. Yep. All the dependencies that are needed to run that workflow have to be available and installed for you. The data that you that that work with depends on also has to be available and installed for you, and you and you want quickly available. So, you know, we started working with with those those ideas in mind and and came up with kind of an initial cloud based solution, and we also have an on premise solution that we we we install. We're still iterating. We have a lot of things we're improving with that basic idea.
Mostly right now, we're we're we're resource constrained. We're looking to hire people because we don't have enough people to help us support the the the success we're having in some of these initiatives. So that's been a constraint for us. It's just getting the right people who can help. But yeah. So Akari is all about, again, collaboration, helping people leverage the IPython notebook. So I've you know, that that phenomenon is is is is a real 1, and a lot of people have seen that this, it can change their approach. I it can for some people, it's replacing Excel workflows, Excel workbook, flows. They'll instead of having a bunch of Excel sheets, they'll use an IPython notebook to express their work.
Super excited about the the future potential of it.
[00:28:31] Unknown:
Great. And I have 1 1 last, last question in the realm of projects that you folks have come out with. Numba is 1 of the things that I had actually learned about before I even actively started using Python just because it made such a splash in the sort of general computing news. The idea of a, you know, JIT compiled hyper optimized Python for a numeric computation is is really, really super cool. And you talk about wide applications, I mean, that's being used in everything from, obviously, data science all the way out to games and, you know, all kinds of areas.
[00:29:07] Unknown:
That really, it's quite an achievement. So what led you folks to to build it, and what are some of the challenges, and and where do you see it going? It's a lot of fun. What led to it, honestly, was my desire when I wrote NumPy, the first thing I wrote actually and remember I talked about SciPy starting with bunch of modules I wrote? 1 of the first ones I wrote in 1998 was something called SIFIs, which is a bunch of special functions. Things like airy function and Bessel functions and a whole host of these scientific functions that nobody cares about unless you're in physics, but then show up in various ways. I wanted all these available to Python users and to do it so NumPy has this thing called the universal function, but to build a universal function you had to write c code. And I always wanted to be able to say kind of have a Python expression of the function and then create a virtual a a NumPy new func just writing Python code. So I've always wanted that, and I you can never do it because, frankly, it needs a compiler. You need to be able to a compiler that can take Python code and produce machine code to do that. So kinda with that in mind, I I came across the LLVM library, and it really helps.
It's a great it's a great library system. There's there's issues in terms of compatibility with back versions and so forth, but but what it allowed is that I saw a lot of companies like Apple and NVIDIA using LLVM as a common compiler framework and realized there's an opportunity to use that to make the process compiling simpler. I did a compiler course after I wrote the first version of Numba. Fortunately, Numba's had 3 reversions since then with a larger team and people with more compiler expertise than I had. But I I was just crazy enough to think that I could do something once I had LLVM.
Like I joke about, a compiler is easy to write if you don't have to write the parser or the code generator. And with Python, I don't have to write the parser because I get bytecode out of Python and I don't have to write the code generator because l v m does it for me. So it's truly just translating bytecode to lvm intermediate representation. There's still a lot of challenges and most of the challenges are semantic and definitional in terms of what are we really doing because, you know, taking arbitrary Python code and making it faster is a really, really hard problem. You can't do it in general.
Right? What you have to do, you can have subsets of code and particularly, like, the kind of code that uses NumPy arrays and scalars and just if statements and so forth. You can make that fast. And there's no reason to write Fortran or c if that's the kind of code you're writing. So it was recognizing that and see what can we do. Let's make progress in here. It was inspired a little bit by a conversation with the Pew Pew community. Some people think there's a conflict. There isn't a conflict. There's just different ways of looking at the world. I actually see a way to forward to working with those folks now that maybe in a later point I could talk about. I'm pretty excited about it, actually. You know, I wrote a blog post because they PyPI was exciting to a lot of people. Wow. Future of Python. PyPI. That's awesome. But they were unaware or of the the the deep roots of the numeric Python scientific computing ecosystem and how connected to c code and Fortran code and all the c extensions that it required.
Like, to really move that community over to PyPI would take an enormous effort, and they they seemed unaware of that challenge. And so my approach say, look. We'll start doing the other way. Well, you know, maybe we'll meet in the middle somewhere. Right? We'll start with take the NumPy sci fi community and start adding JIT compilation and then add those features in that direction. So it's a different approach, same problem. It's all possible, but, again, how much effort it's gonna take? You know, with the right 1, 000, 000 of dollars, we can do everything. But, you know, how do we do this in a community in a way that we can meet in the middle? And I think there's actually some really powerful solutions in the that that could be accomplished if we work together.
[00:32:50] Unknown:
I'll bet. It's it's really interesting that you say that. It it really occurs to me that so often in technology, we get this, like, you know, I'm gonna do it this way. I'm gonna do it that way. And it's like, whether it is that way or not, people assume that it's like, you know, pistols at 20 paces, when in reality, it's, you know, it's just a different approach to the problem and and and it it it benefits no 1 to have this kind of bizarre everyone is in constant competition. I mean, healthy competition sort of like, you know, I can do better than that is is good, but, you know, not collaborating for the sake of partisanship makes no sense. I I came from many years in in Ruby, I learned Python, and and all of a sudden I'm realizing you guys have some really cool toys in this side of the pool. I mean,
[00:33:36] Unknown:
it just makes sense. There's no reason to to block yourself off from from what other people are doing. It may not be what you would choose. It may not even be the right answer for you, but at least be aware of it and be open to it. Totally agree. So actually right now, I'm I I see a lot of, in our future is integrating with with the Java Stack and with the R community and with like, it's really you know, a lot of people are trying to solve the same problems and how do we do it in a way that's cooperative so we can share each other's successes. That's that to me would be the dream. It's it's difficult. There's some real challenges there, but there are also good there are solutions at times, especially if you're looking for them.
And very, very transformative solutions if you if you keep searching. So that drives me. I'm excited about that. I like to I guess maybe all computer programs are lazy, some people say. We all wanna just take advantage of other people's work. And that's certainly I mean, SciPy was a making available old Fortran code, you know, bring it back to life and connect it to the modern user. And that was a big part of what made sci fi, and it's still a big part of sci fi. And that's you gotta keep doing that. You know? Let's let's instead of having to reinvent the wheel, let's figure out how to connect with each other.
[00:34:49] Unknown:
So switching gears a bit. Can you explain a bit about what Numfocus is and how that got started? Oh, yeah. I'd love to, actually. So Numfocus when I worked as a consultant,
[00:34:58] Unknown:
I, I recognized the the challenges of being in a company and then also supporting community. Because I do I'm passionate about community, but I also am passionate about markets and I and I have to feed my family, I have to have a job. And sometimes the the pressures are they cause people to be a little suspicious too. Like, our company is doing this. What does it mean? And so I wanted to create an organization that was fully community run and and and funded and and and supported. So the if people had concerns you know, if you you wanna support companies. I think some companies are doing great things. Go buy their stuff, help them, make them successful. But there are other people who wanna just support the community. So I wanted to make sure there was a place where Sandeep Python could be community supported. And people could understand if they had any concerns, they could they could just focus on 1 in the community side. Then it also becomes a place that people can have different ideas in the marketplace. You know? My company, your company, we have different ideas. We can cooperate together through an organization. So same time, if we found a continuum, I also found a numb focus and got together with leading, you know, with, Brenda Perez and John Hunter and Perry Greenfield and Jared and Anthony John Hunter and Perry Greenfield and Jared and Anthony Scopats. And we created NumFOCUS as a community centric organization, much like the PSF, much like the Apache Apache Foundation, much like the Linux Software Foundation.
Continuum basically hired an executive director and gave her full time to work on Unfocus. That's kind of our commitment to the community, and that's been a successful approach. She's been able to really rally a community around PyData, around the John Hunter Technical Fellowship, around women in science technology events, and really help that grow into a, what should and what we and and the fiscal sponsorships for SymPy and for IPython and for several other projects. And then, you know, eventually, we're we're finally getting fiscal sponsorship for NumPy itself. 1 of the challenges is as I step back away from NumPy, you leave a little bit of a vacuum and and then kind of there's a few people. There's like a committee of people who really make it work and keep running the NumPy and Cypher ecosystems, but helping them get a fiscal sponsorship together so we can fund them. So the mission of NumFOCUS is to fund the tools everybody uses and to be a rallying point for raising money to help these tools keep going.
So it's a fully support if it's a 501c3 nonprofit, you can jump in and participate as a as a member and donate. You can, use your time. You can participate as a in 1 of the Pi Data events. There's a lot of ways to participate and become a supporter of the community.
[00:37:28] Unknown:
That's that's really very cool. I think I think that so much how should I say this? I look at so many technology stacks like like, Linux as an example, Linux on the desktop. And I think to myself, you know, it is such a shame that these folks can't come together and just agree on some standards, some interfaces. Like like, you know, you can have a completely different way of doing things, but being able to to come together and support the infrastructure that you both use and figure out a way to not inconvenience and fragment your user base, that's definitely a wonderful thing. It sounds like that's part of what NumFocus, in addition to offering physical support,
[00:38:14] Unknown:
is trying to do. Is that is that right? Oh, absolutely. Yeah. It's it's definitely a place where conversations can take place that between projects, where people can know there's a there's an interesting body of folks who care about the overall the overall experience and wanna make sure that scikit learn and IPython and NumPy and SciPy are all kind of talking together as even though they're independent projects, that they have standards interfaces they agree on. A lot of that energy sort of happened in fact, you know, I, Fernando Perez, when we we organized NumFOCUS, he had the thought of, you know, confederation of of federate groups coming together and having a place in a in a common organization. He used references to the founding of the country, actually, the United States of America. Oh, wow. Different states, and they only need to come together and there'd be a group. And so, you know, there that's been a part of it. It's figuring out how do we we open source definitely has a community and a and a individual spirit and it needs to, like, retain that, but there does it helps to have organizations that are supportive of that, but a place for people to come together and and, take action together.
And there's it's great to be involved, you know, to have a legal entity that can take money, a legal entity that can employ people, a legal entity that can support if if something you know, patents and trademark ownership and all those things that are important in to associate with the laws of a of a particular country, but it can represent the community. There are challenges associated with it. You know, volunteer run is is hard. That's why I've been grateful that we can continue to support an executive you know, Leah Silan is the executive director and she's a a driving force in the community just as an actor. We can continually move things along. Other companies are starting to give resources as well. So it's exciting to see.
So it's really starting to take off and I'm really pleased about, the response to community to organize it. So just helping sustain itself. Again, my passion has always been about I love open source. I love to see it work. I love to see it sustained. You know, I I have 6 kids. Right? And I have 3 kids in undergraduate school. So from the very beginning of my involvement with open source, I've been aware of the need for me to provide for my family. And so I've had to figure out how do we continue to do this and still put shoes on the kid's feet and put them through now it's put them through college. And so that's that's been a part of what I you know, I wanted that for everybody. You know, how do we how do we make this this work? And sometimes it's through community and sometimes it's through companies. I think both participate and work together to make it happen.
[00:40:35] Unknown:
Absolutely. So for someone just starting out in the data science and, pardon me, analytics space, what advice would you give?
[00:40:43] Unknown:
Yeah. I would say learn Python as much as you can. I would say, download Anaconda. I would say go and, do a do a Google search on data analysis top in Python. Learn learn learning basic data analysis in Python. And there's enormous number of resources on, IPython Notebooks that'll just teach you and you can walk through. I would say go to, find a community, find a group, find a local meetup group, or find a local, you know, either Pydata or a data science group or a even the R community can be helpful. You might go there and find a few like minded interested folks. You can take an online class, but get involved and have it and then have something you care about. Like, if you wanna be doing data science, it means you wanna you wanna transform information into a form to answer a question.
So so think what do you wanna do? Find a problem you're interested in and do the analysis and just and and use that to guide your studies.
[00:41:36] Unknown:
That's great. I think that'll definitely be some very useful advice to a lot of people particularly because of data analysis and data science being such big big industries now. And, you know, the I've been reading a lot of things about how lots of companies are hiring for it. They don't have enough people to fill the positions that they are trying to fill, and so it's definitely a very lucrative direction for people to go in whether they're just starting off in technology or just getting out of college or if they've been in the industry for years years and are just looking for something new to do.
[00:42:09] Unknown:
It is. 1 reason Python's an excellent choice is and don't don't be afraid to learn some programming. I think a lot of people find that the the the the folks want data scientists, but they want data scientists who can program. Yep. We put together a solution. We had a lot of applicants for data science roles. Our needs like, the number of needs where we want someone just to be analyzing data are small. Where we want people to be able to take the problem somebody's trying to solve, again, because the tools right now are not where they could be and will be in 10 years. Today, there's still a you gotta do some integration.
You gotta do some gluing together with legacy solutions with legacy data, and that's gonna take some some programming. Python can do it all for you. So just learn Python and become good at it and don't be afraid to steer around. Maybe you're just interested in Pandas and NumPy, but Don't be afraid to learn a little bit about, you know, the URL, you know, request library or the other libraries and, you know, a parsing library, a scraping library. Don't be afraid to learn a little more and expand your knowledge.
[00:43:08] Unknown:
So of your myriad achievements and projects that you've been involved with over your life, what are you most proud of? That's a tough question. I mean, honestly,
[00:43:19] Unknown:
honestly, it's my kids who I'm most proud of. That's a tough thing to say to a dad of 6 kids. He's oldest as 20 and youngest as 7. That tells you how old I am. So beside my family, I would say jury's still out a bit. I'm certainly proud of what NumPy has become. Certainly, you know, and and realign and mostly because the effort it took to to create it and it was definitely a, it was 1 of those situations where I did not know the end from the beginning, had to, you know, leap of faith, feel this urge, feel this past, feel this need, and take a step in the dark and without a lot of support and then but but have the confidence. And so, you know, and then have it emerge and and be a a real success. Definitely definitely proud of that. But you know, there's other things I'd like to see more proud of in the future. So hoping to replicate that in in other projects.
[00:44:09] Unknown:
That's great. So at the end of our episodes, we like to provide listeners with some pics. And so this can be anything that you find interesting enough to wanna share with other people. So it could be technology, it could be a movie, it could be a board game, whatever it happens to be. So we'll get it started. So my first pick is going to be used bookstores. They are amazing. There's 1 actually down in East Lyme, Connecticut called the Book Barn, and I took my kids and my wife there recently, and we walked out with a giant box of books for about a $100, which would otherwise probably have cost us about 5 times that much. So used bookstores are amazing for getting a lot of really good and interesting material for reading, whether it's fiction or nonfiction or kids' books. Just great thing to go out and do. Good way to spend the day. My next choice is going to be the movie cloudy with a chance of meatballs.
The book is a kid's classic, has been for years. The movie came out a couple years ago, and it is hilarious. My 4 year old loves watching it. He can watch it repeatedly, and I've seen it a few times, and I still think it's 1 of the funniest kids' movies I've ever seen. And then going on that theme for funny movies, another really great 1 is Kickin' It Old School with Jamie Kennedy. And I've watched that movie probably 8 or 9 times, and I still love watching it. It is 1 of the funniest movies I've ever seen. So for anybody who has even a tangential experience with the eighties, it's well worth a watch. Mhmm.
[00:45:48] Unknown:
Yeah. Awesome.
[00:45:50] Unknown:
Chris, why don't you go ahead? Alrighty.
[00:45:53] Unknown:
So this week, the first thing I'd like to pick to to sort of continue along Tobias' theme of comedians is kids in the hall. For anybody who went to college around the same time I did in the sort of late eighties early nineties, these guys are in a Canadian comedy troop, and they are just so funny, so bizarre, really great stuff. I mean, they've gone on to do lots of other great things. You probably know, if not them, then some of the work they've done in other venues, but, they still make me laugh, you know, out loud 20 years later. So that's that says something, I think.
The next pick that I have is the Museum of Fine Arts here in Boston. Every year they do this really cool event called Art in Bloom, and it's really something kind of neat. Like, every year they they they call in floral designers, oddly enough, to make floral designs and pair them with pieces of art all around the museum. So, for the over the course of a weekend, you get to wander around the Museum of Fine Arts and see all these really kind of amazing, creative, cool floral designs paired with great art, you know, and and the Museum of Fine Arts has some really timeless pieces. So it's just a really great experience. It's it's sort of like a great, you know, a great evening or day out, and and I highly recommend it.
Then my next pick is, Saran Bark and the the the code newbies community. Talking about, you know, bringing people in and and enabling people to do good work. You know, so many there's been so much discussion out in our field right now about sort of, like, diversity and being welcoming to newcomers. And she, more than anybody else that I can think of, has really sort of like walked the walk. She's created a whole little empire of communities, sub communities, I guess. They do she does a weekly Twitter chat, they have a Slack team channel room, whatever it's called.
They have a discourse forum and it's all oriented towards helping people get started with programming. And once they do sort of, like, get their foot in the door and get that get that first job or get that dream job in software development. And just sort of like and also, she does a podcast that's totally amazing and whether you're new or old or have been coding for 20 years like I have, it's really neat and and worth a listen. So kudos to her and all the work she does. And my last pick, because I've been blathering on long enough, is the Apple 27 inch, Retina Imac 5 k, which I'm sitting in front of as we speak.
I realized that this is gonna out me as a total Apple fanboy and that's okay. I love this machine. It is fast. The display is just as gorgeous as you might think. It's beautifully engineered, well put together, and just a really a real pleasure to set up, use. It has been a delight since I've gotten it, and I can't recommend it highly enough for anybody who needs a machine with, you know, a fair bit of horsepower under the hood, but doesn't necessarily wanna go through the pain of, you know, building their own PC or or supercharged PC or something like that. It's a really impressive machine.
Travis, why don't you why don't you give us your picks?
[00:49:20] Unknown:
Awesome. Wow. Okay. Okay. I'll I'll start with Data Carpentry. Data Carpentry is, basically, an organization run by Tracy Teal. Just getting off the ground kind of after the pattern of software carpentry, which was very successful at training scientists how to program. Data carpentry is about training people into various industries how to deal with data and how to how to how to manipulate it, how to use it. So it's it's, they're just getting off the ground, but check them out, datacarpentry.org. Tracy is an amazing, participant in the Python community. She was at pie PyCon this year with her family and was, has been a longtime supporter of Python.
So that's 1. Second 1 is the, I would say the Brain Science podcast by Ginger Campbell, MD. She was a emergency room physician who who decided that she really loved science and wanted to go back to her roots and start a science podcast. I love it. It's an old 1. It's been out for a while, but you can still go there and get a lot of great book ideas about how the brain works, and I've just really enjoyed listening to her. And finally, a little bit maybe different, but, a favorite book of mine is, Money, Bank, Credit, and Economic Cycles. It's a bit of a big book. It's pretty thick. It's not light reading. It's definitely for someone who's serious about trying to understand.
I I I feel like I understand the world much better. I understand, financial systems and banking much better because I've read this book. It's it takes you through the history of money from the Roman Roman days to today and kind of all and and how it works at a fundamental level. So really appreciate that. He's out of Spain. He's a Spanish professor, at the University of Ray car Juan Carlos Madrid. Really appreciate that book. So that's that's it.
[00:50:58] Unknown:
Great. Well, we really appreciate you taking the time out of your evening to speak with us. It's been a lot of fun, really interesting. So for anybody who wants to follow you and continue analytics and the work you guys are doing, what would be the best way to find you and keep track of your Yeah. You can follow me on Twitter. I'm at teolephant,
[00:51:19] Unknown:
t o l I p h a n t. You can follow PyData, you know, at pydata, p y d a t a. You can come to our website and follow at continuum IO. Twitter is easy. Facebook, we post too occasionally. You can come to our website. We're we're at every piloted event. So look for piloted events in in an area near you, and that's a way to kind of follow both the company and the community.
[00:51:40] Unknown:
Great. Great.
[00:51:42] Unknown:
Great. Alright. Well, it's been a real pleasure. I appreciate everything, you're doing. Thanks for inviting me. Thank you for coming.
Introduction and Motivation for the Podcast
Interview with Travis Oliphant: Early Days with Python
Creating NumPy and SciPy
The Rise of Python in Scientific Computing
Understanding Data Science
Founding Continuum Analytics
Blaze: Simplifying Data Analysis
NumFOCUS: Supporting the Community
Advice for Aspiring Data Scientists
Travis' Proud Achievements
Picks and Recommendations