Visit our site to listen to past episodes, support the show, join our community, and sign up for our mailing list.
Summary
Ian Ozsvald and Emlyn Clay are co-chairs of the London chapter of the PyData organization. In this episode we talked to them about their experience managing the PyData conference and meetup, what the PyData organization does, and their thoughts on using Python for data analytics in their work.
Brief Introduction
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- Subscribe on iTunes, Stitcher, TuneIn or RSS
- Follow us on Twitter or Google+
- Give us feedback! Leave a review on iTunes, Tweet to us, send us an email or leave us a message on Google+
- Join our community! Visit discourse.pythonpodcast.com for your opportunity to find out about upcoming guests, suggest questions, and propose show ideas.
- I would like to thank everyone who has donated to the show. Your contributions help us make the show sustainable. For details on how to support the show you can visit our site at pythonpodcast.com
- Linode is sponsoring us this week. Check them out at linode.com/podcastinit and get a $20 credit to try out their fast and reliable Linux virtual servers for your next project
- I would also like to thank Hired, a job marketplace for developers and designers, for sponsoring this episode of Podcast.__init__. Use the link hired.com/podcastinit to double your signing bonus.
- Your hosts as usual are Tobias Macey and Chris Patti
- Today we are interviewing Ian Ozsvald and Emlyn Clay about their work with PyData London, a group within the PyData organization. PyData London represents the largest Python group in London at ~2850 members, they hold regular monthly meetups for ~200 members at AHL near Bank and a yearly conference for around ~300 members. Last year, they and their sponsors raised over £26,000 to sponsor the development of core numerical libraries in Python.
On Hired software engineers & designers can get 5+ interview requests in a week and each offer has salary and equity upfront. With full time and contract opportunities available, users can view the offers and accept or reject them before talking to any company. Work with over 2,500 companies from startups to large public companies hailing from 12 major tech hubs in North America and Europe. Hired is totally free for users and If you get a job you’ll get a $2,000 “thank you” bonus. If you use our special link to signup, then that bonus will double to $4,000 when you accept a job. If you’re not looking for a job but know someone who is, you can refer them to Hired and get a $1,337 bonus when they accept a job.
Interview
- Introductions
- How did you get introduced to Python? – Chris
- What is the PyData organization, how does PyData London fit into it and what is your relationship with it? – Tobias
- In what ways does a PyData conference differ from a PyCon? – Tobias
- Does PyData do anything in particular to encourage users from disciplines that might not be aware of how much our community has to offer to choose the Python suite of data analysis tools? – Chris
- You have both spent a good portion of your careers using Python for working with and analyzing data from various domains. How has that experience evolved over the past several years as newer tools have become available? – Tobias
- For someone who is just getting started in the data analytics space, what advice can you give? – Tobias
- How can conferences like PyData help strengthen the bonds and synergies between the Python software community and the sciences? – Chris
- There are a number of different subtopics within the blanket categorization of data science. Is it difficult to balance the subject matter in PyData conferences and meetups to keep members of the audience from being alienated? – Tobias
- Data science is a young field and we’ve yet to see lots of examples of the successful use of data. How are London-based companies using data with Python? – Ian
- Is there a Python data science library you think needs a little love? – Emlyn
Keep In Touch
Picks
- Tobias
- Chris
- Ian
- Emlyn
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to podcast.init, The podcast about Python and the people who make it great. You can subscribe to our show on Itunes, Stitcher, or TuneIn Radio, or you can add our RSS feed to your pod catcher of choice. You can follow us on Twitter or Google Plus, and please give us feedback. Leave us a review on iTunes to help other people find the show. Send us a tweet or an email. Leave us a message on Google Plus or in our show notes, or you can also join our new community. Visitdiscourse.pythonpodcast.com for your opportunity to find out about upcoming guests, suggest questions, and propose show ideas.
I would like to thank everyone who has donated to the show. Your contributions help us make the show sustainable. For details on how to support the show, you can visit our site at python podcast.com. Linode is sponsoring us this week. You can check them out at linode.com/ podcast in it and get a $20 credit to try out their fast and reliable Linux virtual servers for your next project. I would also like to thank Hired, a job marketplace for developers and designers, for sponsoring this episode of podcast.onnet. Use the link hired.com/podcastonnet to double your signing bonus.
Your host as usual are Tobias Macy and Chris Patty. Today, we are interviewing Ian Oswald and Emlyn Clay about their work with PIData London, a group within the PIData Organization. PyData London represents the largest Python group in London at about 2, 850 members. They hold regular monthly meetups for 200 members at AHL, near bank, and a yearly conference for around 300 members. Last year, they and their sponsors raised over £26, 000 to sponsor the development of core numerical libraries in Python.
[00:01:40] Unknown:
So, Ian and Emlen, could you please introduce yourselves? Ian, why don't you go first? I'll be happy to introduce myself, but I'm just gonna correct you there. So, PyDays London is not just London's largest Python user group, but it's the UK's largest
[00:01:53] Unknown:
Python user group. And I think, Emlyn, tell me, I I think we're Europe's largest Python user group as well. I guess we're I'm not quite sure. I would say, I gave Tobias this little intro so that he could he could finish it. And I didn't wanna blow our horn too much. It's been a crazy 2 years building this meetup, building this group.
[00:02:12] Unknown:
And yeah. No. We are. We're Europe's biggest. We're certainly certainly Europe, the UK is and I think Europe's. Yeah. I'm gonna claim Europe until somebody tells me I'm wrong. We're not the US', the New York Python user group is definitely a lot larger than us. Yes, sir. But, on the world stage, we're not doing bad. So, who am I? So my name is Ian Oswald. I've been working in data science for 15 years. I'm an O'Reilly author with High Performance Python, and with Emlyn, we co chaired the first conference 3 years ago, Pilate to London, and then we got the meetup out of that. And I'm a long time Pythonister and c plus plus programmer before that, and international speaker.
[00:02:53] Unknown:
Emlyn, how about you?
[00:02:54] Unknown:
Right. So, yeah. My name is Emlyn Clay. I'm a bit of a chameleon in that my background is in pharmacology, which is sort of a lot of wet science, doing drug discovery. I got into, doing software sort of, tinkering about 10 years ago. I've been doing it professionally for about 6 years. Yeah. I, I co chair PyLab London, with Ian here, and I use Python in anger. I have many production systems now up there. And as well as Python, I write in another a number of other languages, as well, but it's certainly 1 that's close to my heart.
[00:03:30] Unknown:
So how were you each introduced to Python?
[00:03:33] Unknown:
Hey, Chris. So I started using Python in my first real job back, around 1999. It was somewhere around, 2, 001 or so. At the time I was senior programmer for a c plus plus group in an artificial intelligence research company. And I was really proud of the speed of our c plus plus code doing things like, logistics optimization work. And 1 day I was given some Python code, a sax parser written in Python for some HTML, and asked to improve this parser for 1 of the NLP's, the natural language processing teams in our French office. And I remember looking at it disdainfully and thinking why would I need to write in this ridiculous scripting language, and I've got the power of c plus plus behind me and a team of 5 working with me.
And then learning Python in the space of a day learning how to improve the sax parser, I suddenly realized I was more productive in a day's learning than I was after 5 years as a senior programmer with c plus plus, for text processing. And that was a bit of a wake up call. And then since then, I've never looked back. I use Python for almost all of my work, almost exclusively in the last 5 years for data science, falling back every now and again to use c plus plus when necessary, hardly touching any other languages and just just really, it's like a an over 10 year love affair with the Python language.
[00:05:00] Unknown:
And, Emlen?
[00:05:02] Unknown:
Right. So, I like to consider the first the time I got introduced to Python was the day that Excel broke. So at the time, I was analyzing doing some signal processing on ECG data. Unlike most life scientists, we don't really have any, formal training in doing any programming. So I was trying to find anything to sort of, get it working well. And I've been toying around with MATLAB for a little while, but didn't have access to the, computer, down the computer labs at King's College in London. And Python was available, so I could put it on my computer and play with it there, and that kind of, you know, spawned the interest there.
So I started using it to do, biomedical analysis, statistical analysis on on various datasets I had access to. Yeah. So I started off with scripting, you know, just in that sort of way, and then from there, I mean, I, you know, the web got more and more impressive, so then span out into, into those sorts of things. But Python's always been particularly good as sort of a scientist toolbox because all the numerical libraries are particularly strong. And it feels a lot like MATLAB, so, you can almost forget that you are you are using it for that purpose.
So yeah, that's how I got introduced.
[00:06:20] Unknown:
And what is the PyData Organization? And how does PyData London fit into it? And what is your relationship with it?
[00:06:28] Unknown:
Okay. So that's an interesting question, and I'm I'm not entirely sure what's, what is the PI Data Organization. I know that it started several years ago. It's a relatively young organization. It started back in 2012 in the USA. And, out of that it spawned a number of conferences. There have been 14 to date, I believe, and the first 1 that was run-in Europe was run by Emlen and myself and several of our colleagues 3 years ago. And so we hosted the first in Europe, last year there were a couple in Europe, and this year there will be 5 in Europe, and 5 in America. So it's growing rapidly. Emlyn, how would you define the organization of PI Data?
[00:07:09] Unknown:
Yeah. So I mean, you and I were there, I think, at the second ever 1, and that was attached to PyCon, US in 2013. Mhmm. This is where you and I met. I mean, I think they had 1 before that at the, the Google campus. Mhmm. And this was, as far as we're aware, it was started up as a as a counterpoint to, PyCon, which was more sort of general all sorts of topics that touch on Python. And PyData was focused on people who were using it for, doing data analysis, signal processing, statistics, things of that nature. Things that were much more sort of a data driven problem, And all of it, I guess, built around NumPy, which in some ways, NumPy is kind of its own entire ecosystem on top of in inside of Python.
So, yeah. I think that's kind of what it was. It was trying to make it so that it was very much about numerical computation, parts than Python.
[00:08:06] Unknown:
And it's slightly, it sits alongside the older SciPy conference and the European Euro SciPy series, which are I would argue a little bit more academically focused and Pydata sits a little bit more on the industrial side. Both have industrial and academic speakers along, but I think both have a slightly different focus. I think it's quite nice to be able to mix the 2 sides, but accepting that, perhaps it's more interesting for a series to have a lot of academic contribution, and others for industrialists to share their their their, achievements and their problems, and the success stories of actually getting things shipped. Because that that alone is really quite a hard technical challenge, in a changing data world.
[00:08:51] Unknown:
And taking a brief diversion, you guys mentioned that the meetup has grown to a pretty significant number of members. And I'm just wondering if you can give some insight into how you managed to build and expand that community and how you manage to sort of keep it together and keep people interested.
[00:09:08] Unknown:
So we have I'm just looking at the screen now, 2, 872 members. I'm really proud of that number. I'm proud every time it goes up, and I get very excited. I, I scraped, the growth numbers, and we generated a graph for the meetup. And then we have a graph that shows, I don't have to describe this. 2 lines, 1 goes up at a certain rate until Christmas, a year a year and a bit back, and then it grows linearly, but a faster rate. We don't know what changed. It just it grew faster. Now exactly how
[00:09:41] Unknown:
we've helped make it grow, I'm not quite sure. Emlen, what are your thoughts on, why why the audience is growing so well? We know exactly how we've made this grow. So well, I think 1 of the big thing that attracts people, to Pilates London is that it's unashamedly, sort of, intermediate advanced level stuff. So it's really interesting talks whether, you know, the speakers are completely at liberty just to sort of go, here's the stuff I'm doing and here's all this interesting, you know, these ways fits together, which I think creates a fantastic driving force which has helped bring all these excellent speakers in. We've always made it sort of light and fun. We've we've had excellent sponsors. They've always been great at providing, you know, pizza and beer. It's very social and, I think that's that's kind of what sort of pushed it along. I think around us though, there's been the the general macro environment is that, Python has just become more and more important, for data scientists.
And, data science as a job has become, furiously more in demand, especially in London over the past, you know, 18 months,
[00:10:48] Unknown:
or so. There is there is certainly something interesting around the timing. Around 6 years ago at a hacker news event in London, I stood up on stage at the end and said to about the 500 people in the audience, I said, hey, I want to organize some kind of data related meetup. Who's with me? And about 30 hands gingerly went up. And I looked around and thought, wow, that's not not a huge number of people given 500 people who are coming to this Hacker News event. And then a couple of years later some data science groups started in London, and ours started about a year later. And we've I think there are 12 related data science events now from machine learning and deep learning through to visualization and Kaggle competitions. So there's a whole range of events now that have all sprung up and grown in the last 3 years. There's definitely a a local timing event in London. I don't really know, I don't know quite how that works. I know with the meetup, we're very keen to have a meetup every month sponsored by a good host, and so we've had Pivotal at the beginning.
So they got us started, and then list the the fashion recommender us for another 6 months, and then we moved to AHL, a hedge fund, who've been sponsoring us for over a year now. And as Emeline says, we only have good speakers. We have no product sales pitches. It's all about good technical, data storage, the highs and the lows. We have a good diverse speaker set and audience. And, yeah, we we try to encourage interesting stories and a nice environment facilitated by beer and pizza, and it seems to work terribly well.
[00:12:23] Unknown:
Yeah. The rise in data science as a particular discipline definitely. I've definitely seen that happening in a lot of different places, and I can definitely see how that would contribute to the growth of a data oriented meetup group. And, also, as yours you're mentioning how there are a number of somewhat related, more niche meetups, but I think that having a sort of unifying group where people can get a, you know, cross discipline view of what other people are working on is definitely very interesting. I know that whenever I, you know, go to any meetups, it's very interesting seeing what people in other domains and other particular, subcategories of my, of my work are up to.
So you mentioned that you are co chairs of the PyData London conference. And I'm wondering in what ways does a PI Data conference differ from a Picon?
[00:13:19] Unknown:
Oh, that's quite good. So generally, yeah. A lot of it focuses more on here is the particular data problem I tried to solve and here is the implementation I used. And I know that there are a lot of PyCon talks that are about that as well. But I guess PyCon focuses on so many broad things, that generally it's sort of a bigger mixed bag, whereas there's generally a sort of a a common narrative. Do you do you reckon, Ian, there's a common sort of pattern that the speakers, they come along and it's, you know, here's my problem, here's how we've implemented it, and here's what we're going on to in future, which, you know, is is kind of a common framework to all the talks that are done with PyData.
[00:13:59] Unknown:
Right. But the the focus is definitely at the PyData simply around the data rather than, for example, web development or back end system support. I certainly remember when I attended a EuroPython in 2, 010. I remember going to all the different tracks and listening to a lot of the talks there, and then getting to the end of the conference and being a little bit grumpy. And I spoke to a colleague and said, you know, hey. It's it's great that we've got quite so many talks on Django and general web development, but surely people are doing stuff with the data in their databases, not just putting it on the screen in in pretty boxes. And then, my friend said, well, you know, there's a community conference here that the way you fix this is that you do something technical next year on data. Oh 0, crikey. Yeah. Actually, it's probably time that I I did something and gave back rather than just consuming these tools that are provided for me. So the next year I proposed a high performance Python tutorial.
I didn't know how to pitch it, and so, I crammed in as much as I could. And it turns out that, I thought I had 3 hours. I had actually 2 and a half hours of breaks, and so I crammed about 6 hours of material into 3 hours. I didn't let my students out of their room, and I did my bit to to make sure that everyone learned lots of high performance computing, as a precursor to dealing with data. And then after that, I sat back and thought, oh, I wonder if there are other Python conferences that deal with data rather than just just everything like PYcons and, the Europythons. And that's when I discovered Eurospine. So Eurospine is a bit more academically focused, and that's been running for, I think, 8 years now, moving throughout Europe.
And then the Pydata series sprung up, in the last, 3 or 4 years. And definitely, we all of the Pydata, all the videos are online. You can find them, at PIData.org. All the videos are there, and they all focus on data processing. I'd say that's the core difference to Pycoms.
[00:16:00] Unknown:
And have you noticed a difference in terms of the sort of the the feel of the audience or the, meaning the demographics or just the general populace of people who attend a PyData conference versus a PyCon?
[00:16:14] Unknown:
Oh, that's an interesting 1. Yeah. Right. So Evelyn, what do you think? Right. So I guess, certainly we we don't feel like the audience has changed inside the group, that's been quite nice. So along the lines of, where 1 of the things that is the continued strength of Play Data is that the mix of people there is, is fantastic. Compared to a oh, compared to a PyCon, I guess what do you reckon? Like the sort of, almost the job spec of a person. There's a lot more, of course, there's a lot more data science people. Invariably, we have a lot of people who are, sort of former scientists who have come into programming. That, I think, is more common than having sort of career programmers,
[00:17:00] Unknown:
I think as our sort of mix. Yeah. I'd agree. And we certainly have, maybe a third of our audience here are data engineers. So people who deal with the data plumbing, getting in, storing it, and serving it up in efficient ways. Sure. I know that we have, I'm pretty sure our audience is sort of 40% PhD, 40% master's degree, and then the remainder are probably they probably have some level of higher education. I think comparing that to general Python conferences, you will have a a wider mix of academic backgrounds and a far wider mix of industrial, backgrounds. I know it's 1 of the PyCons I went to in the US several years ago.
I was incredibly impressed just to see lots of parents walking around with their children. And they brought their children because they were Raspberry Pi hackathons. And so you could turn up with your child, and then the child would start to learn programming in Python on a Raspberry Pi, and then they take it at home afterwards. And so not scientific at all, but educating the the next most available generation. And I thought that was just that, you know, blimey, that was very different to a typical scientifically led conference.
[00:18:11] Unknown:
And do you think that by virtue of a bit of more shared background in industry causes any difference in the social dynamics? Like, do you notice people at PI Data Conferences being do do you see them having an easier time sort of picking up conversations with their peers versus somebody at a PyCon because of the difference in background? Or
[00:18:33] Unknown:
I I think so. Yeah. I definitely I definitely know that's the case. I think because we're we're somewhat because we're, sort of, in many ways, a more specialized Python conference towards data analysis and data problems. Generally it's a good assumption that most of the people you're talking to have a familiarity with all the mix of things, statistics, machine learning. I mean, you can roughly assume that anyone you talk to at a pi data knows a bit about machine learning, which is nice. So yeah. No. That that has certainly been when it's very easy and fluid to get talking to people. We encourage a very, conversational, you know, discourse driven, style.
You know, we even have we even have heckling in the meetings. Right? I mean we actually It's all about the heckling. Oh, I don't know whether this is just a UK thing, but there's something about having it so that it's really interactive. It's not just you know, turn up, sit down, watch the watch the show. It's, you know, thank you for coming along, you know, have a pizza and a beer, but get in the conversation. Right? Know, it's all about giving back and and that's kind of that's that. That's that, I guess, is,
[00:19:40] Unknown:
yeah. That's that's, I would say, is, yeah, kind of the the focus of what we've got there. Certainly 1 of the more difficult things at a general Python conference will be that you turn around, you say, hello, my name is Ian. Who are you? And you start a conversation with somebody and then you discover that you have a very, very different background. So possibly, say, in a PyCon US with 2, 000 500 people, it's quite possible you've got no shared background at all, no shared work interests. You're both at the conference for your own reasons, and it's quite hard to get some kind of overlap. And that's lovely and, it's interesting, but it's it's hard to find people who really care passionately about your core subject. And we have a similar problem with our Pydata meetups in that we have 200 people turning up every month. So our our regular meetups are the size of a small conference every month for free. So that's a heck of an achievement. But we do have people turning up with a high performance background or a natural language processing background or a time series econometric background, or they're data engineers, and they don't really care about what you do with the data. They just want to store it. So So 1 thing we started to experiment with is sections of the room during the breaks being designated.
That corner, that's for visualization. And that corner, that's for machine learning. And then the bar, that's for general conversation. And that way, the the 200 people have a better chance of finding people who want to talk about their particular subject right now. And I I think that seems to be working really quite nicely. So I I think it's interesting that, the same problem happens at a different scale even at a conference, like PyCon.
[00:21:12] Unknown:
So does PyData do anything in particular to encourage users from disciplines that might not be aware how much our community has to offer to choose the Python speed of data do you have data analysis tools?
[00:21:24] Unknown:
I guess we we sort of do. Yeah. So we've, there are plenty of people who come along to Pydata who are part of the, the r user groups. I think we did a we did a plug for 1 of their conferences. Didn't we? Yes. Absolutely.
[00:21:37] Unknown:
The the IRRRL conference. Yeah. We plugged the last couple of them. And they've plugged us as well. And they've plugged us as well.
[00:21:43] Unknown:
You know, we chat to the big O people, who are, I mean, they use a fair chunk of Python. But then again, they tend to also be sort of, you know, high more high performance focused. Yeah, weirdly, I mean, we sometimes get comments in the meetup, there was a few meetups ago, someone put, there wasn't any Python discussed in this particular talk. And yeah, while we are we are sort of focused on Python because it's a popular language in this space and it's a good language to sort of, you know, put up on a slide and and read and understand. It's it's really it's very interesting because once you get to a certain level with with like if you need extra performance, then yeah, you need to drop down into c or maybe siphonize your code or something. Right?
Or for others, there's compatibility things like you have to get it running on the JVM because that's, you know, the setup they've got. So there is a fair bit of overlap that occurs naturally anyway with the people who come along. So I'd like to think that we sort of we pivot around both the Python bit and the data bit, of our of our group. So,
[00:22:43] Unknown:
we don't have anything in particular. Sorry. Well, 1 of the points we made for the conference is that Python is definitely the hook, and so we want the majority of speakers to be talking about the use of Python. But it's not just about Python. We need people who are using R or Julia or SPSS or MATLAB or any of or Java, I guess, for the big data side. People coming into the community with other ideas, other impressions, other backgrounds, other use cases so that we can have a nice cross discipline sharing of experience at the conference. Otherwise, you you end up in danger of being in a little echo chamber in your own closed ecosystem, everyone patting each other on the back and saying, oh, yes. Well, you've got the best techniques and kind of ignore everybody else. And, you know, that's that's not how the rest rest of the world works. So we do our best to be very inclusive there.
And, yeah, sure, we have had people complain that, that, 3 months in a row, we've had other languages being discussed alongside Python. How about just an old Python night? And yeah. Sure. We normally just have Python nights, for our monthly meetups. But I'm very keen to get other people in from other backgrounds. Spread the word and see who comes in with a contrarian opinion. Hopefully, upsets, some of the natural order of things and changes some of the the normal thinking,
[00:23:54] Unknown:
because that's how new ideas get tested, and that's what science is all about. I think that's fantastic. And and I've I've certainly, if you've been listening or or even if you haven't on this podcast, I've been a huge proponent of people need to break out of their sandbox and look at what other communities are doing, you know, because Absolutely. I I feel very strongly that, you know, I came from Ruby and learned Python and said, oh my god. You guys have built some amazing toys over here. So, you know, and vice versa. So I I think it like, being it's great to be a Python fan, but be aware of what else is happening outside the boundaries of the Python community. It's the only way we're gonna continue to learn and evolve.
[00:24:37] Unknown:
Yeah. Oh, definitely. Definitely. I mean, I don't know whether it's, whether we're just fortunate with the people we have or maybe it's the whole the focus on being or focusing on data or that a lot of our members are sort of ex scientist. But everyone is very much is very critical of everything. Everything they use, all the tools they use and things of that nature. So, yeah, I think the sort of the the reason why Ian and I sort of try and encourage that in our group, is simply because that's that's sort of the background we come from. But you're quite right, it could I mean, there's there's there's no winners in a flame war.
So it it makes a lot of sense to be able to discuss which which thing is better, what makes it objectively better. Because I don't know, I don't think I don't know whether you can make it entirely as a pure Python guy. I think there is something to be said for if you can just if you're a little bit polyglot, like have 1 language you're really strong in and then know a smattering of the others. You can be a lot more dangerous than than just knowing 1 language.
[00:25:37] Unknown:
Yeah. And it's very rare that you come across a code base that has only 1 language in it or not necessarily a single code base. But, you know, a particular application environment, you know, even if you're just doing web development, you're guaranteed to be running up against Java script. Or if you're doing data you know, big data analytics, you're bound to come in to come across Java at some point. Or, you know, if you're doing a lot of distributed computing, chances are you're gonna come across something either Java or Erlang. So being able to at least understand how to parse those languages to be able to figure out what's going on when you come across an edge case is definitely important. So even if you don't necessarily write anything in other languages, having some familiarity in what other languages are capable of and why they're used in those cases and how they operate under the covers a little bit is definitely valuable and makes you much more valuable as an engineer.
[00:26:28] Unknown:
Sure. Sure. I mean, to to Chris's point about, yeah, the crossover with something like Ruby. I think there's a fair there's there's there's a fair chunk of that involved with, with things like Chef. Right? And while, you'll have, Python developers who themselves, aren't probably using, Ruby as their main scripting language for data analysis. They're probably using something like Chef or Puppet, to, you know, to orchestrate all the servers because that becomes a concern when you're plumbing things together.
[00:26:56] Unknown:
Yeah. Absolutely. That's my day job actually, writing Python and Chef.
[00:27:01] Unknown:
Very good. Very good. I'm I'm more of an Ansible man myself. But, I have had on occasion to play with Chef and Puppet.
[00:27:08] Unknown:
Yeah. Need to look at the interval. Sorry, Tobias. I was just gonna say I'm a salt stack. I'll do when whenever I can be. Oh. So you've both spent a good portion of your careers using Python for working with and analyzing data from various domains. And I'm wondering how that experience has evolved over the past several years as newer tools have become available.
[00:27:29] Unknown:
Oh, well, so I've been using Python for,
[00:27:33] Unknown:
I think, 14 years, something like that now. So what would you what would you have started on here? What version is you oh, it's it's and, oh,
[00:27:42] Unknown:
I don't know. What was what was Python 14 years ago? I have no idea. I I know Wasn't even a major number, mate. It was probably like 0 No. No. No. It was well it was well established. There was at least 1 book in the local Waterstones bookstore. I remember because, I I just thought, well, it's it must be legitimate because there's at least 1 book out for it. Now, I know that that was that was sometime around the unification of the math libraries, into NumPy by Travis Oliphant. So that was the beginning of the scientific community coming together with a shared base layer of numeric tools. And it was you know, that was way back before pandas and scikit learn and the like. I guess, 1 thing I've seen evolve is the the the stack, massively improving. So we had NumPy with, homogeneous, arrays of data and then pandas, allowing heterogeneous, arrays to be put together, in an Excel like spreadsheet.
And scikit learn making it super easy, to do your machine learning. And, matplotlib growing up, quite nicely in to a very powerful, a little bit fiddly at times, but a very powerful plotting library. I think 1 of the to me, 1 of the most interesting things has been the disconnect with some of the clients. So I run my own consultancy, and I've been consulting for over 10 years. So I've spoken with a lot of clients. And when they see data science and Python, on the rise, they they kinda match that with a lot of the publicity, partly from the big data world and partly from companies like IBM.
And they begin to assume that because it's AI in data science, probably because, say, Google conversational level, freely available, and they can predict human buying habits, and you could knock out a prototype in the space of a week to improve spending patterns and advertising. And I find it I kinda it's kind of interesting. It's, it's a it's a bit of a tricky conversation to calm clients down there, but it's lovely that clients have bought into the fact that Python and the data science community is so well established that this kind of thing must be possible by now, surely. So, that's an interesting evolution, I think, from tools used by scientists through to clients, assuming that some of these things just kind of solved problems now when and they're not yet.
[00:30:12] Unknown:
I would say I have forgotten the the question initially,
[00:30:16] Unknown:
positive. Yeah. Was I monologuing?
[00:30:18] Unknown:
Oh, dear. So I'm just wondering how in your in the length of your career of using Python for data analysis, I'm wondering how your experience has evolved over the past several years as newer tools have become available.
[00:30:31] Unknown:
So 1 of the big experiences that I've had is that my continued use of R decreases, over time. I think about the time when, you know, pandas came in and gave us a really decent data frame object, that was a big use case for for things that I was doing. We we're just organizing, clinical trial data sets, things of that nature. If you're doing a sort of a small trial that, you know, you've got a couple gigabyte file, that kind of thing, it can all fit in memory. You can if you can do it with something like Pandas, it's very very quick and very expressive to get stuff done. Matplotlib has become easy and easier to use, and let's see. Seaborn came out a few years ago, and that just just by importing Seaborn, all of your graphs got prettier, which was a really nice effect I found.
So it's generally I find myself relying less and less on that. I for for all of my sins have done lots and lots of MATLAB, because of signal processing of you know, biomedical signals and things of that nature. And I've become less and less, attached to it, but I still, you know, I'm still stuck on there. So I see a fair chunk of room, where I can move on in future. But broadly, what's happened is, I've been homogenizing onto the Python platform, because it's been expanding to fill my needs, which has been, it's been lovely to see. And I've only been using it for 6 years or so, so much shorter time, but in that time, you know, it's become very strong indeed for for what I need it for. I guess that point about homogenizing onto the Python stack is the 1 of the key ones here. So
[00:32:07] Unknown:
rather than having languages that are specialized at certain points, of the the scientific problem spectrum. With Python, we get a language and a toolset that means we can go from taking data kind of from anywhere and then doing the necessary monging on that data to turn it into something useful, modeling it, visualizing it, exporting it, and putting it into production, under monitoring, deployed wherever you need it, and you can do it all in 1 language. And that means then we although we may not have the best language and the best libraries for any 1 particular part of that process, you've only got 1 language to keep in mind. So you've got a a bunch developers who are just dealing with 1 conceptual model, 1 language, 1 syntax, and that really eases development and deployment and debugging and support and all of those things. And I think really that's the the main strength that I see of Python evolving in the data science world. You can kind of do all of it just in the 1 ecosystem.
[00:33:08] Unknown:
And so for somebody who's just getting started in the data analytics space, what advice can you each give about, you know, how to you know, what what are the best things to learn first, how to establish themselves, or, just any general advice as far as, you know, doing data analytics
[00:33:23] Unknown:
itself. Oh, sure. Well, I'm gonna dive in there because because I've given a couple of keynotes talking about this kind of thing. So, it was my honor to give a couple of keynotes, PyCon Ireland, PyCon Sweden, and the Budapest Business Intelligence Conference over the last year and a half. And 1 of the things I talked about was helping people, particularly engineers, move on with developing data science products and getting successful products shipped. And so 1 of the things that I think people get a bit hung up about, and forget is that you can go an awful long way with some really simple data science work. And by really simple, I mean, drawing graphs and doing a bit of filtering on your data. There's so much you could do. That's partly why Tableau is so powerful. You can do an awful lot just by drawing your data and cleaning it up. So you want clean data that you can visualize and explain, and then maybe you can do a bit of statistical modeling or some machine learning on it using tools like Pandas and scikit learn. Then you do some more visualization.
And then when you've got something that's really robust, then maybe you want to get it deployed and shipped out there. And if you want to practice, then something like Kaggle, the machine learning competition site, that's an ideal place because there are lots of competitions from simple to quite complex with an open forum full of solutions and discussions about how people have improved things. So there's quite a wealth of material out there which helps people get started. But the 1 of the biggest things I've seen people get hung up on is the need to do something really, really clever and cutting edge like a deep learning spark distributed solution, when actually you can get away with putting it in a small data frame in RAM and running a logistic regression classifier on it and visualizing it in 2 dimensions. And maybe it turns out that's just as good. And if that's a robust solution, then great. Just do that.
[00:35:15] Unknown:
Yeah. I've definitely, particularly in the big data space, seen a lot of mention of people saying how, you know, people are far people are far too likely to get carried away in the approach to a given problem where they see they see a data problem. They say, oh, I'm gonna throw Hadoop at it when all you really need is Pandas and your laptop. So
[00:35:36] Unknown:
Well, it's worth definitely remembering that, I mean, my laptop has 4 cores and 16 gigabytes. But I can go to Amazon, and for a couple of bucks an hour, I can rent a machine with 50 or 60 cores and hundreds of gigabytes of RAM. And as long as my problem fits into that, and it turns out you fit an awful lot of data into a couple of 100 gigabytes, then I have no deployment problems. I don't need to hire someone to run a Spark deployed environment for me. I can iterate really quickly. And because data is not being thrown around amongst the number of machines, I get a solution back really quickly. So I, as an individual or in a small team, can iterate really quickly on my ideas. And I I'm a huge proponent of small to medium data just keeping it on 1 machine and going as far as you can with that before having to worry about a big data solution and the complexity that that involves?
[00:36:27] Unknown:
Well, I think this is a subproblem of the general problem that we we all face as technologists and that we see shiny toys and we wanna play with them. Right? I mean, it really is that simple, isn't it?
[00:36:39] Unknown:
Yeah. I can actually agree on that point. Well, right. No 1 got fired for suggesting a big data solution from 1 of the big providers, but it doesn't mean that it's you know, pragmatically as an engineer, it doesn't mean that it's the right solution.
[00:36:52] Unknown:
So how can conferences like PyData help strengthen the bonds and synergies between Python software community and the sciences?
[00:37:01] Unknown:
I I think it's it gives them it gives them that niche, to, to work in. I mean speaking of, you know, to the fact that we've such a large proportion of the PI data members are, are are either scientists or ex scientists who are working in industry. It gives them an opportunity to focus on their particular problem domain with Python just being, you know, the tool that we all sort of, group around. And I think out of any of the languages, Python has, I think, the strongest, especially for your casual programmer. Speaking as someone who comes from, life sciences where there are very few people who actually do a lot of programming. A lot of our problems are, you know, heavily computational. They're more, sort of wet work stuff, you know, getting the actual sensor on properly and observing a, you know, very large sort of macro change.
Then, yeah, Pylator helped a great deal in that respect. And before that, I mean, Pearl did a great job. I mean, if you're looking at, in the bioinformatics space, Perl's still very much a king in that area. But I'd argue for, sort of, anyone who's got a casual interest in doing some programming, there isn't much better language to advise than Python. So consequently, yeah, PlayData is is very much that bridge, to to help it go 2 ways. You have people who are career programmers who wanna learn more about problem domains, and you got people with problem domains who wanna learn more about programming.
[00:38:27] Unknown:
Certainly, we see at the conference, a lot of industrial users of Python being, they kind of exist under the term data engineer at the moment. So regular engineers who know how to move data around, and then they're trying to do more interesting things with their data. But if they haven't got a data science background, so they haven't got a PhD or masters in those kind of, numerous subjects, then maybe they're wondering, how do they go and do this? But at the conference, they can meet other industrialists who are doing this, and certainly they can meet academics who are working on these kind of problems. So the conference gives, gives a great ground for cross pollinating different ideas and different solutions.
But rather than just having problems that are presented as being interesting problems that could be solved, instead you've got a driver of, hey, we're failing. We've got this huge dataset. We wanna do something with it, and it's not working. Could someone help us? We've got some money. We've got a strong desire to do something. Can someone come and help us with the problem? I think that's a really interesting driver to bring people together. And certainly via the PI data, we've been reaching out to other groups here in the UK. 1 of my particular pushes this year is into the Royal Statistical Society.
So the RSS is a couple of 100 years old. We have a couple of members of the RSS in our PI Data group, but not many. And I know that the Royal Statistical Society have been trying in the last year to get more involved in the big data community regardless of language, just just large datasets in general, to see if they can find interesting ways to apply some of their knowledge into these newer industrial areas that are growing up. And so I'm reaching out into the RSS and trying to get, members to come along and speak and attend the conference, as another way of just bringing in people with another diverse different set of backgrounds, different set of opinions, to come and join the conversation, see if we can get some idea sharing
[00:40:14] Unknown:
going on. And so there are a number of different subtopics within the blanket categorization of data science. Is it to conferences and meetups to keep members of the audience from being alienated?
[00:40:26] Unknown:
Yeah. No. That that is a really tricky area. So we did a survey, well, I say I say we. Ian sent out the survey, to all of our members to get a, you know, handle on, the kind of things that they they wanna see more of or what they're involved in. And why? Because we're we're data geeks. We want surveys. We want all the answers. So we had I mean, we know we know something like a third of the people there. It's London. Right? There's a lot of people involved in the financial services. So people who are in quant finance or you're doing the data plumbing behind it, all that kind of jazz.
But almost a third of the, responses were sort of other, So it's a really long tail to all these different, you know, domains that they're in. So, yeah, it's it's kinda tricky. I mean, what we I guess what we keep doing is we keep relying on the fact that the meetup board gives us, you know, commentary back as to how they think we're doing, they they rate the, the the meeting to see how they felt it went. And we do our best to, you know, reply to that and, you know, adjust it accordingly. I guess because we are fairly data driven, we can also sort of categorize in the rest of it the the talks that we've done. And that gives us a fairly good idea as to whether we're sort of, you know, hitting all the, the the various, areas. But I guess even if someone isn't in a particular problem area, they can still take something home from it. Like, Yeah. Yeah. Emily, you're making this sound terribly principled. I don't think we're being that principled about it. We just keep changing the tune every month. We just keep trying to get different people in different topics talking. And I think it's that it's that diversity in the talks that matters. Right. But what we don't do, like, for instance, I do a a fair chunk of stuff in in the biomedical area, but I don't deliberately try and get lots of bike and biomedical speakers up on stage. And similarly, you do a lot of machine learning. Right? Mhmm. So it's, you know, we don't see it dominated by any of the 1 particular forces.
So,
[00:42:16] Unknown:
yeah. No. Right. But but part of the reason for that is there are other groups in London. I mean, there's there's 12 data related machine learning type groups right now. There's 1 specialized in deep learning. There's another specialized in visualization. There's 2 or 3 in visualization. There's another 1 on text analytics. And so we've got groups that really go deep into each of those areas, and I think part of the role of PIData is to say that with Python and some other languages, you can do an awful lot of work with data, and here's a wide variety of examples of how data's being used. And, hey, all the speakers are intelligent. The audience is intelligent. Everyone's gonna pick up something new every month, so come along and learn something new. And at worst, you have a beer with some friendly, interesting people, and you go to the pub afterwards. So I think that's that's the that's the main thing that happens.
[00:43:04] Unknown:
So data science is still a young field, and we've yet to see lots of examples of the successful use of data. How are London based companies using data with Python?
[00:43:13] Unknown:
So I've definitely got ideas there. But, Emlen, do you have anything you wanna throw in? So particular companies have been using it? I mean, I guess we have the speakers who've been up sort of recently, give us a good idea of how they've been using it. So you know, List who there's often you know 6 or 8 of those engineers coming along to the meet ups every single time. 1 of our committee members works at List. They use it for doing all sorts of really fancy, image processing analysis to try and match up, you know, if this blouse is on this website, is it the same blouse as the 1 on that website and those sorts of things. So that's fascinating.
They the the chaps at Deliveroo who is basically like a UK version of the number of sort of, fast, I say fast food restaurant food delivery services. They use it to to to plan their fleet so that they can, you know, get optimized delivery for, you know, products which spoil in a very short period of time. Yeah, there are there are so many who are in London and doing this that,
[00:44:13] Unknown:
yeah, could reel off more and more and more. But even So I think it's worth, maybe mentioning, our current host for the meetup, AHL, the hedge fund. They're an interesting story in that several years ago, they had a heterogeneous environment. They had people using R and Python and MATLAB and a whole mix of research tools. And they centralized on Python, and they hosted the London Financial Python user group years ago ago, after they've switched into this pure Python mode. And the reason they've, cited for switching to this Python mode is simply by having the researchers and the engineering team working with the same language, they can quickly ship and iterate and improve upon their trading approaches.
So by losing a little bit of flexibility in all of the possible ways they could be developing new code, they get rid of a lot of, friction around reimplementation of ideas that will then go out to engineering. And so they can quickly get things deployed, test, keep it if it works. If it breaks, then take it out again, and improve upon it. So I really like the fact that they're a traditional quant hedge fund, but they've centralized on just Python as their their main tool. And the other example I'll cite, is channel 4. So this is, I cite these as a bit of an unusual example.
So we wouldn't normally think of a British broadcaster, television broadcaster, being data driven. And so channel 4, it's a public services broadcaster. So it's got a remit from the government to provide, certain requirements. It's not a purely commercially driven outfit, but it is commercially driven, so it's ad driven. And so, I'm working with these guys at the moment. And there's some interesting projects going on there. 1 is using classifiers to better understand the audience to improve ad targeting. It's a bit like the Google model of having, a good understanding of your audience to target the right kind of ads. But as a result, they're able to increase their ad sale premiums by 30 to 50% because they're targeting people and they're able to tell the advertisers more about the the audience that they're going to be advertising against.
They're using unsupervised classification methods to better understand the diversity of the audience and the viewing habits and when they watch, where they watch, what kind of devices they're using, and also personalization. So recommending the right kind of shows. So rather than just having the video on demand being this kind of catch up service that you go to if you miss a program, instead maybe it can turn into more of a destination. So a bit more like a Netflix driven site. So in London, as Emlyn says, we have a lot of finance companies, and I love the fact that media companies and the fashion companies Emily mentioned, are getting more into the use of data to drive all of their decisions.
And 1 thing we encourage with the PyData meetups and the conferences is to have companies stand up and say, hey. We're in this niche over here. You haven't heard of us. You probably didn't think we've got a data problem, but we have got a data problem. Here's how we're solving it, and these are our problems. This is where we'd like some feedback to. And I think that, really helps to knit the ecosystem together in London, and it makes it much friendlier for companies to realize, realize that they can use their data productively, take lessons from other domains, and encourage their management that they can see that in other domains, people are solving these hard problems. So why don't we solve those problems too and use our data more productively?
[00:47:35] Unknown:
So is there a particular Python data science library that you think needs a little extra love and attention or 1 that you think or what 1 that doesn't exist that you think should?
[00:47:47] Unknown:
Oh. Now let's see. I mean, this always comes down to a personal bug there, but I always think, sci fi signal could be it could learn a lot basically from the from the MATLAB like, like function APIs. Just because it does lots it does a lot of the complicated signal processing tasks and it's pretty good at that, but it doesn't make sort of easier, sort of more convenience functions available. So, oh, man, I should just go and do it, right? I mean, I I always say this to people, and, I never actually step up and do it, so, looks like I shall get my pull request on, and get involved. But that would be definitely 1 for me.
Ian, can you think of any that you just
[00:48:29] Unknown:
Right. Something something I've asked about, in in talks and the keynotes, in the last couple of years is I want to see more tools that help us clean data, particularly text data, because text data is often broken, hard to mark up. It comes badly encoded. And then most people who deal with it don't have strong training in natural language processing. And they just want a tool where they can chuck in some unstructured text, and it comes out in some kind of marked up useful clean JSONified dictionary like way. And those tools don't exist. So I would love to see some tools that make it easier to work with unstructured text.
It's a rapidly evolving area, and, Word2Vec from, in GenSim, that's helping a lot. But, yeah, certainly, there's room there. And, in general, I think 1 thing that all the projects could do is, and, I will use as example statsmodels and, pymc3, is better documentation. So these products have documentation, but documentation is always something in the open source world in general that people forget about. We consume the documentation. It's documentation. It's quite hard to write good documentation. People don't really go and put the time into that. And so I said this, if there's anything that anyone out there listening wants to improve, go and put in a pull request, or at least go and file a bug report and say, hey. This documentation's a bit wrong, or it's just lacking.
There's there's 1 line, and it doesn't explain anything. That's unhelpful. But maybe if you said this instead, this would be helpful, and maybe someone could turn this into an improvement, against the documentation. All the projects are crying out for documenters. And if you've never contributed to an open source project ever, then 1 of the easiest things you can do is go and improve the documentation. You get loads of feedback on it, and everyone will love you for it. Because normally, people like writing the code, not documentation. So please go and help improve the documentation.
[00:50:19] Unknown:
So before we move on, is there anything that we didn't cover you that you think we should have? Or anything that you wanna bring up?
[00:50:26] Unknown:
I think I think that covers a great deal of it. I'm trying to think. I mean, it's, yeah. No. I I think that covers a great deal about what Pi data does and sort of the the data science state of play in, in London. And I guess is a, you know, microcosm for the other centers of the world. Yeah.
[00:50:47] Unknown:
Yeah. I'm pretty happy with that. Alright.
[00:50:50] Unknown:
So for anybody who wants to get in touch with either of you or follow what you're up to, what would be the best way for them to do that? Ian, how about you go first?
[00:50:58] Unknown:
So for me, I blog and I've got a Twitter account, and both are my name. So that's ianoz v a l d, and so that's my Twitter account and ianoswald.com is my blog. And if you just Google for my name you'll find my past talks and the like. I'm reasonably well represented in Google now.
[00:51:20] Unknown:
And what about you, Evelyn?
[00:51:21] Unknown:
I think the easiest way to get hold of me is probably on Twitter. My Twitter handle is atemlynclay. 1 of those lucky ones that has a rare enough name that no 1 had my handle.
[00:51:32] Unknown:
Yeah. No 1 has the Ausfords.
[00:51:34] Unknown:
Well, that too. And I I own embling clay dotco.uk, but there's nothing on it. So, no use going there.
[00:51:44] Unknown:
Alright. So we will move on to the picks. For my picks today, my first 1 is a tool called Xscape. And what it does is it's a Linux utility that lets you remap your keys to have different functionality. So you just write out a config file for it, and it will remap your whatever key you want to whatever other key you want. I personally use it for being able to let my caps lock key be an escape key.
[00:52:11] Unknown:
So I use e max and so I I was gonna ask I was gonna ask. That smelled like an e max thing to do. Yeah. Because I I've done that back in the day, and you train your fingers to do it, and then as soon as you use somebody else's computer, everything gets hard. Yes. So,
[00:52:27] Unknown:
Yeah. So in in my I use KDE for my desktop, and so I've remapped the caps lock key to be control when it's used as a modifier. But with Xscape, I can also use it as an escape key when I'm not using it as a modifier. So it does double duty. And so my other pick for today is going to be the key base file system. So they recently put out a brief blog post and announcement about the new tool that they added in. So with the newest version of the keybase utility, it will actually map a network drive onto your computer that is encrypted using your GPG keys.
And so what that does is it lets you put a you know, put files into directories based on the IDs of other users of Keybase. And even if they're not a user of Keybase yet, you can, for instance, put it, you know, put it out a file path that includes their maybe their Twitter username so that when they do sign up for Keybase and associate their Twitter ID with their account, they will then automatically gain access to those files that you put on the Keybase file system. So it's a very, very people check that out and take a look. Keybase is still invite only, but that being said, I've got something like a 100 invites remaining.
So for anybody who wants to get an account and check it out, just get in touch with us by their via Twitter or get, log sign up for our discourse forum and leave a post there. Just send us a message about what you like most about the show, and I will follow-up and get you an invite. So, Chris, what about you?
[00:54:11] Unknown:
Let's see. So my first pick first of all, I just wanna say Keybase dot io is is awesome. Those folks are doing such a great job making crypto, crypto tools accessible to the every man. I think it's fantastic, and I can't wait till they go public so I can trumpet them from the highest mountain top. So, my first pick is a book by Iain M. Banks called The Player of Games. It's the second in the his culture series of novels. And, all I have to say about it is I love these stories. They're amazing. And this is 1 of the only science fiction futures that I want to live in. Like, if I could push a button and beam myself into that future being a member of the culture, I would sign up for it in a heartbeat. I wanna live there and do that. It's it's just in a really interesting, interesting world having to do with the evolution of humankind and the evolution of machines in a very sort of nondark, non, you know, doom, they're taking over kinda way. It's it's great stuff.
[00:55:13] Unknown:
My next My culture universe is fabulous. I just recently reread accession. So, yeah, I'd I'd double up, your recommendation there. Anything by banks and the the culture universe is amazing. It's a really good utopia.
[00:55:26] Unknown:
I I need to read some of this other stuff that's not culture, but I'm I'm enjoying the culture book so much that I'm just starting there. Yeah. Just hoof them up. My next pick is a game called Undertale. This game is not what it appears to be. It deserves a second look. It looks like your standard kind of RPG ish kind of thing with an odd, combat mechanic, but it really is not. It is an exploration in morality, and it is totally worth a look. That's that's an is it annoying laugh that I hear there? No. No. No. An exploration of a galaxy. Yeah. It's through computer game. I love the idea of that. Yeah. Yeah. It's it's definitely worth checking out. You guys should definitely give it a look. And my 3rd and final pick is a movie called The Big Short. I was really impressed with this film. It's beautifully crafted. The acting is great. The writing is is phenomenal.
It it takes it does a really excellent job of explaining some fairly complex concepts in a way that is both incredibly accessible,
[00:56:26] Unknown:
incredibly funny, and it just sort of shocks your brain into into being receptive. It's a really good movie. I I highly highly recommend. Oh, I agree. I mean, I watched that a little while ago. My wife and I devoured all of the things that were coming up for nomination. And, no, I thought The Big Short was 1 of the better ones. I I very much enjoy a good technical film. I mean, I don't work in the financial sector and I I know a little bit about the stock market and how it works, but this was, this felt really quite meaty as to what it was doing. And then of course the thing that gets you towards the end is that nothing's been learned. It's just amazing how, you know, all these things have occurred.
And, yeah, they did an exceptional job at explaining, you know, how the system got itself into such a bender. So, yeah. No. I'd I'd simply echo that. That's a great film to go see.
[00:57:15] Unknown:
So Ian, what picks do you have for us?
[00:57:17] Unknown:
Right. I'm gonna go for 4 and I'll try and keep them brief. So Emlyn mentioned Seaborn, the visualization library. If you're using Python for visualization and you're using Pandas to represent your data, then you have to look at seaborn, s e a b o r n. It can consume a pandas data frame and it will do things like, spit out box plots and strip plots, of your columns of data with neat labeling and sensible color schemes and heat maps and violin plots and kernel density estimates and all sorts of things really really easy like in a 1 or 2 lines. Definitely go for seaborne for visualization. I'm working on a project, to try to understand allergic reactions through machine learning, and it's something I'm covering on my blog. So this is a it's a it's a pick because I'm interested in this, and I'll be talking about this at conferences.
If anyone out there has a background around understanding allergic rhinitis and allergies in general and is interested in the idea of machine learning to help citizen science your way through to solving this, then I would love your feedback, and you'll find that mentioned on my blog. I've got a book which is a slightly contentious pick. This is Rui Miguel Forte's Mastering Predictive Analytics in R. So it's deliberately an R book, not a Python book. I'm looking at it as kind of the a secret hidden manual to Python stats models. So R and statsmodels, so statsmodels actually is a subset of all of these things available in R.
In this book, Miguel is covering a statistician's view upon machine learning using R. And you can just take it sideways and go straight across into statsmodels, which I don't believe has a a book available at the moment and you can use the same techniques in scikit learn. So if you want a statistician's view upon doing predictive analytics, I really recommend mastering predictive analytics in R. And finally, if you're visiting London, I recommend you take a walking tour of London with unreal city audio. This is a small niche group of actors who will take you on an interactive tour of London with props.
Props will include people leaping out from behind buildings to give you a discourse, and they will include things like on the coffee house tour, you will have original 17th century bitter coffee to drink. There's a chocolate house tour where you get get original 17th century hot chocolate. And at the moment, we've just missed it, and we're gonna be going for 1 of the ones coming up, a medieval wine tour, where 1 of the goals is to get you a little bit tipsy whilst teaching you history of the center of London. So unreal city audio is a massive, massive, shout out those guys. If you come to London and you want a really good tour, go to Unreal City Audio. I love them.
[01:00:07] Unknown:
Alright, and what about you, Emlen?
[01:00:10] Unknown:
Right, well I, yeah. I've only got I've only got the 1 pick. This is based on, some, stuff I was playing with today. So I love making slideshows in the IPython notebook. Even if they're not really necessarily much about code, I just like making it inside of that notebook. And I was playing around today with the the template flag so I could customise, what was going on. And because it uses the lovely ginger templating, syntax, you can do phenomenal things. Like if you've ever wanted to get really involved in making sort of slightly different animations, really exciting slides where, you know, the graph animates slightly or, you know, being able to blend bits of like d 3 into a presentation.
I'm I'm up for doing a, a pitch to some private equity people. And, I was doing it so that the whole application that we've recently done is inside the IPython notebook that you can play with. And so I found the template flag to be just fantastic for dealing with bits around that. So yeah. I mean, definitely have a look. If you haven't ever made a slideshow in anything other than PowerPoint, put down the PowerPoint right now and get IPython and use the nbconvert, command line tool to turn it into a reveal, reveal dot j s, slideshow and, and profit. Everyone will think you're the coolest guy, in the world.
Fact.
[01:01:38] Unknown:
Emily, do you have a blog post or something right, so I really should. I really should have a blog.
[01:01:46] Unknown:
I do not. Right. So I really should. I really should have a blog. I do not. So I will do, but I if you do need somebody to look at I was looking at, Damien Alvada's, blog and he's 1 of the core contributors to IPython. He went before I was doing a, a conference talk at Pioneer London a few years ago, I tweeted out, oh, no. This thing is broken. And Damien Arvada got on the case and fixed it for me just before the conference. So he's an absolute hero. He's got a whole bunch of good stuff there. So, yeah. I'll put it in your show notes, for all the lovely, all the lovely listeners.
[01:02:22] Unknown:
Alright. Well, we definitely appreciate the both of you joining us and taking time out of your day to tell us more about your PyData Organization in London and both of your experience in using Python for your own work. So I definitely
[01:02:36] Unknown:
learned a lot and, I appreciate your time. Oh, great. Tobias. Chris, thank you very much for having us. This has been a lot of fun. It has been. Thank you.
Introduction and Podcast Details
Interview with Ian Oswald and Emlyn Clay
Ian Oswald's Background and Introduction to Python
Emlyn Clay's Background and Introduction to Python
Overview of PyData Organization
Growth and Management of PyData London Meetup
Differences Between PyData and PyCon Conferences
Audience and Social Dynamics at PyData Conferences
Encouraging Cross-Discipline Participation
Evolution of Python in Data Science
Advice for Newcomers in Data Analytics
Strengthening Bonds Between Python Community and Sciences
Use of Data in London-Based Companies
Python Data Science Libraries Needing Attention
Final Thoughts and Contact Information
Picks and Recommendations