Visit our site to listen to past episodes, support the show, and sign up for our mailing list.
Summary
Trent Nelson is a software engineer working with Continuum Analytics and a core contributor to CPython. He started experimenting with a way to sidestep the restrictions of the Global Interpreter Lock without discarding its benefits and that has become the PyParallel project. We had the privilege of discussing the details around this innovative experiment with Trent and learning more about the challenges he has experienced, what motivated him to start the project, and what it can offer to the community.
Brief Introduction
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- Subscribe on iTunes, Stitcher, TuneIn or RSS
- Follow us on Twitter or Google+
- Give us feedback! Leave a review on iTunes, Tweet to us, send us an email or leave us a message on Google+
- I would like to thank everyone who has donated to the show. Your contributions help us make the show sustainable. For details on how to support the show you can visit our site at
- I would also like to thank Hired, a job marketplace for developers, for sponsoring this episode of Podcast.__init__. Use the link hired.com/podcastinit to double your signing bonus.
- We are recording today on September 7th, 2015 and your hosts as usual are Tobias Macey and Chris Patti
- Today we are interviewing Trent Nelson about PyParallel
Interview with Trent Nelson
- Introductions
- How did you get introduced to Python?
- For our listeners who may not be aware, can you give us an overview of what Pyparallel is and what makes it different from other Python implementations?
- How did PyParallel come about?
- What are some of the biggest technical hurdles that you have been faced with during your work on PyParallel?
- I understand that PyParallel currently only works on Windows. What was the motivation for that and what would be required for enabling PyParallel to run on a Linux or BSD style operating system?
- How does Pyparallel get around the limitations of the global interpreter lock without removing it?
- Is there any special syntax required to take advantage of the parallelism offered by PyParallel? How does it interact with the threading module in the standard library?
- In the abstract for the Pyparallel paper, you cite a simple rule – “Don’t persist parallel objects” – how easy is this to do with currently available concurrency paradigms and APIs, and would it make sense to add such support?
- For instance, how would one be sure to follow this rule when using Twisted or asyncio?
- Are there any operations that are not supported in parallel threads?
- What drove the decision to fork Python 3.3 as opposed to the 2.X series?
- In the documentation you mention that the long term goal for PyParallel is to merge it back into Python mainline, possibly within 5 years. Has anything changed with that goal or timeline? What milestones do you need to hit before that becomes a realistic possibility?
- Can you compare PyParallel to PyPy-STM and Go with Goroutines in terms of performance and user implementation?
- What are some particular problem areas that you are looking for help with?
- Assuming that it does get merged in as Python 4, how do you think that would affect the features and experiments that went into Python 5?
- To be continued…
Picks
- Tobias
- Chris
- Trent
- Show Stopper by G. Pascal Zachary
Keep In Touch
- GitHub
Hello, and welcome to podcast. Init, the podcast about Python and the people who make it great. You can subscribe to our show on Itunes, Stitcher, or TuneIn Radio, and you can also add our RSS feed to your pod catcher of choice. You can also follow us on Twitter or Google plus with links in the show notes. Please give us feedback. You can leave a review on iTunes, send us a tweet, send us an email, or leave us a message on Google plus. And I'd like to thank everyone who has donated to the show. Your contributions help us make the show sustainable. For details on how to support the show, you can visit our site at pythonpodcast.com.
I would also like to thank Hired, a job marketplace for developers, for sponsoring this episode of podcast.in it. Use the link hired.com/podcastin it to double your signing bonus. We are recording today on September 7, 2015, and your host as usual are Tobias Mason Chris Patti. Today, we're interviewing Trent Nelson about Py Parallel. Trent, could you please introduce yourself?
[00:01:08] Unknown:
Yeah. Hi, guys. My name is Trent Nelson. I'm a software engineer by trade. Right, 33 years old, living in New York, working for Continuum Analytics. So my role at Continuum, is to essentially, I do a lot of on-site consultancy. So various clients into finance and and other industries will engage with Continuum for helping in all kinds of fields. I, you know, I I very much enjoy working in Python, but we also do a lot of work in, c and c plus plus. I've had that background in in Java, and, the last project I did had a lot of PLSQL and Oracle work in it as well. So I just I generally enjoy solving interesting problems, especially on client sites where, you know, there's the it can be tied to a specific business need. And, also, I guess I'm the founder of Pyparallel, which is a,
[00:01:57] Unknown:
How did you get introduced to Python?
[00:01:59] Unknown:
So that's an interesting 1. The, Python's such a fascinating language because it's it's just got this, like, simple, beautiful elegance about it. So I I didn't actually get to I did, computer science at university for a I actually did it for a year and a half. So I overloaded I basically went through the, the, like, the role of subjects that you could take and pick all the interesting ones and then tried to do them all in the 1st sort of 18 months. So I, you know, I did, like, Unix and c and, basically, all the ones that seemed really interesting and and seemed like they would be, the the necessary foundation for computer science education. I ended up so this was around 2, 000. So the the dotcomboomwas, uh, infuleffect, and I was having people sort of left, right, and center around me getting these, like, amazing job offers that, were it it basically, like, that happened enough where I ended up, thinking, well, you know what? I I kinda want an awesome job as well. So I ended up, putting together a resume. And, halfway through my degree, I, applied for just I think I just sent my resume to a recruiter, and ended up, ended up getting the the, a job offer from the first place I interviewed at. So I, it was an interesting company. It was a very engineering oriented company that, did essentially railway control systems.
So it was a really Unix, CC plus plus type environment, digital Unix and and the alpha processor, which I was obsessed with at the time. So that's, that's kind of how I got into the that sort of, you know, Unix and c arena. The a couple of years later, I ended up getting into the more financial arena and then started with I actually specialized in a couple of tools that somewhat show my age, ClearCase and ClearQuest. So they were configuration management. ClearCase is basically like a chunky nineties version of Git, done really slowly.
[00:03:51] Unknown:
It's, Sorry. I was a release engineer, so I have some some experience with that. Oh,
[00:03:56] Unknown:
okay. Such such fond fond memories and ClearQuest as well. I mean, that didn't, that wasn't yeah. So anyway, the I mean, they served a very useful purpose, at the time and that, you know, a lot of companies there are still kind of, integral parts of their infrastructure. So I, I ended up specializing in, a lot of ClearQuest stuff and specifically within that, you know, the sort of the release engineering engineering configuration management teams. That was good because it allowed me to continue to write things like Java and C, but not necessarily have to be constrained by a the particular domain. So, you know, I wasn't doing, like, Java in, like, a a finance environment. I could I could experiment with a lot of languages, and I think that was good because it gave me a, an appreciation for all the different types of, ways there are to solve a particular problem. And especially in that role, the the main thing that we did on a day to day basis was work with other teams within other development teams. So you would have teams, you know, that were Java based at a time. You would have teams that were c based. That'd be, entire projects in PLSQL. There'd be data warehouse and stuff. So it was a really good mix of, you know, all kinds of industries.
So that, language wise, I actually ended up specializing quite a lot in Pearl, of all things. So this is, you know, this is epic sort of early 2000 stuff. I I remember reading an article by Eric s Raymond about Python. So I think this was probably, like, circa 2003 or 2004. And, I thought to this day, I still think the article is really well written and, it gives this very nice introduction to Python as a language and and, you know, shows its appeal, for those that hadn't otherwise heard of it. So I I can't I think I just sort of started learning it in my own time. And then essentially what, a couple of years later, I had an opportunity to do a project where the I it it was we had to do a lot of really peculiar database migrations from, again, a lot of disparate bug tracking systems. So I think there was 10 or 20 different configuration management systems that we had to, interface to and then put them into a new clear case, sorry, ClearQuest system that I was designing.
So the I did a a review with some of these teams, and it became very apparent that a a very dynamic flexible language was was gonna be much easier to do this in, than something more heavyweight. So like a Java JDBC thing at the time. So I ended up, I didn't actually know Python. I just was kind of convinced at the merits of the language, prior to to starting it. So what, that was interesting because that gave me my first sort of, so we actually hired someone in on this particular project to take care of that. I so once his contract ended, I ended up picking up from his stuff and, spending, you know, a bit more time sort of rewriting it and learning it and that sort of that sort of thing. I got a lot more opportunities as the years went on. So 2006, 2007. I was doing a lot of Java and c plus plus on the side, but the I was always looking for projects where I could essentially use Piping, because it was just such a fun language to to program in.
And I I did 1 particular project where it was a it was a database migration. So there was these 2 identical databases that I'd actually designed a couple of years before, and they decided that they actually wanted to have it in a a single global database. So I the problem that it was a vendor's database. So when they both when these these 2 databases started out, they both started out from the same sequence IDs. They both started out with the same essentially, the same scheme of metadata. So to merge them into 1, you were gonna get all sorts of duplicate keys. And and there's just wasn't really any way to merge them cleanly with the the vendor API tools. So I ended up using Python to come up with a way to essentially interrogate the API, that the ClearQuest provided to build up a model of the database. And then, essentially, I used Python to generate raw SQL statements that I'd sort of, like, hand tuned to to efficiently do this migration and, deal with some of the issues with duplicate IDs and that sort of thing. So that was around 2, 000 and 7, 2008, and I I really loved the project, that much. I really loved working with Python that much where I ended up getting involved in the Python community. Like, I and, you know, it was a it was a lucrative contract, and I wanted to kind of give back to the community. So I, ended up at basically attacking, like, offering my services for the build bots, especially the Windows build bots and attempting to help, get Python ported to the 64 bit versions of Windows that were coming out. So I think that's that was around 2, 008, I ended up going to the Python conference, and I, it was it was fantastic. It was my my first Python ever. It was in Chicago.
I ended up meeting, you know, Will and Martin and, Mark. I remember Mark Hammond as well. I don't think he's made appearance since then. But, yeah, all these people that I'd, you know, seen and interacted with on the mailing lists and, yeah, was able to then, you know, sit and sprint on the the thing I like most was the the sprints after the Python, where you could actually sit down with the core developers and just start, you know, packing on stuff and just trying to fix things. So I ended up getting, committed privileges, during that Python. That that was so so that was great because I you know, that's a great way to continue my involvement in the project. The I I was really for whatever reason, I was really interested in the build bots.
It's especially, I see a lot I see the same pattern in a lot of new committers now. So the build bot infrastructure that Python has basically for any time that we make a commit to Python, there's these, like, donated machines that will be notified and then essentially build from from scratch and, provide the committer for a way to say, okay. Look why, you know, this program worked on my machine, but I broke it on Windows. So, I mean, it's it's not great, but it was definitely better than nothing. It was really the the best thing that we had for, you know, ensuring a a reasonable level of sanity at the at the release level and also just on a an individual commit level.
The I ended up so there were issues. Like, once if if it if it was failing, so I'm there was someone that, had this alpha system, running through 64 units at the time, which I, you know, I had such a fascination with that stuff. And I spent a lot of time doing that stuff back in the past. And I, there was and though I think someone had an Itanium system, and it kind of got the gears turning in my brain. I'm like, wouldn't it be great if, we had access to these systems and, you know, we could actually, rather than just getting, like, a compiler, it would actually, like, log in and, and, you know, have access and have your own compiled environment and and try to fix it directly. So that, that led to to snake bite, which, ended up consuming it it wasn't a particularly well planned out project from my perspective. And I think the if I looking back on it, I mean, it's it's still it's still exists, this sort of thing. I just, it it's because it was so infrastructure based, I I basically ended up building a server room at, Michigan State University.
So I filled 3 42 u racks with an array of, different sorts of hardware. So, basically, I wanted to have coverage of every sort of mainstream Unix at at the very least that there was. So I there was an SGI Eric's box. There was a, an alpha server, back from the day, with Trusix 4 Unix on it. There was a, huge Itanium system, like a quad Itanium system that HP donated, and that was running HP UX. I bought a whole bunch of PowerPC machines that were running AIX, Solaris machines, Sparks, and and, AMD 64 soft. It it was it was a lot of units. It was, you know, I was kind of partly reliving a lot of the sort of the childhood fascination that I had with with units as an operating system.
The the the problem was that that, you know, that's an infrastructure project. I didn't have any governance set up to, you know, help continue providing funding for that. And it also became very apparent that I much prefer, building things than I do doing project governance. So the for every opportunity where, you know, I I I you know, to set up a a community and to ensure that, you know, people can continue to maintain it and stuff, That's that always took second place to to bring a new system or, you know, that sort of thing. So that's Snakebite's a work in progress. Funnily enough, the actual the FreeBSD server and this is a perfect example of not having any governance. The, the FreeBSD server that I've got is currently down. I think it's just not responding to anything, so I can't actually get into Snakebite nor is the website up. So that's it's something that I need to to give a bit of TLC to. But Snakebite went on so the 1 thing that it was able to do towards the end was actually have all of these platforms where, you know, we could have a a a build bot running on, you know, AIX and 3 64 and Ericsson. You know, all of these platforms aren't particularly important anymore. But there was you know, I built some infrastructure where we synced to the the Python committers, SSH key so that once they were granted commit access to hgpython.org, they're able to SSH into this, snakebite system that I set up and essentially have access to any 1 of these machines and, you know, all the the sort of logins we've synced and stuff. It was really fun because it was the type of thing that so this was obviously around the time that, you know, Amazon was spinning up their stuff, but that's, you know, that's all sort of Linux x86 stuff. None of the problems that Python was having, was on Linux because that's the you know, tends to be the the major development platform, used by all developers.
So the essentially, Snowpipe was a it was a very fun experiment. It was around, 2012, though, that I came into essentially came up with the idea, for for Py Parallel. So which is obviously the topic for this podcast. So I will that was my that was my long winded explanation of how I got into Python and sets the scene for some of the stuff that I'm sure we're gonna talk about later. Back when UNIX meant workstations and and hardware hadn't been commoditized. Right? Yes. Yeah. Yeah. I I I'm just so fascinated by it all. Like, I it really was. I I can kind of understand the appreciation that, people have now for, you know, once that you get into your fifties and sixties and, like, restoring old muscle cars, and it's what they're actually restoring is the car that they fantasized over when they were a teenager. And and those formative years where you you do have the amount of passion that you have some for something isn't particularly connected to any form of, like, rational thought. So it's but that's something that stays with you. Now that's that you you build upon that as as the years goes on. So, yeah, I guess my my muscle car was my, my interest in Unix operating system.
[00:14:33] Unknown:
So for our listeners who may not be aware, can you give us an overview of what Pyparallel is and what makes it different from other Python implementations?
[00:14:41] Unknown:
Right. So Python's biggest issue to date, or or potentially maybe a better way of phrasing this. Python's, most complained about problem is the the GIL, the global interpreter lock that prevents Python from optimally using multiple cores on, you know, the contemporary hardware. So, essentially, Py Parallel is a project that I came up with that took a step back and looked and basically said, you know, what can we do to how can we change Python so that it can exploit contemporary hardware? And it's not just multiple cores, but it's also, you know, the the type of new, memory architectures that are coming out and the the memory latencies. How do you also exploit, the new network? Like, so 10 gigabit Ethernet and 40 gigabit Ethernet and a 100 gigabit Ethernet and Thunderbolt and USB 3 and those sorts of things. The the device speeds that they're coming up with, you know, they're they're increasing just as, the, you know, the call count and and frequency counts are. So I I really wanted to take a step back and look at, okay, look, let's let's look at how, you know, Python has these constraints and what can we do to to address them. So it was a essentially, the project is a way of still staying within the confines of the Python ecosystem, but also being able to leverage, you know, multiple cores, these fast Ethernet cards, and and that sort of thing.
[00:16:06] Unknown:
And how did PyParallel come about? What was the inspiration that got you thinking along those lines?
[00:16:12] Unknown:
As much as I'd like to say, yeah. PyParallel is just like thing that I sat down and planned really well, and it was, you know, there was a it's always nice in hindsight to be like, yeah. It's designed for the fast multiple cause and that sort of thing. How it actually came about was me working at snakebite going, man, I need to get a job. I've been doing this for, like, 3 years. Like, I'm running out of my, like, savings. I what am I doing in my life? Life? So I'm like, you know what? I, and so that was at Michigan state at the time. I, I was like, well, I had, I had some friends in Chicago and I also, had some friends in New York And I was kinda contemplating, like, which do I wanna move to Chicago? Do I wanna move to New York?
And I came up with the idea that I wanted to move to New York, and I kinda wanted to get into things like high frequency trading, where, you know, you do it it's essentially the the sort of the equivalent of, like, gaming world where you're getting the the ultimate performance out of underlying, like, graphics cards and and that sort of thing. High HFT is a very raw, low level way of, you know, the fastest algorithm and the fastest code wins. So if you can shave off, microseconds or nano seconds, it's, it actually provides immediate, financial gain. I mean, it, you know, it provides market liquidity. I won't get into the comments on, you know, the the value of it as a, you know, a general participant in society, but it's a it's a very interesting technical problem.
So so I've made up my mind that I wanted to relocate to New York and get into high frequency trading. The only problem with that is I had no background in that. So I had a couple of months left, especially with my my research base, and I thought, what's what's like a a project that I can put up my portfolio where, I sort of envision this situation where I'd be sitting down with a recruiter, and they'd be like, well, you know, why why would we put you in HFT? Like, you don't have any experience in it. So I what I was envisioning was having a project where I could say, well, look, I didn't know anything about this this problem, and this is what I got done in, you know, 3 or 4 months. So at that time, there was an email that was sent to the Python ideas, mailing list that was called async core batteries not included. And that started off it was very innocuous email. It started off this huge email thread, that ended up being the catalyst for both Pyparallel as we know it today, and also the what turned out to be asyncio.
So the the thread basically dealt with, okay, like, how are we going to, address the need to deal with this, you know, this new sort of, like, concurrent style asynchronous IO, which is essentially in this context, asynchronous IO basically meant sort of, you know, non blocking, network socket IO. So how you know, what what sort of mechanisms could we have in place to, you know, have something like Twisted, which has been really good at setting the standard for doing that sort of completion oriented, protocol oriented, asynchronous IO single threaded event loop. That sort of paradigm was you know, that's really I think Twisted did that really well. So the idea behind this whole thread was to, you know, what could the standard library potentially do? For whatever reason, I decided to pick that as my pet project, and I so I've done a lot of work with Windows recently. You know, I had been reviewing the new, APIs that they introduced on, Windows Vista with the thread pools and their you know, how it linked in into all of the, IO infrastructure, essentially.
So I actually had this idea that, hey. Look. The the asynchronous IO and the the solving of the gill, I we could probably kill 2 birds with 1 stone in there. It was like it was ridiculously ambid. Like, that's that's a crazy thing to try and solve it at 1 time. So I I mean, there were a lot of discussions on the the bathing list, and I, if you go back through them, like, I was I was I I pitched a couple of ideas. I didn't I kinda kept the crazy under, under a certain level, so I didn't actually pitch the idea of solving the guilt problem as well, but I did pitch some ideas that were essentially geared towards okay. Look. How can we leverage each operating system's facilities in the most optimal manner? So at that time, I felt that Windows actually had a a better base to to build some of these asyncio primitives of.
So so it was interesting. So I I basically decided that that was gonna be the project. We know it. It was funny. I didn't really Rio emailed me, directly. We had a we had a chat sort of offline. And as he think he emailed me the weekend that I've decided to sit down and write an async pack, which was, I guess, my version of, you know, everything that I had in mind. So it's actually out there. I I can put the link in the, the footnotes, but I, came up with this, this crazy pep that kind of merged the notion of asynchronous IO and parallel computation. Hadn't really done much CPython stuff recently. So I wasn't I was I was making a lot of claims that, it would be very hard to back up without any proof.
So and Guido's reaction was like this. So I sent it. He sent me an email asking about, you know, did I have any thoughts about on the the IACP stuff? So I the timing was perfect. So I sent him that that pepper's response. Bless me. He was like, oh my. He's like, this woah. He's like, I I don't know if you've, like, stopped taking medication or you've, like, started taking new medication, but like, this this is just like it's crazy. Like, this is not achievable within a reasonable time frame. And he was completely right there. It was it was a really wacky, set of ideas.
So the the other the other side effect of that is that there's a really there's nothing like having Guido say that he doesn't think that you can do it as a as a motivator to try and figure out a way to do it. I think if he'd been like, yeah. This sounds great. I, you probably can do that. I probably would have been like, yeah. I probably can, and then and nothing could have ever happened. So, it's definitely the the the provide someone with the challenge and and set the scene for, you know, to to get that sort of a project. So I, it was usually, like, once I once I had the motivation and the I decided to figure out a way to, like, okay. How can I get Python to optimize all my cores, and and how can I get it to, essentially solve what this build problem was?
So I essentially, yeah, just just I think I spent about a week tracing through the, the core Python code, just trying to figure out how I would go about it. So I had some wacky ideas about, like, inserting opcodes and, coming up with, like, new language semantics, and I none of them appeared to be particularly I mean, it was a I had to I kind of had to set the scene to to get some ideas. So
[00:22:54] Unknown:
The next question was what are some of the biggest technical hurdles you've been faced with? And that's it seems like you're sort of leading into that, so just carry on.
[00:23:02] Unknown:
So the approach that I used to address the problem was basically to start from literally, I just, like, fired up Visual Studio, set a breakpoint at, the the first point that I could find within Python, and then attach the debugger and and literally walk through, like, just, you know, how to run simple parts of code, how to just to figure out, okay, what were these you know, how it was currently working and and just to try and set the scene for, giving me some ideas for how could like, at what point could I interject, this this sort of new functionality that I had in mind. So I I kind of had the the the way that I had in mind to leverage the, the Windows, thread pool facilities. That was that was quite clear. The what wasn't clear was the mechanism of, so basically, what they these Windows facilities, they basically provide a way to provide a c function and some, you know, some per user callback context.
And you can say so for example, try submit thread pool work. You can call that with a c function and, some context, and it will actually manage this thread pool behind the scenes, and it figures out how many threads it needs to create. It can tell if you don't if it hasn't posted enough threads, it'll create some more. It'll kill threads when it's it you know, they're they're not needed anymore. So it manages all of these complexities with literally just getting into the context where, okay, I'm now within still within the Python process, but I'm a separate thread.
That so I basically started with that and and used that, facility to okay. You know, what, like, what happened when that where did it crash? So the that really set the scene for figuring out, the approach that I came up with with, essentially being able to quickly detect whether or not we're in a 1 of these parallel contexts or parallel threads. So I came up with a at the same time, I'd kind of been reading Intel, assembly language manuals, and I was kind of hoping that there'd be this, like, magic instruction that I could, come across that would do exactly what I wanted. And, it turns out there was a there actually was. There was a, the way that it works is the at any time, I can tell what the, the thread ID of my current thread is, and I can compare it to my main thread ID. But I can actually make that a really, really simple, like, comparison between 2 32 bit integers, by using the the CPU section, registers. So there's a there's an instruction called read, FS base that essentially, I use that and with with a particular offset to get the thread ID.
So the actual mechanisms for Pyparallel were were just to find out, basically, to to intercept all of the thread sensitive parts of the interpreter and put in this test to say, okay. You know, if if we're a parallel thread, let's divert to a thread safe alternative. So that really was the the key breakthrough that came sort of a couple days. So I got a prototype working in, I think the first commit that I made was around December 21st, and there was a 4 day period where I made, a a couple of commits. All of these were on, my HQ. Well, I've actually back bought them, but they initially was all done against a sandbox repository on the the hdpython.org HR Mercurial repository.
But the yeah. That like, once I'd, gotten that working, it was a very like, I I think I had an implementation of maps. So I there were, like, 4 attempts, and on, like, the 4th attempt on the 4th day, I essentially got it working well enough where it could basically, like, proof of concept. So, like, okay, there there is some merit to this idea of intersecting these threat sensitive things, and it didn't appear to incur my job ad. So that's yeah. That was I 3 months passed, and I ended up, of it, yep, working on it as much as I could. There was I probably logged there was a good there was a good couple of weeks there, or it was a good sort of, like, 12 hours a day type of thing, just trying to figure out how to get everything, get everything in place. I was essentially working towards presenting this at the PyCon, 2013, conference in Santa Clara in California, Silicon Valley, Santa Santa Cruz. 1 of the Santas.
And the so so, yeah, I did. I I presented, like, a little sort of 25 page, presentation to the Python committers. So that at every PyCon, there's a a day at the start, typically on a Wednesday where all the Python committers that can attend PyCon, get into a room and and just discuss, like, language things. And that that's great. So it's the only time each year where that really happens. So that was you know, it was a good scene to present the work. So, again, my my intent for all of this stuff was still the, the idea of moving to New York. So Guido, it was really fun. I get I get this email from Guido, and it's doesn't have a body in it, but it's the subject is you guys should talk, and it's it's addressed to this guy called Peter Wang. I had no idea who that was, and myself. And there's there's literally nothing else. There's, like, nobody in the email. There's, like, it's just, like, the subject line that let you guys talk. So I replied to him, like, oh, 0, hey. Hey, Guido. I'm, like, Peter, have you got anything I do? Any idea what what, what the context is here? And it was about the the parallel work work. So Peter was chatting to Guido earlier about, some of the parallel, ideas that Continuum were exploring.
So, yeah, I ended up presenting the the the same presentation to Peter and, a couple of Continuum people and then ended up I think Peter and I were just chatting for, like, 4 hours about, just everything and any of these. For anyone that's ever met Theo Wang, he's, there's no shortage of things to talk about with him. He's, he's a very entertaining guy. So, yeah, it was interesting. I ended up meeting Travis as well. So Travis, Olofund, who is the creator of NumPy and the CEO of Continuum. So it was really strange. So we had this, you know, great chat, and I was chatting about, you know, and their their kind of work. As we were leaving, I I kinda made this sort of cursory comment that, you know, I was, you know, I was available to some freelance consultancy.
So they, they're like, oh, that's that's interesting. Like, how do you how do you feel about New York? I was like, oh, that's it's actually like where I'm planning on moving. That's why I did all this stuff. And he's like, oh, how do you feel about, you know, like the finance rate? I'm like, that's, that's exactly what I'm wanting to get into. So it was really weird that the the 2 the plan to, like, let's do Pyparallel, not because I wanted to, like, solve the the Python parallel problem, but because I wanted to, like, relocate to New York. I I just I don't know. Like, thinking back on it, it's always a really weird way, and just kind of quirky that it just sort of happened like that. So I ended up, yeah, we ended up making that happen. I ended up, getting hired by continuing home. That was a that was a long process just because I had to switch visas and get back to Australia, and there's just a lot of time involved when you're, like, relocating and and that sort of thing. So I, was able to I think I started with Continuum around August 1, 2013, and I I I went straight onto a client side, and did this was what it was really a huge project. It was probably for about 18 months working on, all kinds of it. It was a it was a the first project where I interacted with, data scientists. So it was a a sort of a financially oriented project that had a lot of, a a pretty large data science component to it. So it was 1 of the key things that we're doing, not to get too much of a project about, but it was to do essentially like entity and location disambiguation and resolution. So given some unstructured text within a a, you know, a a raw text field to be able to discern, okay, what is this text field referral? So, you know, it might be a a address or a customs anymore or something like that. So, you know, who who is this talking about and where is it talking about? And the idea is taking this unstructured text and figuring out a way to go from, the you know, I was referring to to potentially, like, the the record of the custom that's being referred to or, the the lat long of the location that's being referred to. So that was interesting because it, opened my eyes to NumPy of 1 thing. So I hadn't done really any sort of, like, scientific oriented computing or any, math oriented stuff or or even really used NumPy. I didn't really know what NumPy was and what it was used for. So it was a great, I mean, experience to see through the eyes of the data scientists. And that's, you know, gonna be that's actually 1 of the areas where Python is flourishing now just because the simplicity of the language makes it so attractive for getting out of the way and actually letting you focus on your problem at hand. So for data scientists, their problem isn't, you know, writing software that for the sake of writing software, they're they're essentially trying to solve problems that, you know, are important to the business or important to, you know, whatever they happen to be working on. So that was that was great, because it also gave me a, an opportunity to there was a quite a large Oracle component on the, on the project, and I ended up writing a lot of PLSQL, and, we had this these really beefy servers that, was sort of like a 128 core and 512 gig of RAM and, like, really ridiculous disk arrays attached to them and and that sort of thing. So it's a lot of the time that I spent, essentially, like, exploiting the underlying hardware, like figuring out ways to get Oracle to exploit the underlying hardware.
The that that was great because it gave me so our Oracle's got some really, really robust and mature, parallel options for, essentially, you know, taking this is my, you know, terabyte dataset. How can you chunk it and and essentially do parallel computation against it, which is, you know, obviously within the same sort of similar domain as as parallel, just in terms of the the problems that they're trying to solve. So that was great because it's I I I didn't do much work on PyParallel mainly because I was, you know, I was engaged on on on this side.
But it was great to it was a very good breeding ground for ideas and and just to see some more exposure about how Pyparallel could potentially succeed and how it could be used in in these environments. So I, the I did a couple of presentations, so that was the, there's, like, a 153 page, presentation that I did at a, conference in. So it was a PI Data Conference. Like, Continuum holds a, a PI Data Conference in New York, Everett, around November ish. So I I had this had this Sunday slot that, I started preparing a deck, on the Friday. And I had this deck already prepared that I'd given, at Bank of America a couple of weeks earlier, that was 80 slides. And I was like, alright. So I I started with the aim to cull the deck down to about 50 slides.
But so I think by the end of Friday, I'd had, like, a 102. And by the end of Saturday, I'd gone up to, like, a 100 and 30. And then by Sunday, it was, yeah, it was, like, 153. So I the takeaway from that is that I'm very bad at calling decks. The the what what was issued was that that's that was sort of the first real public, sort of projection of of PyParallel. And the the first sort of communication about how it all sort of tied together, the asynchronous IO cards and that sort of thing. Since since then, I, you know, I I was still working on client stuff. The the next sprint that occurred on it was around PI Data Gotham, which was around the 2014 in the nope. What are we in now? 2014?
Nope. 2015. Yeah. So it was summer last year where I put another sprint into, 1 thing I found was that Pyparallel was, I was I had some of these, like, parallel primitives in place, but it wasn't doing very well for, like, a a web server when you loaded it in, under, like, stress testing situations. So I I spent some time, you know, addressing that and got it to a point where it actually could, you know, run for the duration of the benchmark and actually do really well. And then, you know, subsequently crash. But it was, hey. It was progress. So that that was, yeah, summer last year, and then things I I really got back into it, towards the end of this year. Sorry. The end of end of last year. So over the Christmas break, I, spent about 2 weeks just, yeah, in the depths of of Pyparallel trying to bring all of these sort of, like, parallel socket IO primitives back into into play.
The I ended up so my, the contract that I was working on finished in February, and I much thanks to continuing. I was able to to work on high parallel, pretty much until until now. So I'm going to find clients soon, but I, essentially, yeah, I was able to to to get make a lot of progress on the performance, and, you know, I was able to set up a website and that sort of thing. So that's, that's essentially the and, again, the long winded story about how High Parallel came about.
[00:35:56] Unknown:
So what are some of the biggest technical hurdles that you've been faced with during your work on Pyparallel?
[00:36:02] Unknown:
Yeah. So the the biggest 1 was definitely figuring out how to, address that issue with okay. You know, once we're in these parallel context, how do we address dealing with, parallel objects and memory allocation and that sort of thing? So it turns out that that was actually quite easy to address once we had this mechanism in place to detect if we were a parallel thread. So the the this particular macro that I came up with, it could easily divert to to other things. The identifying whether or not something was a a parallel object was the next hurdle, which we, you know, provided a way to come up with. The the next big thing was probably the tying together all of these things that I I had in play with the the TCP IP, like client server socket paradigm. So that definitely consumed the majority of of, my attention this year. And it's still, you know, it's still a work in progress. But from a, I mean, most of these technical ones, they're, you know, a problem until you keep looking at them and eventually come up with a way to solve them. So the, there's there's always gonna be some some challenges in the way, but the the key part of figuring out that we don't actually need to do, like, reference counting and garbage collection in these parallel context, and we can just lie rely on the object semantics to control the lifetime of objects.
That was that was kind of a key breakthrough that, really set the scene for it being able to, you know, gain some traction and and be a useful thing and and and for something that you could actually write useful programs in.
[00:37:33] Unknown:
And so you mentioned earlier how you were digging through the assembly code and the the CPU instructions for the Intel x86 6 architecture. I'm wondering if that means that this particular implementation of Py Parallel is currently tied to both Windows and a particular CPU architecture.
[00:37:52] Unknown:
So it is. It's definitely still tied to, the AMD, 64 architecture. So the there is a possibility to use so, basically, the thing that I exploit, is the ability to quickly get access to the current thread ID in as little time as possible. So there is a a much more portable way to do that. But the I just, you know, exploited the mechanism for being able to get this per thread unique identifier, from the, these these segment registers. So ARM has a similar instruction that achieves essentially the same thing. So that so these the operating system will actually need to have a way to do this to efficiently be able to, you know, when that when it services an interrupt, to be able to switch into a switch into to to figure out, you know, when it's processing this thing that happened. Like, where does that user's thread or that process block contain information? So it's it's it tends to be something that comes with a, that, you know, these modern CPUs. So the the the architectural thing, it's it's a, it it would be possible to port it to Arm in the future.
[00:39:04] Unknown:
And what would be required for enabling Pyparallel to run on Linux or BSD style operating systems?
[00:39:12] Unknown:
Yeah. So that's, that's an interesting 1. The the work that I did basically falls into 2 parts. 1 was the changes to the interpreter to allow running of essentially simultaneous threads without, you know, causing mayhem and and without and also without the need to continually lock objects, unlock objects, and that sort of thing. So those changes are essentially platform agnostic. So I did, you know, do all this with the eventual aim to get it back into the Python mainline. The the the work that's going to be required on Linux and BSD, so that basically boils down to, there's a there's a couple of facilities that Windows provides that's that's really useful. So, the the way that they provide memory management, so their their ability to allocate individual heaps. And then rather than calling malloc, what you would actually do is is once you've got this heap created, you actually allocate memory through that particular heap. So a lot of it is just gonna be providing similar sort of support or or, you know, functionality on Linux or or BSD.
There's a whole bunch of I mean, 1 of the key reasons I chose Windows was because it really does provide such good scaffolding for writing these high performance, parallel oriented I mean, Windows has had threads from from day 1. Whereas on Unix, it's there there was always this notion of the process and the the process abstraction and threads were really any kind of, how do I say, bolted on as an afterthought, but they they weren't really a a key part of the design. So the and, you know, you could see that on a lot of vendors where they they struggled to get good threading implementations over the sort of the nineties and the 2000. I think I think Linux only got, threading support in, I can't remember the version, but it was, you know, it was it was work that was, like, funded for by by IBM because it wasn't, it's not an easy problem to solve on the the the UNIX paradigm way of doing, IO and and process.
[00:41:12] Unknown:
How does PyParell get around the limitations of the global interpreter lock without removing it? So that's a great question. So I
[00:41:20] Unknown:
said that phrasing is definitely 1 that I I've used, and I I keep reiterating it. The a lot of the, complaints that that people make about Python and the GIL, they're they're all centered around, okay, like, how do we remove the GIL? Removing the GIL, so the GIL actually is is great. It's sold it serves a very useful purpose. It ensures that at any 1 time, the interpreter only has 1 thread of execution running within it. The way that I address the problem was, well, that's what problem we're actually trying solve, and that is the problem of I wanna optimally use my underlying hardware. So the the Pyparallel solution is to leave the leave the GIL alone. It it has very well known semantics. It's it's relied upon. There's you know, you've got all this c API compatibility that you're going to need to maintain, and the the the ability to be able to pause all parallel threads and single step through something and ensure that, you know, nothing's releasing and acquiring the GIL is is quite useful.
So the the Pyparallel approach is to simply essentially sidestep it, And it it it's it's a great sort of turnout that that actually worked. Like, I could basically augment all of the threads that sniff parts of the interpreter, including GIL acquisition and GIL release and, essentially do that. So that was a a very nice surprise that, that came out of all of this work.
[00:42:44] Unknown:
And is there any special syntax required to take advantage of the parallelism offered by Pyparallel? And how does that interact with the threading module in the standard library?
[00:42:53] Unknown:
Right. So my 1 of the key design decisions that I, made up front was that I wasn't going to support free threading. So I being able to take an existing program that was written, using the threading dot thread, you know, classes or subclasses, I had no intention of suddenly being able to, like, magically make that run on all of your cores. There's a couple of reasons that the the key 1 is that threads in Python have never facilitated, you know, parallel computation. They've always been a means to essentially achieve, concurrency. So you would if you need to do a blocking call, so, like, you know, connect to a database or something like that, you could actually palm it off to a separate thread.
And then yeah. I mean, it it would they they weren't a great way to achieve a lot of things, especially on a high performance way, but, you know, it got the job done, and it's a it's a very useful purpose. There's no code out there that was written that would suddenly be able to leverage the fact that it was now running in parallel. Like, that's that's a a a really a new paradigm. So I I focused on, okay, what's I set the basically, the path to say, look. Okay. In order to use these new parallel facilities, you're going to have to use the module that we provide. And I essentially centered around this notion of, you know, you you would define these completion oriented protocols. So and this is where there's a lot of overlap, and, you know, I I commonality between pyparallel and things like asyncio and and Twist. So Twist, in particular, was 1 of the things I I most liked about it was its notion of completion oriented protocol. So you would define these classes that were, essentially you know, you define methods like connection made and data received and connection dropped and that sort of thing. And you would implement, you know, how you wanted your program to behave when the protocol acts the way that it does.
And then Twisted was responsible for calling you when when that happened. So Pyparallel leverages the exact same concept. So it's a very much, you know, don't call us, we'll call you type paradigm. And that's, yeah, that's that's that's really the key piece. Once you've once you you're using these new parallel, modules, you can essentially have Python call back into your code, in these parallel threads, and you don't have to create any thread pools. You don't have to do any of the stuff that's always made threaded programming hard, which is and, again, another key thing that I wanted to address because it's the I I think a lot of people get threads are really just a vessel for achieving that you know, for allowing the a process to execute something before it used to do wait on some fire wire or wait on some network IO. So threads are really just the mechanisms, and I wanted to draw focus away from threads themselves, and actually focus on, okay, structure your program so that the work can live independently to the worker. So the worker being the thread and the work being processing of incoming requests. There's there's no real reason why that can't be done in parallel, or just assume that everything's gonna be in parallel until we need to serialize it. So it's an inversion of the the way that we've approached programming today, where it's, you know, everything is sequential, and then you attempt to try and farm out in in parallel means, where possible.
[00:46:07] Unknown:
And so how does that particular paradigm or pattern how is it related to the idea of functional programming of having immutable objects and immutable processes
[00:46:19] Unknown:
without side effects? Yeah. Very similar. I mean, there's there's a lot of overlap with a lot of the concepts that you see in both functional programming, both the, you know, languages like Rust and Go and, it all really comes down to object ownership and object lifetime. When can you safely deallocate an object, and how do you prevent races to to set objects? So the the functional paradigm, it what become very apparent is just to to use the HTTP processing request. So when a when a web server gets in a request, the of all of the objects that are allocated to process that request, none of them are actually required. And, you know, there's another for Trello, but none of them are actually required once that request is being sent back to the or once that response has been sent back to the user. So PyTorch leverages that, and and essentially kind of, I wouldn't say, forces you to write in a functional way, but it's it allows you to structure your program, with with similar sort of constructs.
[00:47:19] Unknown:
So in the abstract for the Pyparallel paper, you said a simple rule, don't persist parallel objects. How easy is this to do with currently available concurrency paradigms and APIs, and would it make sense to add such support? So that's a really good question. The that
[00:47:35] Unknown:
that rule was motivated by a couple of bits of advice that Rita gave me. So so, essentially, he's always asking out, like, what what are my constraints? What are my restrictions? So it became very apparent that the the ease to which you can convey the new mental model that's required for this to to take off and to to not be, like, an absolute disaster for for most people to adopt. The simpler I could get that, the the better. So in in essentially, like, sitting down and and writing that documentation, that I I did recently, the the thing that came out about that I could tie this down to this single rule of, like, just don't persist parallel objects. And it turns out that when writing programs that leverage these parallel facilities, that's actually not that hard to do. Like, it's actually pretty I've written 2, you know, semi real life examples of an instantaneous, like, Wikipedia search server that, essentially, basically has the entire Python process serving all of Wikipedia, in parallel, and also a, implementation of 1 of these, the tech and power frameworks benchmark suite.
So none of none of the issues that I had getting those working had to do with me accidentally persisting parallel objects. A lot of it had to do with finding places where code that I was calling was were persisting parallel objects. And, really, it boiled down to, okay, you know, if if you've got some library that is just say, for example, like a regular expression, the first time it comes across a particular pattern, it'll compile it, and then it saves that in a dict so that it can be quickly, called back. So that concept is what I'm referring to when you're saying you can't persist parallel objects. And it it sort of ties down to the way that the memory is currently allocated.
But it's it's really more I mean, the rule will outlive the the current reasons for the rule, which are more implementation based. But I think, you know, as soon as you get into anything other than a rule like that, you start having to deal with, you know, the who owns what, what's the object ownership, how do you do safety allocation, and lots of things. So the the fact that I've centered on this rule, I think, is it's, well, I mean, time will tell. It's it either you will be able to write useful programs of Pyparallel or you won't. So I'm hoping that, that that that plays out. I mean, that's the yeah. That's something that we'll that we'll be seeing.
[00:49:54] Unknown:
So for instance, how would 1 be sure to follow this rule when using Twisted or Async IO?
[00:49:59] Unknown:
So that's that's so the the other thing that clicked recently, in the past couple of weeks is that Pyparallel and Async IO are actually complementary. I mean, asyncio and and concurrent features and all sorts of things. So the I haven't actually changed anything in Python that prevents that single threaded, event model. So the way that I would actually envision the 2 working together, would be so async IO could essentially be used to drive high parallels. So, you know, for example, you if you wanna process a directory and all of the contents within that, you could actually wire up async IO where you've got this main thread. You've got this ability, you know, you take local state and that sort of thing. What that asyncio thing does is basically break a request down into, well, I've got 8 cores. I can actually dispatch this in in in, you know, 8 parts, and and that's picked up by high parallel. It's all within the same process. It's just a quick, you know, copy from 1 buffer to another, basically.
So I actually see the 2 living together in a sort of symbiotic relationship. They the original intent was to, again, have Pi Parallel basically be asyncio, but I think that was unrealistic. The now that I, you know, have spent some time working with Boris, it's clear that they could actually really when paired together, you could get some really cool new paradigms. And, you know, look at things like, games. I mean, there's the the single threaded paradigm of, you know, 1 main event loop, that's that's quite powerful, and it's something that's existed for, you know, as many years as we have been programming, and it's something that probably will continue to exist. There's a typically, what you would do is start off with that. It's much more easier to reason about your program's logic in that state, and then there's gonna be opportunities to, okay. This particular piece of work is, you know, needs to be done against a chunk of data. That data can be chunked into, you know, 8 different pieces. Those can be executed in parallel.
So that's the that's kind of how I see all of that tying together, down the track.
[00:52:01] Unknown:
Yeah. It seems like your approach of having more of a functional paradigm for handling the parallel processes, it seems that that will greatly simplify a lot of the trouble that has historically been associated with threaded programming of trying to figure out what order things are being returned in and how that mutates the global state of the program. And by not persisting anything in those parallel threads, you can remove a lot of that contention and the need for
[00:52:31] Unknown:
semaphores and mutexes and things like that. Yeah. I I didn't write a single mutex, in all of my I I rely on the operating system to manage the the scheduling of, you know, callbacks and that sort of thing. And it actually works really well. Like, I it's the old ways of the, you know, the Java style of, you know, locks against anything, concurrent hash maps, and all that sort of thing. That that's like just none of that was necessary. It it it's a it was strange. It was a a a training act. So I did all of that sort of type of stuff, you know, back in the 2000. And it's it it was a nice not to have to deal with threads and locks and and stuff directly. There's a bit of I didn't write a mute. I did use a couple of, of the new Windows, slim rewrite lock. So there there is some locking in in some little parts, but it's in a very minimal case, and it's it's not something that I exposed to the the Python user. It just happens when it needs to behind the scenes.
[00:53:23] Unknown:
So are there any operations that are not supported in parallel threads or anything that a user needs to be aware of when they're trying to create a program that makes takes advantage of those parallel processes?
[00:53:35] Unknown:
Right. So, I mean, other than obviously the, you know, so you couldn't have, like, a dict in the, the main thread and then be able to just assign to that from the, from parallel threads. That's, I actually think so It either crashes or I put some code into actually, it's actually pretty easy to detect when that's happening, it and raise an exception so that the parallel code can actually get that exception and and handle it. The other restrictions at the moment are that you are disabled importing from parallel context, so you can't import a module, which is, you know, that's a reasonable restriction. There's there's no I that is not a restriction that I have been butting my head against.
So it's it it kinda makes sense to do what you're importing in the main thread. The other restriction, I think I disabled tracing and, like, her sort of thread stepping through at the c byte. That's, I mean, that's a really specific change. I haven't done anything to support debugging of these parallel threads. That's gonna require a bit more, attention. So how do you actually I I I did add preliminary support such that 1 of these parallel threads can actually acquire the GIL and become the main thread, but that would be used only in debugging. There's there's a there's actually a way to efficiently synchronously call back into the main thread, which which should be used. But so for debugging reasons, it actually could be quite useful to, you know, have these parallel threads that if you're debugging code that works fine in the main thread but isn't working in the parallel thread to actually, you know, provide a way to provide an option for debugging that. So other than that, it's, I mean, the the you can't use any code that violates the restrictions of the don't persist parallel objects. So that's really the main the main takeaway. The other the other restrictions are just very small and very unlikely to be run into.
[00:55:23] Unknown:
Yeah. So going back to your point about debugging, it seems like some of the projects that have been coming up recently that enable remote debuggers might be a good place to start with pursuing that in terms of being able to as you hit an exception in 1 of the parallel threads, just have it stop, enter the debugger, and then I know that in Celery, it will print out a port that you can then telnet to and step through the PDB at that point.
[00:55:50] Unknown:
Yep. There's definitely a lot of opportunity to to explore those sort of things.
[00:55:55] Unknown:
So what drove the decision to fork Python 3.5 as opposed to the 2.x series?
[00:56:01] Unknown:
Oh, so I so it was 3.3, but the tag was 3.3.5. I didn't actually so that was, as much as I would like to say it was a calculated decision, I'd literally just started hacking on c Python on whatever my state of the repository was. So I think I had, like, so, you know, I was a core Python committer, so I was working on Python. I think what turned out to be 3.4 at the time. It was essentially a checkout of the the default, like, the master branch of, Python Sourcetree. So that happened to be Python 3.3 at the time. It wasn't really doing it on 2 wasn't ever really something in my mind. I I think at the time, again, I was trying to I was motivated to try and, essentially, like, race Guido. So to try and get something on the plate where it could be considered, next to this, you know, the ASIN GAIL and the yield from syntax and that sort of stuff. So that yeah. There just wasn't any real viability of of doing that in 2. And again, you know, I I wanted to support Python and support Python 3, and, you know, Python ironically, Python 4 with if if PyPILLO does come into the main line increases Python 3's adoption because where, you know, PyPI Parallel is gonna track Python 3 as it goes, you know, through its, through its the next versions.
Hey. That's, you know, that's all better for Python really.
[00:57:19] Unknown:
And so tying into that, in the documentation, you mentioned that the long term goal for Py Parallel is to merge it back into Python mainline. So it's possibly within 5 years. And I'm wondering if anything has changed with that goal or the timeline, and what milestones do you need to hit before that becomes possibility?
[00:57:35] Unknown:
Yeah. So the the thing that's never gonna happen is a big bang merge. So there's never gonna be a point where I I sort of, like, email, read on, be like, alright. Here it is. It's ready. I've finished all the patches. Apply. It's it's really going to need to. To and just because of the you know, this is the first time where we've been able to have Python code run-in parallel threads. So and it's it is experimental. It is proof of concept. We don't know you know, we're not Microsoft or Apple. We don't have the ability to figure out all the problems ahead of time. We kinda need to leverage the open source nature and and leverage the community participation. So I really want Pyparallel to live separately and independently to the main Python tree because it's it's also a quality thing as well. Like, people have come to rely on Python. It's you know, it powers millions of websites and businesses and, you know, is the language of choice for so many developers.
So that would be a huge breach in our in the community's trust of us if we start just putting in Pyparallel changes that, you know, crash or or or start to reduce that level of quality. That being said, the PyParallel does need to track the way that I envision it playing out is actually tracking. So, you know, Python 3.5 is due to be tagged really soon. I think we're in the second release candidate. The idea that I have in mind is once that's tagged, I will essentially attempt to rebase, PyParallel against that. It's gonna be a bit silly because there were a lot of changes to to between 3.3 when I made it to to where it is now, especially with all the memory allocators and that sort of thing, which is 1 of the areas where I do all the intersection.
But the if there's sufficient interest, I I wanna use that opportunity to try and get it working again, get the build working again just with these new, primitives in place on Linux and OSX so that, you know, we can actually build it. And from there, you know, there's we've got the ability to people can start implementing the the platform specific stuff that's going to be required in order to achieve this. So I really do envision it tracking the state of the CPython repository over the next, you know, few years. And then depending on, you know, what what the level of change is, I think Python 4 is is realistic.
There's no you know, that's never been, there's no sort of, like, release scheduled for Python 4 at the moment. But because of the nature of the change, I think it would make sense. But I I actually do see Pyparallel going in lockstep with with the core Python development over the next few years until we can say, okay. Look. It's we've ironed out all the bugs. We've, you know, it's matured to a point where, you know, we're happy basing this off. So it's the other key thing is that, you know, that's we're attempting to solve, you know, how how is Python going to be able to continue to flourish in the next 25 years without undoing all of the benefit and and progress we've made in the past 25. So that means, you know, ensuring that there's as as much compatibility with the sources there, you know, so so that people don't have to continually, like, rewrite their modules and rewrite all of this Python code that they've been using. So that's the yeah. It it'll be interesting. It's, it's obviously going to need some involvement from the community to, on the the Linux and OSX sides.
I I don't I think once people see the advantages of the the the implementation that that, you know, they'll it'll scratch and someone will wanna scratch a niche and and and get involved. So
[01:00:59] Unknown:
so what are some particular problem areas that you're looking for help with or particular areas of bring to the table in terms of moving the project forward? Yeah. So it's an interesting 1. So I
[01:01:13] Unknown:
so the GIL problem is, you know, it's it's constantly discussed on, the Python ideas list and Python dev list. I think the the biggest issue that Python's gonna have, you know, it's like a it's it's it was designed in 1990, I think, when Guido started making his first commits and released in 1993, I think. It's you know, this is a there aren't really many other projects out where with that with that age where they are trying to attempt to do something that is, you know, that is kinda radical. So there's a question of, like, you know, do open source languages need to to, you know, rock the boat and and try and come up with these, you know, aggressive ways of staying relevant as, you know, hardware matures and as, you know, technology evolves. So that's that's more of an open ended problem that is it's gonna be interesting to see, you know, to see what the reception is like to parallel.
So the GIL isn't removed removed, but the limitations of the GIL are removed, and it essentially allows you to effectively exploit your underlying hardware. So the but it's a new concept. Like, everything that I had to come up with to solve it, I had to essentially invent. So there's I can't really tie it to a, you know, a known paper of, like, you know, we implemented this particular, technique to to solve x y zed. It was all it's all kind of just fitted together as it as it had come along. So it's gonna take time for people to just understand how it works and what are some of the limitations on it. You know, maybe it maybe it doesn't end up being the the way that Python progresses.
It could quite easily just live it as its own as a as its own particular problem. The I think that's the the biggest area, just to to get people's awareness of the project and to be willing to, play around, but especially if if you've got Linux and Unix skills to when we do start the rebasing against 3.5 to to start getting involved in, seeing if we can get parity between the the Linux and Prosek stuff and, and the current Windows implementation.
[01:03:08] Unknown:
For anybody who's interested in checking out the source code for this or checking out any issues that might be filed against it, where can they find that?
[01:03:16] Unknown:
So github.com/piparallel/piparallel. So the the website, highparallel.org, also links to the the GitHub page as well. So that's the that's a good starting point.
[01:03:28] Unknown:
And for people who do dig into the repository, are there any particular open issues that you have on your high priority list or any,
[01:03:38] Unknown:
So there's, there's 1 issue that so I knew just recently, migrated it to, to GitHub. The there's definitely some, some work that I need to do in order to structure the things that need to be done in such a way that people can start contributing. So I, you know, I anticipate that that taking places over the next couple months. The the there's 1 issue in there at the moment where someone's asked whether or not it supports Linux. So I I said not yet, and Patch is welcome. But there's, yeah, there's no list of work items yet, but that's something I'm looking to work on going forward.
[01:04:14] Unknown:
So before we move into the picks, is there anything else that you wanted to bring up or mention to our audience?
[01:04:21] Unknown:
No. I think, there's definitely a lot more to talk about. So, I definitely look forward to if you'll have him back on the show, there's we can jump into some other details and and continue discussing some of the ideas and, you know, topics that
[01:04:36] Unknown:
that we're dealing with here. Yeah. Definitely. We would love to have you back on the show to dig a little further into this. So we'll work on getting that scheduled, and anybody who's listening, keep an eye out for that and we'll continue this conversation. So for picks, I will go ahead and get us started. And my first pick today is a project called Test Infra and it is an infrastructure testing library that is a plugin for pytest. So it can be used as an alternative to the server spec project which is written in Ruby. So it gives everybody who is doing systems engineering using Python based or actually any language based configuration management toolset. It lets you use Python as your language for implementing those tests.
So it's still kind of in the early stages it looks like, but it seems that it's got a lot of really good potential and a lot of good basis for working forward with. And my next pick is a new podcast I started listening to recently called Software Engineering Daily, and the host has weeks that have a particular theme. And so the ones I'm listening through recently he had a big data week where he interviewed people in the big data community about various projects such as Storm and Kafka. And the episodes I'm listening through right now are all around Bitcoin and the blockchain. So really well done show. Definitely worth checking out.
[01:06:05] Unknown:
Excellent. Thanks. First of all, you scooped 1 of my picks because that that software engineering daily podcast is is phenomenal. I've been really, really enjoying it. It it really, you know, runs the gamut from talking to some really big luminaries in the field, Schneier and and, you know, all kinds of other great folks, to dealing with projects that I just haven't heard of, but really, you know, wanted to know about. It's, it's great stuff. And it kind of amazes me that he can put that on daily. I mean, it's a lot of work do this weekly. I can't imagine cranking out a new podcast every day. That said, my picks are I'll my first pick is the Hello Web App Intermediate Concepts Kickstarter by, Tracy Osborne.
The Hello Web App book that she wrote initially is excellent. I'm a, you know, non front end, you know, back end infrastructure kinda guy, and web programming has never really made sense to me. And this is 1 of the first times, the first books that I've ever read that actually explained how to design a web app in in really accessible terms. I I really like it. I'm looking forward to the intermediate 1 and everyone should go back to that Kickstarter. My second pick is a beer. It's been a few weeks since I picked a beer and I encountered 1 yesterday that I really liked. It is the called Rainbow Dome, oddly enough, from Grimm Brewing again. It's a really interesting beer. It's it's a sour, but it's got some really interesting, flavors going on. It's, once again sort of almost like luminescent yellow. They Grim seems to have a penchant for brewing beers that, look really novel in the glass.
So so there's that. And my 3rd and last pick, I will pick the PBS Idea Channel podcast. It's a great a YouTube channel rather. Excuse me. It's a great show. Every show he goes through some concept whether it's, a, you know, something from current events or whatever. It's brilliantly produced with all sorts of little great video clips, and it, you know, it's it's just incredibly intelligent and and definitely worth watching. And that's it. Trent, what do you what kind of picks do you have for us? I have 1 pick, and it is a book called Showstopper,
[01:08:26] Unknown:
and it is by an author g Pascal Zachary. And it's, the the tagline on the, the book is the breakneck race to create Windows NT and the next generation at Microsoft. So this book was written in 1994, and I bought it recently. I bought it, I think, last year and, only ended up reading it sort of over the Christmas break, but it was a fascinating basically, a fascinating, account of the development of the NT kernel, which is essentially what all modern Windows machines are are now based upon. So it, attracts the development project from, David Cutler, who is the the main architect, on Windows NT. So his, heritage came from, you know, he came from digital, and he had a a, a background in VMS.
And, it was in the late eighties, he got a call basically from Bill Gates, that's you know, had heard of his problems, heard he was, you know, looking potentially to to leave digital, and basically got free rein to to build the next generation of operating system at Microsoft. So it was actually it's strange because I didn't actually know that connection until much later, and I know that my initial interaction with Windows, especially when as a as a teenager, it was the only thing that I really remembered about Windows 3.1 and Windows 95 and stuff was how often it, like, lost my homework and how often it would crash. Like, that's you know, that that sets stays in your mind, and it's, it it wasn't until I got into the enterprise where actually, you know, there was a Windows 2, 000, which was obviously the where the NT kernel came in, a Windows 2, 000 machine and SQL Server 2, 000. And I'm like, oh, maybe that's it. That's all I meant.
Microsoft I always had this really negative opinion towards. And that really just started the the the gradual change into interest within Windows and the all of these primitives that I that I use within Pyparallel, they can all be traced back to the, sort of the fundamental design decisions that that can even that go back as far as VMS. So and VMS was the you couldn't get anything more different to Windows than than VMS. This was like a system that was designed to, reliably and safely run, you know, business arc business, problems in the in the late seventies and eighties and that sort of thing. So this book in particular, it it's just fascinating to me, reading it because it gave a really close account of how Windows NT was developed from sort of the late eighties to its release in in 1993.
So I, yeah, I've got I've got, like, 30, 40 little tabs poking out of it of of where I found interesting little tidbits and comments and stuff. And, yeah, I it's it's interesting. I'm buying a lot older books now, as I get older, which is strange. I bought a VMS book, the which was published in, like, 1980, like, the late eighties, I think. So it's, yeah, it's it's interesting to see that transition as I, I guess, you know, continue work as a software developer. My the the average age of the the books of, like, when they they were first published is actually getting, getting older as well.
[01:11:34] Unknown:
It's it's definitely a known thing that the longer you work in our industry, the more you realize that what's new is old, really, when you peel back the covers. Definitely.
[01:11:45] Unknown:
Alright. Well, we really appreciate you coming on to the show to talk to us about pyparallel and all the interesting work that you've done on it. And for anybody who wants to keep in touch with you and follow what you're up to, what would be the best way for them to do that? Yeah. Sure. Visit,
[01:11:59] Unknown:
GitHub slash Py Parallel Py Parallel, and, you know, favorite star bat. I can be re actually, twit pi parallel on Twitter as well is a is a good way to get in touch. I can be reached on Twitter at Trent Nelson. Great. Good talking to you guys. Have a good night.
Introduction and Host Information
Interview with Trent Nelson
Trent Nelson's Background and Role at Continuum Analytics
Founding PyParallel
Introduction to Python
Early Projects and Python Community Involvement
Overview of PyParallel
Challenges and Technical Hurdles
Approach to Solving the GIL Problem
Technical Hurdles in PyParallel Development
Functional Programming and Parallelism
Debugging and Limitations
Forking Python 3.3 and Future Goals
Community Involvement and Contributions
Closing Remarks and Future Discussions