Visit our site to listen to past episodes, support the show, join our community, and sign up for our mailing list.
Summary
More and more of our applications are running in the cloud and there are increasingly more providers to choose from. The LibCloud project is a Python library to help us manage the complexity of our environments from a uniform and pleasant API. In this episode Anthony Shaw joins us to explain how LibCloud works, the community that builds and supports it, and the myriad ways in which it can be used. We also got a peek at some of the plans for the future of the project.
Brief Introduction
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- Subscribe on iTunes, TuneIn or RSS
- Follow us on Twitter or Google+
- Give us feedback! Leave a review on iTunes, Tweet to us, send us an email or leave us a message on Google+
- Join our community! Visit discourse.pythonpodcast.com for your opportunity to find out about upcoming guests, suggest questions, and propose show ideas.
- I would like to thank everyone who has donated to the show. Your contributions help us make the show sustainable. For details on how to support the show you can visit our site at pythonpodcast.com
- Linode is sponsoring us this week. Check them out at linode.com/podcastinit and get a $20 credit to try out their fast and reliable Linux virtual servers for your next project
- The Open Data Science Conference in Boston is happening on May 21st and 22nd. If you use the code EP during registration you will save 20% off of the ticket price. If you decide to attend then let us know, we’ll see you there!
- Your hosts as usual are Tobias Macey and Chris Patti
- Today we are interviewing Anthony Shaw about the Apache LibCloud project
Interview with Anthony Shaw
- Introductions
- How did you get introduced to Python? – Chris
- What is LibCloud and how did it get started? – Tobias
- How much overhead does using libcloud impose versus native SDKs for performance sensitive APIs like block storage? – Chris
- What are some of the design patterns and abstractions in the library that allow for supporting such a large number of cloud providers with a mostly uniform API? – Tobias
- Given that there are such differing services provided by the different cloud platforms, do you face any difficulties in exposing those capabilities? – Tobias
- How does LibCloud compare to similar projects such as the Fog gem in Ruby? – Tobias
- What inspired the choice of Python as the language for creating the LibCloud project? Would you make the same choice again? – Tobias
- Which versions of Python are supported and what challenges has that created? – Tobias
- What is your opinion on the state of PyPI as a package maintainer? What statistics are most useful to you and what else do you wish you could track? – Tobias
- Could you walk our listeners through the under the cover process details of instantiating a computer instance in say, Azure using libcloud? – Chris
- Does LibCloud have any native support for parallelization, such as for the purpose of launching a large number of compute instances simultaneously? – Tobias
- What does it mean to be an Apache project and what benefits does it provide? – Tobias
- What are some of the most notable projects that leverage LibCloud for interacting with platform and infrastructure service providers? – Tobias
- Could you describe how libcloud could be extended to abstract away a new type of service that’s not yet supported – e.g. a database? – Chris
- Would you suggest that libcloud users extend libcloud to cover ‘native’ services they might use like AWS Lambda, or should they mix libcloud and ‘native’ SDKs in cases like this? – Chris
- Could you talk a little bit about the cloud oriented network services that libcloud supports? Is it possible to create AWS VPCs, subnets, etc using libcloud? – Chris
- Do you know if people use LibCloud for abstracting the APIs of a single cloud provider, even if they don’t have any intention of using a different platform? – Tobias
- Do you think that people are more likely to use LibCloud for bridging across muliple public cloud platforms, or is it more commonly used in a hybrid cloud type of environment? – Tobias
- What is on the roadmap for LibCloud that people should keep an eye out for? – Tobias
Keep In Touch
Picks
- Tobias
- Chris
- Anthony
- Hidden Brain Podcast
- PyKwalify
- Doing Nothing
Links
- Dimension Data
- Austin Bingham and Robert Smallshire Pluralsight Python Training
- CloudKick
- PyPI Ranking website
- Apache JClouds
- SaltStack
- Scalr
- Apache Software Foundation
- Mist.io
- StackStorm
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to podcast.onit, the podcast about Python and the people who make it great. You can join our community at discourse.pythonpodcast.com for your opportunity to find out about upcoming guests, suggest questions, propose show ideas, and follow-up with past guests. I would like to thank everyone who has donated to the show. Your contributions help us make the show sustainable. For details on how to support the show, you can visit our site at pythonpodcast.com. Linode is sponsoring us this week. Check them out at linode.com/podcastinit and get a $20 credit to try out their fast and reliable Linux virtual servers for your next project.
We also have a new sponsor this week. Rollbar is a service for tracking and aggregating your application errors so that you can find and fix the bugs in your application before your users notice that they even exist. Use the link rollbar.com/podcastinit to get 90 days and 300,000 errors for free on their bootstrap plan. You can subscribe to our show on Itunes, Stitcher, TuneIn Radio, and now Google Play Music just launched support for podcasts. So you can check us out there and subscribe to the show. Please give us feedback. You can leave a review on iTunes or Google Play Music to help other people find the show.
And don't forget that we have 2 tickets to the Open Data Science Conference in Boston, which is happening on May 21st 22nd to give away. In order to enter, just sign up for our newsletter at pythonpodcast.com. Also, if you use the code ep, you will save 20% off of the ticket price. If you decide to attend, then let us know. We'll see you there. Your host as usual are Tobias Macy and Chris Patti, though Chris is gonna be joining us a little later on today. Today, we're interviewing Anthony Shaw about the Apache Lib Cloud project. So, Anthony, could you please introduce yourself?
[00:01:54] Unknown:
Hey, everybody. My name is Anthony Shaw. I'm calling from Sydney, Australia. It's a little bit later in the evening here. I'm on the Apache Live Cloud project management committee. So that's basically 1 of the governing boards for the project. And, yeah. I'm here to talk about Apache libcloud and have a good conversation about Python. So Great. And how did you get introduced to Python? It was a little abruptly, actually. I I come from more of a c c plus plus background. I've been a developer for about 10 years, 10 years now. Spent most of my time recently with c sharp, and late last it was mid last year, I was on a trip to in Seattle to visit Microsoft, and I was supposed to fly to New York for the weekend.
And, I was actually ill. So I ended up, being stuck at the hotel for the whole weekend, with nothing to do, and there's not a whole lot to do in, on the East Lake. So, basically, I thought, okay. What am I gonna get up to? So I I figured I'd sit down and learn Python because it'd been on my to do list for a while. And I picked up the Apache libcloud project, because that looked like a great place to start. And the the company I work for, Dimension Data, we have a we have our own public cloud, and we didn't have a driver. So what I said about doing over the weekend was, trying to learn the language, and to do that by, trying to put together a driver for the cloud and see how that was done. So that's kinda how I get how I get introduced to it, and sort of hacked around with it for for over the weekend. It's a it's a pretty straightforward language to get used to actually. I've worked with a number of different languages, in the past, but it was a bit of a shock from a from a space, and then going to something that's white space, significant and losing all the braces and stuff like that. But once you get it once you get over that, it's, it's very easy to learn.
And then when I got back to Sydney, I went through a series of training modules on, Pluralsight. If anyone uses Pluralsight, I really recommend Austin Bingham and Robert Smolsh's talks on Python. There's about 8 hours worth of training content on, on Pluralsight by both of those guys. And they were excellent because it taught me, not just syntax because that's that's important to get started, but how do Python developers do things? And that's what I really wanted to know was, you know, I wanna write Python like a c programmer. I wanna write Python like a Python programmer. So what kind of convention is different? What do they do? And that was a good place to start for that.
[00:04:34] Unknown:
For anybody who's not familiar with it, can you explain what the libcloud project is, and if you're familiar with the history, how it got started?
[00:04:42] Unknown:
Yeah. Yeah. Sure. So libcloud is a Apache top level project. It's written in Python. It's a class library that includes there's a combination of things actually. It's a client library for multiple clouds. I send multiple there's on the compute side, so you've got the other things like Amazon and Azure, and the providers like Joynt and Digital Ocean. In total, it's about 52 different cloud providers. So it's a Python package that has the client libraries for those, and we don't depend on those we don't depend on each provider's own Python library. We've basically written our own implementations of an AWS client and Azure client, a joint client, and all of those. There's 0 dependencies.
And it also has an abstraction system where you can basically ask the cloud to give you a class implementation to get information about compute and provision compute. So, the focus of the project is to try and make all clouds look the same as hard as that sounds, and to provide that in a way that's fairly seamless and dependable, without too many surprises. That's that's kind of what the cloud does. It was, it was formed, fairly organically by a startup called CloudKick, which I think was in 2,008, 2,009. There's an Apache member called Paul Querna who kicked that off. And, CloudKick was acquired by Rackspace.
And just before the acquisition, the team that was originally working on it, they were working in Python, which is why it was originally done in Python, and and they decided that actually, this bit of IP we've got around this this generic cloud obstruction is pretty cool. Maybe we should separate that off and make it a separate project, because the monitoring thing and the cloud obstruction, sure they go together, but actually, there's, you know, there's a single responsibility principle on the obstruction side. So they kind of separate that off, put it as an Apache incubator, which is like Apache's, before you become a big project, before you before you sort of graduate the big boy school, they put you in the incubation space, and it sat there for, I think, about a year and a half, where different mentors from the Apache Foundation basically help them progress through that.
[00:07:05] Unknown:
How much overhead does using Libcloud impose versus native SDKs for performance sensitive APIs like block storage?
[00:07:13] Unknown:
Yeah. That's a that's a tricky 1. I'd say Danfords. So we have a storage abstraction, driver for compared with something like, if you're looking about Amazon, compared to something like Boto, we it would probably be similar in terms of performance. So we use a similar principle, draw streaming data from the file objects and and writing them to the APIs. We did have our own HTTP library that was written originally back in 2009. We're just working on replacing that with the request library at the moment. Back in 2,009, it didn't exist. So the today, the guys that were developing it had to write their own. But now we've got the option of using request for putting that functionality in. And that basically gives us more direct stream to read and write raw, file streams.
So if you are using, object storage like s 3, for example, then it would increase performance significantly in the new versions. In the older versions, again, really, we're still using Python data streams, so we're not sticking anything in the middle.
[00:08:21] Unknown:
And what are some of the design patterns and abstractions in the library that allow for supporting such a large number of cloud providers with a mostly uniform API?
[00:08:30] Unknown:
It's a really hard problem to solve, actually. Each each provider, has a very unique API. Each provider, obviously, wants to differentiate themselves from from another. So if you just compare simple things like OpenStack and VMware vCloud, for example, then, you know, they want to differentiate themselves against each other. Therefore, they have different functionality, and therefore, they have different API. So it was I think it is still remains a challenge for us to, figure out how to obstruct these things away. Luckily, Python makes that pretty easy compared to a more strongly typed language. So there are a lot of benefits to being able to rapidly construct and interact with dictionaries in Python, which you would otherwise find quite hard to do in a in a language where you don't have dynamic types.
I think we'll talk about JCloud maybe later, but they that's another project written in Java, and it's really interesting, to see 2 languages try and solve the same problem. And something like Java, where, you know, there's a lack of, dynamic typing in Java, whereas in Python, it's a bit more native. So what we really have the ability to do in in libcloud is, provide a base class. So it's like a driver base class. So if you compute that would give you methods like list nodes, which, lists the VMs you have, list networks, deploy network, and deploy node, start start node, stop node, delete node. So sort of basic functionality.
Each each driver then implement, inherits from the base class. And if they have any additional, arguments, then we sort of advise against using, keyword arguments because it's quite difficult to, for to document, to document those. And in terms of how we generate our documentation, it's much preferred to have names keyword arguments rather than, star star quarks. Also, we have a number of model classes that come back from the driver. So if you say this node's in the Amazon driver, and you say list nodes in the Azure driver, then the list that you get back looks exactly the same. So you get a list of node instances, and the node instances have the same properties, and that is the name, the ID, the location, and the state in which it's in. So there's standard properties on the on the class, that's like a standard model, and then there's an extended dictionary called extra, which allows each driver to put in additional information that's unique to that driver. So you can you can still pick out the extra stuff like, okay, which AML I was deployed from, or, you know, do I have any additional EBS volumes, or, do you have more specific stuff for like Amazon.
And And then in, we basically have a factory method that sits around the top of this. So you can say, give me a driver, and you can ask for a Rackspace driver and a dimension data driver, and an Amazon driver. You can put those in a collection, and then you can iterate through them and and list nodes across multiple clouds. So that's kind of where it starts to become really powerful. It's not just having a a consistent way and a consistent set of models for interacting with different Cloud APIs, But being able to put all of those together and actually see the whole picture. So what do I have in all of my clouds instead of just targeting them individually?
[00:11:56] Unknown:
Given that there's such differing services provided by the different cloud platforms, do you face any difficulties in exposing any of those additional capabilities such as route 53 with Amazon or, you know, maybe some of like the Google container engine, anything like that?
[00:12:11] Unknown:
We've separated things into different base drivers. So at the moment, we have compute, and we focus on whatever it is as a service. So anything that's provided as a service, we focus on that abstraction. So computer as a service, I think is, fairly well known, and that's what the drive is really known for. We also have DNS as a service. So we support, Route 53. We support providers like GoDaddy. I think it's about 20 different drivers in there, The the Azure DNS, as well. So we then have storage. So that's for your object storage like, s 3, Azure Blob Store, more sort of native stuff, as well as Ceph and the, OpenStack Swift, I think it's called. We also have container as a service, which is a new 1. So that's for Amazon's Elastic Container Service, ECS, as well as, Google Container Engine, or just Kubernetes directly. So we kind of have a container as a service abstraction driver.
And also, we have load balancer, which is a more specific 1. And then anything that kind of fits outside of that, I guess, which is a bit more unique to a particular provider. So let's say, Amazon, their simple queuing service, for example, it would be I think it would still be very hard for us to create queuing as a service obstruction API because at the moment, there are only maybe a couple of providers that have that, and their services are really different. So we've really focused on places where there are similarities and where we think there's value in it providing an obstruction. That's that's kind of been the focus of the project.
[00:13:52] Unknown:
For those kinds of additional services, do you recommend to people that they just use the native cloud drivers, like, for instance, the Boto library?
[00:14:00] Unknown:
Yeah. And I think that is the case as well. Are you just looking at the popularity of the 2 the 2? So however it's pronounced as the Boto Boto. Sorry. I'm English, and I'm living in Australia. My accent has got so confused over the last years. Just I was looking at it today. Actually, there's a pipeline ranking website, and did the it's it's the 10th most popular package on PyPI. So, like, it's not just popular, it's like, it's 1 of the most popular Python packages. So we, you know, we know that people are using it. And, yeah, if you are if if you want to orchestrate the queuing stuff, if you wanna orchestrate Lambda, for example, then, yeah, of course, you should use a native, a native system. 1 of the things that I think we is a bit different about the cloud is that we try and treat our cloud providers the same, and the the reason for that is that is various things.
What is that when you're trying to write a portal, like a multi cloud portal, and there's a there's a few kind of examples of this. Then what tends to happen is these multi cloud portal development companies sit down and look at, okay, now we have to code against Amazon, and we have to code against Azure, and then we have to code against OpenStack. So and I've seen this time and time again. They sit down and they develop these libraries. They use the native client libraries, and then they have to write a whole bunch of custom code to normalize the data that they get back from each of the APIs.
Then a customer comes to them and says, oh, we really need, Digital Ocean as a provider. And they're like, great. Okay. Now we have to go and do the same thing again. And where kind of Lib Cloud comes in is to give them the obstruction out of the box, but also allow them to then just focus on the over the top stuff. So, you know, there's no value in these companies writing their own obstruction APIs when if they use especially if they're using Python because LibCal has done this for a number of years. We support pretty much every cloud provider you can find. I mean, on the compute side, there's 52.
On the storage, I think we have about 20, and DNS is about 18. So we cover off pretty much every major provider. The APIs are kept very up to date. They're very well maintained. And I think, like I said earlier, it's a it's a client library. It's not just an obstruction library. So it includes implementations to talk to, all of those different clouds. We don't depend on individual, packages to do that. So 1 of the risky risky moves in in either building it yourself or depending on different libraries and packages is the versioning side, for example, and also just creating this massive dependency tree. I think if we just decided early on in the project to, say, okay. We'll just use we'll use Amazon's Python library. We'll use Microsoft's Python library. We'll use x company's Python library. I worked out the other day that we'd have about 500, package dependencies in the project. Just the amount of maintenance to do that would just be insane. I I like I said earlier, I I worked on, I worked in C Sharp for a long time and still do, and there's this, what they used to call DLL hell. Yeah.
If you remember that. But we now have this thing called Nougat hell, which is, Nougat's the dot net equivalent. I don't know if you either of you guys work in dot net, but, nougat is the dot net equivalent of a, sort of package form, like an egg, basically. And, you once you create the bindings, it's only for a specific version. In in in dot net, we get these implementations where, we know we've got all these different packages, and then eventually, somehow we've got 500 dependencies. And 1 dependency needs a different version to another 1, and they argue about which version is the right 1. Yeah. It just gets into such a mess. So, it's not hugely popular, to be honest, when new people come on board to the project, and they say, oh, I don't wanna write my own client library from scratch. Why can't I just use this?
But in the long term, for the health of the project is is really important because we can't guarantee that these these Python libraries are gonna get maintained by people. So, yeah, that's kind of been our focus.
[00:18:16] Unknown:
Right. And I can imagine that there are also a number of cloud providers that don't even necessarily have their own native driver for their particular platform. So something like libcloud might be the only option in the Python space.
[00:18:28] Unknown:
Yeah. And and I don't wanna name names, but it's not that they sometimes they have them, but they're not written by Python programmers. Like I said earlier about, you know, wanting to write Python like a Python program and not like a c sharp developer. Yes. There are a number of cloud providers who have auto generated Python or, you know, they sit down and basically put a, a Java library to Python, and it just it just it's horrible. It's really nasty to use.
[00:18:57] Unknown:
It's definitely not too hard to write Java or c or some other language in Python. And for somebody who's actually used to writing Python, it's definitely pretty painful trying to interact with those libraries because there's just a big cognitive gap in how you think the problem should be solved versus how it's actually being done or the APIs that are exposed.
[00:19:13] Unknown:
Yeah. Absolutely. I think people who are new to the platform cringe a little bit at how often Python programmers use the term Pythonic, but it but it actually is there for a reason, and and it it is, as you say, it's it's kind of a way of thinking about things, and it shows in the code you write.
[00:19:29] Unknown:
1 of the first things that hit me actually was the lack of a switch statement. So, I kinda got to a point where I was like, what's the switch statement? What's the new switch keyword? And then I I sort of googled it and, you know, code by Stack Overflow, which is always a bad idea. But, then realized they said, oh, you can you can do if l f l f, but, you know, this is not a very Pythonic way of doing it. I'm like, what does that mean? And I'm like, and then you kind of see better examples and better patterns. And And it and the argument actually, this is probably quite a good Stack Overflow, instead of just here's a really lazy copy paste snippet of code, was that the top rated answer was if you need to use a switch statement, you're probably doing something wrong.
And try try and look at your patterns again because, yeah, I think you've ended up in the wrong place.
[00:20:17] Unknown:
Yeah. 1 of the, more interesting approaches or patterns that I've seen and used for attacking the particular case of needing to use a switch statement is actually using a dictionary, where each of your cases in the switch statement are a key in the dictionary. And then the function that needs to be executed is stored as an object as the value in the dictionary?
[00:20:37] Unknown:
Yeah. We I think we actually use that pattern in a couple of places, which is is so much tidier, and so much easier to read as well. Our package, we insist on Pepe compliance, and it's part of our, it's part of our CI as well. So, you know, we we do force shorter line length, and that helps stop some of the nesting, which, especially in c sharp, like, gets to crazy levels. Like, all of the developers that I work with on c sharp have this huge big widescreen. Well, it's just because there's such crazy levels of nesting in all the in all the functions.
[00:21:09] Unknown:
Yeah. 1 of the nice things that I've found when actually adhering to PEP 8 is that if you do find that you're running out of line length, it usually means that you're nesting too deeply and you need to break some things out into their own function. So you touched on it briefly with mention of the JCloud project, but how does libcloud compare to similar projects in other languages such as JCloud as you mentioned or the Foggem in Ruby?
[00:21:31] Unknown:
Yeah. JCloud is is, again, an Apache project, and, I think they have a very similar ethos to to the Cloud. We actually more recently started talking to them and trying to sort of start some collaboration exercises. Java, I guess, there's a few differences between the Cloud and JClouds, and they probably speak to the differences between the languages. They have much stricter contracts in terms of the the the base drivers, and the interfaces, that are provided on the Java side. So, it's a bit harder to, I guess, bend the rules a bit. So on ours, for example, if you say list nodes, let's pick GCE, so Google Google Cloud.
When you say list VM, you need to provide a project ID. If you're doing it for, Amazon, it's a bit more generic. So you can just say this note. So for this note is on our side, then we can, you know, just have an extended argument on the on the notes method. And then if it is Google, then you can include that as a keyword argument. That's easy to do in Python. On Java, however, you know, if you're if you're implementing a contracted interface, then you you can't just add additional additional arguments on them into on the method. So, that's where they've kind of had to design things a bit differently.
And I I do think that makes it a bit harder for them to add new drivers. I know there's a bit more overhead. So, in terms of the 2 packages, theirs is probably stricter in terms of the the contracts, but they have fewer providers, in terms of functionality that support some of the stuff that we do like backups and, and containers. Fog is actually a bit different. It's a it's more of a client library rather than the abstraction library. The, the the fog gem does include clients to talk to, a number of different clouds, but it doesn't really provide a generic, model or any contracts. So depending on which provider you talk to, you get you basically get different responses back.
And some of them are some of them are normalized, some are not aren't. So that's probably the difference between, Fog and, Libcloud. But Libcloud and JCloud are Apache projects as well, so that's this is another differentiator.
[00:23:58] Unknown:
And libcloud has way better documentation than fog. I realize that's, like, maybe a small thing, but when I had to learn how to use fog and I got told about 15 times to use the source, Luke, which, you know, fog is a non trivial software package, and sometimes using the source can be rather like pulling teeth even in even with the source code is good. Right?
[00:24:20] Unknown:
Yeah. I think it's I think it was talked about in the past episode, actually, the differences between, the Ruby packages and the Python packages, in terms of documentation level. But, yeah, I I found the same thing. I've I've tried to use fog, to, actually working at the moment to do a client library for the the provider that I work for. And, yeah, I found the same thing, but the documentation was kind of like, yeah. You know, agreed to RTFC or what I call it internally, but, I'm not gonna spell that out. But with, with with ours, I think, we focus on the documentation, and as part of our PR process as well, you know, we do require that people include, documentation, and that is pretty strict.
And obviously, Docs tools makes that pretty straightforward because we can we can test that. And as part of our build, we actually do check check the docs and also build the docs. So that's really cool.
[00:25:17] Unknown:
So you briefly touched on the inspiration for the choice of Python being the fact that the company who first created it was largely a Python shop. But if you were to make the same choice again, you know, if you're if you're starting the project today, do you think that you'd make the same choice of using Python as the language for it, or do you think you would do something else?
[00:25:36] Unknown:
Yeah. I I do a number of, other projects at the moment. I've had experience with, PHP in quite a lot of detail with, so some of the sort of more weekly types languages like JavaScript, PHP. I've just been introduced to Ruby recently In terms of functional programming, I've stayed stayed well away from that for now. I don't wanna cover it off unless I get ill again and end up locked in a hotel for 40 days. But, yeah. And then on the more sort of strongly typed outside, c sharp has been my main experience in c plus plus And to be honest, I would still pick Python because, the the task that we're doing isn't a particularly clean programming pattern where we're kind of acting as a a facade in some places and the mediator mediator in others, and and, you know, we have to talk to these different APIs which all behave kind of differently.
Some of them have weird behaviors. Some of them are consistent, and, you know, that that's a it's a tricky problem to solve, and I wouldn't say there's a particularly clean way of doing it. You know, you're not gonna write the absolute perfect code where they're writing a project like this. And the thing I really love about Python is that just being a doing dynamic, modeling is is is refreshingly simple, and doing it in in other languages. C sharp is actually a little bit easier. They added the dynamic type in version 5 of c sharp, I think it was, which makes things a lot easier. And they they mainly did that because, you know, people were starting to use JSON libraries.
You know, when you convert something from a JSON a JSON document to a a local object, then, you know, what do you get back? And in in in in c sharp, you know, you can't define the fields at runtime, so they actually have to create a dynamic type. Java, I don't think has done that yet. And c sharp still is complicated. Once you got it in a dynamic type, it's hard to then import that to something more concrete, And you have to if you wanna do it as a solid type, then you have to put all these, attributes around all the fields. And the the matching has to work, and you have to include all the all the formatters and stuff. So, you know, I've I've done it in c sharp, and it's it's overhead, and I'd say it's hard work, and it was just refreshingly simple to do it in Python.
And that's 1 of the things that I was so surprised about with the language was that I I was like, oh, because I had a fairly bad experience with PHP because, I kind of came in in the, I don't know if there any must be a fair few PHP programmers on the call. In the in the 4, in the 4 days when register global was was a good idea, which doesn't think we should have ever been the case. So if yeah. If anyone's ever I'll explain what registered global has been on some of the non PHP developers, but imagine you're writing a PHP page, and you can basically override any variable on the application just by putting that variable name in the get parameter or in the post body, and then basically setting the value.
So, you know, basically, it's just a really lazy way. Anything that was in get a post, would basically get turned into a global in your application scope. And if you forgot to name, or if you forgot to override that global at some point in this in this scope, basically, an attacker could override any of the variables in the local scope just by changing the URL. Anyway, I digress. So I've got burned in a number of places on PHP, mainly in the security side of things. So I worked as a sysadmin for a while, and I'm just fixing PHP, you know, really badly secure PHP apps kind of burnt me and put me off a Wiki type languages for a while.
Perl, I got I got put off Perl for similar, not the same reasons, actually. I think the security side was a was a bit stronger. But just obfuscated code. You know, it's writing in Perl can be fun, but I found that I had to Google the same thing over and over and over again to the point where I was thinking, is this really intuitive? Like, if if I've had to do it 9 times and every single time I've had to go back to Google, Like, why isn't this sinking in? You know, I'm not and, you know, I'd like to think I'm not stupid. Is it just not intuitive? And that's the conclusion I came to was that, you know, maybe I'm just not of the right mindset to be a Perl programmer because, you know, I just find this really hard, and it's not fun.
And definitely wasn't enjoyable.
[00:30:19] Unknown:
So It's definitely not fun. Any language that forces you to unpeel your own, you know, subroutine method call parameters from a stack is is cannot be considered intuitive in my book.
[00:30:31] Unknown:
Yeah. And and maintaining other people's power code is like I wouldn't want it on my worst enemies. It's it's like a game that they play to try and make their code as up as possible. So I didn't realize that was an option. Yeah. Maintaining and perm, yeah, they don't really go together. No. I didn't come on here to bash other languages, but I'm doing it anyway.
[00:30:56] Unknown:
So bringing it back to Python, which versions are supported and what challenges has that created?
[00:31:02] Unknown:
Yeah. We because the project has been, running for, I guess, back in the 2.5 era, then we got up to 2.7, and then the the questions around 3 point, that sort of Python 3 came up. The the chair our project chair, Thomas, who has you know, if you go and look on the Git graphs, you'll see he's the 1 who's written sort of 80% of the code for the library. He's, he's he does the work of about 5 people. It's it's amazing. He he did the hard work of actually quite early on, and we were sort of 1 of the first big projects to port to Python 3. So we currently support, I'm gonna say 26 up to 35.
We took I can't remember strictly whether we support 25. If it works, then good for you, but, if it doesn't, please don't come and ask for help, because we're trying to trying to phase it out. And the question at the moment is coming back about 2.6 because, you know, 2 dealing with 2.7 is so much easier. For those who who are listening, and I guess aren't too familiar with the differences, a lot of the function out there, the nice functionality that gets added in, in Python 3 then kind of got back ported to 2.7. So, so you really but that stuff wasn't really available in 2.6. So we kind of programmed the library like it's 2.6, and we also have a utility library that's part of the package that's called pie 3.
And all of their light, all of the drivers basically use this drive this library, and it does the switch out. So it basically discovers which version of Python you're running, and then we switch things out. So we have our own version of base string, and then depending on which version of Python you're using, we we switch it out for a, unicode and string as a tuple or we, just use a native string type. So we basically have a Python 2/3 abstraction utility module package, I guess, you wanna call it. This part of part of the code, which is really cool. Actually, Tomas put that together. And that makes it a lot easier for people who are used to Python 3 coming into our library and trying to develop on it, as well as people who are more used to Python Python 2. So, yeah, we support up to the latest greatest. So 3.5 definitely, is a supported version.
We might consider dropping 3 0 and 3 1 in the future, just because there were some crazy issues with those.
[00:33:31] Unknown:
So it sounds like you've implemented your own version of the 6 library inside of libcloud so as to reduce your dependencies even further?
[00:33:40] Unknown:
Yeah. We more or less. Yeah. We try try not to do too much string comparison, So it should be avoided if possible, and different APIs as well. Some of them don't support unicode, and some of them do. So you also need to be careful about not just having, Unicode strings in in run space, but actually whether the API you're talking to even supports it. So that's another challenge.
[00:34:07] Unknown:
And as a Python project, I'm assuming that your main method of distribution is via the Python package index. So I'm wondering what your opinion is on the state of PIPI as a package index and what statistics are most useful to you and wondering what else there's anything else that you wish that you could track.
[00:34:24] Unknown:
Yeah. Sure. So and I think it's a great platform. Like I was saying, sort of comparing it to, dot net. They've got only recently really, nougat.org, which is their sort of version of PIPI. We I use it, just to kind of keep track of download count, against particular versions, and also, you know, we obviously use it for packaging and publishing. There's a there's a few kind of things. It's I think it's great as a consumer, like, if you're if you're just putting down packages, then it's, it's a really useful tool. But if you're publishing packages, I think it still has a lot to be desired, compared to, Gem, compared to nougat.org, compared to npm, for example.
Like, I I I don't know. So I know that version 0.14 is really popular, just by looking at the download camp, but I don't know why. Like, I don't know whether it's because a particular project has it in their requirements file or, do you know why that is? I'd I'd really like to know more details about what versions of Python people are using. Like, we talked about, you know, dropping 2 5 support, and I'd like to be able to go, we've only had 10 people use Python 2.5, you know, download the package from Python 2.5 in the last year. You know? And so, therefore, you know, we're gonna maybe upset 10 people, but that's not the end of the world.
Maybe we can find out who they are and email them or just put a note online. Whereas if we say we're gonna drop 26, and we find out that 30% of our users are using 26, then, you know, that would be a stupid decision, and we're just left to guess, really, or we're we're left to ask users. But the the thing is that because it's a it's a it's a module, then people don't really sometimes people don't even know that they're installing libcloud because it's part of another project. So they wouldn't necessarily want to give us feedback or they're not on our mailing list, so it's hard to get in contact with them.
And for something like NPM, for example, you can see, okay, this this package depends on these packages, these specific versions. And I'd like to know who who's added us in their requirements for and which versions have they picked, and can I get in contact with them and say, hey? Do you guys fancy using the newer version? You know, you get these features, and just checking we have a broken compatibility and things like that. So that would be really, really, really helpful, is to get down download count per Python version, just so we can kinda get that metric. And I'm sure the PSF would value that as well, like, getting usage statistics, of different Python versions.
And and, you know, we talk about the reason for not switching to Python 3 is mainly because you end up coming across a dependency that hasn't doesn't yet work in Python 3. Like, I think if we knew who depended on what, and which versions are being used, that picture would actually be a bit clearer. So I I would love to see that, and definitely, other platforms are put that in place soon.
[00:37:29] Unknown:
Tobias, you'll have to refresh my memory because it's too early in the morning here. What was the name of the gent that we spoke to in the last episode who works on PyPI and is working on warehouse, the new version?
[00:37:40] Unknown:
I I was just going to say that Donald Stuff was his name, and I was going to say that this conversation is particularly relevant given our previous conversation. And it certainly sounds like a good opportunity for some open source contribution to anybody who is so inclined to add that capability to the warehouse project and also potentially PIP depending on where those statistics would be pulled from.
[00:38:02] Unknown:
Yeah. Definitely. I mean, for those who might have not caught last last episode, they are actively seeking contributors for the warehouse project, which is the new version of of the PyPI website. So if nothing else, even if that's not where this particular change would go, it might be a great way to engage with those folks and sort of figure out how you can pitch in and and maybe get this change made. Because I I agree. I think that would be useful to a lot of people. So, could you walk our listeners through the under the cover process details of instantiating a compute instance in, say, Asia using Libcloud?
[00:38:39] Unknown:
Yeah. Yeah. Sure. So the the the starting point is that you obviously install the package, and import import the module. We have a I don't know if it's a Pythonic term to say a factory method. Is that is that a Pythonic word? Yes. I spent too much time in, in c sharp. Yes. It was really a Java thing, I think. They do we have a factory method called get driver? And we have an, called provider, and that basically lists out all the different providers we support. So you call get driver, and you'd say provider dot WACC space, or provider dot ec2, for example. And it would give you back, a class instance.
I'm sorry. A class object, and then we would instantiate that, by calling the, the initializer. So you can call get driver. You get back a class object. You can instantiate that, and then you know what methods you're gonna get on the driver. So you're gonna have list nodes. You can have create node, and this networks, start stop node. So that functionality is gonna be there. For DNS, for storage, it's all consistent. It's all the same. So you have a factory method, and you call that where they provide a, and it gives you back the relevant driver. So that's basically all you need to do. If you want to create a node, then, there's a there's a method that's part of each driver, called create node, and it has a set of standard, order parameters. And you can call those and to give it sort of a name and the address.
Depending on the driver, you might then have, additional keyword arguments that you can include, for things like, scheduling for GCE, for example, or what additional EBS volumes do you wanna use. And, also, we do support things like sizing. So for Amazon, you know, you can obviously do limitless sizes, which is a node feature that works against other cloud providers that have a notion of sizes as well. And for things like SSH keys, any of the providers that support keys, you can include a key in the create node. So we've we've created consistency amongst the the the providers that have this notion of keys, which is, SSH keys, which is quite common, and the I'll do oversizing.
Not all of them do those. So, maybe about 30 or 40%, I think, of the drivers have a sizing idea. So you do need to know a little bit about the cloud that you're using, or it will just default back to whatever you think is appropriate. So, if the drivers provided default values, we'll include those. So once you do create node, you basically get back a node class instance. And then on that instance, you can call methods like start and stop. You can then use that to add it to a load balancer. So you'd use the load balancer module, and you create a load balancer. So 1 of the cool things is that because the the the classes are consistent across the drivers.
So in terms of the the structures that you get back, in terms of the fields, so they're consistent so that you can mix and match across the tools. So, we talked about RAP 53 as a DNS, for example. If you wanted to use wrap 53 for your DNS, but you wanted to use Azure for your VM, so you could actually cross pollinate those. So you could use the Azure driver to create the node, then you could pass the node to a route 53 instance and say, you know, create an air record for this. Or you can do the same for load balancers. So any load balancer provider that supports public addresses, you can give it a collection of nodes, and they could actually be from different clouds.
So that's that's another example where we kind of not just say treat the clouds the same, but actually create a solution that allows you to span services across multiple cloud providers.
[00:42:36] Unknown:
That sounds really powerful. I mean, I've definitely seen Libcloud for the purpose of avoiding vendor lock in, so being able to migrate your workload between different cloud providers, but being able to so seamlessly orchestrate, resources between the different providers just by passing around objects within the libcloud library is pretty amazing.
[00:42:56] Unknown:
Absolutely. I mean, the idea of being able to sort of take the best of breed from each service, use, Route 53 because the most evolved DNS provider, and let's use compute instances from Google because they're cheapest, and that's that's awfully cool. It's a very compelling, value proposition.
[00:43:12] Unknown:
Yeah. I I'm hoping that someone will put this together. And if they do, please send me a link to the code. But there's a there's there's been this concept of a a storage, basically, like a storage broker. So our our storage client, works across all the major object storage providers, and it would be awesome if somebody would write a an app that would take a directory and then, you know, sign up to every free tier around, and then basically just spread your files across all the free tiers. So, you know, you can just have a few gigs here, a few gigs there, and then, you know, just keep manipulating that free to
[00:43:54] Unknown:
That's really funny. I think you're not the first person to have that idea, but I think that Libcloud may be the the right infrastructure at the right time to actually implement it. So that is kind of a neat idea. Unless you're Dropbox and or Google and or all these other providers, because perhaps the idea of having a utilize people's free storage everywhere abstraction is is perhaps not so ideal for their bottom line. But for end users, I can certainly see it being a boon.
[00:44:24] Unknown:
Yeah. Exactly. And there are other advantages as well, like, you mentioned lock in. So I I kinda talk about this quite a lot, at the moment because, like, I guess it's a concern of mine, not not just from an open source point of view, but, in terms of the most of open platforms, and that's the you know, we we talk about vendor lock in or cloud lock in, but I think it's actually becoming a real real issue is that people kind of default to, AWS. Like, I think it's a great platform. It's really awesome. It has a lot of functionality, but it just becomes now the default stance.
And the thing that worries me more is that people will people's skill set or their applications will be bound to the AWS architecture. And then let's say you leave that company, and you go and work somewhere else, and you're working in a purely research environment, like you're working for a not not for profit or, you know, you're trying to do everything off your end back, and you wanna run things in your laptop, then, you know, all of a sudden, you're not gonna have access to all those APIs, and your application stack's not gonna work. Or, you know, it's like you if you're gonna bound yourself into a paid service, then you you get what you you get what you wish for. So No. I totally agree.
[00:45:39] Unknown:
So that actually brings up an interesting question. Does Libcloud have any sort of drivers that would support abstractions to allow you to target things on your local machine? So for instance, if you needed to provision DNS, maybe just to have it, edit your Etsy hosts file or if you needed to access, your allocate storage, just have it placed directly on your hard drive.
[00:46:02] Unknown:
I'm I'm not sure about SE hosts as a as a bind replacement. But, I I'm not sure actually whether we do support that. In in terms of compute, definitely. So we have support for, Libvirt, which is a virtual driver, and we which can write to KVM. So you can use that for OpenStack. If you somehow manage to run OpenStack on your laptop, then, fair play to you. But you could use that. For containers, we support Docker, local if you're running Docker locally, which pretty much everybody does, then you can use that. And on the compute side, there's KVM, the VMware I can't remember anything now. The virtual desktop, that's called, and VirtualBox. Yeah. And you can run all those locally. So, yeah, it we I've I've used it for a number of things where, you know, I've got some spare hardware lying around, and I use the cloud as the, to kind of orchestrate that. And then when I'm ready to promote it to a public cloud environment, then I'll basically use the same code to do the same thing.
[00:47:10] Unknown:
Yeah. Having the same abstractions and same setup for local as well as production development is definitely a, important capability and 1 of the larger goals of sort of DevOps as a whole. So having libcloud as an option to support that kind of workflow definitely makes it even more appealing than it already is. So does libcloud have any native support for parallelization, such as for the purpose of launching a large number of compute instances simultaneously?
[00:47:37] Unknown:
Yeah. But it does. So we I think there's a note somewhere in the documentation saying that the library is not guaranteed to be thread safe. I'm not sure who wrote that and what the reasoning was. I've never tried to to do it in in that context. I don't know how badly it would blow up. But if you look at the cloud providers, then we have a each time you implement a a driver, you pick a type of connection, and we have 1 called an async connection. So, providers where you'd want to provision a number of nodes simultaneously, typically, what you would do is, if they have an asynchronous API that you can call, so you can say provisioning provisioning node and it gives you back a job ID, Then you can use, just a standard connection and and keep calling the job ID. Just typically what people do.
And the other option is an async connection, which does a lot of that work for you. So you wouldn't necessarily want to call all of those in a thread. Normally, what you would do is create a, like a job queue or a job list, and provision the nodes because the cloud provider does that work that work for you, provisioning multiple nodes at the same time. The other thing I've seen done in some drivers is, like a they basically have a callback. So you can say wait for state. So you could have a, a list, of target of target nodes you wanna deploy, and then for each node, you just say, wait to stay, and then just create a separate thread to do that. So That'll be another way of doing it. Yeah. That was another question I had is wondering whether libcloud
[00:49:20] Unknown:
exposes any events within its life cycle of communicating with the different services or if it has, support for callbacks for being able to string together different operations without having to necessarily do the polling on your own?
[00:49:36] Unknown:
Yeah. It 1 of the 1 of the drivers does have, some callbacks, but the I'd still say that the cloud providers are a bit behind in terms of streaming of an API. So, it's probably more recent thing actually that, you know, we're kind of stepping away from this idea of just keep calling the API and, checking for differences. If you I was working recently with, Slack, the the Slack APIs, and they have a streaming API, which is really cool. So you can basically just subscribe to it and and pull out the events from the remote service. So if the cloud providers were to start doing that, then I think, that's something we could we could then leverage.
But until then, we'd have to create some sort of polling mechanism, and and each cloud provider is different. Like, most of them don't give you job IDs. You just gotta keep asking whether your node exists yet, and then maybe it might say yes. Like, it's, yeah, it'd be tricky for us to do that.
[00:50:37] Unknown:
And what does it mean to be an Apache project, and what benefits does it provide?
[00:50:42] Unknown:
So, yeah, but there's a number of advantages, to being an Apache project. Well, it's really kind of being a part of the software foundation. And that's if you kind of look at the core principles of Apache, 1 of them is community over code, which I think is the most powerful of all the principles. And that's kind of this idea that the intellectual property of a of a project, you know, exists not just solely. So if you could just look at the project and the worth of a project, actually, you think about how much the community is worth. And in our case, you know, we have contributions.
You know, maybe we close it, maybe 10 PRs a week, I think, at the moment. And pretty much all of them are from different people. So individuals come forward and they improve a particular driver or they add a new driver. We added actually 3 drivers in the last week. So we got 2 new DNS drivers and, a new storage driver. So, like, the just the power of having a community out there that's constantly improving the project and recognizing that and protecting the brand of the project as well. So there's a there's a rule in Apache that if it didn't happen on list, then it didn't happen. So you don't have individuals in backrooms making decisions, where everyone doesn't have a say, and I think that's that's quite important. It's it's not just about the brand, which is import which is which is good, you know, having an Apache brand kinda gives the project a bit of a, you know, kudos.
But Apache is I think the majority of the Apache projects are Java projects. So I say the Apache brand isn't as well known in the Python community. But either they kinda think of as a couple of major Python projects, but they mainly think of the web server and things like Solr and those kind of tools. But there is this kind of idea that the community is is the most important thing, or more important than the code. And also, you can't buy influence in Apache. Like, there's no, you know, if you give the Apache Foundation a $100,000, then you become, you know, you get you get voting rights in a project.
It's a meritocracy, so you start off as a committer, then you become a contributor, then you can become a member in future. And members, members get voted in by other members based on the the good work they've done in the open source community. It's not like, you know, it's not like, you know, 1 of these sort of political circles where it's who you know, not what you know. That's the other way around. So that is a really powerful thing, and that's, I think that's what enabled the project to run so consistently for so long and not had somebody come in and go, oh, can we buy the company that developed this? Or, you know, can we buy this as a piece of IP, and what do we do with it going forward?
And there's not all these external parties trying to buy influence in the project. We've kinda got this protection, similar to what you how you do with the PSF, actually, I think. But, you know, the SF's been around long ago.
[00:53:51] Unknown:
What are some of the most notable projects that leverage Libcloud for interacting with platform and infrastructure service providers?
[00:53:58] Unknown:
Yeah. So believe 1 of you is a big fan of SaltStack. Who is that? That would be me, Tobias. Okay. Cool. To us. Yeah. SaltStack has a feature called Salt Cloud. I don't know if you've used that.
[00:54:10] Unknown:
Yes. Rather extensively.
[00:54:12] Unknown:
Okay. Excellent. So, Salt Cloud is well, I'd be able to tell you if, PyPI would tell me. I don't know what percentage of our downloads are from SaltStack. I think it's a a fairly significant amount, and I think they're the culprit for version naught.14 because somewhere on the SaltStack website, it says, this is the version you should use, but it doesn't say why. So self stack is a is a good case. The soft cloud functionality, if you go and look at the the driver implementations, I think all of them except for Amazon and I think OpenStack, OpenStack and Amazon, they used, in the OpenStack case, they used the Nova client library.
And in Amazon, they used Botto. But for everything else, it uses LiveCloud. So Softstack is a really cool 1 for us because, it's a nice implementation, for our library. They followed a similar idea that you can kind of define the infrastructure and provision infrastructure and and pick pick a cloud. So they do use it for the compute side of things, and I recently added a new driver to Softstack, again, for the company I work for. So, added support to SaltStack, and actually, 1 of the other developers in our team did the work, and I handled the PR process.
But that was really easy because we were already in the cloud, and we just basically have to write some wrapper code and and stick it in soft stack. So, you know, if you've if you've written it once, and this is the idea, if you've written a client library for Lib Cloud, then you get to take advantage of all the packages and tools and ecosystem that lives around our project. So Softstack is the big 1. Scalar is another 1, which is like a multi Cloud UI, like a management UI. Scala uses NetCloud for a number of different cloud providers. Another 1 very similar is mist mist. Io, which is a sort of mobile application where you can sign up and manage all your different clouds. Another 1, StackStone, which is, I've just acquired by Brocade last week, actually.
So, yeah, that's another 1. Yeah. We actually interviewed,
[00:56:27] Unknown:
Tomasz and, blanking up the other gentleman's name. It wasn't Evan. I wanna say Patrick. We interviewed them recently about StackStorm, I believe, 2 or 3 episodes ago. So that was another good interview.
[00:56:42] Unknown:
So could you describe how libcloud could be extended to abstract away a new type of service that's not yet supported, e g a database?
[00:56:52] Unknown:
Yeah. Sure. So I've had to go through this, recently, so I've I've still got some of the scars of the process. The the 1 we introduced, we introduced a couple recently. 1 is backup as a service. So I think 1 of the things we recognized is that, you know, we have the compute, the load balancing, DNS storage, and backup is, is supported amongst some providers, but not all of them. And some of the providers so let's pick Amazon, for example. Like, people use EBS Snaps as a way of doing backup and recovery. So it that's fine, and you can orchestrate that, But what we wanted to provide was kind of like a backup as a service abstraction API. So what we had to do what you have to do is mainly just research to start with, looking at the different APIs, the different cloud providers, see how they model their service, see what consistency there is amongst the different APIs, which is a really fun exercise. Now that I'd like to spend a bit more time sharing some of that with the community because, I'm sure there's some probably some good feedback in there for the cloud providers about, you know, why why why why did you guys make this decision to make this so different from everybody else, and this has caused me so much pain, or or for example, just to give them credit and say, this is a really awesome API, and thank you for making the documentation so wonderful.
A lot of time, actually, it just ends up being emails to, cloud providers just pointing out the errors in the documentation, where they've got docs that have been written by hand and not from source. So, you know, you kind of get all these mistakes and field names and stuff like that. So once you've done the research and you know really what the the abstraction needs to look like, then you put together a core driver and, then you have to do implementations for I think we dictate 4, just so that we're sure that you haven't coupled your design too tightly to a particular provider or a particular couple of providers.
So we did this a backup, and we also we've added container as a service, which probably sounds a bit strange. I'll explain that in a second. Which again was a bit tricky because it's so new, and the providers were were so different.
[00:59:23] Unknown:
I'll bet. And with regards to why do providers, implement their APIs so differently, it just occurred to me as you were speaking, this is 1 of those instances where the rubber hits the road of, you know, we like to build technology for technology sake, but ultimately, these things are businesses. Right? And they're designed to make money, and it's not clear to me that the cloud providers would make any more money by enabling people to not be locked in to a given vendor. Right? Like, if if all the APIs were standardized, then you could take your entire stack from 1 cloud vendor to another, and it's not in their best interest to do that, I wouldn't think.
[01:00:05] Unknown:
No. And somebody did I think it might have been missed, tried to put together, like, a standard API or, like, an API standard document, and it was, like, 200 pages or something. And I and I think the the the cloud service providers could if they even looked at it, would have just shrugged it off and thought, we're not doing this. Like, we're we're gonna implement it, and we're gonna be the only if we implemented it, we'd be the only people, and everyone everyone would laugh at us and go, why would you waste? Because it it will cost them 1,000,000 of dollars. I I work for 1 of them, and you you can't underestimate the overhead of developing this stuff because you can't it's not just 1 guy that can hack away in a room. You've gotta make it absolutely rock solid, and you've got to roll it out to data centers all over the world, and there's people to train and documentation to update, and it is so expensive.
So, yeah, it just doesn't make any sense. So, yeah, we have had that fun, and also, I think it's been good because we've we've learned a bit about, you know, how badly some of the APIs can be documented, and we've had a few, bits of fun there. So, you know, when they provide example responses and, example requests in the API documentation, I would say 6 times out of 10, maybe they're wrong. Yeah. It's it's always you can so we we actually pick it up in their PRs now. Like, when people contribute tests into the project and they include fixtures, we look at the fixtures and say, this looks like something you got from the documentation website.
You know? Please can you include a real response because, you know, chances are it's probably different. But it's, yeah, it's a case that someone updates the API and doesn't update the documentation or the person who doc who writes the documentation has never used the API, or somebody made a typo because they wrote the, you know, the example JSON response by hand, and they put, you know, I before e, but they got it the wrong way around or something. That's it's like it's happened so frequently. It's ridiculous.
[01:02:10] Unknown:
Right. Right. You know, it's funny. I I just I feel like I need to explain myself. You know, you mentioned you work for 1 of the providers, and I'm just about to. I just signed an offer with, in fact, the big provider that you were talking about not wanting to get vendor lock in with. So it's kind of it's kind of like, yep. I can I can definitely see that standpoint? I think it's also true that that AWS, I'm just gonna say it, they have services that other cloud vendors haven't caught up with yet. So I think that's what makes it a tricky value proposition when companies are deciding what to go with and how to not get locked in. It's tricky because developers want to use the new services that exist in AWS, but but nowhere else yet. So it's it's that's kind of a tension that, you know, I think tools like Libcloud can help resolve.
[01:02:58] Unknown:
Yeah. And it's not that I that I think people shouldn't, And I I don't I'm not speaking for the project, by the way. This is my just my personal opinion.
[01:03:06] Unknown:
We're all speaking this is this entire podcast is our personal opinions and does not construe the opinions of any of our employers.
[01:03:16] Unknown:
Excellent. So, yeah, you know, I wouldn't I wouldn't say that you shouldn't use, I don't know, the simple messaging service or, you know, Lambda, for example, just because, you know, if you're stuck with it. Because I think there's some really cool stuff that's in AWS, and, you know, if if it saves your company, if it saves your project time and it saves your project money, and there's no value in writing that thing from scratch, then, yeah, of course, you should do it. But, I do also think people need to think about, you know, what's your backup plan? Like, if that service goes away or it doesn't become cost effective, then, like, what's what's your alternative? Which is kind of my my main word of caution is that, you know, make sure that you do have an alternative, because once you're in there, and once you've been locked in, you know, that's that's it becomes more costly to, change something else.
[01:04:09] Unknown:
Absolutely. So could you talk a little bit about the cloud oriented network services that Libcloud supports? Like, is it possible to create AWS VPCs, subnets, etcetera, using Libcloud?
[01:04:20] Unknown:
Yeah. This is probably 1 area where we, fall a little bit short still. We don't have a network as a service based driver or any kind of networking abstraction at the moment. The focus on compute was around nodes. When the original design was done, I think Amazon was really the only 1 that had any kind of software defined networking starts. Overstack came along a bit later, and, you know, they introduced, some of that functionality. Then they kind of rewrote it, and redesigned it. So I don't think anyone really have the time to catch up and put together a networking answer. Each of the drivers implement the the the set of base met base methods. So things like this note is create note. Each driver then defines all their own additional methods. So and these are all unique to each provider. So like a VPC, for example, I know that VPC support is in AWS driver, but it's not a a method that's callable or obstructed upon. So it's a extended method that's part of the driver instance.
You can call it and you can talk to your VPCs, but not cloud agnostically. So you're you're kinda coupling yourselves to Amazon. But to be honest, that's inevitable. Like, I don't wanna sit here and say, you know, you can use our driver and basically, like, every time looks the same, and you'll never have any kind of lock in because realistically, you know, each provider has still has different ways of doing things. Like, authentication, for example, every cloud provider has a different authentication mechanism, and none of them are consistent, and they're all some of them are horrendously complicated.
Like, it's like, oh, you gotta generate this key, and then you go on the website and give us the key, and then we give you back a a pen file. And then you've gotta install the pen file locally, and then you've gotta put this sort of key here. You know, you can only connect them as a full moon, and it's like, why does this have to be why does this have to be so hard? Like and there's always the providers that end up with security vulnerabilities as well, relatively. But, Yeah. That that's the tricky thing is is trying to make this stuff consistent. I think we're we're kind of fighting a battle to some extent. But for for places where it doesn't make sense, like VPCs, then, you know, you just set it up, directly in the base driver.
Or if you wanna use anything that we don't provide, then use, use Blotto, use NovaClient if you're on OpenStack. You know, it makes sense.
[01:06:48] Unknown:
So do you know if people use Libcloud for abstracting the APIs of a single cloud provider even if they don't have any intention of using a different platform? Maybe just because the APIs that of their provider aren't as intuitive as the ones they're trying to use or maybe their provider doesn't actually have a Python library for using it?
[01:07:05] Unknown:
Yeah. Definitely. This does come up. There are plenty of users that, like you say, the the provider doesn't offer a Python library or the Python library that they created was a port from another language, and it was auto generated code or just really horribly written Python. Or actually, more common is that they have a Python library, but it only works with 26 or 27, and they wanna work with 3. So, actually, our packages, is a good way of talking to these different cloud providers in Python 3.
[01:07:38] Unknown:
And do you think that people are more likely to use Libcloud for bridging across multiple public cloud platforms, or do you think it's more commonly used in a hybrid cloud type of environment maybe for people running a private and public open stack or a private open stack and bursting into public a d AWS or something like that, something like that?
[01:07:58] Unknown:
I think multiple public cloud is the most popular. You're going off a couple of data points for this. There's a survey that's done every year by a company called WhiteScale that that does multi cloud management software. And they run a report called the State of the Cloud, and there's a lot of stuff in there about multi cloud and obstructions, which is really useful for us to get a picture of how many people are doing multiple clouds. And multiple publics seems to be, quite popular. The hybrid idea, is in theory, you you know, it's a really cool use case that people have a private cloud locally, and then they burst in the public cloud, and they've automated all of that stuff. In reality, what happens is that when the way they define hybrid is that they've got this legacy, you know, vCenter installation, and everyone goes in through the UI. Like, everyone logs in through the vCenter UI, and then they have, like, an AWS or an Azure. And they want to orchestrate and automate the the public cloud stuff because they know how to do that.
We do support, VMware vSphere, but actually, the number of people who do that is quite small, because just to maintain that and orchestrating VMware is still a bit of a challenge. Their APIs are pretty complicated, and they were kind of written in a pretty Rust era. So
[01:09:12] Unknown:
I also think you kind of alluded to this, but I think it's worth just saying out now that hybrid cloud is 1 of those things that the idea is really attractive. It's like, okay. Great. We get fault tolerance. But when the rubber hits the road, actually implementing it is very, very difficult. A lot of people say, yes. Let's go do this thing. And then they actually dig into what's going to be required to make it happen, and the reality is not so rosy.
[01:09:39] Unknown:
Definitely. I think I'm hoping containers make it easier. I know I know containers is the answer to everything these days, but, yeah, it's like this this doesn't have enough containers, so we'll just put some in there. But I'm hoping that does make it easier because you focus less on the infrastructure automation, and you you just focus on the application piece. So I think I think it'll make some problems
[01:10:00] Unknown:
more tractable. I think it'll make the sort of how do you define a software image problem that you used to have. Like, you know, you have to build an AMI, and then you have to build an Azure image, and you have to build a GCE image. Those kinds of things are made easier by containers, because it's like, hey, you build a Docker container, and you can deploy it everywhere. But it doesn't make things like, as you pointed out earlier, the radically different authentication schemes and the, you know, differences around how networking is handled and how, you know, various other aspects of security are handled. It's those devilish little details that I think can make this kind of project go from, oh, this should be easy to, my goodness, maybe not so much. Yeah. And the,
[01:10:46] Unknown:
container ecosystem is still growing. So Docker is the big player right now, but there's also the Rocket project from CoreOS that's starting to gain ground. And also there's, you know, native LXC or BSD jails that have been around for quite a while now. So the choice of which 1 to use is, still something that requires some consideration. And also things like the security of the images that you're pulling down for being able to layer on top of or also, some this is something that's been solved a bit more now than it was when Docker first came out, but the inter container networking is, you know, something is something to consider. So you're just trading off for a different set of problems. So as in everything, it's trade offs.
[01:11:29] Unknown:
Yeah. We said we we did actually add, container as a service, like a base driver for container as a service. It's still experimental at the moment. So we're just waiting for bit more feedback, and also, for some of the APIs that we're calling to stabilize. So the idea with it was that, people, have these Docker images, these Docker containers that they're constructed, and they want to deploy them. And they wanna test them locally, and then they want to either deploy them into a private Kubernetes deployment. Or they wanna use 1 of the public clouds that supports containers.
So Amazon have ECS, their Elastic EC 2 container service, and, Google have their public cloud version of Kubernetes, so g k e and it's called. So what we do is provide an abstraction, over those services. So if you go on a red doc site and then click on the container piece, it's actually quite an interesting insight into the sort of current state of the container API. So similar principle to what you get with the the compute, you can get a container as a service driver, and then you can install images and deploy containers, into that. And we have a set of base classes for things like a container image and, a running container and a container cluster.
So, Kubernetes and ECS support this notion of a a group of containers or a cluster of containers. So you can create and destroy a list, clusters and find out what containers are deployed in each cluster. And there's also some if you're gonna dig around on the site, you'll find some example code. And I believe we're the first people to do this, where you can deploy Docker containers in Kubernetes, or you can deploy them in, Amazon ECS, for example. And you can do that using the same code. So you can go and say, I wanna deploy these instances, and I wanna get and deploy them over here as well. So you can write that and abstract that across. You mentioned CoreOS, their rocket engine. We, looked into adding adding that, but digging into the rest API. There's a, there's a big comment in the in the API source code that says don't use this. So I sort of thought, yeah, I think I'll stay away from this for a while and wait till it's stabilized.
[01:13:48] Unknown:
So what is on the road map for a Libcloud that people keep an eye out
[01:13:52] Unknown:
for? Yeah. So we are I mentioned the container, the container driver support. So that's, we're looking to stabilize that based on community feedback over the next few months. So that'll be, there'll be a stable version of that coming out soon. We had our own HTTP library, that we wrote, to NASS, but I think back in 2009. This is before requests would have existed. And, there are a number of reasons why that was put together, but, you know, for lack of a nice HDP client that worked for Python 2 and Python 3, We ended up having to write our own. So, we're replacing that with the request library, and that work has just been finished. So there's a there's a branch version of that on our GitHub page.
And it was just a blog post about that actually done today, explaining what the differences are and the reasoning behind it, and what kind of advantage you get in in having, requests in play. So if you wanna go and try that out, then that'd be great. Check that, your coaches and broken because, when I put it together, we have just over 6,000 unit tests in the project, and I broke 4 and a half 1000 of them in the first, in the first tranche of commits. So it basically took me a week to fix all of the tests.
[01:15:17] Unknown:
Yeah. That's always fun when you, make the change and have to try and figure out, okay, what is the common thing among all of these broken tests?
[01:15:25] Unknown:
Yeah. It turns out to be about 20 different common things. Yeah. So that was quite painful exercise, but it's done now. So I know that all the tests pass. However, I don't know whether the tests are valid, because some of the underlying implementation has changed. So, really, what we need is proper testing.
[01:15:45] Unknown:
So are there any questions that we didn't ask that you think we should have or anything else that you wanna bring up before we move on? No. I think that's a great range of questions. Okay. So for anybody who wants to keep in touch with you and follow what you're up to, what would be the best way for them to do that?
[01:15:59] Unknown:
Yeah. So you can follow me on Twitter. I'm Anthony p j Shaw, or on GitHub on Tony Baloney. Baloney like the sausage. Or if you just wanna follow us on GitHub, it's github.com/apatchy/lipcloud. Oh, and I should also mention, sorry, if you go into the SaltStack conference, this month in Salt Lake City, I'll be there, and I'm speaking on 1 of the days about not just about SaltCloud. I'm talking about ServiceNow integration. But if you're going and you wanna have a chat, just hit me up on Twitter, and I'd love to, you know, have a drink and find out what are you doing. Or if you're going to ApacheCon in Vancouver in May, I'm also speaking at that event. So again, if you go into that, then just send me a line.
[01:16:46] Unknown:
So with that, we'll move on into the picks. And my first pick today is going to be the Blue Yeti microphone, which I just picked up and I'm actually using to record this episode. So we've had a few comments about our audio quality, so I decided to pick it up. And hopefully, it's having a positive impact. And for my second and last pick today, I'm going to choose the Diablo Swing Orchestra. So this is a group that I just found on Spotify few other things. So definitely few other things. So definitely worth checking take taking a look at. They're pretty talented and, pretty diverse. So a lot of fun, and hopefully, you enjoy it. And with that, I'll pass it to you, Chris.
[01:17:32] Unknown:
Thanks, Tobias. My first pick is going to be something that is un glorious, but oh, so useful if you need it. The Rosewill RK series key caps. I had the bright idea of buying a DOS keyboard that has no legends on the keycaps, which was just fine mostly for me because I touch type, but then led to a couple of really awkward moments when my wife and or other friends came over to the house and wanted to use my computer and couldn't because they can't touch type. So, this set is a really nice, very inexpensive. You can get it on Amazon or from, Newegg. I think it's their house brand. So they have key caps, and it comes with a key cap puller.
So it's actually been relatively easy and actually, kind of therapeutic to sit here and replace all the key caps in this keyboard, so that other people can use it. And I have to also admit that even myself as a touch typist, occasionally when it's like, okay, type shift f 10. I end up having to like count function keys to hit the right key and that is just a drag. So this thing has been great. My next pick is going to be a relatively new, mobile application called Enki, e n k I. And obviously, I'll put links in the show notes. But this is kind of interesting. They're in they're sort of in beta at the moment. You have to sign up and ask for an invitation, but I jumped on it early, so I got 1. It's really kind of neat. It's basically an application for your phone that's they refer to it as developer training. So it's kind of like you get these little daily doses of training exercises, really sort of quick hit kind of things on topics like, JavaScript, Git, CSS, Linux, etcetera, etcetera.
And I found it to be really kind of very satisfying, and it only really does they they it actually really only does take like just a couple of minutes to do and it really has sort of, you know, I consider myself to be a fairly, well versed Git user and I I've started out with the Git training session and I've learned a lot. So so that's been really great. My next and last pick is going to be a novel that is a classic, and I have no idea how I had not managed to not read it yet being 47 years old. But, Catch 22, it it's it's a really sort of it's 1 of those things that, you know, you don't realize how many things tie into it until you read it, and then then, you know, a bunch of cultural references come to light. It's a really great book. That's all I have for picks. Anthony, what do you have for us?
[01:20:06] Unknown:
Yeah. I'll vouch for the, catch 22, mainly for the reason that whenever people say, oh, this is a catch 22, you can say, no, it's not. It's like the most misused thing ever. Yeah. I have a a few picks for you. On the non the nontechnical side, I'm a a bit of a podcast junkie. And 1 I'm particularly favorite at the moment is a NPR podcast called hidden brain. If you can dig if you can dig this 1 out, it's about 20 20, 30 minutes long, the episodes. It's basically like a social psychology podcast, which kind of sounds a bit bizarre, but they kind of cover off all sorts of ideas like risk and fear and ego and all sorts of things. And it's just really interesting to hear the scientific explanations, the sort of day to day challenges and issues, and, you know, what might be affecting you in terms of relationships with other people. And, I found it really useful just to understand, what the current thinking is in in in terms of the social sciences, and also to actually apply some of this stuff, in my day to day life. So that's been quite fun. On the technical side, there's a Python package called pyqualify, qualify, k w a l I f y.
And, if you're using YAML, for, like, I don't know, configuration of your project or defining things or whatever. It's really cool because YAML is so easy to write and it's, you know, human readable, but, you've maybe lost some of that checking. So if you wanna do static checking of YAML files, so for whatever reason, like, let's say you've got a config file in your project, and every time somebody, edits edits it and checks it into the build, you might wanna run a check on the build and make sure that that, is actually valid. So you can use pyqualify to define, like, a schema, like a YAML schema, and you can validate your YAML files against the schema. So that's coming really helpful for me doing doing some testing. And my final pick is, maybe slightly unusual. And if you're anything like anything like myself, you find that you feel every minute of every day doing something. So whether that's like checking your phone, looking at Twitter, emails, Facebook, and listening to podcasts, which is never a bad thing. But, it does consume a lot of more time. My pick is actually doing nothing. And this is quite a skill that you need to practice over a period of time, which is just trying to find time in the day to not be doing something and not be learning something, to actually try and focus on doing absolutely nothing.
And there's a few ways that you can do this. Things like watching the kettle boil or watching the bath run or, it's really soothing, and it kinda gives you some time to reflect on things and some time to think about things. Another way you can do it is a technique called mindfulness, which is becoming a bit of a hipster thing at the moment. But but it it does actually have some sort of more serious side. And there's an app you can get called smiling mind. It probably goes against my point of not doing anything, but it's it's like an audio guide that helps you through, you know, like, meditation technique and and learning to relax. So, if you find yourself over eager and fidgety and you're struggling to sleep, then I strongly recommend that.
[01:23:36] Unknown:
Alright. Well, thank you very much for taking time out of your evening to join us and tell us more about the Libcloud project. It's been a lot of fun, and, I learned a few things about it that I never knew before. So appreciate that, and I'm sure our listeners will as well. So thank you. Enjoy the rest of your evening.
Introduction to the Episode and Hosts
Interview with Anthony Shaw
Anthony Shaw's Introduction to Python
Overview of Apache Libcloud
Performance and Design Patterns in Libcloud
Supporting Multiple Cloud Providers
Challenges of Cloud Provider APIs
Python Versions and Compatibility
Instantiating Compute Instances with Libcloud
Parallelization and Callbacks in Libcloud
Benefits of Being an Apache Project
Notable Projects Using Libcloud
Extending Libcloud for New Services
Cloud Oriented Network Services
Use Cases for Libcloud
Containers and Future Roadmap
Closing Remarks and Contact Information