You can find out more about us and view previous episodes at podcastinit.com.
Brief Introduction
- Date of recording – 2015-06-02
- Hosts – Tobias Macey and Chris Patti
- Follow us on – iTunes, Stitcher or TuneIn
- Give us feedback on iTunes, Twitter, email or Disqus
Interview with David Baumgold
- Introduction
- How did you get introduced to Python?
- What problem does Flask-Dance solve that wasn’t covered by other libraries?
- What were some of the technical issues that you encountered while building Flask-Dance?
- What are some of the design considerations that you had when building Flask-Dance?
- You also built webhookdb for replicating GitHub’s information to be queryable. What are some use cases for which you would want to do that?
- What is Open EdX and what is its intended audience?
- What are some of the challenges implementing a system like Open EdX, and what can Python developers learn from the implementation of the project?
Picks
- Tobias
- Chris
- David
Keep in touch
[00:00:14]
Unknown:
Hello, and welcome to podcast dot init. The podcast about Python and the people who make it great. We are recording today on June 2, 2015. Your hosts, as usual, are Tobias Macy and Chris Patti. You can follow us on iTunes, Stitcher, or TuneIn Radio. And please give us feedback. You can leave us a comment on iTunes or Stitcher. Send us an email at host@podcastinit.com. Find us on Twitter or leave a comment on our show notes. And if you'd like, you can give us a donation to keep the show going. There are links on our site at podcastinit.com. Today, we're interviewing David Baumgold.
David, could you please introduce yourself? Hi.
[00:00:57] Unknown:
My name is David. I go by DB with some people at work. Been doing Python for probably about 8 or 9 years or so. I'm really into open source, really into community building and talk with other people. And I work for a company called edX as a developer advocate, helping people to understand the software that we build and to make it more awesome together because it's open source.
[00:01:21] Unknown:
David, how did you get introduced to Python?
[00:01:25] Unknown:
Well, I was in college and I started learning about computer science sort of on a whim because I had a friend back in high school who was interested in computers. And I decided I wanted to learn more about it, but I could chat with him more intelligently. At college, they were teaching me Java, which was okay. But I had a different friend suggest that I start learning Python, which was more fun and faster. And so while I was learning Java in my college classes, I started learning Python on the side and discovered that I liked it a whole lot better. And I was able to do things like building websites and stuff. So I started sort of rolling with that, and it's worked out a lot better for me.
[00:02:07] Unknown:
So I understand that you wrote a library called FlaskDance. Could you describe what it does and what it solves that wasn't covered by some of the other libraries that may be similar to it?
[00:02:18] Unknown:
Sure. So FlaskDance is a library that works with the Flask web framework. It's designed to make it easier to use OAuth as a web application. And OAuth is the protocol that you use to allow 2 websites to communicate information with each other about a user with that user's consent. So basically what that means is anytime you see that login with Facebook button or login with Google or login with Twitter or whatever, Any of those situations where you're going to a website and asking it to work with 1 of these big providers like Facebook. They use this protocol called OAuth to allow the user to specify that it's okay to give that information to this random third party website.
And OAuth is a very complicated protocol I discovered when I was trying to use it in some websites that I was building. And I realized that I could abstract all of the complexity of OAuth and wrap it all into 1 little library that I could then make available to all of the Flask apps that I and other people want to build. Because this was something that I wanted to use multiple times, and I didn't want to have to go through all the complexity of OAuth more than once.
[00:03:30] Unknown:
Definitely sounds like a reasonable pursuit having looked at some of the specifications and also having to decide between version 1 or version 2. And okay, which type of implementation do I want to use?
[00:03:43] Unknown:
So you may have kinda partially covered this a little bit in the first question. But what were some of the technical issues that you encountered while building Flaskiance?
[00:03:53] Unknown:
Well, I have some good resources to work with to start, which made it easier to get around some of the technical issues, because actually Flask dance is not the only library out there that does OAuth for the Flask web framework. There are actually 2 other libraries out there that do this. So there's Flask OAuth and there's Flask OAuth lib. So I was able to look at both of these libraries and get some ideas and inspiration from them. The reason why I decided to write my own rather than using 1 of those 2 is because I really didn't like the APIs that those libraries exposed. I really like the if if you've used the requests framework or making web requests, the author, Kenneth Wrights, talks about how important it is to have an API that is usable and understandable and human friendly.
And I took a lot of pains to try to make blast dance, follow those ideas of being a lot easier to understand and a lot more human friendly, I suppose. So I guess the biggest challenge that I had was trying to wrap my head around how the OAuth protocols worked and how the various different libraries that are underlying FlaskBands work and how to wrap them all up in such a way that I could still provide all of the power and flexibility that something like OAuth lid provides to you while still making it very easy and straightforward to get started with. I'd say that was the biggest challenge.
[00:05:23] Unknown:
Okay. And what are some of the design considerations that you had while you were building FlaskDance more focusing on the internals in terms of maybe some design patterns that you employed or particular pain points that you were trying to cover up?
[00:05:40] Unknown:
Well, the pain points that I saw in most other implementations of OAuth is that they expect you to write a couple of very specific views for your web application and to call very specific methods in very specific ways. And it's very easy to get all of those fiddly bits a little bit wrong or a lot wrong. So I guess I primarily tried to use principle of encapsulation to abstract all that information out. Flask has this principle of something called a blueprint, which is basically a set of views that you can attach to your web application that are independent of everything else going on in your web application.
So I wrote up ClassDance to be a blueprint. The blueprint exposes 2 different views or 3 in in the case of OAuth 1, I believe, that allow you to handle the original authentication and the authorization views that are required by the OAuth protocol. And you don't have to worry about actually coding up those views. All you have to do is configure this blueprint to work with the provider that you want, which is to say, face Facebook or Google or whoever. And give it the application ID and application secret that you get by registering your app on Facebook or Google or whatever you want, and then attach the blueprint to your application.
So it sort of abstracts all of that away from you.
[00:07:11] Unknown:
You also built Webhook DB for for replicating GitHub's information to be queryable. What are some use cases for which you'd wanna do that?
[00:07:20] Unknown:
So that was a case where I was actually I started off doing this for a project for my company. As I said, the software that edX makes is open source, and it's all hosted on GitHub. And I work on a team that works very closely with our open source community and tries to make sure that they're as happy and as productive as possible. So 1 of the things that we wanted to do at my job was to get an idea of how many pull requests were coming in over time and to see how long it took for them to be merged so that we could try to work towards getting them merged more quickly and more reliably. And we started using the GitHub APIs to try to determine that information, but quickly ran up against the problem that the APIs have a rate limit.
You can make at most 5, 000 requests per hour, which sounds like plenty. But our main repository alone at this point has over 8, 000 pull requests historically. And we want to be able to run reports that get information about all pull requests since we open sourced the project to be able to generate information like graphs of number of pull requests over time since the beginning. And so we hit that limit pretty quickly and discovered that the only way to work around that was to just do the first 5, 000 and then wait an hour. And then the next 5000 and wait an hour and so on, which aside from being very slow and unsatisfying also meant that the data changed in the course of trying to collect all of it, especially for the more recent pull requests.
An hour during the course of the Workday can make for a lot of changes in our repository. So I thought about the problem from a different angle and realized that basically what we were trying to do was we were trying to run reports like you might do in database sort of fashion. And I also realized that getting information over the GitHub API could almost be looked at as making queries to a remote database. So I was wondering if there's a way that we could replicate that database to 1 that we could control so that we could just run normal SQL queries over that database and generate reports that way. And fortunately, GitHub has this concept of a webhook, which basically says, something can ask GitHub to notify a specific HTTP URL anytime an event happens.
So what webhookdb is, is it's basically just a very thin web application layer over a postgres database. And it exposes a couple of HTTP API endpoints. And you can set up GitHub to say, every time an event happens on this repository that I care about, just send that information over to my web DB server. And all of the server does is it takes that information and stores it into the local database. So basically what you've got set up is database replication over HTTP, which I thought was a kind of cool concept. And then once you've got that, you have a local database running that you have access to in order to run any sort of arbitrary SQL SQL queries that you want, which is being constantly kept up to date dynamically by this HTTP replication.
And you can run any sort of queries that you want rather than being restricted to just the API endpoints that GitHub exposes.
[00:10:52] Unknown:
That sounds very cool. And it definitely sounds like it gives a lot of capability for doing some more in-depth data analysis than you might be able to do just by using the APIs directly. Have you thought about adding in some sort of a dashboard capability to Webhook DB so that you can have some pregenerated
[00:11:10] Unknown:
reports that are easy to access for people who are just getting started using it? That would definitely be really cool. Even just to have that graph that we were originally interested in in something like pull requests over time. It's a project that I've sort of been building in fits and starts, trying to add a feature at a time as it becomes useful to me, as it's something that I think that we'll need. And so far, I haven't had to add that capability of an actual meaningful user interface because we can just get the data that we need by running SQL queries. But it definitely would be nice to do that. Another thing that I'd really like to be able to do is I'd love to be able to use some of the permissioning model built into Postgres so that we could you could have a server set up where when you log in to the webhook DB server using your GitHub credentials, again, using FlaskDance to to handle that OAuth stuff.
It would be really nice if a webhook DB server could then create a new account for you on Postgres database and set up permissions on that Postgres account so that you could see only your own repositories. And then just give you the username and password to that Postgres so that you could log in using the Postgres client and run your own queries that way as well. It's just a question of coding that up in such a way that is it's secure and people don't see each other's private repositories.
[00:12:33] Unknown:
Yeah. Security and authorization and role access are always tricky things to have to deal with particularly when you have to try to conceptualize it from the ground up without having any sort of a framework already in place. Although, to some extent, GitHub gives you at least the roles already, so might simplify that a little bit.
[00:12:55] Unknown:
What is Open edX, and what is its intended audience?
[00:12:59] Unknown:
Okay. So Open edX is this open source software platform that I was talking about earlier. It is a MOOC, a Massive Open Online Course Provider. And basically what that means is if you've ever gone to edx.org or Coursera or Udacity, they're big platforms where you can sign up for university courses, and you can learn about just about any topic that you want. You can basically get a university education. And most of the time, those courses are available for free, which is awesome. So Open edX is the software platform that underlies edx.org.
It's actually exactly the same software that my company uses to run edx.org. And that means that anybody around the world can use the software, stand up their own Open edX server and can offer their own courses and can provide them to other people. Now I should also mention that there's a difference between the platform and the courses that are offered on top of that platform. So Open edX allows you to create your own courses, doesn't have the course content available that you would see on edX.org because those are owned by the universities that created those courses. But basically, it means that you can create your own courses, which are just as high quality or more so than the courses that you might find on edx.org.
And you can share those courses with other people as well so that you can collaboratively build interesting information and teach people new things. It's a really exciting project, and we're sort of taking a gamble by making it open because we're hoping that people are going to use the
[00:14:56] Unknown:
Yeah. I've definitely been seeing big growth in open source as a business model where even historically a number of companies have used open source and contributed back to open source, but they haven't necessarily opened up their entire platform. But I've seen and read about a few companies who have been doing that lately and have been seeing very good results as a as a effect of that because people are much more willing to engage with them as a company because they have a better idea of what what they're actually interacting with. And also I was reading some of the tweet stream from, I think it's MTech, the MIT sort of conference that's going on right now where 1 of their presenters was talking about how particularly going into the future, your average citizen is going to need to have some some amount of data literacy.
And as a result of that, they're going to need and desire to have a more transparent view of the data that is present in the platforms that they interact with and how that data is being used and what's being done with it. So having oh, having edX doing that with online courses definitely seems very beneficial both now and into future as people become more aware of what what's actually being done with the information that they provide to companies.
[00:16:18] Unknown:
Absolutely. And by the way, I should mention since it's topical for this podcast, edX is written in Python. So anybody who's interested in Python can check out the source code and hack away using that language.
[00:16:33] Unknown:
Definitely. And I've looked at the source code a little bit, and it looks like it's a fairly sizable project. So what are some of the complications or challenges that you've come across in using and building a large code base based entirely off of
[00:16:49] Unknown:
Python? There's been a fair amount of challenges to be honest. When edX started the the software project, initially, it was closed source for the 1st year or so of development. And it was in the sort of startup mode of we need to move very, very quickly because we're struggling to survive. And so a lot of the code in the Open edX software project initially came out of sort of this frenzy burst of we just need to get something that will work. So as a result, some of the code is kind of difficult to follow. And we've accumulated some tech debt from that.
But we're working really hard to pay that down and to take the parts that are more difficult to follow and more difficult to run safely and and performantly, and to rewrite them and make them work better. But because the software project is so large, as you said, another big problem that we've run into is trying to have many different software developers work on this big project and have consistency across the entire project as well. So for example, there's a big focus that we have now days on creating APIs so that other people and other software projects can utilize those APIs and integrate with an open edX installation, whether it's edX.org or any other 1 around the world.
And lots of people were trying to build these new APIs within the company, and we discovered that there was a fair amount of duplication. So for example, an API to get information about a course running on Open edX. I think there end up being 6 different APIs that could get that information by the time we discovered that this duplication was happening. So it's been a, challenge to try to identify everything that's going on to try to unite all the teams and say, okay, We need this information, but you need that information. Let's find a way to build another piece of the software that both of us can build on top of in order to provide that information in a way that's maintainable and workable for everyone.
[00:19:09] Unknown:
And as you mentioned, Open edX is available for anybody to install and run and build their own courses on top of. So are you aware of any large high profile universities or organizations using that or any particularly interesting uses of that platform?
[00:19:27] Unknown:
Well, there are plenty of big universities using it. And the first 1 that comes to mind actually is Stanford University. Stanford was the the the primary impetus for us going open source in the first place. We were always intending to open the code up. But Stanford basically said, we'd love to use your code, but only if you actually open it up and don't give it to us privately. And if you don't, then it's not gonna happen. So Stanford was the first user of the Open edX codebase. And they're still running tons of tons of classes on their website and contributing to the codebase, helping out with BME and such. There's also a number of international partners.
So I actually just came back from a massive hackathon that was held in France, actually, focused entirely on Open edX. Apparently, the French government is using Open edX in various different capacities for their national education system. And there are a number of organizations in France that are also using it in other ways, commercially or non commercially. And this hackathon was a, as I said, it was a national thing, uniting 9 different cities in France simultaneously working on this on the software and ending up with a big competition to judge what was the most interesting and valuable project to come out of the hackathon.
So we have local usage in the US. We have international usage. And if you wanna see more of the places where Open edX is being used, there's actually a list of sites on our GitHub, on on 1 of the Wiki pages of places where people have self reported that they're using Open edX. And the list is quite long. I think we're up to somewhere around a 100 or so sites out there that we know about. There's probably a lot more. We have the picks. So, Tobias, why don't you kick it off? Sure.
[00:21:29] Unknown:
So for my first pick today, I'm going to choose evil mode, which is an emacs plug in that gives you some really phenomenal VIM emulation. So in my course of choosing and using different text editors and IDEs, I went from Sublime Text to Vim and have now set up on emacs, but I was never quite happy with the movement keys in emacs. And so I was very happy when I came across evil mode because I really liked the ergonomics of Vim, but really liked the interactivity of e max. So with evil mode and e max, you get the best of both worlds. For my second pick, I'm going to choose forgotify, which is a project that somebody built where they use the Spotify APIs to retrieve a list of every song on Spotify that has never been played even once by anyone.
And you can go to the website and it will serve up a random song from that selection, and then you can click play and it will play in Spotify. So in an interesting way, it's sort of a self destructing project because as more people become aware of it and listen to all of these songs, the pool of songs that have never been listened to shrinks. So it's a very interesting use of data analysis and sort of a interactive exploration of data. For my next pick, I'm going to choose Wolf of Wall Street, which is a very bizarre movie, so much so that it can only be based on a true story, which it is. It's the story of a man named Jordan Belfort in the, you know, early to late eighties and I believe early nineties as well, and his entry into Wall Street and making a whole lot of money on in that area. So I I just can't even begin to do the movie justice by description. So it's worth at least look at watching the trailer if you don't watch the whole movie. It stars Leonardo DiCaprio who does a wonderful job in the role, and just watch it if you if you're so inclined.
For my last pick, I'm going to choose pipreqs, which is a Python project that will analyze all the import statements in your Python code and auto auto generate a requirements dot TXT for use with pip. So that if you either haven't generated a pip requirements of text file yet or you're trying to clean up the 1 that you've got, it's a good starting point for that. So, Chris, take it away.
[00:24:12] Unknown:
So my first pick is a beer, and it's a beer with a rather unusual name. It is a beer called smells like a safety meeting from Dark Horse Brewing. And the name is even funnier and when you find out, as I did, that the beer's original name was smells like weed, but, apparently, various government regulators had kind of an issue with that, so they had to change the name. It's an imperial IPA, and it's very tasty. It's because it's an imperial IPA, it's it's not doesn't have those really, really sort of intense, almost, acrid hops notes. It's it's got some fruity character. I really like it.
So my next pick is going to be Medium. Medium is so, you know, obviously, there are a bunch of magazine sites around these days, slate and and the like. What I really like about Medium is that they report on a hugely wide variety of topics. Some of them kinda mainstream and current. Some of them really, really not, like dredging up things from history or going through various interesting discoveries from science and how they might relate or, you know, they they'll have celebrity writers. It I it's just it's you know, I use Pocket to capture my articles that I then read on the on the subway or whatever the case may be. And Medium is is really making it hard to not let that bloat out of control because they're just a constant stream of good articles that they keep writing that I find really interesting. So my last pick is Modern Ganoo Emacs. Now I realize it's kinda silly picking Emacs. If you're a geek, you know all about it. You've heard about it. You've probably had some oldster like me rant and roar about it. But the key thing is the experience of using and and just getting set up with and learning Emacs is so different from when I last touched it, like, 15 years ago.
I I reached a point where I was getting kinda frustrated with Vim in that I was trying to set up, you know, some really good text based completion for languages like Python and Ruby for my Chef work and things like that. And some of the extensions that I was using for that were conflicting in odd ways, causing performance degradation, and and just generally giving me a bad time. And I realized that all these things are kind of much more natively available in in enacs because its extension system is Elisp and has been around forever and is very mature and hardy and all that.
So it has been a just a great experience, and the the package system makes getting and installing in extensions a total breeze, which it never was before. It used to be kinda like pulling teeth. You had to download the package yourself, configure a large chunk of Elisp and t r.emax, and pray. And now it's pretty much like, I want this thing, install it, maybe 1 or 2 lines to activate it, and away you go. So it's been it's been great. And and what really surprised me, I thought I was gonna end up using Tobias' pick evil, but, I have ended up, like apparently, e max is like riding a bike just, you know, despite the fact that it's been, like I say, over a decade since I last used it, all the key bindings just have come right back to me. So I'm really loving it.
David, why don't you give us your picks for this week?
[00:27:36] Unknown:
Sure. So the first thing that comes to mind, especially since you were just talking about package managers and emacs right now, is Homebrew, which is a package manager for the Mac. Love it. Which allows you to easily install open source software from all around the world, various repositories just by doing brew install name of package here. And it also has an extension called homebrew task, which is optimized to work with the sort of binaries that you download from the Internet, open up the DMG file and drag the application to your applications directory, Homebrew Cast takes care of all that stuff for you as well. So you can install and update applications on your computer in a standard package manager work flow. So that's pretty cool.
The second pick that I have is actually a combo. There's Arrow, which is a Python date implementation, and moment JS, which is the same sort of thing in JavaScript. And both of these libraries take the standard sort of date based arithmetic parsing, displaying information that you that you need to do all the time in programming. And just make those APIs way more simple, intuitive, friendly, usable, particularly for the JavaScript implementation because JavaScript's date API is sort of painful to use at times. But even in Python, allowing you to do things like easily parsing dates, allowing you to easily humanize them and I'll put them in in different formats and different languages. It's just really nice and really smooth, and I highly recommend it.
And I guess the third 1 that I'll say is a movie called The Imitation Game, which I just saw very recently. It's a movie about Alan Turing, and it's set during, I believe it was World War 2. And it was showing how Turing managed to crack the Enigma code that was used by the Germans for communication. And but on another level, the movie is also about trying to communicate with other people, especially for a lot of people who are into software and find that it's difficult to really communicate with other people out there. Alan Turing had a lot of issues relating to other people. And this movie was a fascinating insight into sort of getting around that psyche and and learning how to relate to other people, the pros and cons thereof.
[00:30:09] Unknown:
That's great. I'm a huge fan of the Imitation Game. I I think Benedict Cumberbatch was amazing in that role, and, and I I loved it. I couldn't I can't recommend it to more people. My family and friends are rather sick of me gushing about it, so very cool. And homebrew is 1 of those things. Homebrew and homebrew cask both are 2 things that are, in my opinion, fantastic examples of how the Mac, you know, despite the fact that there are parts of it that are closed source, just ends up pre can create this amazing, extensible technical platform for for techie people like us to work with. It's it's such a joy to use. And I know there are package managers for their platforms, but Homebrew is just great. I really love it to bits. Absolutely.
So how can your our listeners best keep in touch with you and and follow your your ongoing development and other things?
[00:31:04] Unknown:
Well, I'm my handle on the Internet is singing wolf boy. So that's my handle on Twitter, on GitHub. You can also check out my website at davidbaumgold.com, which is hasn't been updated in a while, but it's out there. And you're welcome to email me as well. It's just david@davidbombold.com. I'd love to hear from you.
[00:31:28] Unknown:
You know, I just I really quickly just wanna kind of address 1 thing Tobias mentioned in the beginning. Please do give us your feedback. We really do appreciate it. 2 people have been kind enough to review us on iTunes and I have to say reading those reviews just made me happy. So thank you to those 2 listeners and anybody else who enjoys with their hearing. Please drop us a line or give us a review. We do really appreciate it.
Hello, and welcome to podcast dot init. The podcast about Python and the people who make it great. We are recording today on June 2, 2015. Your hosts, as usual, are Tobias Macy and Chris Patti. You can follow us on iTunes, Stitcher, or TuneIn Radio. And please give us feedback. You can leave us a comment on iTunes or Stitcher. Send us an email at host@podcastinit.com. Find us on Twitter or leave a comment on our show notes. And if you'd like, you can give us a donation to keep the show going. There are links on our site at podcastinit.com. Today, we're interviewing David Baumgold.
David, could you please introduce yourself? Hi.
[00:00:57] Unknown:
My name is David. I go by DB with some people at work. Been doing Python for probably about 8 or 9 years or so. I'm really into open source, really into community building and talk with other people. And I work for a company called edX as a developer advocate, helping people to understand the software that we build and to make it more awesome together because it's open source.
[00:01:21] Unknown:
David, how did you get introduced to Python?
[00:01:25] Unknown:
Well, I was in college and I started learning about computer science sort of on a whim because I had a friend back in high school who was interested in computers. And I decided I wanted to learn more about it, but I could chat with him more intelligently. At college, they were teaching me Java, which was okay. But I had a different friend suggest that I start learning Python, which was more fun and faster. And so while I was learning Java in my college classes, I started learning Python on the side and discovered that I liked it a whole lot better. And I was able to do things like building websites and stuff. So I started sort of rolling with that, and it's worked out a lot better for me.
[00:02:07] Unknown:
So I understand that you wrote a library called FlaskDance. Could you describe what it does and what it solves that wasn't covered by some of the other libraries that may be similar to it?
[00:02:18] Unknown:
Sure. So FlaskDance is a library that works with the Flask web framework. It's designed to make it easier to use OAuth as a web application. And OAuth is the protocol that you use to allow 2 websites to communicate information with each other about a user with that user's consent. So basically what that means is anytime you see that login with Facebook button or login with Google or login with Twitter or whatever, Any of those situations where you're going to a website and asking it to work with 1 of these big providers like Facebook. They use this protocol called OAuth to allow the user to specify that it's okay to give that information to this random third party website.
And OAuth is a very complicated protocol I discovered when I was trying to use it in some websites that I was building. And I realized that I could abstract all of the complexity of OAuth and wrap it all into 1 little library that I could then make available to all of the Flask apps that I and other people want to build. Because this was something that I wanted to use multiple times, and I didn't want to have to go through all the complexity of OAuth more than once.
[00:03:30] Unknown:
Definitely sounds like a reasonable pursuit having looked at some of the specifications and also having to decide between version 1 or version 2. And okay, which type of implementation do I want to use?
[00:03:43] Unknown:
So you may have kinda partially covered this a little bit in the first question. But what were some of the technical issues that you encountered while building Flaskiance?
[00:03:53] Unknown:
Well, I have some good resources to work with to start, which made it easier to get around some of the technical issues, because actually Flask dance is not the only library out there that does OAuth for the Flask web framework. There are actually 2 other libraries out there that do this. So there's Flask OAuth and there's Flask OAuth lib. So I was able to look at both of these libraries and get some ideas and inspiration from them. The reason why I decided to write my own rather than using 1 of those 2 is because I really didn't like the APIs that those libraries exposed. I really like the if if you've used the requests framework or making web requests, the author, Kenneth Wrights, talks about how important it is to have an API that is usable and understandable and human friendly.
And I took a lot of pains to try to make blast dance, follow those ideas of being a lot easier to understand and a lot more human friendly, I suppose. So I guess the biggest challenge that I had was trying to wrap my head around how the OAuth protocols worked and how the various different libraries that are underlying FlaskBands work and how to wrap them all up in such a way that I could still provide all of the power and flexibility that something like OAuth lid provides to you while still making it very easy and straightforward to get started with. I'd say that was the biggest challenge.
[00:05:23] Unknown:
Okay. And what are some of the design considerations that you had while you were building FlaskDance more focusing on the internals in terms of maybe some design patterns that you employed or particular pain points that you were trying to cover up?
[00:05:40] Unknown:
Well, the pain points that I saw in most other implementations of OAuth is that they expect you to write a couple of very specific views for your web application and to call very specific methods in very specific ways. And it's very easy to get all of those fiddly bits a little bit wrong or a lot wrong. So I guess I primarily tried to use principle of encapsulation to abstract all that information out. Flask has this principle of something called a blueprint, which is basically a set of views that you can attach to your web application that are independent of everything else going on in your web application.
So I wrote up ClassDance to be a blueprint. The blueprint exposes 2 different views or 3 in in the case of OAuth 1, I believe, that allow you to handle the original authentication and the authorization views that are required by the OAuth protocol. And you don't have to worry about actually coding up those views. All you have to do is configure this blueprint to work with the provider that you want, which is to say, face Facebook or Google or whoever. And give it the application ID and application secret that you get by registering your app on Facebook or Google or whatever you want, and then attach the blueprint to your application.
So it sort of abstracts all of that away from you.
[00:07:11] Unknown:
You also built Webhook DB for for replicating GitHub's information to be queryable. What are some use cases for which you'd wanna do that?
[00:07:20] Unknown:
So that was a case where I was actually I started off doing this for a project for my company. As I said, the software that edX makes is open source, and it's all hosted on GitHub. And I work on a team that works very closely with our open source community and tries to make sure that they're as happy and as productive as possible. So 1 of the things that we wanted to do at my job was to get an idea of how many pull requests were coming in over time and to see how long it took for them to be merged so that we could try to work towards getting them merged more quickly and more reliably. And we started using the GitHub APIs to try to determine that information, but quickly ran up against the problem that the APIs have a rate limit.
You can make at most 5, 000 requests per hour, which sounds like plenty. But our main repository alone at this point has over 8, 000 pull requests historically. And we want to be able to run reports that get information about all pull requests since we open sourced the project to be able to generate information like graphs of number of pull requests over time since the beginning. And so we hit that limit pretty quickly and discovered that the only way to work around that was to just do the first 5, 000 and then wait an hour. And then the next 5000 and wait an hour and so on, which aside from being very slow and unsatisfying also meant that the data changed in the course of trying to collect all of it, especially for the more recent pull requests.
An hour during the course of the Workday can make for a lot of changes in our repository. So I thought about the problem from a different angle and realized that basically what we were trying to do was we were trying to run reports like you might do in database sort of fashion. And I also realized that getting information over the GitHub API could almost be looked at as making queries to a remote database. So I was wondering if there's a way that we could replicate that database to 1 that we could control so that we could just run normal SQL queries over that database and generate reports that way. And fortunately, GitHub has this concept of a webhook, which basically says, something can ask GitHub to notify a specific HTTP URL anytime an event happens.
So what webhookdb is, is it's basically just a very thin web application layer over a postgres database. And it exposes a couple of HTTP API endpoints. And you can set up GitHub to say, every time an event happens on this repository that I care about, just send that information over to my web DB server. And all of the server does is it takes that information and stores it into the local database. So basically what you've got set up is database replication over HTTP, which I thought was a kind of cool concept. And then once you've got that, you have a local database running that you have access to in order to run any sort of arbitrary SQL SQL queries that you want, which is being constantly kept up to date dynamically by this HTTP replication.
And you can run any sort of queries that you want rather than being restricted to just the API endpoints that GitHub exposes.
[00:10:52] Unknown:
That sounds very cool. And it definitely sounds like it gives a lot of capability for doing some more in-depth data analysis than you might be able to do just by using the APIs directly. Have you thought about adding in some sort of a dashboard capability to Webhook DB so that you can have some pregenerated
[00:11:10] Unknown:
reports that are easy to access for people who are just getting started using it? That would definitely be really cool. Even just to have that graph that we were originally interested in in something like pull requests over time. It's a project that I've sort of been building in fits and starts, trying to add a feature at a time as it becomes useful to me, as it's something that I think that we'll need. And so far, I haven't had to add that capability of an actual meaningful user interface because we can just get the data that we need by running SQL queries. But it definitely would be nice to do that. Another thing that I'd really like to be able to do is I'd love to be able to use some of the permissioning model built into Postgres so that we could you could have a server set up where when you log in to the webhook DB server using your GitHub credentials, again, using FlaskDance to to handle that OAuth stuff.
It would be really nice if a webhook DB server could then create a new account for you on Postgres database and set up permissions on that Postgres account so that you could see only your own repositories. And then just give you the username and password to that Postgres so that you could log in using the Postgres client and run your own queries that way as well. It's just a question of coding that up in such a way that is it's secure and people don't see each other's private repositories.
[00:12:33] Unknown:
Yeah. Security and authorization and role access are always tricky things to have to deal with particularly when you have to try to conceptualize it from the ground up without having any sort of a framework already in place. Although, to some extent, GitHub gives you at least the roles already, so might simplify that a little bit.
[00:12:55] Unknown:
What is Open edX, and what is its intended audience?
[00:12:59] Unknown:
Okay. So Open edX is this open source software platform that I was talking about earlier. It is a MOOC, a Massive Open Online Course Provider. And basically what that means is if you've ever gone to edx.org or Coursera or Udacity, they're big platforms where you can sign up for university courses, and you can learn about just about any topic that you want. You can basically get a university education. And most of the time, those courses are available for free, which is awesome. So Open edX is the software platform that underlies edx.org.
It's actually exactly the same software that my company uses to run edx.org. And that means that anybody around the world can use the software, stand up their own Open edX server and can offer their own courses and can provide them to other people. Now I should also mention that there's a difference between the platform and the courses that are offered on top of that platform. So Open edX allows you to create your own courses, doesn't have the course content available that you would see on edX.org because those are owned by the universities that created those courses. But basically, it means that you can create your own courses, which are just as high quality or more so than the courses that you might find on edx.org.
And you can share those courses with other people as well so that you can collaboratively build interesting information and teach people new things. It's a really exciting project, and we're sort of taking a gamble by making it open because we're hoping that people are going to use the
[00:14:56] Unknown:
Yeah. I've definitely been seeing big growth in open source as a business model where even historically a number of companies have used open source and contributed back to open source, but they haven't necessarily opened up their entire platform. But I've seen and read about a few companies who have been doing that lately and have been seeing very good results as a as a effect of that because people are much more willing to engage with them as a company because they have a better idea of what what they're actually interacting with. And also I was reading some of the tweet stream from, I think it's MTech, the MIT sort of conference that's going on right now where 1 of their presenters was talking about how particularly going into the future, your average citizen is going to need to have some some amount of data literacy.
And as a result of that, they're going to need and desire to have a more transparent view of the data that is present in the platforms that they interact with and how that data is being used and what's being done with it. So having oh, having edX doing that with online courses definitely seems very beneficial both now and into future as people become more aware of what what's actually being done with the information that they provide to companies.
[00:16:18] Unknown:
Absolutely. And by the way, I should mention since it's topical for this podcast, edX is written in Python. So anybody who's interested in Python can check out the source code and hack away using that language.
[00:16:33] Unknown:
Definitely. And I've looked at the source code a little bit, and it looks like it's a fairly sizable project. So what are some of the complications or challenges that you've come across in using and building a large code base based entirely off of
[00:16:49] Unknown:
Python? There's been a fair amount of challenges to be honest. When edX started the the software project, initially, it was closed source for the 1st year or so of development. And it was in the sort of startup mode of we need to move very, very quickly because we're struggling to survive. And so a lot of the code in the Open edX software project initially came out of sort of this frenzy burst of we just need to get something that will work. So as a result, some of the code is kind of difficult to follow. And we've accumulated some tech debt from that.
But we're working really hard to pay that down and to take the parts that are more difficult to follow and more difficult to run safely and and performantly, and to rewrite them and make them work better. But because the software project is so large, as you said, another big problem that we've run into is trying to have many different software developers work on this big project and have consistency across the entire project as well. So for example, there's a big focus that we have now days on creating APIs so that other people and other software projects can utilize those APIs and integrate with an open edX installation, whether it's edX.org or any other 1 around the world.
And lots of people were trying to build these new APIs within the company, and we discovered that there was a fair amount of duplication. So for example, an API to get information about a course running on Open edX. I think there end up being 6 different APIs that could get that information by the time we discovered that this duplication was happening. So it's been a, challenge to try to identify everything that's going on to try to unite all the teams and say, okay, We need this information, but you need that information. Let's find a way to build another piece of the software that both of us can build on top of in order to provide that information in a way that's maintainable and workable for everyone.
[00:19:09] Unknown:
And as you mentioned, Open edX is available for anybody to install and run and build their own courses on top of. So are you aware of any large high profile universities or organizations using that or any particularly interesting uses of that platform?
[00:19:27] Unknown:
Well, there are plenty of big universities using it. And the first 1 that comes to mind actually is Stanford University. Stanford was the the the primary impetus for us going open source in the first place. We were always intending to open the code up. But Stanford basically said, we'd love to use your code, but only if you actually open it up and don't give it to us privately. And if you don't, then it's not gonna happen. So Stanford was the first user of the Open edX codebase. And they're still running tons of tons of classes on their website and contributing to the codebase, helping out with BME and such. There's also a number of international partners.
So I actually just came back from a massive hackathon that was held in France, actually, focused entirely on Open edX. Apparently, the French government is using Open edX in various different capacities for their national education system. And there are a number of organizations in France that are also using it in other ways, commercially or non commercially. And this hackathon was a, as I said, it was a national thing, uniting 9 different cities in France simultaneously working on this on the software and ending up with a big competition to judge what was the most interesting and valuable project to come out of the hackathon.
So we have local usage in the US. We have international usage. And if you wanna see more of the places where Open edX is being used, there's actually a list of sites on our GitHub, on on 1 of the Wiki pages of places where people have self reported that they're using Open edX. And the list is quite long. I think we're up to somewhere around a 100 or so sites out there that we know about. There's probably a lot more. We have the picks. So, Tobias, why don't you kick it off? Sure.
[00:21:29] Unknown:
So for my first pick today, I'm going to choose evil mode, which is an emacs plug in that gives you some really phenomenal VIM emulation. So in my course of choosing and using different text editors and IDEs, I went from Sublime Text to Vim and have now set up on emacs, but I was never quite happy with the movement keys in emacs. And so I was very happy when I came across evil mode because I really liked the ergonomics of Vim, but really liked the interactivity of e max. So with evil mode and e max, you get the best of both worlds. For my second pick, I'm going to choose forgotify, which is a project that somebody built where they use the Spotify APIs to retrieve a list of every song on Spotify that has never been played even once by anyone.
And you can go to the website and it will serve up a random song from that selection, and then you can click play and it will play in Spotify. So in an interesting way, it's sort of a self destructing project because as more people become aware of it and listen to all of these songs, the pool of songs that have never been listened to shrinks. So it's a very interesting use of data analysis and sort of a interactive exploration of data. For my next pick, I'm going to choose Wolf of Wall Street, which is a very bizarre movie, so much so that it can only be based on a true story, which it is. It's the story of a man named Jordan Belfort in the, you know, early to late eighties and I believe early nineties as well, and his entry into Wall Street and making a whole lot of money on in that area. So I I just can't even begin to do the movie justice by description. So it's worth at least look at watching the trailer if you don't watch the whole movie. It stars Leonardo DiCaprio who does a wonderful job in the role, and just watch it if you if you're so inclined.
For my last pick, I'm going to choose pipreqs, which is a Python project that will analyze all the import statements in your Python code and auto auto generate a requirements dot TXT for use with pip. So that if you either haven't generated a pip requirements of text file yet or you're trying to clean up the 1 that you've got, it's a good starting point for that. So, Chris, take it away.
[00:24:12] Unknown:
So my first pick is a beer, and it's a beer with a rather unusual name. It is a beer called smells like a safety meeting from Dark Horse Brewing. And the name is even funnier and when you find out, as I did, that the beer's original name was smells like weed, but, apparently, various government regulators had kind of an issue with that, so they had to change the name. It's an imperial IPA, and it's very tasty. It's because it's an imperial IPA, it's it's not doesn't have those really, really sort of intense, almost, acrid hops notes. It's it's got some fruity character. I really like it.
So my next pick is going to be Medium. Medium is so, you know, obviously, there are a bunch of magazine sites around these days, slate and and the like. What I really like about Medium is that they report on a hugely wide variety of topics. Some of them kinda mainstream and current. Some of them really, really not, like dredging up things from history or going through various interesting discoveries from science and how they might relate or, you know, they they'll have celebrity writers. It I it's just it's you know, I use Pocket to capture my articles that I then read on the on the subway or whatever the case may be. And Medium is is really making it hard to not let that bloat out of control because they're just a constant stream of good articles that they keep writing that I find really interesting. So my last pick is Modern Ganoo Emacs. Now I realize it's kinda silly picking Emacs. If you're a geek, you know all about it. You've heard about it. You've probably had some oldster like me rant and roar about it. But the key thing is the experience of using and and just getting set up with and learning Emacs is so different from when I last touched it, like, 15 years ago.
I I reached a point where I was getting kinda frustrated with Vim in that I was trying to set up, you know, some really good text based completion for languages like Python and Ruby for my Chef work and things like that. And some of the extensions that I was using for that were conflicting in odd ways, causing performance degradation, and and just generally giving me a bad time. And I realized that all these things are kind of much more natively available in in enacs because its extension system is Elisp and has been around forever and is very mature and hardy and all that.
So it has been a just a great experience, and the the package system makes getting and installing in extensions a total breeze, which it never was before. It used to be kinda like pulling teeth. You had to download the package yourself, configure a large chunk of Elisp and t r.emax, and pray. And now it's pretty much like, I want this thing, install it, maybe 1 or 2 lines to activate it, and away you go. So it's been it's been great. And and what really surprised me, I thought I was gonna end up using Tobias' pick evil, but, I have ended up, like apparently, e max is like riding a bike just, you know, despite the fact that it's been, like I say, over a decade since I last used it, all the key bindings just have come right back to me. So I'm really loving it.
David, why don't you give us your picks for this week?
[00:27:36] Unknown:
Sure. So the first thing that comes to mind, especially since you were just talking about package managers and emacs right now, is Homebrew, which is a package manager for the Mac. Love it. Which allows you to easily install open source software from all around the world, various repositories just by doing brew install name of package here. And it also has an extension called homebrew task, which is optimized to work with the sort of binaries that you download from the Internet, open up the DMG file and drag the application to your applications directory, Homebrew Cast takes care of all that stuff for you as well. So you can install and update applications on your computer in a standard package manager work flow. So that's pretty cool.
The second pick that I have is actually a combo. There's Arrow, which is a Python date implementation, and moment JS, which is the same sort of thing in JavaScript. And both of these libraries take the standard sort of date based arithmetic parsing, displaying information that you that you need to do all the time in programming. And just make those APIs way more simple, intuitive, friendly, usable, particularly for the JavaScript implementation because JavaScript's date API is sort of painful to use at times. But even in Python, allowing you to do things like easily parsing dates, allowing you to easily humanize them and I'll put them in in different formats and different languages. It's just really nice and really smooth, and I highly recommend it.
And I guess the third 1 that I'll say is a movie called The Imitation Game, which I just saw very recently. It's a movie about Alan Turing, and it's set during, I believe it was World War 2. And it was showing how Turing managed to crack the Enigma code that was used by the Germans for communication. And but on another level, the movie is also about trying to communicate with other people, especially for a lot of people who are into software and find that it's difficult to really communicate with other people out there. Alan Turing had a lot of issues relating to other people. And this movie was a fascinating insight into sort of getting around that psyche and and learning how to relate to other people, the pros and cons thereof.
[00:30:09] Unknown:
That's great. I'm a huge fan of the Imitation Game. I I think Benedict Cumberbatch was amazing in that role, and, and I I loved it. I couldn't I can't recommend it to more people. My family and friends are rather sick of me gushing about it, so very cool. And homebrew is 1 of those things. Homebrew and homebrew cask both are 2 things that are, in my opinion, fantastic examples of how the Mac, you know, despite the fact that there are parts of it that are closed source, just ends up pre can create this amazing, extensible technical platform for for techie people like us to work with. It's it's such a joy to use. And I know there are package managers for their platforms, but Homebrew is just great. I really love it to bits. Absolutely.
So how can your our listeners best keep in touch with you and and follow your your ongoing development and other things?
[00:31:04] Unknown:
Well, I'm my handle on the Internet is singing wolf boy. So that's my handle on Twitter, on GitHub. You can also check out my website at davidbaumgold.com, which is hasn't been updated in a while, but it's out there. And you're welcome to email me as well. It's just david@davidbombold.com. I'd love to hear from you.
[00:31:28] Unknown:
You know, I just I really quickly just wanna kind of address 1 thing Tobias mentioned in the beginning. Please do give us your feedback. We really do appreciate it. 2 people have been kind enough to review us on iTunes and I have to say reading those reviews just made me happy. So thank you to those 2 listeners and anybody else who enjoys with their hearing. Please drop us a line or give us a review. We do really appreciate it.