David Baumgold on Flask-Dance, WebhookDB and Open EdX

Hello, and welcome to podcast dot init. The podcast about Python and the people who make it great.

We are recording today on June 2, 2015.

Your hosts, as usual, are Tobias Macy and Chris Patti.

You can follow us on iTunes,

Stitcher, or TuneIn Radio.

And please give us feedback. You can leave us a comment on iTunes or Stitcher.

Send us an email at host@podcastinit.com.

Find us on Twitter or leave a comment on our show notes.

And if you'd like, you can give us a donation to keep the show going. There are links on our site at podcastinit.com.

Today, we're interviewing David Baumgold.

David, could you please introduce yourself?

Hi.

My name is David. I go by DB with some people at work. Been doing Python for probably about 8 or 9 years or so.

I'm really into open source, really into community building and talk with other people.

And I work for a company called edX as a developer advocate, helping people to understand the software that we build and to make it more awesome together because it's open source.

David, how did you get introduced to Python?

Well, I was in college

and I started learning about computer science sort of on a whim because I had a friend back in high school who was interested in computers.

And I decided I wanted to learn more about it, but I could chat with him more intelligently.

At college, they were teaching me Java, which was okay.

But I had a different friend suggest that I start learning Python, which was more fun and faster.

And so while I was learning Java in my college classes, I started learning Python on the side

and discovered that I liked it a whole lot better. And I was able to do things like building websites and stuff.

So I started sort of rolling with that, and it's worked out a lot better for me.

So I understand that you wrote a library called FlaskDance.

Could you

describe what it does and what it solves that wasn't covered by some of the other libraries that may be similar to it?

Sure.

So FlaskDance is a library that works with the Flask web framework.

It's designed to make it easier to use OAuth as a web application.

And OAuth is the protocol that you use to allow 2 websites to communicate information with each other about a user with that user's consent.

So basically what that means is anytime you see that login with Facebook button or login with Google or login with Twitter or whatever,

Any of those situations where you're going to a website and asking it to work with 1 of these big providers like Facebook.

They use this protocol called OAuth to allow the user to specify that it's okay to give that information to this random third party website.

And OAuth is a very

complicated protocol I discovered when I was trying to use it in some websites that I was building.

And I realized that I could abstract all of the complexity of OAuth and wrap it all into 1 little library

that I could then make available to all of the Flask apps that I and other people want to build. Because this was something that I wanted to use multiple times, and I didn't want to have to go through all the complexity of OAuth more than once.

Definitely sounds like a reasonable pursuit having looked at some of the specifications

and also having to decide between version 1 or version 2. And okay,

which type of implementation do I want to use?

So you may have kinda partially covered this a little bit in the first question. But what were some of the technical issues that you encountered

while building Flaskiance?

Well, I have some good resources to work with to start, which made it easier to get around some of the technical issues, because actually

Flask dance is not the only library out there that does OAuth for the Flask web framework.

There are actually 2 other libraries out there that do this. So there's Flask OAuth

and there's Flask OAuth lib.

So I was able to look at both of these libraries and get some ideas and inspiration from them. The reason why I decided to write my own rather than using 1 of those 2 is because I really didn't like the APIs that those libraries exposed.

I really like the if if you've used the requests

framework or making web requests,

the author, Kenneth Wrights, talks about how important it is to have an API that is usable and understandable

and human friendly.

And I took a lot of pains to try to make blast dance, follow those ideas of being a lot easier to understand

and a lot more human friendly, I suppose.

So I guess the biggest challenge that I had

was trying to wrap my head around

how the OAuth protocols

worked and how the various different libraries that are underlying FlaskBands work

and how to wrap them all up in such a way that I could

still provide all of the power and flexibility

that something like OAuth lid provides to you while still making it very easy and straightforward to get started with. I'd say that was the biggest challenge.

Okay. And what are some of the design considerations that you had while you were building FlaskDance

more focusing on the internals in terms of maybe some design patterns that you employed or

particular pain points that you were trying to cover up?

Well, the pain points

that I saw in most other implementations

of OAuth

is that they expect you

to

write a couple of very specific views for your web application

and to call very specific methods in very specific ways. And it's very easy to get all of those fiddly bits a little bit wrong or a lot wrong.

So I guess I primarily tried to use principle of encapsulation

to abstract all that information out.

Flask has this principle of something called a blueprint,

which is basically

a set of views that you can attach to your web application

that are independent of everything else going on in your web application.

So I wrote up ClassDance

to be a blueprint.

The blueprint exposes 2 different views or 3 in in the case of OAuth 1, I believe,

that allow you to

handle

the

original authentication

and the authorization

views that are required by the OAuth protocol.

And you don't have to worry about actually coding up those views.

All you have to do is configure this blueprint to work with the provider that you want, which is to say, face Facebook or Google or whoever.

And give it the application ID and application secret that you get by registering your app on Facebook or Google or whatever you want,

and then attach the blueprint to your application.

So it sort of abstracts all of that away from you.

You also built Webhook DB for for replicating GitHub's information to be queryable. What are some use cases for which you'd wanna do that?

So that was a case where I was actually I started off doing this for a project for my company.

As I said, the software that edX makes is open source, and it's all hosted on GitHub.

And I work on a team

that works very closely with our open source community and tries to make sure that they're as happy and as productive as possible.

So 1 of the things that we wanted to do at my job was to get an idea of how many pull requests were coming in over time and to see how long it took for them to be merged so that we could try to work towards getting them merged more quickly and more reliably.

And we started using the GitHub APIs to try to determine that information,

but quickly ran up against the problem that the APIs have a rate limit.

You can make at most 5, 000 requests per hour, which sounds like plenty.

But

our main repository alone at this point has over 8, 000 pull requests historically.

And we want to be able to run reports that get information about all pull requests

since we open sourced the project

to be able to generate information like graphs of number of pull requests over time since the beginning.

And so we hit that limit pretty quickly

and discovered that the only way to work around that was to just do the first 5, 000 and then wait an hour. And then the next

5000 and wait an hour and so on, which aside from being very slow and unsatisfying

also meant that the data changed in the course of trying to collect all of it, especially for the more recent pull requests.

An hour during the course of the Workday can make for a lot of changes in our repository.

So I thought about the problem from a different angle and realized that basically what we were trying to do was we were trying to run reports like you might do in database sort of fashion.

And I also realized that getting information over the GitHub API could almost be looked at as making queries to a remote database.

So I was wondering if there's a way that we could replicate that database

to 1 that we could control so that we could just run normal SQL queries over that database and generate reports that way.

And fortunately, GitHub has this concept of a webhook,

which basically says,

something can ask GitHub

to notify a specific

HTTP URL

anytime an event happens.

So what webhookdb

is, is it's basically just a very thin web application layer over

a postgres database.

And it exposes a couple of HTTP

API endpoints.

And you can set up GitHub to say, every time an event happens on this repository that I care about,

just send that information over to my web DB server.

And all of the server does is it takes that information

and stores it into the local database.

So basically what you've got set up is database replication over HTTP,

which I thought was a kind of cool concept.

And then once you've got that, you have a local database running that you have access to in order to run any sort of arbitrary SQL SQL queries that you want,

which is being constantly kept up to date dynamically by this HTTP replication.

And you can run any sort of queries that you want rather than being restricted to just the API endpoints that GitHub exposes.

That sounds very cool. And it definitely sounds like it gives a lot of capability for doing some

more in-depth data analysis than you might be able to do just by using the APIs directly.

Have you thought about adding in some sort of a dashboard capability to Webhook DB so that you can have some pregenerated

reports that are easy to access for people who are just getting started using it? That would definitely be really cool. Even just to have that graph that we were originally interested in in something like pull requests over time.

It's a project that I've sort of been building in fits and starts, trying to add a feature at a time as it becomes useful to me, as it's something that I think that we'll need.

And so far, I haven't had to add that capability

of an actual meaningful user interface because we can just get the data that we need by running SQL queries.

But it definitely would be nice to do that. Another thing that I'd really like to be able to do is I'd love to be able to use some of the permissioning model built into Postgres

so that we could

you could have a server set up where when you log in to the webhook DB server using your GitHub credentials, again, using FlaskDance to to handle that OAuth stuff.

It would be really nice if a webhook DB server could then

create a new account for you on Postgres database

and set up permissions on that Postgres account so that you could see only your own repositories.

And then just give you the username and password to that Postgres so that you could log in using the Postgres client and run your own queries that way as well.

It's just a question of coding that up in such a way that is it's secure and people don't see each other's private repositories.

Yeah. Security and

authorization

and

role access are always tricky things to have to deal with particularly when you have to try to conceptualize it from the ground up without having any sort of a framework already in place. Although, to some extent, GitHub gives you at least the roles already, so

might simplify that a little bit.

What is Open edX, and what is its intended audience?

Okay.

So Open edX is

this open source software

platform that I was talking about earlier.

It is a MOOC, a Massive Open Online Course Provider.

And basically what that means is if you've ever gone to edx.org

or Coursera or Udacity,

they're big platforms where you can sign up for university courses,

and you can learn about just about any topic that you want. You can basically get a university education.

And most of the time, those courses are available for free, which is awesome.

So Open edX is the software platform

that underlies

edx.org.

It's actually exactly the same software that my company uses to run edx.org.

And that means that anybody around the world can use the software, stand up their own Open edX server

and can offer their own courses and can provide them to other people.

Now I should also mention that there's a difference between the platform

and the courses that are offered on top of that platform.

So Open edX allows you to create your own courses,

doesn't have the course content available that you would see on edX.org because those are owned by the universities that created those courses.

But basically, it means that you can create your own courses, which are just as high quality or more so than the courses that you might find on edx.org.

And you can share those courses with other people as well so that you can collaboratively build interesting information and teach people new things.

It's a really exciting project, and

we're sort of taking a gamble by making it open because we're hoping that people are going to

use the

Yeah. I've definitely been seeing

big growth in open source as a business model where

even historically a number of companies have used open source and contributed back to open source, but they haven't necessarily opened up their entire platform. But I've seen and read about a few companies who have been doing that lately and have been seeing very good results as a as a effect of that because people are much more willing to engage with them as a company because they have a better idea of

what what they're actually interacting with. And also I was reading some of the tweet stream from,

I think it's MTech,

the MIT

sort of conference that's going on right now where 1 of their presenters was talking about how particularly going into the future,

your average citizen is going to need to have some some amount of data literacy.

And as a result of that, they're going to

need and desire

to have

a more transparent view of the data that is present in the platforms that they interact with and how that data is being used and what's being done with it. So

having oh, having edX

doing that with online courses definitely seems very beneficial both now and into future as people become more aware

of what what's actually being done with the information that they provide to companies.

Absolutely.

And by the way, I should mention since it's topical for this podcast,

edX is written in Python.

So anybody who's interested in Python can check out the source code and hack away using that language.

Definitely. And I've looked at the source code a little bit, and it looks like it's a fairly sizable project. So what are some of the complications or challenges that you've come across in

using and building a large

code base based entirely off of

Python? There's been a fair amount of challenges to be honest.

When

edX started the the software project, initially, it was closed source for the 1st year or so of development.

And it was in the sort of

startup mode of we need to move very, very quickly because we're struggling to survive.

And so a lot of the code in the Open edX software project

initially came out of sort of this

frenzy burst of

we just need to get something that will work.

So as a result, some of the code is kind of difficult to follow.

And we've accumulated

some tech debt from that.

But we're working really hard to pay that down and to take the parts that are more difficult to follow and more difficult to run safely and and performantly,

and to rewrite them and make them work better.

But because the software project is so large, as you said,

another big problem that we've run into is

trying to have

many different software developers work on this big project

and have consistency

across the entire project as well.

So for example, there's a big focus that we have now days on creating APIs

so that other people and other software projects can utilize those APIs and integrate with an open edX installation,

whether it's edX.org

or any other 1 around the world.

And

lots of people were trying to build these new APIs within the company, and we discovered that there was a fair amount of duplication.

So for example,

an API to get information about a course running on Open edX.

I think

there end up being 6 different APIs that could get that information

by the time we discovered that this duplication was happening.

So

it's been a,

challenge to try to identify everything that's going on

to try to unite all the teams and say, okay,

We need this information,

but you need that information.

Let's find a way to build another piece of the software that both of us can build on top of in order to provide that information in a way that's maintainable

and workable for everyone.

And as you mentioned,

Open edX is available for anybody to

install and run and build their own courses on top of. So are you aware of any

large high profile

universities or organizations using that or any particularly interesting uses of that platform?

Well, there are plenty of big universities using it. And the first 1 that comes to mind actually is Stanford University.

Stanford was the the the primary impetus for us going open source in the first place. We were always intending to open the code up. But Stanford

basically said, we'd love to use your code, but only if you actually open it up and don't give it to us privately. And if you don't, then

it's not gonna happen.

So Stanford was the first user

of the Open edX codebase.

And they're still running tons of tons of classes on their website and contributing to the codebase, helping out with BME and such.

There's also a number of international partners.

So I actually just came back from

a massive hackathon that was held in France,

actually, focused entirely

on Open edX.

Apparently,

the French government is using Open edX in various different capacities for their national education system.

And there are a number of organizations in France that are also using it in other ways, commercially or non commercially.

And this hackathon

was

a, as I said, it was a national thing,

uniting 9 different cities in France

simultaneously working on this on the software

and ending up with a big competition

to judge what was the most interesting and valuable project to come out of the hackathon.

So we have local usage in the US. We have international usage.

And if you wanna see more of the places where Open edX is being used, there's actually a list of sites

on our GitHub,

on on 1 of the Wiki pages

of places where people have self reported that they're using Open edX.

And the list is quite long. I think we're up to

somewhere around a 100 or so sites out there that we know about. There's probably a lot more. We have the picks. So, Tobias, why don't you kick it off? Sure.

So

for my first pick today, I'm going to choose evil mode, which is

an emacs plug in that gives you some really phenomenal VIM emulation.

So in my course of choosing and using different

text editors and IDEs,

I went from Sublime Text to Vim and have now set up on emacs, but I was never quite happy with the movement keys in emacs. And so I was very happy when I came across evil mode

because I really liked the ergonomics

of Vim, but really liked the

interactivity

of e max. So with evil mode and e max, you get the best of both worlds.

For my second pick, I'm going to choose forgotify,

which is a project that somebody built where they use the Spotify APIs to retrieve a list of

every song on Spotify that has never been played even once by anyone.

And you can go to the website and it will serve up

a random song from that selection, and then you can click play and it will play in Spotify.

So in

an interesting way, it's sort of a self destructing project because as more people become aware of it and listen to all of these songs,

the pool of songs that have never been listened to shrinks.

So it's a very interesting use of data analysis

and sort of

a interactive

exploration

of data.

For my next pick, I'm going to choose Wolf of Wall Street,

which is a

very bizarre movie, so much so that it can only be based on a true story, which it is. It's the story of a man named Jordan Belfort

in the,

you know, early to late eighties and I believe early nineties as well, and his

entry into Wall Street and making a whole lot of money

on

in that area. So

I I just can't even begin to do the movie justice by description. So it's worth at least look at watching the trailer if you don't watch the whole movie. It stars Leonardo DiCaprio who does a wonderful job in the role,

and

just watch it if you if you're so inclined.

For my last pick, I'm going to choose pipreqs,

which is a Python project that will

analyze all the import statements in your Python code and auto auto generate a requirements dot TXT for use with pip.

So that if you either haven't generated a pip requirements of text file yet or you're trying to clean up the 1 that you've got, it's a good starting point for that. So, Chris, take it away.

So my first pick is a beer,

and it's a beer with a rather unusual name. It is a beer called smells like a safety meeting

from Dark Horse Brewing.

And the name is even funnier and when you find out, as I did, that

the beer's original name was smells like weed, but, apparently,

various government regulators had kind of an issue with that, so they had to change the name. It's an imperial

IPA, and it's very tasty. It's because it's an imperial IPA, it's it's

not doesn't have those really, really sort of intense, almost, acrid hops notes.

It's it's got some fruity character. I really like it.

So my next pick

is going to be Medium.

Medium is so, you know, obviously, there are a bunch of magazine sites around these days, slate and and the like. What I really like about Medium is that they report on

a hugely wide variety of topics.

Some of them kinda mainstream and current. Some of them really, really not, like dredging up things from history or

going through various

interesting discoveries from science and how they might relate or,

you know, they they'll have celebrity

writers.

It I it's just it's you know, I use Pocket to capture my articles that I then read on the on the subway or whatever the case may be. And Medium is is really making it hard to not let that bloat out of control because they're just a constant stream of good articles that they keep writing that I find really interesting.

So my last pick

is Modern Ganoo Emacs. Now I realize it's kinda silly picking Emacs. If you're a geek, you know all about it. You've heard about it. You've probably

had some oldster like me rant and roar about it. But the key thing is

the experience of using and and just getting set up with and learning Emacs

is

so different from when I last touched it, like, 15 years ago.

I I reached a point where I was getting kinda frustrated with

Vim in that I was trying to set up, you know, some really good text based completion

for languages like Python and Ruby for my Chef work and things like that.

And some of the extensions that I was using for that were conflicting in odd ways, causing performance degradation, and and just generally giving me a bad time.

And I realized that all these things are kind of much more natively available in in enacs because its extension system is Elisp

and has been around forever and is very mature and hardy and all that.

So it has been a just a great experience, and the the package system makes

getting and installing in extensions

a total breeze, which it never was before. It used to be kinda like pulling teeth. You had to download the package yourself,

configure a large chunk of Elisp and t r.emax,

and pray. And now it's pretty much like, I want this thing, install it, maybe 1 or 2 lines to activate it, and away you go. So it's been it's been great. And and what really surprised me, I thought I was gonna end up using Tobias' pick evil,

but,

I have ended up, like apparently, e max is like riding a bike just, you know, despite the fact that it's been, like I say, over a decade since I last used it, all the key bindings just have come right back to me. So I'm really loving it.

David, why don't you give us your picks for this week?

Sure. So the first thing that comes to mind, especially since you were just talking about package managers and emacs right now,

is Homebrew,

which is a package manager for the Mac. Love it. Which allows you

to easily install open source software from all around the world, various repositories

just by doing brew install name of package here. And it also has an extension called homebrew task, which is optimized to work with the sort of binaries that you download from the Internet, open up the DMG file and drag the application to your applications directory,

Homebrew Cast takes care of all that stuff for you as well. So you can install and update applications on your computer in a standard package manager work

flow. So that's pretty cool.

The second pick that I have is actually a combo.

There's

Arrow, which is a Python date implementation,

and moment

JS, which is the same sort of thing in JavaScript.

And both of these libraries take the standard sort of date based arithmetic parsing,

displaying information

that you that you need to do all the time in programming.

And just make those APIs

way more simple, intuitive,

friendly,

usable,

particularly

for the JavaScript implementation because JavaScript's date API is sort of painful to use at times. But even in Python, allowing you to do things like easily parsing dates,

allowing you to easily humanize them and I'll put them in in different formats and different languages. It's just really nice and really smooth, and I highly recommend it.

And I guess the third 1 that I'll say is a movie

called The Imitation Game, which I just saw very recently.

It's a movie about Alan Turing,

and it's set during,

I believe it was World War 2.

And

it was

showing how Turing

managed to crack the Enigma code that was used by the Germans for communication.

And

but on another level, the movie is also about

trying to

communicate with other people,

especially for a lot of people who are into software and find that it's difficult to

really communicate with other people out there.

Alan Turing had a lot of issues relating to other people. And this movie was a fascinating insight into

sort of getting around that psyche and and learning how to relate to other people, the pros and cons thereof.

That's great. I'm a huge fan of the Imitation Game. I I think Benedict Cumberbatch was amazing in that role, and, and I I loved it. I couldn't I can't recommend it to more people. My family and friends are rather sick of me gushing about it, so very cool. And homebrew is 1 of those things. Homebrew and homebrew cask both

are 2 things that are, in my

opinion, fantastic examples of how

the Mac, you know, despite the fact that there are parts of it that are closed source,

just ends up pre can create this amazing,

extensible

technical platform for for techie people like us to work with. It's it's such a joy to use. And I know there are package managers for their platforms, but Homebrew is just great. I really love it to bits. Absolutely.

So

how can your our listeners best keep in touch with you and and follow your your ongoing development and other things?

Well, I'm my handle on the Internet is singing wolf boy.

So

that's my handle on Twitter, on GitHub.

You can also check out my website at davidbaumgold.com,

which is hasn't been updated in a while, but it's out there. And you're welcome to email me as well. It's just david@davidbombold.com.

I'd love to hear from you.

You know, I just I really quickly just wanna

kind of address

1 thing Tobias mentioned in the beginning.

Please do give us your feedback. We really do appreciate it. 2 people have been kind enough to review us on iTunes

and I have to say reading those reviews just made me happy. So thank you to those 2 listeners and

anybody else who enjoys with their hearing. Please drop us a line or give us a review. We do really appreciate it.

The Python Podcast.init

Brief Introduction

Interview with David Baumgold

Picks

Keep in touch

The Python Podcast.__init__

Brief Introduction

Interview with David Baumgold

Picks

Keep in touch

The Python Podcast.init