Summary
As a developer and user of open source code, you interact with software and digital media every day. What is often overlooked are the rights and responsibilities conveyed by the intellectual property that is implicit in all creative works. Software licenses are a complicated legal domain in their own right, and they can often conflict with each other when you factor in the web of dependencies that your project relies on. In this episode Luis Villa, Co-Founder of Tidelift, explains the catagories of software licenses, how to select the right one for your project, and what to be aware of when you contribute to someone else’s code.
Announcements
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With 200 Gbit/s private networking, scalable shared block storage, node balancers, and a 40 Gbit/s public network, all controlled by a brand new API you’ve got everything you need to scale up. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
- And to keep track of how your team is progressing on building new features and squashing bugs, you need a project management system designed by software engineers, for software engineers. Clubhouse lets you craft a workflow that fits your style, including per-team tasks, cross-project epics, a large suite of pre-built integrations, and a simple API for crafting your own. Podcast.__init__ listeners get 2 months free on any plan by going to pythonpodcast.com/clubhouse today and signing up for a trial.
- Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email hosts@podcastinit.com)
- To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.
- Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
- Your host as usual is Tobias Macey and today I’m interviewing Luis Villa about software licensing and intellectual property rules that developers need to know
Interview
- Introductions
- How did you get started as a programmer?
- Intellectual property law and licensing of software, data, and media are complicated topics that are often poorly understood by developers. Can you start off by giving an overview of categories of intellectual property that we should be thinking of?
- Most of us who have created or used software, whether it is open or closed source, have at some point come across various licenses. What may not be immediately obvious is that there are degrees of compatibility between these licenses. What are some guiding principles for determining which licenses are in conflict?
- In an organization, who is responsible for ensuring compliance with software and content licensing within a given project?
- When introducing new dependencies into a project or system what steps should be taken to evaluate license compatibility and compliance?
- When creating a new project, one of the steps in the process is to select a license. What are some useful guidelines or questions to determine which license to use?
- Another aspect of software licensing that developers might run into is when contributing to an open source project where a contributor license agreement might be necessary. What should we be thinking about when deciding whether to sign such an agreement?
- In addition to software libraries, developers might need to use content such as images, audio, or video in their projects which have their own copyright and licensing considerations. What are some of the things that we should be looking for in those situations?
- Another component of our systems that has grown in its importance with the rise of advanced analytics is data. We may need to use open data sources, pay for access to data repositories, or provide access to data that is under our control. What are some common approaches to licensing or terms of use for these contexts?
- What should we be wary of when using or providing data in our applications?
- How much of the work that you do at Tidelift is spent on educating developers and customers on the finer points of intellectual property management?
- What are some of the most common difficulties or points of confusion that you encounter?
- What are some useful resources that you would recommend to anyone who is interested in learning more about intellectual property and software licensing?
Keep In Touch
- Website
- @luis_in_140 on Twitter
Picks
- Tobias
- Luis
- The Good Place
- Twitter and Teargas by Zeynep Tufecki
Links
- Intellectual Property and Open Source: A Practical Guide To Protecting Code by Van Lindberg
- Tidelift
- BASIC
- Apple //e
- Copyright
- Trademark
- Patent
- Copyleft
- OSI Approved Licenses
- Permissive Licenses
- Strong and Weak Copyleft
- SSPL (Server Side Public License)
- OSI (Open Source Initiative)
- Contributor License Agreement
- FSF (Free Software Foundation)
- DCO (Developer Certificate of Origin)
- Creative Commons
- Noun Project
- Free Music Archive
- Wikimedia Commons
- TL;DR Legal
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to podcast dot in it, the podcast about Python and the people who make it great. When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So say hi to our friends over at Linode. With 200 gigabit private networking, scalable shared block storage, node balancers, and a 40 gigabit public network, all controlled by a brand new API, you've got everything you need to scale up. Go to python podcast.com/linode, l I n o d e, to get a $20 credit and launch a new server in under a minute. And don't forget to thank them for their continued support of this show. And if you're like me, then you need a simple and easy to use tool to keep track of all of your projects.
Some project management platforms are too flexible, leading to confusion of workflows and days' worth of setup, and others are so minimal that they aren't worth the cost of admission. After using Clubhouse for a few days, I was impressed by the intuitive flow. Going from adding the various projects that I work on to defining the high level epics I need to stay on top of and creating the various tasks that need to happen only took a few minutes. I was also pleased by the presence of subtasks, seamless navigation, and the ability to create issue and bug templates to ensure that you never miss capturing essential details. Listeners of this show will get a full 2 months for free on any plan when you sign up at python podcast.com/clubhouse.
So help support the show and help yourself get organized today. And don't forget to visit the site at python podcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. And don't forget to keep the conversation going at python podcast.com/chat. Registration for PyCon US, the largest annual gathering across the community, is open now. So don't forget to get your ticket, and I'll see you there. Your host as usual is Tobias Macy. And today, I'm interviewing Louis Via about software licensing and intellectual property rules that developers need to know. So, Louis, could you start by introducing yourself?
[00:02:05] Unknown:
Yeah. Sure. So I'm, the cofounder of Tidelift, which is a software company we'll talk about later. I used to be a programmer. I you would not want me to wield any code in your general direction anymore. These days, I'm an attorney, and I've been 1 for, Mozilla, Wikipedia, as well as, large law firms representing really huge companies and small start ups.
[00:02:27] Unknown:
And do you remember how you first got started as a programmer?
[00:02:30] Unknown:
How I first first got started as a programmer involves, I think, basic on an Apple 2e, and then I didn't really think about it again for many years. And in college, I was I I went into college thinking that I wanted to be I was interested in in, like, building things, and I was interested in, politics. And so I started off in college as a political science and mechanical engineering double major. And as part of the mechanical engineering double major, I was required to take some programming courses, and it very quickly became clear that it was way more fun to, to write code than it was to, like, figure out how to become the junior door handle engineer at Ford.
And, so I switched majors. And, at the time that I switched majors, I still didn't think the political science and the computer science really had anything to do with each other. This was like, I guess I started computer science major, like, 1997 maybe, and the Microsoft trial was just going on. And that was that first light bulb moment of, like, wait. Actually, like, power and politics and government and computers are you know, it's not gonna be able to be to be able to separate them in in the future. And so that that was something that worked out really well for me. It wasn't a deliberate Somebody once complimented me like, oh, such a vision that you saw these 2 were coming together. It's like, nope. I was just interested in both of them and thought they had nothing to do with each other. So yeah. And then, worked as a programmer for a while out of college and, had had fun doing that, but also realized that, ultimately, I wasn't, like, all that great at it. And, the company I was at, which is a Linux desktop open source startup, we got acquired, and I had to deal with the attorneys in that deal.
They really didn't know what was going on with open source, so we had to spend a lot of time explaining to them what an open source license was, why it mattered, why this wasn't crazy. And, in my, like, hubris, my naivete, I was like, oh, I can do it so much better than these folks can. So yeah. So after after the acquisition completed, I ended up going to law school and, getting a lot of humility knocked into me about what lawyers do.
[00:04:52] Unknown:
And so basically, for as long as software has existed, but particularly in recent years as open source has become so much more prevalent and more of a general topic of conversation, the concepts behind intellectual property law and licensing of software, data, and media have become more complicated and more relevant for developers to be able to understand while still, in general, being fairly misunderstood or only partly understood. So can you start by giving a bit of an overview of the general categories of intellectual property that we should be thinking of as developers and engineers?
[00:05:32] Unknown:
Yeah. Sure. So there's basically 3 big categories. There's, copyright, there's trademark, and there's patents. And they're related, but they're definitely very distinct. So the first 1, let me start with patents. So patents allow you to protect an idea and any different implementation of an idea. So if you figured out a new way to, make, well, this is it gets a little nuance, but, you know, if you figured out, say, a new way to do machine learning, right, you could potentially get a patent on machine learning, and then anybody who implements that machine learning process in any different system could potentially be liable to you under a patent. So the patent produces protects a general idea.
A copyright protects a specific implementation of an idea. Right? So if I write a, a novel that features the characters Romeo and Juliet, and I let's say I wrote that now in 2019, I'd be able to protect that specific expression of an idea. The, you know, how did Romeo and Juliet interact, what specific sentences did they use. But if you, operating independently, then wrote, a different novel that happened to be about 2 young people in love, if I had a copyright on Romeo and Juliet, that would protect only those words, those sentences, and not the general idea of 2 young people being in love.
In the software world, that essentially, you can think of that as being equivalent to, hey. I wrote a database in c, and somebody else on the opposite side of the planet wrote a database in Scala. Right? They never looked at my code. They weren't aware of my code, and so they didn't copy the specifics of my code, so copyright isn't involved. Right? They if they independently invented it, copyright doesn't come into play. The 3rd category, and by the way, as we go through the discussion today, hopefully, we'll get to some more specific examples that help make this more clear because I realize it's a little vague to start with. Trademark, is different from the other 2. So the first 2 categories, patent and copyright, are about protecting the author and encouraging innovation.
Trademark is actually about protecting users, and it's making sure that they don't get confused. So what trademark is protecting is a brand, is a mark. So instead of if you name your software Postgres and my I name mine Postgresy, then people might be confused. Like, is this part of Postgres? Is this something that's written by the original authors of Postgres? What kind of assumption should people have about the quality and the sourcing of that material? And so trademark, it it protects brands, and names of things so that users don't get confused. So those are the 3 big categories, and open source interacts with all of these in different ways.
So when you write a program, US copyright law talks about a writing. Right? It has the idea that as soon as you've written something down that is new and original, copyright inheres in this thing. Right? So as soon as you write a program that's new and original and you publish it to GitHub or even if you keep it on your own hard drive, you have a copyright in that thing. You don't have to call up the government. You don't have to file for a copyright. You just have copyright from day 1. And so as soon as you write that first piece of open source code that you published to GitHub, congratulations. You have a copyright in it, and you can protect yourself and your code from being copied from someone else.
And so the good news is, yay, you've got protection. The bad news is if you don't put a license on it, we'll get to that in a bit, then other people, like, know that they're potentially infringing, and they have to be careful and aware of that. A patent in open source operates differently. Patents, you have to prove to the government that it was a new and original idea because, remember, a patent is stronger. Again, to go back to that book analogy, a patent is not just about protecting specifically Romeo and Juliet. It's about protecting the entire idea of 2 young people being in love. So there's a much higher bar. You have to apply to the to, the US government, the EU government. You have to prove that it's a new and original idea. Nobody's ever had the idea of 2 young people being in love before.
And then once you get it, you have protection over the whole class. Right? So you could imagine, for example, to get back to that machine learning example, if you write some machine learning code, it's entirely possible that somebody out there has a patent on machine learning that even if you've never heard of that patent, you've never heard of that company, you could potentially still be infringing it by virtue of having written some machine learning code in Python. Yeah. I'll stop there because trademarks come in in a different way, and I think we can talk about those separately.
[00:10:53] Unknown:
Yeah. And to just look at it a different way to put it maybe into sort of object oriented terms, the patent can be thought of as the abstract base class that covers the general category of a particular domain, and then the copyright
[00:11:09] Unknown:
is protecting the specific concrete implementation of that base class. Yeah. That's exactly right. And that's that's a great analogy. I'm gonna steal that 1. Yeah. I mean I mean, that's right. There's I wanna be a little careful there because it turns out that, unfortunately, thanks to our friends at Oracle, there's some all kinds of fun confusion about whether APIs are copyrightable, but that's, I don't know. We might have to do a second podcast involving, stronger beverages. I've only got tea in front of me right now, so API copyright May maybe a different thing for a different day. We'll see how we go today.
But, yeah, that's that's right. There's this sort of general ideas versus specific implementations is is exactly the way to think about this. And
[00:11:52] Unknown:
the point that you made too about copyright being inherent in the act of creation is something that comes up a lot in conversations that I've heard where just by publishing it to GitHub doesn't necessarily automatically make it open source. It makes it source available. But as you said, somebody who then tries to make use of it is potentially infringing, and that's why it is important to ensure that if you want other people to be able to take advantage of it, that you specifically select a license that provides whatever allowances you want to afford to people. And so on that note, anybody who has used open source or closed source or worked in either of those environments has most likely come across some of these various licenses.
And what may not necessarily be immediately obvious is that there are general degrees of compatibility between those licenses. So I'm wondering if you can give some guiding principles for determining which licenses are in conflict with each other, things that developers should be thinking about as they work with these differently licensed softwares, either as a application or project or as dependencies within their own projects.
[00:12:59] Unknown:
Right. Well, actually, let me before I go into that 1, Tobias, I think you said something really important there that's worth calling out or repeating, because I think it's actually the single most common misconception, amongst developers today. Right? Which is, hey. I put this on the Internet. Of course, anybody can use it. Or on the flip side, I found it on the Internet. Of course, I can use it. Like, the the defaults, of all copyright systems is you cannot use it. Right? So just because you found it on the Internet does not mean that that default has changed.
Nobody loses their rights, as the authors. They don't lose their rights just because they put it on the Internet. And that applies to as a sort of legal baseline rule, that's not just code. That's also images, audio, video. The default is, hey, just because it's on the Internet, somebody may still have rights in it. And confusion over that, surprise over that is such a is such a source of all kinds of pain. Right? It's a source of confusion for developers who wrote code because they're like, oh, I thought like you said, hey. I put it on GitHub. That's all I need to do. Right? And then what until recently, GitHub sorta encouraged that. Right? Now these days, if you start a new project on GitHub, thankfully, they, because of some great people on their team, it will tell you now, hey, let's pick a license.
You know, but there's still plenty of old code out there that predates that that doesn't have a license. And if you don't see a license,
[00:14:32] Unknown:
you do need to think about, like, hey. Can I grab this? If so, under what terms? And it's probably worth calling out too at this point before we go much further that while you are a lawyer, you are not necessarily anybody's specific lawyer who's listening to this podcast. So if you have specific legal questions, you should seek legal counsel on your own terms.
[00:14:52] Unknown:
I'm I you know, I, my team gave me a, a lawyer hat that, hangs behind me in my office. It's it's a Krispy Kreme doughnut hat. I am not wearing the lawyer hat right now. So, Tobias, you're definitely correct. This is I'm not your lawyer. This is not legal advice. But that's actually I think that's actually an interesting topic, again, for another day. Lawyering as a profession has not really caught up with the Internet, and this is 1 of those ways in which it has not done that, which is, a little, a little too bad. Right? Yeah. So, I mean, let's jump right in. Actually, you know, I made that point about how GitHub now gives you a selection of licenses that you can choose, and that's a great jumping off point for talking about what are the different types of licenses, why would you choose 1 type of license over another, and how do they work together? So there's essentially I think the the best way to think about it is that there are essentially 4 categories of licenses.
The first and probably most common 1 is what are sometimes called permissive or academic licenses, and those are basically you can do just about anything as long as you give us credit. Right? So that's, licenses like BSD, MIT, Apache, lots and lots of projects licensed under these licenses. And, the bottom line for those is, hey. You need to continue to tell people that parts of this code are under a BSD, MIT, or Apache license. You need to say who the authors were if the authors pass along that information to you. And, otherwise, go to town. Have fun. Do whatever you want.
In the Apache case, there's the slight caveat of please don't sue us for patents, but that's a I I would hope that none of your listeners are running around suing people for patents. If you are, you might have some Apache issues. Keep an eye on it. So that's 1 category, permissive. The next category is, sometimes called, mild copyleft or weak copyleft. Copyleft as a category essentially says, you can do anything you want as long as you share the code as long as you share your modifications with the people you're giving it to. So, historically, that was like, hey. You used my code in, like, a printer driver.
You have to give people who bought the printer a copy of the code so they can modify it and improve the printer driver themselves. So that was the goal of Copyleft was to spread software freedom so that anybody could modify anything. And within that big category of Copyleft, there's different degrees, which basically boil down to how much code do you have to share and under what circumstances do you have to share it. So the weak or mild copy left licenses, most famously the library, GPL or library general public license, LGPL. But also included in that category are the Eclipse license and the Mozilla license, 2 other big popular licenses.
They basically say I think you can think of them most easily in the modern context as if you're modifying the core software, you have to share modifications to that. But if you're doing something like building a plugin, then you don't have to share that. Right? You can think about that in a Mozilla context. It was originally written for the web browser. The idea was, hey. If you want a proprietary web browser plugin, that's fine. No restrictions on that. But if you're modifying the core browser, you have to be ready to give that back to the Mozilla community. And so that's a mild copy left that says under some circumstances, you have to give back, and under other circumstances, you don't. There's the strong traditional copy lefts, the most popular so the 3rd category is sort of strong traditional copy lefts.
And that's what's called like, the most popular example of that is the general public license. That's what the Linux kernel is under. That's what MySQL is under. A whole slew of different projects under that license. And that basically says any modifications you make to this modifications or additions that you make to this code, when you distribute it to somebody when you distribute the resulting binaries to somebody, you've also got to give them access to that code. And that copyleft was designed in the sort of early mid nineties and was really popular and really impactful when you were giving everybody binaries.
Right? So, like, my TV, I think, was the first time I'd ever seen the GPL in the wild. I bought it in 2004, and I was flipping through the manual, and there was a big, Linux kernel penguin in my manual. And there was a copy of the general public license, and it turned out that if I really wanted to, I could call up Panasonic. Panasonic was legally obliged to tell me under the terms of the GPL that I could call up Panasonic. Well, it was legally obliged to tell me that there was a copy of the Linux kernel inside my TV, which was a great surprise in 2004, less so these days. And I was legally they were legally obliged to tell me that I could call them up and say, please send me a copy of all the source code that's in inside my TV.
And that was a result of the copy left license on the Linux kernel. But as you may have noticed here in 2019, we really don't go around sending binaries to people very much anymore. Right? The most interesting stuff lives on servers, and that's where the 4th category of software comes in, of of licenses comes in, and that's what's often called network copylefts. Right? The idea there is that if you're offering people the functionality over the Internet, then you have to make them aware of your changes and give them access to source code under certain circumstances. I guess the canonical example of that is MongoDB, where it was under what's called the Apero general public license, the AGPL, which is really the only widely used network copy left. There are other ones, but it's like the it's the most common example.
And, so Mongo for a long time was like, hey. Mongo is available under the AGPL. If you don't modify it, that's fine. You can use it however you want. But if you start to modify it, then you have to tell people who interact with MongoDB over the network that MongoDB is part of the system and the code has to be you have to make available to those users the modifications that you made to Mongo. You know, the generous interpretation of those kinds of licenses is, hey. Again, software freedom. We're empowering users to make modifications to the software that they use. The less generous interpretation is that Mongo and others wanted to make it a bit of a pain in the neck for people to use the software so that they would pay for a different license.
And that's been, not super common business model in open source, but it's definitely been becoming a more prevalent 1 as services are are becoming more dominant in the software landscape. Right? So, again, tie that back together, basically, 4 categories. Permissive, do whatever you want. That's Apache MITBSD. Mild copy left, only share under certain conditions, Eclipse, Mozilla, lesser library GPL, traditional copyleft GPL, share when you share a binary, and then network copyleft, most prominently the Apero GPL, and that's share when anybody connects over our network.
And those are the 4 big classes.
[00:22:45] Unknown:
And this is probably a good time to call out some of the recent activity that's been going on with the advent of cloud providers and them providing services that are based on some of these open source projects where, again, MongoDB has introduced, as well as Redis, the idea of the server side public license, which tries to add some additional restrictions on service providers to try and protect some of the interests of the companies behind these open source projects. So I don't know if you want to talk to that a little bit before we move on to some of the other conversation.
[00:23:20] Unknown:
Yeah. I mean, I think that's a really interesting you know, I mentioned that GPL in the network, like, the sort of traditional copylevs in the network context really don't it's interesting. They're battle tested. Right? There has been litigation over these. People have sued each other trying to figure out what the GPL means. There was even a there was a case in China the other day about what the GPL means. You know, so so people have a pretty good understanding of what the GPL means. And but we also know that 1 of the meanings is that if you try to use it basically to protect a network service, you don't get anything. Right? It's basically in a lot of senses, it has exactly the same impact as as a permissive license. And so there's been experimentation.
The the AGPL was sort of the first experiment along those lines of, hey. We're offering us we're offering a SaaS service. We still want people to share in the code. And an AGPL for various reasons has not gotten a lot of adoption, and it's all and it's not really well understood. And, you know, part of that is the politics behind it. Part of it is just it's a very long complex license, and so people don't even people who are well intentioned about trying to understand it can get confused by it. And so there is there has been in the news these past few weeks as you've said, past few months, a bunch of companies trying to make it harder for other people.
I mean, there's like I was saying earlier, there's sort of the generous interpretation. They're trying to get more people to share in their community and build build together with them. The less generous interpretation, which is simply they're trying to make it harder for other people to make a profit off of their code. And so there's been much discussion over whether or not these things are open in, like, a meaningful sense. 1 thing that your readers may not be aware of is that, in the late nineties when open source was literally being defined, a group was started called the Open Source Initiative that I'm a former board member of. And the Open Source Initiative created an open source definition. It's basically a set of 10 rules that you're supposed to be able to apply to a license and say, hey. Is this license an open source license? The answer is yes if it passes all 10 of these rules, and the answer is no if it doesn't pass those rules.
And, I would say most of these new licenses do not pass those rules. And so it's pretty clear that they're not open source. You might call them source available. There's a couple other catchy phrases people are trying to get to stick with them. I think probably the most useful 1 is that they're source available. But if you see 1 of them, you need to be aware that they're not open source, and you may not have all the permissions that you would traditionally think you have under an open source license. And so if you see 1 of those products, you need to think through and yeah. They're so new. Like, normally, I hate to do the, you need to talk to a lawyer, but these are so new and the intentions behind them, I think, are opaque enough that if you see 1 in the wild and your company wants to use something that's under 1 of these licenses, you really need to talk to a lawyer, sooner rather than later because you the last thing on earth you wanna do is, use 1 of these things and then find out later, actually, it's not open. There's a whole bunch of restrictions here that my company can't use, can't deal with. It's interesting. I mean, I actually think it's a good thing that there's experimentation here because it's clear that the traditional Copyleft model isn't working in 20 19 because of all these services. So I'm glad that people are trying new things, but I don't think any of the current crop is really I think mostly it's just increasing friction, and it's not growing the pie. Right? And I think that's what we really wanna do, hopefully, in open sources.
Why open source has been so successful, right, is that it tore down a lot of barriers, removed a lot of friction for people, And these new licenses are really all about adding in more friction and trying to claim size of a small pie, and I don't think that really works.
[00:27:43] Unknown:
And to complicate matters more, there are these general categories of open source licenses with some different variations within those, 4 subcategories that you mentioned. But then there's also commercial licensing, which can have its own variations in terms again, the again, the interactions that these different licenses have might bring along different issues with compliance, and there are a number of companies that refuse to use some of the stronger or weak Copyleft licenses because of the virality that they bring along with them. So in an organization, as developers or project managers or team leaders, who is ultimately responsible for ensuring the compliance with the various software and content licensing that might be used within a given project?
[00:28:38] Unknown:
I mean, you know, as when you're talking about it, there's there's 2 different sides you can look at that problem. Right? There's like, hey. I'm a maintainer publishing a piece of open source. Who's responsible for compliance? And, ultimately, it's really up to the maintainer. Right? You wanna think about we actually sort of skipped over the compliance, the compatibility question, and we can get back to that in a second. But the you know, that that's really a maintainer who wants to be a thoughtful maintainer and think about how their users are using things, does have to think about what's my license, what are my dependencies.
As with everything for better or for worse, that that burden falls on the maintainer when you're publishing software. When you're using software, boy, that's a complex answer because it it varies so much based on are you IBM, or are you a small start up? Right? If you're at a big company, you know, a lot of them have entire teams. The common name for it is an open source program office that will do things like scan your products, understand what's actually being used in them, and check and see the compliance is happening, And those open source, program offices will have different mechanisms for how they do that. Sometimes they'll have, like, a blessed list of open source. You can only use these things. Sometimes they'll have, as you hinted at, a blessed list of licenses.
You know? But if you're at a smaller company, 1 that's maybe not so regimented, a lot of that will, in practice, fall to the developers. Right? You do have to it's this is tough because certainly in, like, a in a PIPI or NPM kinda world where it's easy to, like, you know, PIP install and all of a sudden you've got 30 new libraries that you didn't know about or, you know, npm install, and all of a sudden you've got 300 new libraries you didn't know about. That that can be tough. Right? And, yeah, there are a variety of tools out there that can help you say that can help your company say, hey. We don't want license x.
Please flag every time it goes into continuous integration, right, if if license x is there. I mean, so it's certainly if you're a developer who's concerned about this or say you're a startup cofounder, you can look at tools that help you understand what you're actually taking in. Right? Because that is otherwise I will say this used to be a lot easier when, essentially, if you were building a if you were building a website in 1998, it was like, okay, Linux kernel, pick my database, pick my p pick my language with a p in front of it. Right? Perl or Python, and suddenly and and those were the licenses you had to think about. Right? There wasn't a whole lot else. And now you do hello world, and it's like I think I did a test on a hello world in 1 of the, like, JavaScript libraries, and it had, like, 923 dependencies. You know, you can't reasonably ask individual developers to, like, read the licenses on all 923 of those. It's gotta be tool assisted.
And so there's a bunch of options out there these days that can help you with that. And when releasing a project
[00:31:49] Unknown:
as open source in particular, what are some of your general guidelines that you found to be useful or questions to be asked for determining which license to use to release your project under? So
[00:32:04] Unknown:
the first 1 is what's the business goal? Right? Like, why am I and I say business here, like, in the broadest sense. So so, like, why am I releasing this as open source instead of, like, keeping it to myself? And if the answer to that is, like, hey. I just want the world to have this. It's like my gift to the world. Use a permissive license. Right? And it's and we'll and we'll get to a second and I'll get to in a second, like, how you choose which permissive license or but, like, that's the basic, common I would say that's the basic answer. Right? If there's some, like, more sophisticated business answer, like, for example, we think this is a loss leader for our, you know, SaaS product. Right? Or it's a it's an API wrapper that helps you use our SaaS product. Then you have to start being more careful. You have pay more attention to detail. You have to think about, well, like, do I wanna take this proprietary? Like, maybe someday down the road, I wanna be able to take this proprietary, or I don't want competitors using this exact API.
So I want to protect from competitors. And then when there's, like, a real business consideration, around who might use it, how you think about your competitors, You then you need to think about especially in 2019, I think I, you know, tell people, look at the Affair GPL. Look at, licenses that actually have some kind of copy left effect because those are those have the good feature of, encouraging people to contribute because people actually like it when there's not free riding, and they have the, the negative feature of driving away competitors. Right? People who are concerned about trying to compete with you are gonna be very careful about using a copyleft license that is that is published by a competitor.
You know, that's the that's the highest level. I would say the second level thing is simply look at what your community is using. Right? Like, if you're aiming to be a Java Eclipse project, you should probably use the the Eclipse license. If you're an Apache project, you should probably use it. You are gonna be required, mandated to use the Apache license. You know, Python, a lot of stuff in PyPI is BSD, so you probably wanna use BSD. You know, looking at who your peers are, because a lot of this is about expectations management. Right? 1 thing, if your goal is for people to use this stuff, you want them to not be surprised by the license.
And by the way, like, I think 1 related to this and not be surprised, there's, like the open source initiative has approved, like, 70 languages more or less over the course of the 2 decades it's been in business. I should say to clarify, I say in business. It's a nonprofit. The board is elected by the public. So, you know, in business might be misleading if you're not familiar with OSI. But the OSI has probably, like, 70 licenses. The vast majority of them, you should never touch. Right? They are all they're gonna do is create confusion and surprise in your users, and that's the last thing on earth you wanna do. The the whole point of using an open source license is to simplify, reduce friction, and get people to join in.
And so that's why I'm really only talking about I think I have said besides s p SSPL 1, 2, 3, 4, you know, 6 or 7 licenses today, and that's really all you should ever think about using as 1 of those 6 or 7 most popular ones. And another aspect of licensing
[00:35:52] Unknown:
and copyright that developers might encounter in the process of working with open source in particular is the idea of a contributor license agreement where you are assigning copyright of your work as a contributor to the owner and maintainer of the project so that it's easier for them to possibly relicense the software if they, if they so choose at some point down the road. So I'm curious what are some of the things that developers should be looking at as they are reviewing and potentially signing these contributor license agreements? Or as project maintainers, what are some of the cases where you want to implement, this type of agreement?
[00:36:33] Unknown:
There's actually 2 things that are done by the average CLA. And, historically, those were in the same document, and at some point in the past 10 years or so, those got refactored into 2 different things. And so so let me split those out. 1 is, like you said, assign the copyright and give the central maintainer the opportunity to relicense. And that could be that's a 2 edged sword. We'll get to that in a second. The other thing that the that CLA has traditionally did was give the central maintainer reassurance that you actually have permission to do to contribute the things you were contributing.
Right? So let's say, for example, so, like, the Apache CLA has language in there where you you cross your fingers, hope to die, you say, yeah. I promise I didn't just copy and paste this from some random person. Right? In fact, I know I have permission to do it, and I am am promising to you that I have permission to do it. And so those 2 roles, historically, were both in the same document. Right? Like, that was how Apache did it, and that was how the Free Software Foundation did it was that you promised to the Apache Foundation, you promised the Free Software Foundation that you had the rights, and then you assign those rights to those groups so that they could make changes as they wanted as they needed to. And then at some point, the Linux kernel community realized, like, actually, the promising to change part and the promising I have these rights part don't necessarily have to live in the same document.
And so what the, Linux kernel community did that's now been more widely adopted is they essentially refactored that into, they knew the Linux kernel knew they were never gonna change their license, so they didn't need that part of it. And so they refactored, threw that away, and came up with the developer certificate of origin, or d you might hear it abbreviated as DCO. I think the website is developer certificate.org. And it basically just says, I have permission to to contribute this code to the Linux kernel either because I know I wrote it myself or because I got it from somebody who gave me the permission to do it. Right?
And so a developer certificate is a pretty nice idea because it gives, it gives people some certainty. They're like, oh, I people the the contributors to my project have thought about it a little bit. Like, not a ton. I mean, if you read it, it is it is the lightest weight quasi legal document you are ever gonna read in your entire life. Very simple, very straightforward, and it conveys some very basic information of, hey. I own this. It's mine. I can give it away. And if you read that and you're concerned, you're like, oh, maybe I'm not maybe I don't own this, then that's actually a great time for you to talk to, who did you get this code from?
Did I write it on company time? Did my company lawyers know I'm contributing? Right? And the DCO is designed it's less about legal, can I say ass covering on this podcast? Yeah. It the the DCO is less about ass covering and more, about prompting people to have those discussions. Right? So yeah. So start with if you're a developer publishing a new project and you think it's gonna get like, you you think it's gonna be really big, consider adding a DCO. If you're a company publishing a project and you think you might need to change a license at some point, then that's when a CLA comes in. And on the flip side, if you're a user and you see a CLA, you should be a little skeptical, I think, right, at this point in time. Like, that is a sign that the company is thinking about how can I use this in a way that might not be the way you originally intended? So for example, if you contributed to I I hate to single single them out because I think they're a good company in a lot of ways, but they're a a top of mind example, Mongo. Right? So if you contributed a patch to Mongo, after they open source MongoDB, you signed a CLA, and the CLA said, hey. You assigned us the copyright, so we reserve the right to change the license at any point.
And that if you really liked the AGPL, that should be a warning sign for you. Right? That maybe they're gonna wake up 1 day, decide they don't like it, and they're gonna change the license to maybe something proprietary or something like, as it turned out, the SSPL. And they're able to do that because contributors signed the CLA. And, you know, that's not necessarily a terrible thing. Right? So for example, let's say that Apache finds legal documents have bugs too. Right? And sometimes they have to be revised because of bugs. In fact, 1 of my favorite programming analogies for, for programmers and and and lawyers is that just like programs rely on libraries and sometimes the underlying behavior of the library can change in a way that you, the programmer, have to fix in your, you know, in your layer of the application.
Underlying laws are libraries that contract drafters depend on. Right? So let's say, you know so if, like, if congress wakes up tomorrow, we're we're filming this we're we're recording this on Friday, so they're gonna do even less tomorrow than they do on a normal day. But they they woke up tomorrow and, and decided to change copyright law, Apache might have to change its license. Right? Like, they they might have to do something that counters how the license acts in response to that. Right? And, and in fact, actually, this is something we had to do at Mozilla where there was some language that was maybe a little vague in MPL 11, and there was a court ruling that came out, actually in a case between, Microsoft and some microsystems.
And, the court said, well, actually, this word that you think means this 1 thing under certain circumstances might need a different thing. And we were using that word in MPL 11, and so in MPL 2, a lawyer pointed out like, hey, you should look at this case cause it means that under certain circumstances, this license may not mean quite what you thought meant. And so we were able to make that change in MPL 2 to make sure that the the situation was much more clear than it had been after that court case. Right? So a CLA can be useful in that case for a group like Apache or FSF or Mozilla.
Like, if they have to change the license, having a CLA can be really helpful for them. So the thing with the CLA at the end of the day is, who are you assigning that CLA to? What are their motivations? Why do you trust them? And the answer for that is gonna be very different for Apache or Eclipse than it is for a private company. And the topic
[00:43:39] Unknown:
of changes in copyright law, particularly with the US and the DMCA or Digital Millennium Copyright Act is particularly relevant for some of the other types of licensing concerns that developers might come across when using various images or audio or video media as part of the end result of their project. So wondering if you can talk a bit about some of the different types of licenses, both the Creative Commons and some of the different, levels of implied copyright for, creative works, and what types of things developers should be thinking of as they are incorporating these various media assets in their projects? You know,
[00:44:22] Unknown:
that's a great question. Boy, we could almost do an entire podcast on that 1, so I'll try to be briefer than I have been. There is, you know, as with as I said at the very beginning, right, copyright adheres from the moment the thing is created. So if somebody does a cute little sound sample and publishes that on the web, it's copyrighted. If somebody does a nice little stock photo and puts that on the web, it's copyrighted. So you do have to think about these things when you're incorporating, when when you're incorporating things. Right? The good news is is that a lot of the sort of basic things you would want to use already grant you a license. Right? So for example, if you're building an Android app, the Android SDK gives you a license to stock Android icons that you're gonna use in your Android app. Right?
So the things that come with the operating system, you typically don't have to think very hard about. They're usually licensed to you in a way that, is available for you to use. Then past that, you need to start thinking about things like, the most common set of licenses, as you mentioned, is the Creative Commons license. Creative Commons family of licenses. People often make the same mistake I just made and speak of Creative Commons as if it's 1 license. There's actually different licenses that go back to that permissive versus copy left, versus sort of noncommercial split that I was talking about. Right? So there's Creative Commons attribution where all you have to do is give attribution.
There's Creative Commons share alike where you have to people have to be able to change and modify the photo, or music or whatever it is. And there's even Creative Commons noncommercial, which prohibits commercial use of Creative Commons. So, there's definitely, folks out there who publish icons or photos, for example, on their Creative Commons on commercial so that you can look at them and you can touch, but you can't, you and you could use them, for example, in a fun project for yourself, but you couldn't necessarily use them in a project for your business. So, you know, that's, that's a big world. I'll try to be brief on that. I will say there's a lot of great resources out there for things like, you know, if you're trying to use, image if you're trying to use icons, there's there's a a thing called the noun project, which has, at least for certain of their icons, has Creative Commons licenses on them.
There's a lot of freely licensed music out there, on SoundCloud. You do have to check and see what the license is because a lot of stuff on SoundCloud is not Creative Commons license, but some of it is. You know, Flickr and Wikimedia Commons, if you're looking for images, are both great sources of images. And Creative Commons these days even has a search engine that allows you to search for things so that you know their Creative Commons license when they come from a Creative Commons search engine.
[00:47:25] Unknown:
And I'll just say too that the intro and outro music for this and my other podcast both came from the free music archive, which is all creative commons licensed
[00:47:35] Unknown:
audio. So that's another good resource for anybody who's looking. Yeah. Yeah. There's a there's a growing amount of that stuff out there. And I will say the 1 thing that, you know, I I I love that you gave that credit. Right? Because it is a big difference between open culture and open source is that everybody really knows where to get open source. At the end of the day these days, it's GitHub, and there's there's no central sort of hub for finding information, for collaborating on open culture. And that fragmentation is on the 1 hand, it's a source of richness and diversity, but on the other hand, it's hard to just point somebody at, like, oh, just there's no equivalent of, oh, just go to GitHub. Right?
Wikipedia might be the closest to it, but, you you know, but that doesn't have a ton of, like, sound, for example, or icons.
[00:48:23] Unknown:
And then the other major category of resource that gets used in software systems is the data that we either generate via use of the systems that we build or that we might pull in via different APIs that could either be open or proprietary or paid or by licensing different datasets. So I don't know if you want to talk a bit about some of the considerations that developers and systems builders should be thinking about when they're interacting with and creating and curating and releasing these various data resources?
[00:48:59] Unknown:
Well, you need to think about a couple things there. So 1 is most data that you're gonna get well, there there's a couple different ways that you can get data. Right? 1 is you can scrape the web. And, there's it's interesting. There's actually not a ton, I would say, of case law these days on whether or not scraping the web is legal for use in things like machine learning, but a lot of people do it anyway. Your company if you're doing that for, like, a company, you would wanna talk to your company's lawyers about what their level of tolerance is for that. Again, you can't just assume that because stuff is on the Internet, you can just scrape it all. Though our friends at Google have done a lot of have spent a lot of lawyer money to make sure that that is relatively protected at least in the US. So scraping is is 1 way of doing that, and it's, you know, it's possible, but be careful.
Another way of getting it, as you mentioned, APIs. Unfortunately, there's not a ton of standardization out there. Right? 1 thing that's been interesting is the way open source has evolved, the vast majority of open source is covered by essentially those 6 or 7 licenses that I mentioned earlier. Whereas for data APIs, there's no real standardization. And so that means, unfortunately, you have to look carefully at the terms of every API that you're using to understand what you can and can't do, how you can and can't aggregate it. And that's important not just from a copyright perspective because a lot of the data that you may be getting out of an API may be protectable under copyright, so there may be things there. There's also your company probably has a contract with the API provider, so there's potentially problems with, did you violate the contract or not?
And there's also considerations for things like privacy. Right? So, you know, thankfully, when we're dealing with just code, there's typically not a whole lot of privacy considerations. But when you are dealing with something like a database of usernames or a database of emails, how you integrate that data with other data sources and how you use it may have really big privacy law implications. And so that's not something, you know, even the reading the contract, reading the license can't always tell you about those privacy implications and especially for, companies that are doing work with Europe these days because Europe is finally enforcing rules on data and privacy in ways that the US has not yet. That is, you know, that's a real challenge.
And and people unfortunately, there's no 1 size fits all. I suspect we will see more efforts to standardize that licensing in the future so that there's less overhead, but it's definitely still a wild west right now, and you just have to be careful about it. It's great for me as a lawyer. It means lots of people have questions they wanna pay you for, but it's also frustrating because I can't give you 1 clear answer. And as you mentioned at the beginning,
[00:52:05] Unknown:
you are working to help provide an avenue for sustainability for open source developers and maintainers. And in the process, I'm I'm sure you come across a lot of questions and confusion about the finer points of intellectual property and software licensing. So I'm wondering what you have found to be some of the most common difficulties or points of confusion that you encounter in working both with developers on your platform and, customers and organizations who are on being onboarded to contract for support for the different libraries that they use. You know, the single biggest thing the single biggest,
[00:52:51] Unknown:
like, both fun and horrifying surprise to me is that we've spent a lot of time there's a lot of companies out there that will go around and scan all of your source code for you and tell you down to the per file or in some cases, even down to the per function level of, like, we have seen this function on Stack Overflow before. Maybe you should get a license from Stack Overflow for this particular function. Right? Or it looks like this file was copied and pasted out of some other program. The other program was under this license. The new file doesn't reflect that license. Right? And so for years, you've been able to do that, and, unfortunately, it ends up because it's so fine grained, it ends up with a lot of false positives about, you know, how, like I mean, again, when you literally get down to the function level, it turns out everybody's written that function before. Like, whatever function you think you're writing right now, you're like, oh, this function is the greatest, most unique, most individual function ever created. It turns out somebody else's art written that function if you do enough search over enough source code. Right? So you get a lot of false positives.
So we've tried to turn that on its head, and instead of looking down at the lowest, most fine grain level, we've been looking at the top level. So GitHub's API, GitHub will try to detect the license in a source code repository, and it and if it can find a license, it'll report that out through the API. Similarly, PyPI, NPM, etcetera, will, you know, have in their metadata, have a, a field that allows the package maintainer to specify what the license is. You would think that those 2 like, this is literally the highest level of metadata that you can get about, licensing.
You would think you would that the source code repo and the package repo would match most of the time. Nope. Like, depending on how you wanna count, like, 70, 80% of packages at most, they actually match up. So, like, even at that most basic level, people are making mistakes both large and small. And so talking through all of the different ways in which that can get screwed up has been a really interesting that's been an on ramp for conversations with maintainers about how to educate them so that they're building something that is more useful to corporate customers. Right? And that's the kind of thing we're enjoying paying maintainers to do where it's, like, something that is valuable for corporate customers because the corporate customers can then build tools that actually reliably use those APIs.
And on the developer side, it's like, oh, you got paid to fix something that was actually pretty simple and basic once you knew what was going wrong, and you provide a value to enterprises. Right? And so that's where we're that's Tydelift's whole thing is that we sort of sit in the middle and we aggregate, and we tell people, like, hey. We can get problems fixed on both sides or both of you and get some money to flow so that people can get made paid for core maintenance, not just for consulting or working for Google or Facebook. And for anybody who is interested in learning more about intellectual property, particularly as it pertains to software licensing and open source projects. What are some of the most useful resources that you would recommend? You know, at book length, the if you really wanna get deep, it turns out in the Python community, you've already got 1 of the best people in the world on this in Van Lindbergh, who is on the PSF board and, you know, has done a lot of great speaking at Pycons.
And his book on intellectual property and open source, a practical guide to protecting code is very much written with it's about 10 years old now, but most of the material is still pretty timely, and it is a great resource for engineers because Van, like me, was an engineer, before he was a lawyer. And so he gets really deep into a lot of these programming analogies and how does, you know, how does legal code work in the same way as source code, and how and when are those analogies dangerous? His book is really great on that. You know, there's definitely other, there's definitely a lot of other great resources out there. TLDR Legal is 1, for example, that explains software licenses in relatively plain, English. Wikipedia is actually another pretty good resource for a lot of licensing information Because of the overlaps between the open source and open culture communities, the information on Wikipedia about those licenses is pretty, is pretty reliable.
So yeah. So I think those are all good, all good resources.
[00:57:25] Unknown:
Alright. Are there any other aspects of software licensing and copyright and intellectual property law that we didn't cover yet that you think we should discuss before we close out the show?
[00:57:36] Unknown:
You know, it's I think it's a it's a pretty rich field, so I could go on for for hours. I don't think I'll bore you too much. I think, actually, I will say 1 thing that's really exciting that everybody in the space should be really excited about is that this year for the first time in a couple of decades, new works were added to the public domain, because we did not pass a new copyright law that extended the length of copyright. And that's not super directly related to open source, but the idea that the commons is growing, because things from our history are being added is something that I think should be exciting for everybody who, has seen how valuable
[00:58:19] Unknown:
giving it away can be. Yeah. It definitely adds a lot more opportunity for people to incorporate the newly available creative works into their own projects, whether it's other creative projects or software or using some of the metadata from those works in creating various analyses. So, yeah, it's definitely great to see something like that become available so that more people can take advantage of it who wouldn't otherwise be able to whether because of monetary
[00:58:48] Unknown:
or, you know, geological restrictions, things like that. You know, and that actually reminds me of 1 other thing, Tobias, that I think is something that probably all your listeners should be, thinking about and reading about is you know, we talked earlier about data gathering for machine learning and and from APIs, and I think it's an increasingly important part of our role as computer scientists, as as ethical computer scientists, is to think about where that data comes from and what the nature of that data is. Right? Our algorithms, the things that we train our machine learning algorithms to decide can have a lot of biases, both more and less obvious. And the data that we ingest to train those algorithms on can be a key source of bias. Right? So for example, this is what reminded me is talk about old books.
So there's a big there's a big machine learning corpus that tries to do essentially word analogies. Right? Like, words can be more or less associated, and this has been trained on a big corpus of old books. And it turns out old books, while nice because they're available in the public domain, only ever talk about this is just 1 example. Doctors are always men and women are always nurses. And so if you train your machine learning on the corpus of old books, your machine learning is gonna be sexist. And we won't even get into how racist your machine learning algorithm is gonna be if you train it only on old datasets. And so I think this is 1 of these big the intersection of intellectual property and what datasets we allow people to use and don't use, what are the contents and quality of those datasets is something that I think every programmer is gonna have to become more familiar with because we, as programmers, are gonna have to be the ones to raise the red flag and say, hey. I'm not comfortable with the quality of this dataset because I think it's got some biases in it. If we don't, if we don't do that, you know, not a lot of people are.
And I can I can give a link to a good paper on this topic, by the way, to,
[01:00:51] Unknown:
to your readers? I'll I'll I'll have it along for you in the the written material. Yeah. We'll add that to the show notes. For anyone who wants to follow along with the work that you're up to and get in touch, I'll have you add your preferred contact information to the show notes. And with that, I'll move us into the picks. And this week, I'm going to choose a movie that I watched over the weekend, the Spider Man Into the Spider Verse. It was a lot of fun, very interesting, very visually appealing, slightly different take on some of the Spider Man, mythos. So it was definitely worth a watch, pretty broadly accessible to different age groups. So if you're looking for a movie to watch, that's definitely 1 worth checking out. And so with that, I'll pass it to you, Louis. Do you have anything to pick this week? Well, you know, I could be the boring guy who says The Good Place just like everybody else. I think, feel it feels like they're telling me I should watch The Good Place. It really is terrific.
[01:01:43] Unknown:
I wanna I'll throw out something that I read, over Christmas break. It was pretty great. Twitter and Tear Gas by, I'm gonna I'm gonna butcher her name, but Zeynep Tufekci. She's a Turkish originally Turkish, now works at University of North Carolina Chapel Hill. And it's a book about tech and revolutions and how we talk to each other, and it's just an endlessly fascinating book that,
[01:02:08] Unknown:
I think everybody in tech should read. Well, thank you very much for those recommendations and for taking the time today to help educate us all on some of the various aspects of software licensing and copyright and intellectual property that we should be thinking about. So I appreciate your time and effort on that, and I hope you enjoy the rest of your day. Great. Thank you so much. It's been fun talking to folks.
Introduction and Sponsor Messages
Interview with Louis Via: Introduction and Background
Overview of Intellectual Property Categories
Understanding Software Licenses and Compatibility
Recent Developments in Open Source Licensing
Ensuring Compliance with Software Licenses
Contributor License Agreements and Developer Certificates
Licensing for Creative Works and Media Assets
Considerations for Data Licensing and Privacy
Common Confusions in Intellectual Property and Licensing
Resources for Learning About Intellectual Property
Picks and Recommendations