Security, UX, and Sustainability For The Python Package Index

Hello, and welcome to podcast dot in it, the podcast about Python and the people who make it great.

When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode. With 200 gigabit private networking,

scalable shared block storage, node balancers, and a 40 gigabit public network, all controlled by a brand new API, you've got everything you need to scale up. And for your tasks that need fast computations, such as training machine learning models and running your CICD pipelines, they just launched dedicated CPU instances.

They've also got worldwide data centers, including a new 1 in Toronto and 1 opening in Mumbai at the end of the year. So go to python podcast.com/linode,

that's l I n o d e, today to get a $20 credit and launch a new server in under a minute. And don't forget to thank them for their continued support of this show. For even more opportunities to meet, listen, and learn from your peers, you don't want to miss out on this year's conference season. We have partnered with organizations such as O'Reilly Media, Dataversity, and the Open Data Science Conference with upcoming events including the O'Reilly AI Conference, the Strata Data Conference, and the combined events of the Data Architecture Summit in Graph Forum. Go to python podcast.com/conferences

to learn more and take advantage of our partner discounts when you register. Your host as usual is Tobias Macy. And today I'm interviewing Nicole Harris and William Woodruff about the work they're doing on the PyPI service to improve the security and utility of the package repository that we all rely on. So, Nicole, can you start by introducing yourself? Yeah.

My name is Nicole Harris. I've been working on PyPI or the warehouse project, which is

the code base that powers PyPI

for about

3 or 4 years now. In my day job, I manage, a UX, UI team, at a company called PeopleDoc.

But,

in my spare time, I I work on Pipe AI. And, William, can you introduce yourself?

Sure. So my name is William Woodruff. I'm a security engineer with a small security consultancy called Trello Bids. I've actually been working on warehouse for only about,

about 5 or 6 months now. We started the work back in March.

But, during my day job, I sort of split my time between, engineering and and research.

And on the research side, I do,

program analysis research, mostly government funded. And on the engineering side, I work on mostly open source projects like warehouse

and OS Query and things like that.

And going back to you, Nicole, do you remember how you first got introduced to Python?

So my background is in HTML,

CSS, design,

user interface.

So

I

Python wasn't sort of the first technology that I was exposed to in terms of the web,

but my husband is actually a a Python

developer.

He started

teaching himself programming

by learning Django.

So through him, basically, I got introduced to Python and,

also learned,

you know, enough Python to to to to be useful

alongside my front end skills.

And, William, do you remember how you first got introduced to Python?

I think I think I used Python in a few, university courses, but I didn't actually really start programming in earnest at it until,

I took this job.

And before that, I mostly did, actually c and Ruby. So,

this has been sort of a nice a nice turn for me.

And given the fact that you haven't been using it for your day to day, I'm curious how much effort it's been to get up to speed with the code base and be able to understand it and be effective with it, and how much of your experience with Ruby in particular

so I I think, fortunately, the the warehouse code base, I'd like to say, is is probably 1 of the nicest Python code bases I've worked on. It has, like, 1

100% unit test coverage, and the idioms of the frameworks that it uses are actually well preserved across the code base. So it was actually relatively easy to get up to speed. And, thankfully, I had both Nicole and everybody over on on the PSS side as well as Sumana to

answer my questions as as they came up.

And so

for both of you, I'm wondering if you can just start by sharing a bit about how you each got involved in working on the PyPI project and the main responsibilities that you have.

Yeah. So I can maybe start there. So I think it must have been in 2015,

Donald Stufft, who is the

lead developer

on on warehouse, which is the project powering PyPI,

sent out

I think he actually opened a GitHub ticket that said help. I need a designer.

This is not something that I'm good at.

You know, I'm rebuilding this thing and,

you know, this is completely outside of my skill set so, you know, please retweet.

And it was through 1 of my friends that I'd actually met at a Python conference

that I that I kind of put my hand up and said, hi. You know, I'm Nicole, and this is this is what I do, and and I think I can help you. So so that's how I got involved in

in my my involvement has kind of

extended from there. So in terms of my,

responsibilities,

I'm responsible for the UX,

the the UI, so the user experience, the user interface,

as well as the, HTML and the CSS

code base for, the warehouse project. So

a bit of both, a bit of coding, and a bit of designing.

And, William, how about yourself?

Yeah. So on my side, I got involved via the current contract that I'm working on, which is the OTF funded security improvements to warehouse.

And my work has primarily revolved around,

4 key changes to the warehouse code base to sort of improve

both the way that users

improve the ability of, for users to secure their accounts as well as,

improve the general security posture of of the PyPI code base. And I can talk about specifics of those improvements as as we go forward. But,

that was that was how I got started.

And particularly for you, Nicole, what was the state of the system at the time that you first began working on it and any of the notable issues that you were first faced with?

So I don't know if you're aware of the sort of full history of PIPI.

The

when I joined

it the project in, I think it was 2015, 2016,

basically, pypi.org

was still powered by an old code base that had been written, like I think it had been written before kind of web frameworks even existed. I think I think Donald described it as before we even knew how to, like, use Python to

to build great web experiences.

So

in terms of the the state of the ecosystem,

you know, there was this old code base

that I was kind of that the tunnel really discouraged me from diving into. He was, like, look, don't look at it. It's not best practice. What we're going to do is we're gonna rebuild this from scratch. So,

you know, I had a fairly clean slate in terms of the user interface and, in fact, the HTML and the CSS,

code base.

Donald did have some sort of I think there were bootstrap

templates

that were working in in the code base, but

they they weren't particularly

finessed, let's say. They were basically just kind of outputting data onto the screen. So I basically rebuilt that from scratch

and made a whole lot of decisions about how we were going to structure,

not so much the templates, but certainly

the,

the the SCSS because we we're using SAS, the SCSS code base so that it would be something that would be easy to maintain

moving forward.

Because

if any of your listeners have experience sort of working on large code bases with CSS, it can get out of out of control pretty quickly.

So we needed to put that in from the beginning.

Yes. As I started working on warehouse, 1 of the first things I looked at was, sort of the,

present security posture of of the site and of the

various, like, sort of common weak points in in package management,

such as, like, name squatting or, project name reuse or username reuse.

And, overall, as far as, package managers or package indices go,

warehouse was in a pretty good state. So, for example, it as I began work, it already supported,

preventing,

common typosquatting

attacks on packages.

And it had rate limiters and other sort of mechanisms in place to prevent,

these really common low level

attacks against package indices.

The things that, I ended up working on, as part of the OTF funded scope were things that are sort of above and beyond the current

norm for for package indices.

And that would be, like, 2 factor authentication, API tokens, surprisingly, are not the norm

for for package indices

and the auto logging infrastructure.

And, Will, I understand that you have also worked on the homebrew package manager as well. And I'm wondering what your initial reactions were as you started digging into warehouse and how it compared to your prior experience of working with other package managers and some of the common security pitfalls that are germane to that particular type of application. Mhmm.

I I will say I am probably the,

Humber's current worst maintainer. I'm probably the 1 of the least active ones.

But the the security issues that Humber has to deal with are somewhat, unfortunately, somewhat orthogonal to the traditional Homebrew or traditional package management, issues, primarily because Homebrew, revolves around this this central Git repository for all packages.

And so we actually have,

finer grained control over both the integrity of packages as well as their origin because we can actually see the the git committer,

as well as,

run, like, CI checks, basically, iteratively as every package is updated. So it all go it it all gets centralized in a way that, for example, PIPI,

can't necessarily can't necessarily do. But that that being said, I think

And so for each of you, once you began working on the PyPI code base

and working toward some of the initial issues, I'm curious if the problems that you were addressing are identified ahead of time or what your overall approach was for determining what were the most critical and most important tasks to be undertaking to improve the overall

security and user experience of the platform? Yeah. So I can take this 1. I think this this kind of relates to the way that,

that this project has actually been funded.

So as well as being a a contributing

designer slash developer on on PyPI,

I'm also a member of the Python Packaging Working Group, which is the sort of sub organization or a working group that that's that works under the Python Software Foundation to raise money for packaging related,

projects. And it was through that working group that we actually got funding to make the security improvements that the users are starting to see being rolled out on PyPI. So

the scope of of the work that Will and I have been,

undertaking

is is directly related to the application that we made,

to the Open Technology Fund who have actually funded this work. So what we did is we,

we looked at their their mission and their vision and their values, we looked at the different,

grants

streams and we made an application

for the items that we thought were relevant to their particular fund and that was kind of what determined the scope of of everything,

that has been funded through that particular initiative. So

I think, Will, you probably agree that in coming into this project, we had fairly

well defined parameters around what was and what wasn't in scope based on, what was basically being funded and what we'd what we'd said we were going to do for the OTF. Yeah. I I think that's correct. I think, yeah, we we had a we had a a high level idea of the individual goals we want to achieve based on the work that we scoped out with OTF.

And then once we actually began work, we, sort of prioritized the individual tasks based on what we thought would have,

both, the highest user impact as well as,

what we could roll out with with, like, minimal disruption to, like, I think, like, package upload and

the user experience.

And given that you're both focusing on somewhat different areas of the platform, I'm wondering how often the issues that you're focusing on have had overlap

and what the cross section ends up being between user experience and security, particularly given that the interfaces

that you're dealing with aren't necessarily

just the web UI that you see when you load up the web page? So I I I actually I'm I'm of the opinion that, like, UI is is severely underrated in terms of,

user security. So users oftentimes

are don't know how don't really know how to engage with the security features that, security engineers expose to them. And this is an issue that I've run into in in other platforms that I've worked on.

And I think a huge part a huge a huge boom to working with Nicole has been,

actually,

setting up a set of features and then,

seeing how how to expose them correctly to users.

So that's something that I I I I'm not personally equipped to do. And seeing

her build like this actually extremely pleasant to use and extremely intuitive

setup has been,

really great. And then in terms of the trade offs that exist, I know that oftentimes there's a conflict between

improving the overall security of a system, but also still making it usable. Because as you ratchet it down too tightly on making something ultimately secure,

you start to encourage people to take shortcuts that ultimately reduces the effectiveness of your practices and how you try to balance that issue,

practices and how you try to

balance that issue and some of the common patterns that you have settled on to make sure that you're improving the security as much as as possible while still making sure that people are adhering by the security practices?

Yeah. I think a huge challenge when designing,

secure systems is is security fatigue. So 1 of the last things you wanna do is,

like you said, ratchet down the system so much that users become frustrated and take shortcuts

to achieve achieve their ends. And that's,

1 of the issues you often see with, like, 2 factor implementations is is a 2 factor implementation will, like, require a user to sign on or reauthenticate so frequently that users will just, like, move their TOTP setup onto their post itself and just control c, control v, and thereby, like, dissolve the the second factor component of of the authentication scheme.

And

I'm wondering too if you can just enumerate the overall list of interfaces

and the total surface area of the problems that you're each working with as far as

the specifics of the PIPI project. Because with some projects, it might be limited to just the web UI. Others, it might be just an API.

But with PyPI, there's the web interface. There are the APIs that users are using. There's the actual data integrity

as well as the actual

interactions that people have of downloading and installing the packages, which is potentially another attack vector that isn't necessarily going to be present in other projects.

Yeah. I think so the work that,

at least what I did the work that I did,

primarily centered around

the API and the,

web interface.

So the security features that we added, specifically 2 factor authentication,

API tokens, and an audit log of those,

2 factor authentication is is intended primarily for

use with the web interface,

and audit log, visibility is is performed

via the, web interface.

Although some auto log events are actually captured as the user hits the API for, like, security sensitive actions such as, package uploads or

file uploads or removals.

But also on the API side, there's API tokens themselves, which, the user will interact with via a tool like setup tools or Twine or any of the other clients that interact with with the, warehouse API. And,

Nicole, for you as well, I'm wondering

what the

surface areas that you're dealing with as far as the user experience work and some of the ways that that manifests at the different trade offs and the interactions between APIs and web UI

and the overall package upload experience, etcetera?

Yeah. So, I mean, in terms of this current contract, my work has been limited to basically what we've just described. So

first part was, making sure that when users

that that they find it easy to set up 2 factor authentication and then to use that, when logging into pypi.org.

So that's sort of the first first thing we worked on. Then we looked at the,

the API keys. So

sorry. API tokens. We're avoiding the word keys, and I can tell you why later.

But

looking at how it's it's,

you know, making it easy for, users to set up those those tokens.

And then, obviously, as Will said as well, exposing the audit log to the end users. In terms of my work with regards

to sort of

the way that people interact,

with PyPI outside the browser,

that's really limited to me,

making sure that

the instructional text and the help text that we're showing on pypi.org

is actually useful enough

for people to be able to to do what they need do. So, for example, with the API tokens that we're we've just deployed,

I've been running some user tests that have revealed that perhaps

the the way that we display the token and and the instructions that we give to users

currently is not good enough for them to understand,

what they need to do next,

using

using whatever tool they're using. So,

that's kind of where my sphere of influence

kind of sits is is making sure that people

have

the information that they need to be able to then interact

with PIPI,

however they need to do that. That I'm sure is also

complicated by the fact that there are any number of different tools that people might be using that would require the access to that API token where I know that there's PIP and there's Twine for being able to upload things and Flit, and

there are, I'm sure, any number of different homegrown applications. And I'm wondering how that plays into your efforts to make sure that the instructions are clear and accessible,

and,

I guess, how far you're willing to take the effort and when you decide that you've covered enough ground

and the sort of majority of people are handled and

anybody who is in some of these edge cases is there because of something that they've decided to do that isn't necessarily

something that would be required to be supported by the people responsible for the PyPI infrastructure. Yeah. I think I think there's sort of 2 2 Yeah. I think I think there's sort of 2 2 factors when when thinking about or at least how I think about designing for PIPI. It's

it's that yeah. People have different workflows as you've just described, and also that you have people with different really vastly different levels of knowledge as well.

So, you know,

Python is now being used a lot, as a teaching language.

So

I'm really aware that PyPI

could be the first,

you know, package index that some people are, are using or experiencing.

So they might not be familiar with all of the concepts that we present to them.

On the flip side, you have people who've been coding for,

you know, decades

and are really familiar with all the concepts. So it's a real challenge in terms of making sure that

you're explaining things enough for beginners, whilst also not sort of, you know, talking down to people who are who are really experienced.

So that so that is that is a challenge. But I tend to lean on the side of, okay, let's

let's give more information for beginners because at the end of the day experienced users can ignore

instructions that they already know if they don't need them. In terms of the the kind of

weighing up of how much information to give, we tend to take a lot of feedback,

from the community. So, I mean, I've run, user tests.

I'm thinking more less about the API tokens here and more about the 2 factor authentication,

workflow that we worked on. I ran a whole lot of user tests

when we were rolling out those interfaces

with people with different levels of experience

and who were who who had different

who were kind of using different tools. You know, for example,

to for TOTP

to to authenticate, they some people were using

a password manager to create a temporary 1 time password.

Some people were using, mobile phone. They're using Authy. Other people you know, there was all sorts of different ways that people were doing that. And and what we, in the end, did was put a whole lot of examples into our help text of of, okay, these are the kind of the kind of applications that you might choose to use. And we made sure that we had a good balance there between,

you know, sort of the most popular tools. So things like Google Authenticator and Authy are sort of floated to the top of the list as as things that people sort of were mentioning frequently,

but also mentioning,

you know, the kind of less common use cases, making sure, for example, that we were listing

nonproprietary,

solutions as well because we know that there's members of the community out there who prefer not to use

proprietary software. So

it's it's really just about prioritizing the way that you present the information to to cover the most common use case first

and then give the kind of the information for the edge cases later. Yeah. And I would say that's the same also when we're talking about,

web auth and which is 2 factor authentication with some kind of device.

Lots of people understand that as, oh, I authenticate with the YubiKey because, you know, YubiKey

is probably the most popular

the most popular USB key that you can use,

with that particular standard, but we do have people out there in the community who are using other things. So what we ended up doing was,

writing the instructional and the help text in such a way to sort of emphasize

USB keys,

mentioning,

certain brand names so people kind of were associating what we were talking about

with the correct concept

and then also mentioning, 'hey, there's all these other ways that you can also do this as well by the way. So I think that balance is quite good,

because

generally

if you are not necessarily using the most mainstream, as you sort of said, if you're not using the most mainstream solution out there on the market, then you're probably

more familiar and a more advanced user anyway,

in which case perhaps the help text is or the instructional text is less required for you than it might be for someone who's a beginner who's using, something that's fairly mainstream. Yeah. And and to add on to that for the API tokens work we did, 1 thing that's, pretty interesting about the Python, package ecosystem as a whole is that, there's a whole lot of third party clients out there and a whole lot of third party implementations that talk to these APIs. And so as we were designing out the initial API,

keys

approach, we realized that we would probably have to make concessions in terms of, like, authentication semantics to, make them fit into all of these third party clients that expect a username and a password instead of just a general purpose key, for authentication.

And as we're working on that, we also realized very quickly that the range in continuous integration setups as well as other automated systems

constrained our ability to add certain token prefixes

and certain,

stub usernames.

So doing all that work was was pretty interesting because it involved,

community feedback as well as trying to sort of guess at the common or happy paths and and unhappy paths for for common for common uses of, tokens

or, sorry, API keys. 1 thing I'd also like to add to that,

is I don't know when this podcast exactly is going to go out, but, currently, in terms of those API,

API tokens,

I'm

still working on, improving the help text and the instructional text.

But I do need to seek

feedback

from from members of the community

as to what tools they are using

their API tokens with so that I can make sure that I am covering all of those well, as many of those use cases as possible within the help and instructional text. So I suppose that's a bit of a call to action, and I know we'll probably get a chance to make another 1 by the end of this podcast.

But if you're a community member out there

and you're particularly particularly interested in if you're using a continuous integration

service to upload your package to Pypi, and you'd like to test out the API tokens, then I'd really like speak to you because understanding what your workflow is and and how we can document that

in the user interface and and give brief but useful instructions,

would be very valuable.

As we've been discussing here, there is a wide variety

of people and patterns in terms of how the PyPI infrastructure is interacted with. And I'm curious how that

informs and affects your overall workflow and strategy for inter for introducing changes

to the platform

and how you, validate and,

I guess, control the rollout of those changes. Yeah. So I can speak on that,

a little bit.

So in terms of releasing,

new features, well, a lot of this is actually handed by, Sumana from ChainSet Consulting who's our project manager for this contract, and and she's worked as a project manager for previous contracts as well. And and what she does is she reaches out to the community

and does a lot of communication about what the upcoming features are going to be. We then

when we release a new feature, it's marked as a beta or beta, depending on your accent,

feature.

So it it sort of comes with the warning of, you know, this is something that we've shipped,

but, you know, it's it's it's still not kind of certified as as as perfect and

and production ready. So, you know, obviously, set things up with the expectation that perhaps things might change.

And she does communication at that point as well to to reach out to the community to say, hey. We've released this this new thing. Please go and test it.

At that stage,

I obviously also do some,

reach out in terms of user testing

with people to see if they've got any any problems,

working through,

the interfaces. But we also because of her work in in sort of communicating what's going on to the wider community, we do tend to get a lot of tickets opened up on on GitHub where people have said, hey. You know, I've I've tried out this thing and it's not quite working.

You know, there's a bug or I'm using a browser that you haven't tested it with or or whatever it is, and then we go and address those, those particular issues before we can, obviously, move out of the beta period. So so it's been quite smooth so far in terms of, you know yeah. There's bugs, but we expect that to happen within that period, and we've been quite good at turning around and fixing those.

And,

and because we're labeling things as beta,

people understand that that that's, you know, part of the process of developing

community?

Yeah. So I think the the big things that come to mind are what you mentioned earlier with with confusion about token versus key in the context of security token versus, what I originally called API tokens, but we quickly realized confuses users because they associate token with with a physical device. We've also on the on more of the development side,

I think I mentioned earlier, but,

warehouse has,

pretty comprehensive unit tests. So

as as we've been developing,

we've been somewhat fortunate to catch things that otherwise probably would have would have blown up in production,

as both unit tests and as as,

sort

of, smoke tests via either Sumina or the reviewers on on the PSF side. That would be Ernest,

Donald, and, Dustin.

So we've mentioned the

API keys and some of the 2 factor auth

features that have been introduced. I'm curious what have been some of the other notable features or improvements that you've each been involved

with? Well, I I suppose

I've been involved in since very early. So I'm gonna scope my answer to that question to this particular contract, which is the OTF contract.

So, yeah, as you said, 2 factor authentication,

API,

API token,

and then the audit log, which is is basically being able to expose.

So that that with this kind of from my point of view, there's 2 sides to that audit log. It's,

we have an account audit log. So when, you log in to your PIPI account, you can see, okay, you know, when did I last change my password?

When did I set up an API key? When did I enable 2 factor authentication, etcetera etcetera. So

we've got that exposed, and then we've also got, project

audit logs as well. So things that have happened on an individual project. So, for example,

a new release is being made

or or an API key is being created that has permissions on this project, so so things like that. The other thing to mention is that the OTF grant

doesn't just cover security.

When we made the application through the Python packaging working group, we also,

received funding to,

improve both the accessibility

and the localization

of pypi.org

as well. So,

some of my work, well,

already, we I'm working on this, but it's it's gonna be my work moving forward as well, is to improve the accessibility

of ppi.org

for people who are using assistive technologies. So, for example,

people who are using screen readers or people who are limited to just using their keyboard,

people who are using high contrast mode, etcetera. And, also, we're going to be implementing,

localization,

so

making it possible for us to translate

at least the interface copy on pypi.org

into her local

languages, so French, Chinese,

whatever whatever community contributions we get for translations.

Those things are kind of within the the scope of the OTF contract as well. So that's super exciting because it's not just about thinking about how we can make the site more secure, but also how can we make it more universally

accessible

for people who have different needs and who are who are in in different communities, Python communities around the world.

And William, in terms of the attack vectors

that you have considered for PIPI, I know that you said in general, it was in a fairly good security stance as far as already having some capacity for mitigating

typo squatting attacks.

But I'm wondering if there are some of the other attack vectors that you have looked at or other things that you're concerned about for PIPI,

recognizing that you're

not asking you to do any sort of,

improper disclosure, but just in general, some of the thoughts that you have as far as security and attack vectors for a package repository.

Mhmm. Sure. So so the the really common attack vectors that you see on package indices and package managers are

sort of those typosquatting,

package takeover,

phishing based attacks,

where someone will try to take over the account or or add themselves as as a contributor to a project and then push up a malicious version of that project that contains, you know, a malware dropper or whatever whatever it needs to be. And like I said, so, fortunately, PyPI already had a few pretty pretty good mitigations in place, including for typosquatting and and rate limiting to prevent

credential brute forcing. There are some things that are sort of already well known, well known weaknesses in API's setup. Those include sort of the way that that roles are currently structured. So at at the moment, any account can be added to

any project

as an owner without that other project without that targeted user's consent.

So and and prior to this audit log in,

without

a ton of of of,

history or or logging to designate that that change. So there are there are big issues with sort of transparency and package ownership

as well as transparency and changes in package,

control. So, like, it's it's

if you'd, I think I'm actually not positive about this, but I believe, currently, if you delete your project name on if you delete your project on pypi.org,

another user can claim that that name. And if that happens, you can then imagine a sort of package reuse attack where a popular package gets deleted by an attacker, and then they become a, like, in a sense, the legitimate owner because they've they've actually claimed the project rather than taking it over. Yeah. That's correct as to my knowledge, Will. However,

they can't release

any files

that have previously

been released, if that makes sense. So it would only be new versions moving forward, but you're right in in the sense

that, yeah, it would be

they would own the package

and have the legitimacy of of of that that package name. With regards to your first comment, I know that we do have a pull request in progress, so I'm hoping,

that we'll be able to address

the issue with,

giving permission,

to add collaborators,

soon. Yeah. There's also the sort of more general problem of of active scanning of of projects or rather packages as they get uploaded. And that's, I think, as far as I know, an unsolved problem in the world of of package maintenance. And I don't think that's something that that that pipe I could fairly be asked to solve. Will, what do you mean by that? You said active scanning? Yeah. So,

imagine, scanning for, like, common indicators of compromise or common, indicators that a package is is malicious,

for some for some, you know, fuzzy definition of of malicious. Because Because you can imagine, like, a research package that contains,

malware samples or,

what have you. And particularly

given the flexibility

non trivial and,

potentially NP complete problem to be able to actually definitively determine whether or not a package is malicious or has some, nefarious intent. Yeah. This is a problem that some of the most locked down,

platforms in the world struggle with. You know, Apple with their App Store struggle with static analysis

immensely. So I I think it would be completely unreasonable to expect a dynamic language on a community maintained index

to solve this problem.

So in terms of your overall experience

of working on and with the PyPI platform and the community of users who rely on it, what have been some of the most interesting

or challenging or unexpected aspects of that work?

I I can try answering that.

So, on my side, at least, I've I've done community management before.

Some of it has in my role as a Humber maintainer or some of it on my own open source projects as well as the open source work that Trello Vets does. But it is it is different every time. And so especially when dealing with,

future changes that affect potentially,

tens of thousands of people. It can be

sort of challenging to get people to,

see your side of things, especially when it comes to, like, event logs. So,

very understandably, users are,

wary of any sort of feature that records their IP address or records

security salient events

about their actions.

And so it can be difficult to explain to users who don't necessarily see the value of those recordings

from a security perspective.

It's it's it can be difficult to justify,

those events to them and coming up with a compromise where we both get actionable we're we're able to to record

enough information to take action while also preserving their privacy and and mitigating their concerns can be can be a challenge,

especially, you know, for for countries where, GDPR compliance is is key. I think on my side,

1 of the issues with

doing design in the open on open source community

projects is that

the work is very, very visible.

And it's it is really hard to to satisfy everybody. You know? Everybody's using different browsers. Everybody has different use cases, and

and, you know, we don't have any full time resources on on looking at the user experience of PyPI. It's It's just me and the hours that I have,

either in my spare time when I'm working as a volunteer

or as on this contract

for my contracted hours. So,

you know, it's it it has been challenging,

to try and satisfy everyone and and and make everybody happy. That was probably

more challenging when we had the transition from the old pypi.org,

sorry, the old pypi code base to pypi.org,

when there were a lot of changes,

which

was disruptive to people's

existing workflows.

On the other hand, there were a lot of people who were like, yay, PyPI's sort of moved into the modern era and it works on mobile and and, you know, so there was kind of 2 sides sides to every coin.

What I've tried to

do in terms of my work with PyPI

is make sure that

when decisions are made, that they're really backed

by,

either user research or user feedback

or by user testing.

So, you know, it not just being a case of me saying, well, it's my opinion that it should be like this and therefore

my opinion

is

most important, but actually being able to show people, hey. I looked into this. I looked at prior art or I looked at I spoke to people within the community and this is the reason that this decision has been made. And when you actually articulate the reason and you show people that you've you've thought about this more than just, you know, this is my opinion, then people are really responsive to that. So I think that that's been quite a positive experience for me in in interacting with the Python community, who as a whole are very friendly friendly bunch of people. In terms of the

future work that you either have planned for your existing contract or that you have identified as potential improvements to the platform in general,

what do you think are,

most interesting or most notable? And what are some of the ways that listeners and the broader community can get involved and help out with your efforts and just the overall

work needed

to keep the PyPI platform

healthy and viable for the long run. Yeah. So I can address that. In terms of the current contract,

most of the security work is is kind of done now. I mean, there's a few things that we need to wrap up.

And as I mentioned, I would really like to talk to anybody who's using,

CI to upload to to PyPI,

because that would be really helpful for me in terms of making sure that the interface is working for those use cases.

In terms of the rest of this contract, as I mentioned earlier, we have accessibility

and localization, which are the last, 2 subjects that we need to address.

In terms of accessibility,

I've also put a call out recently.

I'd really like to talk to any,

members of the Python community,

who are

are interacting with websites

using assistive technologies.

So if you're

a a user

who's online using a screen reader, I would love to speak to you.

Same for if you're someone who's limited to using a keyboard or if you're using high contrast mode or if you're using, like, a a very zoomed in, version,

you know, you're using you're zooming in your browser a lot, because of poor eyesight.

The reason that I would really like to speak to to people,

who are using the web in those ways is because

we're doing an audit against

WCAG 2 0

standards, which is is kind of the accessibility standard. But

just being able to tick the box isn't

in my view enough. I mean, obviously, we wanna check the box and say, yes, we're compliant.

But actually

being able to test,

the interface

with people who are using assistive assistive technology

and and seeing that it's working for them in in real life with real life use cases is super important as well. So it's not really enough just to check the boxes. We really need to talk to people about how they're using the site as well.

And on the localization side, and and I think there'll be more communication that will come out about this,

later as we sort of get into that milestone,

We are going to be looking for people to help us to actually translate,

the interface copy into different languages.

So once we've actually got the the technical implementation done,

you know, we're going to want to get people to translate it into whatever language

that they'd like to translate it into,

barring Arabic and Hebrew in any right to left languages because that is outside of scope of the current, project.

Yeah.

Also, on on the security side of things,

there are things that are out of scope of the current contract, but, that I believe are are planned for future iteration on on the warehouse code base. And that would be things like,

for API keys. The implementation that we went with, is based on the security tokens called macaroons.

And 1 of the interesting things about macaroons is that they have, embedded in them something called a caveat language, which allows for,

a sort of rich description of the permissions associated with each token. And, currently,

we have a versioned

a a version field in our caveat language that allows for those permissions to be iterated on,

and modified to allow for sort of really rich interactions

with the authentication system. So you can imagine I think the future in the future, the plan is to add tokens that expire after exactly 1 use or are only allowed between certain hours of the day or can only be used from a certain domain in terms of or a certain authenticated IP,

or things like that. So, I think we've put out, on the warehouse issue tracker,

sort of a request for for help with that. Yeah. I I should mention here as well that,

if any of your listeners are interested in contributing to the warehouse project,

the issue tracker is in in fairly good,

is fairly well managed. So we do tag issues with needs discussion or help required.

So

going on to the issue tracker and having a look at what discussions are happening,

is kind of a useful way of of being able to find out where you could,

help make PyPI

more sustainable

in terms of,

the the feature development that we're currently working on. The other thing I'd like to mention as well is that

and I think

what Will's already said today kind of reinforces this. It's a really nice code base to work on. Like,

pretty easy to set up with Docker and Docker Compose.

Got really great unit test coverage.

It really is a very nice code base to work on. So, you know, if you're looking to make an open source contribution, I I think it's a it's a good candidate. And we do welcome also,

people who are making their first contribution to open source as well. So,

it's not just your more experienced listeners who can make contributions

to to the warehouse code base. We have plenty of tickets tagged with good first issue,

which,

specifically for people who are looking to make sort of more minor or sort of to ease their way into open source contributions.

Yeah. I do wanna hammer that point. It it really is a nice code base.

I've worked on a lot of, both open source and proprietary code bases written in sort of a combination of Python 2 and Python 3 or, you know, now Python 3, but we're migrated from Python 2 with very bespoke setups and

and environments that were clearly developed from an engineer's desk somewhere inside of an office.

And warehouse fortunately is not 1 of those code bases.

And is it worth digging more into the actual funding behind this work and how that's structured and just some of the overall sustainability

efforts to

be able to maintain and upgrade the PIPI and warehouse platform? Yeah. So I can I can talk about that? As I mentioned earlier, I'm a a member of the Python Packaging

Working Group, which raises money

for for,

not just PyPI,

for any,

packaging related project.

And it was through that that that we got this this grant from the Open Technology Fund, OTF,

to actually be able to do that work. It's the second,

major grant that we've got for PIPI. You might be familiar

with the fact that we got a,

MOS,

we were granted a a MOS,

grant, a Mozilla open source,

grant

last year, must have been last year, and that was to migrate from the old version of PyPI to this to to the new warehouse code base and to retire that old code base.

So so far through the packaging working group, we've had 2

fairly substantial,

grants which have allowed us to really improve

the the the packaging index.

That working group continues to to work to to make grounds for for different

subjects, not just

PyPI, but also many of the tools that interact with PyPI such as PIP. So we're hoping that in in the next sort of year, we will have more more money coming in from those from those,

from those applications that we make and be able to fund more

sustainable development

for Python, the packaging ecosystem in general.

The other thing to mention is that we are very, very fortunate with PIPI

to have

a number

of great sponsors who actually give us the infrastructure for free. I don't have the data right now in terms of how much that's worth, but it's certainly 1, 000, 000 per year that it costs to actually run the Python packaging package index.

And

a lot of that is is borne by our CDN Fastly,

whose donation to us is actually quite enormous. So in terms of sustainability, we we have a mixture of these, the funding coming through from grant applications, and we have, you know, the these different companies giving us their their services

to enable us to keep the service up. The other thing that we,

we we appreciate is we have a donation page on pypi.org

where members of the community can donate towards,

the Python Packaging Working Group,

so that we can then have

a budget to be able to,

pay for maintenance

and improvements to to both PIPI and other projects.

An ideal scenario in the future is that we would have enough kind of recurring,

donations,

from the community that we would be able to set up a more reliable

either part time or full time

situation when where we had people working on packaging

as their job. Because at the moment, we really have mostly

just contracts that come and go depending on the money that comes in.

Are there any other aspects

of your current efforts on the PIPI infrastructure

or any other aspects

of the overall platform that we didn't discuss yet that you'd like to cover before we close out the show? Yeah. I I can't think of anything. Can you think of anything, Will? No. Not not in particular.

I could I mean, there's there's sort of interesting things about WebAuth and then TOTP that could go into, but that would be a bit in the weeds.

Well, for anybody who does want to dig deeper into that, if you have any specific references that you have found useful, I can add them to the show notes. And for anyone who wants to follow-up with either of you or get in touch and follow along with the work that you're doing. I'll have you each add your preferred contact information to the show notes.

And so with that, I'll move us into the picks.

And this week, I'm going to choose the show, The Expanse.

I started watching that recently and have gotten through the 1st season and into the second, and it's just,

very interesting and well done sci fi series

chronicling some, dramatic events set far into the future where humans have

gone beyond Earth and started populating other areas of the solar system. So it's

a interesting and, well put together show with a lot of good sort of environmental

aspects, such as the creole language that people speak further out into the asteroid belt. So, if you're looking for something new to watch, I recommend that. And so with that, I'll pass it to you. Do you have any picks this week? Sure. Yeah.

I don't know if I have a a a media pick. I've I've been reading I'm not actually normally an a a big nonfiction person, but I've been reading an autobiography of Abraham Lincoln by Carl Sandburg, who's a somewhat well known American poet. So it's a little bit out of his, like, I think not not his expertise, but it's a it's a little bit out of his, field of of renowned. But it's it's been a pretty interesting a pretty interesting read so far. It's actually a surprisingly nuanced,

biography of his life in the sense that it

goes through sort of the both political and military failures that he encountered. And it's just been it's been sort of interesting to read because, you know, you learn this stuff in 10th grade in American,

high schools, but then you just then it gets dropped.

I do have an answer.

So last week or the week before, I watched a documentary on Netflix called The Great Hack, which

was particularly

interesting to me because I live in the UK, and it talked about,

Brexit and Cambridge Analytica,

and and what's sort of been happening. I haven't followed that probably as closely as I should have. So, yeah, anybody out there who's kind of interested in documentaries,

certainly very, very interesting

and very topical at the moment with regards to the current political climate. Well, thank you both very much for taking the time today to join me and discuss your work on the PIPI platform

available workflows for people using it. So I appreciate all of your efforts on that front, and I hope you enjoy the rest of your day. Thank you. Thank you.

The Python Podcast.init

Summary

Announcements

Interview

Keep In Touch

Picks

Links

The Python Podcast.__init__