Summary
Software development is a skill that can create value and reduce drudgery in a wide variety of contexts. Sometimes the causes that are most in need of software expertise are also the least able to pay for it. By volunteering our time and abilities to causes that we believe in, we can help make a tangible difference in the world. In this episode Eric Schles describes his experiences working on social justice initiatives and the types of work that proved to be the most helpful to the groups that he was working with.
Preface
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- When you’re ready to launch your next app you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to scale up. Go to podcastinit.com/linode to get a $20 credit and launch a new server in under a minute.
- To get worry-free releases download GoCD, the open source continous delivery server built by Thoughworks. You can use their pipeline modeling and value stream map to build, control and monitor every step from commit to deployment in one place. And with their new Kubernetes integration it’s even easier to deploy and scale your build agents. Go to podcastinit.com/gocd to learn more about their professional support services and enterprise add-ons.
- Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email hosts@podcastinit.com)
- To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.
- Your host as usual is Tobias Macey and today I’m interviewing Eric Schles about how to get involved with social justice causes as an engineer
Interview
- Introductions
- How did you get introduced to Python?
- What are some ways that engineers can create real-world impact with their skills?
- What are some of the common roadblocks to contribution that people should be aware of?
- What are some of the types of projects or tools that can provide the most value compared to the amount of effort?
- Do you have any advice for picking an organization or cause that will benefit the most from technical expertise?
- Many of the tools and systems that get built for public or non-profit organizations require some amount of data for them to be useful. Do you have any advice on methods for identifying, locating, or collecting the necessary information for feeding into these projects?
- What are some of the design factors that should be considered when building tools for these organizations to allow them to be maintainable and sustainable in the absense of an experienced engineer?
Keep In Touch
- EricSchles on GitHub
- @EricSchles on Twitter
Picks
- Tobias
- Eric
Links
- USDS
- 18F
- OCW
- SAS
- R
- Machine Learning
- Version Control
- GitHub
- Agile
- OCR (Optical Character Recognition)
- Eric Schles Interview On Podcast.__init__
- Excel
- ETL (Extract Transform Load)
- Automate The Boring Stuff
- Web Scraping
- Thomas Levine
- Elasticsearch
- Trello
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to podcast dot in it, the podcast about Python and the people who make it great. When you're ready to launch your next app, you'll need somewhere to deploy it, so you should check out Linode. With private networking, shared block storage, node balancers, and a 40 gigabit network, all controlled by a brand new API, you get everything you need to scale. Go to podcast in it.com/linode to get a $20 credit and launch a new server in under a minute. And to get worry free releases, download Go CD, the open source continuous delivery server built by Thoughtworks. You can use their pipeline modeling and value stream app to build, control, and monitor every step from commit to deployment in 1 place. Go to podcastinit.com/gocd to learn more about their professional support services and enterprise add ons. And visit podcast then it dot com to subscribe to the show, sign up for the newsletter, and read the show notes. Your host as usual is Tobias Macy. And today, I'm interviewing Eric Schless about how to get involved with social justice causes as an engineer. So, Eric, could you start by introducing yourself?
[00:01:08] Unknown:
Sure. So I'm Eric. I've been doing data science and engineering for probably, like, the last 7 or 8 years. I've been interested in social justice for a long time, and I've been very lucky in my opportunities. I've gotten to work in local government, within law enforcement, at the DA's office. I've worked as a contractor for the New York Police Department as well as other police departments. I've gotten to work with some DA's offices all over the country. I've gotten to work in the federal government. I was hired by the Obama White House, and I got to work for the United States Digital Service as well as 18F, which are 2 agencies Obama started. And, I've helped out with a ton of projects, and I just generally love helping people and writing code that makes people's lives better. Alright. And do you remember how you first got introduced to Python? Yeah. So, I took the MIT Open Courseware course on Python and just kinda fell in love. So I had started off, using SAS, and then I moved over to R back when I was doing, economics because those are standard tools for that space. And then I took a machine learning class, and I was like, oh my goodness. Machine learning is the next thing. I need to, like, learn everything about it. And all the data mining books were coming from a computer science perspective. So I was like, okay, I need to study computer science.
And then I school around for, like, the basic computer science book or computer science thing because I decided to go for grad school at NYU and computer science after finishing out a bunch of stuff. And the Python intro class was the first thing they came up, and it was amazing. It was well taught. It was super well structured. And, from from that point on, I've I've fallen in love, and it's always been a love love relationship with Python, especially having worked in other languages. And, yeah, if you if you look at my GitHub, it's pretty clear that Python is my favorite thing in the world, as far as programming languages go. So as we mentioned, the main topic that we're here to discuss is ways that
[00:02:59] Unknown:
as engineers, we can create value and help out with various real world issues and social justice causes. So I'm wondering if you can just enumerate some of the ways that we can use our skills and our expertise to provide value to different organizations or groups who are working in that type of space?
[00:03:20] Unknown:
Yeah. So this is a really great question, and I'm gonna attack it from a couple of different ways. So the first thing I'll do is actually talk through some specific examples of things that I've done, and then I'll talk through the ways that you can sort of, like, sell yourself as someone who wants to do social justice work as an engineer. And then the last thing I'll do is talk about some other people's examples that I don't know as well, so I'll be a little bit briefer there. So, the examples that come mostly to mind are around applying, like, software engineering best practice to just really a problem. Because at the end of the day, software engineering is much about processes as it is about actually writing code. And the processes for doing work in the software engineering space are super good. Version control is like an amazing innovation, and just having the ability to use GitHub within an organization allows you to encapsulate and codify and put, like, all your knowledge base in 1 central place that, you know, can be shared across an entire organization, is easy to get access to, and requires no barriers to entry. And then I really think that agile processes can be super helpful. So now I'll talk through some those are some high level, like, process things, and I'll talk through some specific examples. So, my first job out of grad school was working for the DA's office in New York, Manhattan DA's office. And there, we had this 1 this 1 process for okay. So without going into too much detail, I worked in the anti trafficking space. And so when we would look start looking into a case about, like, a given trafficker, what we would do is we subpoena for all these records. And 1 of the things we would subpoena for is, like, their banking information to figure out if they were doing money laundering because money laundering is the number 1 way that you find, catch, and put away traffickers.
And so we would get all these picture PDFs, just like actually, it would be more like pieces of paper. So boxes and boxes of pieces of paper. And then these poor analysts would have to, you know, scan these documents in, and so they'd be picture PDFs, and they'd have to OCR them. And then they would manually go through and turn all of that, like, information into CSVs. And so the first thing I did is I wrote a super simple web interface, and I created a little parser that would take a PDF, OCR ed automatically, and then it would strip out the information that we cared about. I don't wanna go into too much detail about what we cared about. That's, like, I've got plenty of talks on that if you're interested. Go check those out. But, basically, automating that process took a process that took, you know, months to sometimes years and turned it into minutes. I mean, yeah, they still have to scan in all the pieces of paper, but you can do that over the course of maybe 2 days. The hard part is translating all the information and just, like, doing the data entry stuff.
Also, like, doing the data entry this way, the error rate, it meant that the analyst could focus on actually catching the bad guys. And so it frees up a ton of time. The second example I'll bring up is some work that I did for, the the federal government. And so when I worked for the federal government, 1 of the things that we ended up doing is, like, people would use these Excel documents. And Excel is prolific in the federal government. In the local government, PDFs are prolific, and the federal government, Excel documents are. So they're, like, a little bit further along. And so these Excel documents, they would just have functions embedded in them, and it was extremely hard to be able to understand what was going on in Excel document, whether the calculations were correct, all this other stuff. And so I wrote a simple tool that would take these Excel documents, put them into a database, gather all the data from all the different, Excel documents, then put them together. And this saved about because this guy was running this stuff manually, this saved about, you know, I don't even know how many hours, like, just a ton of hours over time. The point here, though so generalizing out.
Usually, people in the in the government space or in the in the the specific set of the social justice space, which does generalize, by the way, to a lot of the problems that happen across the federal space, across nonprofits, and across local government, which is where the social justice work is happening, is it's about ETL. It's about taking something that should be automated and automating it. So usually, like and I talk about this in a lot more detail on other talks, but, basically, it's about going from a point where things are done in a manual way to automating away, essentially, the boring stuff. So, Python is really great for that. High level software engineering is great for that. So now that I've talked through some case examples of the types of work that are usually low hanging fruit and easy to automate, and there's lots of good, tools and processes for doing this. Oh, also, if you wanna do this, there's there's a third class of things that are typically associated with this, and that's, web scraping. So a lot of times, you'll have a lot of, silos across government, but because of various mandates, they have to put things on the Internet or on the web or in in a downloadable form. But these agencies internally won't talk to each other, so you can just scrape the stuff from the Internet, especially in the federal space, and then, you know, actually make the datasets locally, for for yourself. So I recommend Thomas Levine's blog. He's got a lot of great a lot of great resources on how to do web scraping and other stuff. I also will put together a few links of of some various web scraping things after the podcast if you all are interested because he's been rewriting his blog recently, so it may or may not be available. So In any event and I have a talk on parsing PDFs that I can also send you as well as some tools that I wrote that do this. Okay. So now generally generalizing and abstracting out.
So once you have these sort of, like, class of things that can be helpful and are easy to explain to people, so not the sophisticated machine learning, not the user research and design stuff, that takes a lot more industry buy in or sorry, a lot of, more leadership buy in and other stuff. The sec the the next thing to do is just honestly start by building 1 of these tools or maybe even all 3, and then giving talks on it for that domain area. Maybe you found a dataset on the Internet somewhere and you cleaned it, and then you wanna share it back with that agency. Go give talks at a conference. I bet you, and this is, like, the crazy part, like, someone will know someone at that agency and they'll be able to introduce you and then you can develop a working relationship, either get hired by them or you can work pro bono. And these are 2 great ways and sometimes you can just reach out and people will be more receptive be receptive. I found that in the federal space, people you tend to have to to know someone. So usually, like, getting to that connection, so where you can actually do the work is takes a while.
But in the nonprofit space, you can usually just reach out to the head of the nonprofit and just be like, hey. I built this tool for the problem that you're kinda working on. I'd love to just give it to you for free and let you, like, let your team run with it. So usually, once you've built the tool, it may or may not be the right thing, and you have to you do some designs, some research, build it to their specifications and their needs. So, usually, it's best to do this work, in a team setting. And then, you can take that tool or that set of tooling and just give it to everyone in that space, and then they can automate their processes and, honestly, save themselves money and time.
Most of these organizations don't understand free and open source software. They don't understand, best practice. And so giving them a little bit of that as well as figuring out how to do hand off in a reasonable way are, like, the keys to success here. And you can make a ton of impact just by doing, like, the basic industry standard type of things and just standard software engineering. I know it sounds a little boring, but, like, it means that you probably already have the skills to be super helpful without having to, like, go and research a ton about that specific organization and that specific problem domain. You can just go and be impactful. And, honestly, it doesn't take very much. Like, these people don't need, like, all the bells and whistles and all the optimizations and all of the like, it's not you know, you can build these really simple high level products and then deliver a ton of value. Another or last thing that I'll touch on actually that I forgot about. So search is something that doesn't exist across a lot of these things. So just bringing data together and being able to make it searchable is also a huge value add. So if you can elastic put things in Elasticsearch, then that just, like, is a game changer for everybody. There was a third thing I was gonna discuss. Oh, yeah. Other people's work. So this will be the last thing. So if you're a user researcher and or or something, but you're in the tech space, you can also help out too. When I was with the White House, a ton of user researchers would go and just, like, figure out when people were working on the same problem and then bring all that stuff together. So that's, like, a good example of some other cases where things can be done in a more obvious way. And then project managers just applying, like, general project process in a standard way can be super helpful to a lot of these organizations because they don't know how to manage software teams, and they don't know how to manage automated systems most of the time. Or if they do, their skills are degraded over years or something. That not to say that these people are not brilliant or capable. They're all really smart. It's just that government moves a lot slower, and so they haven't been able to keep up with all the technical revolutions that have happened in the private sector. Yeah. As technologists, it's often easy to forget about how
[00:11:57] Unknown:
some of the simplest tooling that we use in our day to day is completely revolutionary for somebody who doesn't work in the space and who is used to doing those manual processes. And also 1 thing that I forgot to call out when we first started the show is that this is actually your second time being on the podcast. And so if anybody is interested in hearing more about the work you're doing with the New York district attorney's office, then I'll add a link to the show notes for that episode as well. That would be great. Yeah. That was a super fun podcast. I really enjoyed it. And going back to your point too about version control as engineers, we're used to using it in the context of managing our software, but there's also a lot of value to be had in terms of just managing arbitrary text documents or even PDFs just so that you can have even just an audit log. So it's an easy way for people to just, well, comparatively easy, to just keep track of changes that are happening in documents even if you're not able to maybe go into GitHub and view the diff directly, you at least have a way of going back in time to compare, you know, even Excel documents or PDFs.
[00:13:00] Unknown:
That's absolutely right. And being able to do that type of thing is really, really hard for all these organizations, especially if they're using I mean, there are a lot of great proprietary tools that or I guess GitHub is technically is a proprietary tool. There are lots of tools that can do this well, but, like, a generalized version control system is just gonna do it, like, in so much better in such a centralized way. Like, yeah, you can definitely treat it as code, but, like, if you treat it as just, like, work product, then being able to just understand where institutional knowledge is and not having it disappear when someone leaves or quits or gets fired or dies, you're just gonna be more sustainable as an organization.
[00:13:36] Unknown:
And 2, it goes back to the whole 80 20 rule of 20% of the effort that you put in is going to produce 80% of the value necessary to make it easier for people to do their work and be more efficient so that they can free up time spent doing manual tasks for more value add projects that they can do with their time. Absolutely. That is 100% a thing. Yeah. I think, like, another place, like, just to call it out, versions of Excel documents
[00:14:02] Unknown:
are a really big deal. And, like, there's Google has, like, a a versioning system for Excel documents, but, like, I I think it it doesn't I don't wanna say it's lacking because it's not lacking. It's a great product. But, like, it's a lot easier to do version control on a thing like git GitHub or git where you can just, like, see the previous versions in a really obvious way. And I'm not trying to bash them. They do a lot of great work, and it's an awesome tool. And you can do a lot of crazy stuff in Google, like Excel documents that you can't do or Google, spreadsheets that you can't do and regular Excel documents that are local. But, yeah, I think having the the ability to version control these things in an obvious way is is super important. It's added a ton of value.
[00:14:39] Unknown:
And so on the topic of these small projects or even just raising awareness of tools that we take for granted to make people's jobs easier, Are there any particular types of projects or tools that you have found to provide the most value compared to the amount of effort required to get it up and running? I feel like Elasticsearch
[00:15:02] Unknown:
definitely provides, like, a ton of value. GitHub provides a ton of value. Trello is, like, a godsend. Like, I don't mean to, like, be, you know, pimping people's, like, stuff or whatever, but, like, Trello is so easy to use and, like, it's just so easy to explain scrum to people when you're just, like, oh, there are cards that you move around, and then you don't have to communicate about every single task someone's working on and you just kinda know. And, like, anything that really is gonna be a conversation piece is gonna be helpful. And Trello is just my personal favorite version of this. I think those those, make things a lot easier. I mean, you know, incorporating Python into the stack is gonna make things better because some of these shops will still be using, like, you know, like, older languages that aren't as good. Like, I've seen examples of Fortran and Assembly still in some places. I've seen weird proprietary languages that I don't even remember, like, the names of in some of these places.
And they're just they're crazy, and it's like, why why aren't you using standard tools? I mean, like, you know, and that really is just gonna come down to some people's flavors and stuff. But, let me think of anything else. Yeah. I think that's that's a good small list. There's there's definitely far more of these things, but I think something in each of those categories, like, something that makes it easy to understand what your data is doing, sort of like dashboarding thing, a search tool, a process management tool, and a version control tools, kinda like the the small representative set that you that you need as a basis.
[00:16:41] Unknown:
And with Python in particular, because of the fact that the syntax is generally very easy for people to pick up even for people who aren't necessarily to tie to tie things together and get things done quickly. I think that makes it a viable option for introducing into an organization, you know, building these tools in Python so that people who have a desire to go in and maybe modify or tweak certain aspects of it still have the possibility of doing that versus if you were to build things in c or Haskell or some of these more complex languages?
[00:17:20] Unknown:
Absolutely. I mean, just like the number of packages and the community that's exist in the Python landscape makes it sort of like a de facto standard if you have limited time and resources and you need to write something both quickly and inexpensively. Also, it's such a accessible language that you can usually, like, get someone, like you said, up to speed really fast. So it's it's I think it is, like, the most viable option for most of these smaller organizations and even some of the larger ones, you know, that, like, need the help. And so in terms of
[00:17:50] Unknown:
picking where you spend your time or the types of organizations or causes that you, want to get involved in, do you have any advice for picking out an organization or group that will be able to benefit the most from your technical expertise and provide the most real world value?
[00:18:07] Unknown:
Yeah. So this is an interesting question because there's some level of subjectiveness here. So I personally, either intelligently or or in some ways, always like to go where people's lives are on the line when I'm choosing a social justice project. So I'll try to work with either survivors of trafficking or people that are dealing with homelessness issues or both. You know, whenever whenever there's I I try to put my my effort personally into places where I know that I'm gonna be adding value and that it's important that things be correct because, you know, my time is a limited resource. I'm not gonna live forever nor do I have an unlimited number of hours in my day, and so I wanna be helping where the help is going to translate into a real value. That being said, there's actually a lot of upstream processes and things that stop people from ending up in a situation where their lives are on the line and only working on that part of the problem doesn't completely fix things, then you're always in firefighting mode. Because it's it's intuitive. It makes sense. It's like, hi. If I if this does if this gets better, then fewer fewer people will die or their lives will be significantly of a higher quality. But, actually, it's really usually a death of a 1000 cuts is how you end up in that bad situation in the first place. So, you know, I think mentorship organizations, like, in New York City, there's a thing called iMentor. I don't know if it's a sales where I imagine it does. All the various clubs for kids and things where you can receive mentorship, those add so much value to a lot of these people's lives so that they don't end up in a bad situation in the first place. So I think mentorship is 1 of the places that honestly gets some help but could always use more. I think disaster relief is gonna be key for the next century, and it's gonna become increasingly a problem. So, you know, just as a problem area, I think it's important to work on that. The energy space has always been exciting to me, and I think it's something I don't spend enough time on. But figuring out how to fix sort of, like, archaic processes in the energy space is really interesting.
So but but generally speaking, you want an organization that's well functioning, that that's gonna be capable of taking the help that you give. Like, you can go to organizations that are best in breed in some some sense. Right? And they're, like, really good at their problem area and domain expertise, and maybe they have a lot of clout, but don't necessarily have a technical footprint or or any appetite for technology, and you're gonna be fighting an uphill battle. It doesn't mean you shouldn't work with those organizations. It just means that it's gonna be a harder a harder sell, and you're gonna have to prove more value at every turn, and there's gonna be a lot more teaching. So it really is this value proposition of how much time do you wanna spend, how much time do you actually have that you can spend given, like, you know, other constraints on your time, be that family, friends, work, whatever, and then budgeting appropriately or finding a team that can distribute that work you can distribute that work across. So when you're choosing your project, the key is you wanna really provide value for whatever clients or organization you're gonna be working with. And so if you don't have the time and resources, then you're not gonna you're gonna end up wasting their time. You're gonna waste your time, and nothing's gonna get done. So making sure that you can be successful before you move into, a new project or start working with a new client is always important. But the way you figure that out is through this discovery. Like, I would say just reach out to everyone who's working in the space that you're interested in. And then through conversation and starting that work, you should decide whether or not it makes sense from both perspectives to to work on this. Or if there's a resource thing or how to communicate out, you know, when changes need to be made or whatever, and then allowing yourself to fail fast. Like, I've tried to have collaborations with many, many, many institutions over the years, and it hasn't always worked out. Sometimes it works out great and it's amazing. And sometimes it does not work out at all. And being honest about the failures is important. Right?
So so other people don't make the same mistakes. And so, I mean, I'm not gonna call out names, but I'll just say that being okay with failing fast is is okay. Right? Like, if you come into it trying to help people and you don't end up being able to accomplish that, freeing up the resources that were spent on the conversations you're having with that organization as fast as possible is really important if you find out that there's just not gonna be a fit and it's not gonna be possible. So good signs to look for, though, signs of success. If they already have any technologists on board, they have a technical team, even if it's just IT people, someone that you can talk to who's technical, who can carry the torch forward internally in the organization, and also can explain things to the managers because they're working there full time. So, really, this is kinda like somewhat staff augmentation, somewhat just trying to be helpful. And then you know? Or an extreme appetite from the the the, organization to to grow in this direction. So this can either for local government, it can either be, like, a mayoral initiative or something else or, like, maybe they throw a hackathon and you meet them through that. So there are social justice hackathons in a lot of places, and if you can get access to 1 of those, that's usually a pretty strong indicator that the organization wants to move in a more technical direction. Then just follow-up a ton, and that can lead to a ton of a ton of value added because, you know, they're already seeking it out, they're already primed for this, and then they're ready to to have successful collaboration. That's worked out pretty well. If they have technologies on board, that usually means that things are gonna go well or at least there's the opportunity for things to go well. It doesn't always mean that. And then, if an organization has just gotten a ton of funding. So, like, a lot of organizations that work on, news literacy have gotten a ton of money since Trump was elected, and so working with them can be very effective nowadays. But, you know, like, basically, this is a little bit fad driven, but basically where people throw their money does determine what kind of resources they're gonna have available to them. So if their staff grows, you can contribute more time because there's more resources available and stuff. And then usually technical requirements are more likely needed. Right? Like, so if you've got more more problems to solve because you've, you know, you've got more money, you've you're being working on something bigger, then it's more like you're getting something to be on me. And also that type of opportunity is a good way to take advantage of the increased attention so that you can build something that will be more long lasting and provide value after that attention starts to fade. Yeah. So the key is, like, not everyone will throw money at a problem forever. And so in lieu of people leaving to go work on the next sexy thing, right, while they still have the resources, you build up that automation, and then it's okay if people leave. This isn't replacing jobs. It's just allowing jobs to not be needed at such a high quantity when when volume goes away. That's exactly right. And so the company or organization continues to do the good work that it's doing already. And a lot of the tools and systems that get built for various nonprofits
[00:24:24] Unknown:
or organizations focused on social justice or organizations that are helping to bring about effective change in the world will often require some measure of data being fed into the system to be able to provide useful outputs. So I'm wondering if you have any advice on methods that you can use for identifying or locating or collecting that information to be able to supply that information into the into these products.
[00:24:52] Unknown:
Yeah. So this is sort of at the the key or or or the scent the center of a lot of data science for good work. This is, like, the fundamental question that I think everyone's figuring out. Now I obviously have an answer. Other organizations will have different answers. So, if you hear something different elsewhere, we just haven't all agreed that this this is the right thing. So I'll talk through some of the ways that I think about this, and then I'll talk through what I've seen other organizations do that try to be helpful in the space. And, you know, you can sort of pick and choose what what you think is best for you. So okay, so when you're dealing so there's really 2 so you you talked about 2 things. You talked about public, and you talked about nonprofit. So we'll talk about let's talk about, government data sources and things, and then let's talk about the the nonprofit space and the the data that's useful for them. Within the public sector, usually, it's being able to bring together, other organizations within that same local or maybe even, state or national level, equivalent. So I'll be specific. So let's say that we're dealing with the Department of Education, but they need housing data. Now within New York City, for instance, there's this mandate where everyone needs to have open data, access to all their datasets, anonymized, of course, by some, like, deadline. I don't remember what it is or if it's passed or if it's been accomplished yet already. But, basically, the point is that they wanna have folks have access to government data, because they're citizens. Right? Like or or they pay taxes for that that work to be done. And so opening up that data means that the citizenry can actually take charge of their own data, do some statistical analysis or whatever. But also it's a natural mechanism for government agencies to talk to each other. So some of the tasks that revolve around these sort of open data initiatives that you can do as, like, a citizen developer or data scientist, and it involves, so first and foremost, cleaning the data and then handing it back to the agency.
A lot of agencies have no idea what clean data looks like. They've never done it before. They don't really understand how data science works. They're just like, oh, I just use this Excel document, whatever. And that's kind of it. And so being able to clean and structure the data appropriately, if it's just free form text, which they sometimes will just drop into CSVs, or maybe there's a ton of missing values, maybe there's a ton of, columns that aren't well labeled, so creating a data dictionary for these folks. Being able to normalize things or regularize things if some of the so, like, sometimes addresses will be put in multiple ways and creating a normalized way that addresses get put in your dataset can be huge value add to these organizations because then they, like, don't just go from releasing this dataset. They're like, oh, well, now I actually, like, understand what's in my data because now I can search for this 1 street. Oh my god. Right? So, like and then, you know, that can lead to a ton of of insights, just doing that data cleaning for them.
Not very sexy, though. The next step is usually bringing that data together. So maybe if you're a department of ed, going back to that case, and you need housing data and maybe you need public works data, maybe you need parks data, bringing it all together into 1 dataset that that the department of ed might be able to use to figure out when likely truancies are happening or, I don't know, any number of things can be really, really cool, and creating a unified dataset that does all the sort of, like, integration work for them can be super valuable. Now as you can guess where I'm kind of going with this, the next step is to do actual analysis. So just starting with exploratory data analysis, you do that, people will get so excited. They'll be like, oh my god look at that graph you made! It's got all the points! This is crazy! Because most of these data sets, right, like they're so either so big because they're over years or they're so, like, messy that they have no way of getting actual value out of these things. But if you can do actual, even exploratory data analysis or in some cases modeling, either a classification or regression analysis or some time series thing or some geospatial thing or whatever, if you can do some analysis on this dataset and actually show that it provides real value that, like, are key questions that people actually care about, then they're gonna, 1, take it far more seriously in the future, and then they'll be able to make better decisions, which is the point. A lot of local government agencies do not have the resources internally to do a lot of this stuff. And the problem is when they pay contractors to do work, is that the contractors provide usually a specific service that they specialize in. And they're not looking to provide value in a general way, they're looking to provide products. This way they can scale out the amount of money that they get. And so they'll solve, like, a specific use case. Maybe they'll include a few datasets, but they're not really looking to answer interesting questions or help you think of new things that can because they're being paid for a specific service, they don't want to take on too much work, etcetera etcetera, they're resource constrained, all of these reasons. So, you know, really, the best way to add value to local government is from the citizens. And, honestly, nobody's citizens care more about this. Right? Like, if you really wanna understand why a specific thing is happening, then you can add a ton of value just by bringing a bunch of data sources together and starting them on the path of doing data analysis.
So I'll give, like, a really specific example is or a specific use case of something that I did is we were looking at truancy rates across, local government. It turns out that truancy gets split out a bunch amongst a whole bunch of metrics and there's, like, specific things that you can look for. Anyway, long story short, they were reporting truancy about once once a month, and that wasn't enough to predict what truancy rates would be in the future. And so what I did is I just did, like, this super dumb time series analysis where I was like, oh, if you interpolate and, like, triple the data points, then you can actually build a predictive model. And they were like, oh my god. This is amazing. We can totally have people report more often, then we'll have the actual numbers instead of having to do this interpolation. I was like, great. There you go.
And so this is like, a specific example of how I bought, like I didn't even use alternative data sources. I just had 1 data source, and I brought it together, and that was all it took. And it had a ton of value for the organization in, like, a really, really short order, and that was just, like, at a hackathon. But speaking more generally, like, whenever you can bring public data and and just provide, like, some specific value to it, and then it's gonna make these people's lives better in a way that they probably even thought about. Plus, fresh pair of eyes is always good, on data problems. So now that we've talked about public sector things at the local level, federal's a little trickier. Federal people, like, definitely are very collaborative, but it's you know, most of them are in DC, so it's kinda hard to get them to, like, come out to your specific town. If you live in DC, there's a ton of stuff to do. If you don't, then it's kinda tough. But, talking about moving over to the nonprofit space, usually, you're gonna wanna scrape either government data because it's usually the best place to find information about a thing, or you'll wanna scrape, like, competitor data sources, on on, like, a specific domain area. And just kinda bring all that together, and then you share it out to all the organizations.
So not really competitors in the non profit space, but really just like about having information sharing. Because a lot of these non profits, you know, are extremely siloed. They, sometimes they'll talk to each other and there's some specific, sub cases where the organizations are extremely collaborative. But most of the time, these organizations do not talk to each other. And, you know, they just solve these really niche problems. And that's fine because everyone deserves and needs, like, a ton of help. But, like, if you can't learn lessons from other people in the space, it becomes a real challenge. And so just being able to break down those silos can add a ton of value just by scraping their open website, you know, maybe their initiatives, their goals, all that stuff. So those are the 2 ways that I really see and then, you know, you just go through the standard workflows again.
So 1 of them, you know, you're scraping CSV's. The other 1, you're probably scraping HTML. Both ways, you're pretty much just like pulling in a bunch of data just to generalize out. You're pulling in a bunch of data, and then, you're pretty much just, like, showing them what that data can be useful for, and then that's how you're able to provide value for a lot of these these organizations from a data feeding perspective.
[00:32:46] Unknown:
And when you are building these projects or introducing these new practices to various organizations, are there any particular design factors that you should be be considering when you're building these projects or teaching people about these things, particularly given the fact that once you're done with a given task, you're likely to move on and they'll be left without the assistance of an experienced engineer to be able to keep these things running and maintain the momentum.
[00:33:13] Unknown:
So this is the most important problem for people that work in the social justice space on the weekends or nights. And thank you for for asking me this question. I think think it's the most important 1. So, like, I think I have a process that works pretty well. Maybe there are other people that have better processes. I'm sure there are. But, like, I'm reasonably happy at this point with the process that I have for for doing this. And I'm gonna walk through it step by step. I think this is the hardest thing to figure out though for folks. So but anyway, so let's let's let's walk through it. Okay. So I'm gonna talk at a high level about the the individual steps, then I'm gonna go through, into each node and talk through all those. But first, I'll start with, like, just the general highest level possible, thesis for for working in a space. It's about handoff.
It's about building technical capacity at the organization so that you can move on because you're not gonna work with any given social justice organization forever. There's a ton of reasons for this, but it's just you know, some relationships will last a long time, but it's just it's not sustainable to think that you're always gonna be around or on call, especially if you're not getting. And so just being very upfront about it, like, you've gotta expect that hand off is coming. So let's walk through the specific steps now or so let's talk through the high level notes. So, the first 1 is project calling and preparation.
The second stage is software development and, like, updates and communication, and the 3rd stage is hand off. Now within the first stage, what you're gonna wanna do is you're going to just really have a conversation with the people that you're building this new thing for. You're gonna either talk to them because you already built something, and you're gonna dispel the notions that you have and be like, okay. Is this actually useful for you? If not, what should we actually be building? And then you're gonna teach them. You're gonna give them, like, a Trello board. This is why where Trello usually comes in for me. Or some sort of scrum management tool, and you're gonna write out features, and you're gonna write out timelines, and you're gonna write out schedules. Not, like, in a in a specific way, not in a WaterFolly, like, let's design this whole system before we build it. But in just, like, a high level, these are the sorts of things that you should expect to be in the product and really getting them familiar with the vocabulary of what you're gonna be building. You don't establish hard deadlines. You don't establish hard timelines.
You establish sort of, like, loosey goosey things, and you and you have to be very upfront with them about it. Be like, this is not gonna be done in 2 weeks. I have no idea how long it's gonna take right now. Right now, we're just trying to establish how typically to how long it typically takes to do these things and give you a sense of, like, what's possible, what's feasible, what requires more resources, all of this sort of stuff, and just giving them a sense of how product management kinda works. Then once you started to talk through some of the high level ideas, some of the, essentially, like, the product board, you could say for the for the product, and then you broke broken out some really high level features. Then what you do next is you start actually developing things. Now this is probably the most important thing. Oh, actually, there's 1 more thing. And at at the first stage, you prepare them saying, and at some point, we're gonna hand this off. So you should either have a contractor on hand, or you should hire a software engineer, or you need to, you know, have, like, some contractor firm that's gonna maintain this system once once once I move on or the the team that I'm working. So you set that expectation of hand off upfront. So it's very clear to them that it's not gonna be forever. Then when you're actually building it, I think that what I've been doing is I work with 1 colleague and 1 and I work for a time bound amount of time. For me, 2 hours is ideal. It's usually on a Sunday because you need Saturdays to recover from the work week. Do not do it during the work week. That is a recipe for failure. I can't do it on Saturdays personally, maybe other people can. But, like, if I code 6 days in a row, I die. I just can't do it. So Sundays are when I I work with my coworker or my my colleague. And, we just we do things for 2 hours a week, and we get as far as we get. Some weeks, we make a ton of progress. Some weeks, we cover almost no ground, and it's extremely variable. And that's fine because at the end of the day, you're providing a volunteer service, and so it's okay if things take longer. The key is after you've done the work so there's 2 keys here. 1 is after you've done the work, make sure you communicate it out and be like, okay. Here's the status. Here's the update. Here's what we did this week, you know, and here's what we're gonna work on next week. And this gives them an opportunity to say, no. No. No. Don't do this other thing next week. Do this other thing or whatever. But it's it it it it furthers the conversation and shows that you're keeping up. And then the other thing is I think it's important to pair on everything you do because you're gonna burn out. So 1 week, you're programming. 1 week, you're just you're just there as, like, moral support and to answer and to, like, you know, talk through problems and basically just be a rubber duck. Now this is not efficient from a time perspective in terms of, like, you're literally doing 2 engineers to do 1 job, but it's the only way you're gonna not burn out on the on the specific project or product or or or thing you're working on. Because burnout is real in the social justice and volunteer space, and it's very easy to have it. I've burned out so many projects. And so this is the only way I've been able to find to make it sustainable, especially given the rigors of how hard it is to be a software engineer. So once you've established that and so, you know, usually, pairs of people will work together, for, like, whatever. And then you have 1 core person who's gonna act as sort of, like, the lead or maybe a team of of, like, 3 or 4 people at most who are gonna act as the leads and communicate directly with the clients. So this way, people doing the line tasks don't have to worry about, you know, client interfacing. Also, clients can only handle or or or non products can only handle or really anyone can only really handle working with, like, a couple people at a time, and they don't wanna see, like, a ton of new faces all the time. It just makes things too complicated for them. And so establishing a couple poor people that are gonna be on the product from start to finish creates a seamless experience. And then as developers roll on and off as they have time, it's not as big a deal. And then the the, the nonprofit or the the local government agency doesn't lose faith in the product overall because they're consistently seeing the same person, you know, maybe like once a week or whatever. Then the last phase is hand off. So this is where you should be documenting throughout. Good documentation is key, but this is the phase where you're handing over the documentation, you explain everything that's in the documentation to to the to the contractor if you haven't been already. Honestly, really, you should be, like, explaining things, and you should be working with them on the product board, on the Trello boards every single week.
You know, you shouldn't just, like, go off into a a a corner and then come back in 3 months with something. You should talk like, if you're not talking to them once or twice a week, then you're then you're then, you know, chances are or maybe once every 2 weeks for certain certain organizations, you're not gonna really be providing value. And then once you've done all that, then you can move on to this hand off phase where you help them, you know, vet and find contractors to take over the work or vet and find software engineers. So you teach them how to hire for for that specific skill set. And, you know, you can usually get pretty decent people because they just wanna do the work. And, you know, you've already provided a ton of value, so you've created that culture where software engineering is seen as a valuable thing internally.
And they don't have to pay as much for contractors usually to be able to get the best value add possible. There's a ton of reasons to do this. And so that's that's really, I think, like, the the right way to engage with these these kinds of organizations and how to how to do the best practice. And this is like this typical the typical, like, you know, this is how you work if you're a contractor and you're doing consulting things for a larger organization. But, like, it works super well for for, for these nonprofits. And it's really just, know, it's about hand holding a lot more, but it's the same the same fundamental idea, country in the world. And I know that for people who are interested in finding
[00:40:41] Unknown:
organizations to volunteer with or finding topic areas that they can contribute to, I am I know that you have a list that you have shown in some of your talks, so I'll add links to that in the show notes. And for anybody who wants to get in touch with you and follow the work that you're up to, I'll have you add your preferred contact information to the show notes. And so with that, I'll move us into the picks. And this week, I'm going to keep it simple and say shoes without laces because being able to slip your shoes on and off is awesome. And then so now I'll pass it on to you, Eric. Do you have any other picks for us this week? So we hung out at open data science conference. If you haven't checked out Catboost,
[00:41:18] Unknown:
which is just the coolest package in the world, please do so. It is amazing. I gotta meet 1 of the core engineers, this woman, Anna what was Anna? At the conference. And it is just such an amazing tool. I've used it in past jobs, and it got better performance than some of, like, really real some of the other really amazing packages just because it's, like, so well trained for categorical variables. So that's, like, what I'm gonna say is my first major pick. And then the second 1 I'll call out is pomegranate, which is, a library for doing sort of like distribution fitting to your dataset.
And the author of Pomegranate actually was able to show that he got better performance using Naive Bayes than scikit learn's version of, support vector machine on some specific problem because he prefit the data to this right distribution and then was able to figure out the, like, perfect boundary condition, which is just insane because this Naive Bayes algorithm is in no way better than sport vector machines. So those are my 2 picks because they're amazing tools and they don't get enough love anywhere near enough, and you should go use them immediately if you haven't already. Alright. Well, thank you very much for taking the time to join me and discuss ways that people can get involved
[00:42:35] Unknown:
with real world initiatives and organizations that will help them become realized. So I wanna thank you for that, and I hope you enjoy the rest of your day. I hope you do too. Thank you so much. This was super fun.
Introduction and Guest Introduction
Eric Schless: Background and Experience
Discovering Python and Its Impact
Using Engineering Skills for Social Justice
Case Studies: DA's Office and Federal Government
Automating Manual Processes
Building Tools and Giving Talks
Version Control and Documentation
High-Value Tools and Projects
Choosing Organizations to Work With
Collecting and Using Data
Design Factors for Sustainable Projects
Finding Volunteer Opportunities
Picks and Recommendations