Data Retriever with Henry Senyondo

The podcast about Python and the people who make it great

12 August 2017

Data Retriever with Henry Senyondo - E122

0:00/0:00

Share on social media:

Summary

Analyzing and interpreting data is a large portion of the work involved in scientific research. Getting to that point can be a lot of work on its own because of all of the steps required to download, clean, and organize the data prior to analysis. This week Henry Senyondo talks about the work he is doing with Data Retriever to make data preparation as easy as retriever install.

Preface

Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
I would like to thank everyone who supports us on Patreon. Your contributions help to make the show sustainable.
When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at www.podastinit.com/linode?utm_source=rss&utm_medium=rss and get a $20 credit to try out their fast and reliable Linux virtual servers for running your awesome app.
Visit the site to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch.
To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.
Your host as usual is Tobias Macey and today I’m interviewing Henry Senyondo about Data Retriever, the package manager for public data sets.

Interview

Introductions
How did you get introduced to Python?
Can you explain what data retriever is and the problem that it was built to solve?
Are there limitations as to the types of data that can be managed by data retriever?
What kinds of data sets are currently available and who are the target users?
What is involved in preparing a new dataset to be available for installation?
How much of the logic for installing the data is shared between the R and Python implementations of Data Retriever and how do you ensure that the two packages evolve in parallel?
How is the project designed and what are some of the most difficult technical aspects of building it?
What is in store for the future of data retriever?