Doing Dask Powered Data Science In The Saturn Cloud

Hello, and welcome to Podcast Dot in It, the podcast about Python and the people who make it great.

When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode.

With the launch of their managed Kubernetes platform, it's easy to get started with the next generation of deployment and scaling powered by the battle tested Linode platform,

including simple pricing, node balancers,

40 gigabit networking,

dedicated CPU and GPU instances, and worldwide data centers.

Go to python podcast.com/linode,

that's l I n o d e, today and get a $100 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show.

Your host as usual is Tobias Macy. And today, I'm interviewing Julia Signell about building distributed processing work flows in Python through the power of Dask and her work as the head of open source for Saturn Cloud. So, Julia, can you start by introducing yourself? Hi. I'm Julia.

So I'm the head of open source at Saturn Cloud as you mentioned. So what that means is I spend half my time working on, like, the engineering side of Saturn Cloud, building the platform that we have as a company, and then it's been half my time working at open source

projects.

So I'm a maintainer on Dask, and then I also work on several other open source projects in a more minor role. And do you remember how you first got introduced to Python? Yeah. I was trying to think. I mean, I took, like, intro to comp sci in college, and that's probably when I first

started using Python. But when I really started using it properly

was when I got out of college and I was working in hydrology labs, and I was doing data management and visualization.

And

the Py data ecosystem was just, like, huge for that. It was so helpful. Xfinity, in particular, was a really

helpful library that I ended up really relying on. And so, like, that's how I really started getting into Python

in the context of

visualization

and management of data in those labs. As you mentioned, now you're working with Saturn Cloud, and I know that the overall intent of that platform is to simplify some of the work of doing data science in Python. I'm wondering if you can just give a bit of an overview about what it is that you're building there and some of the

overarching goals for the platform and who your target users are? Yeah. So Saturn's a platform that runs on AWS,

and it provides easy access to DASK and Jupyter, both on CPU and GPU.

So the people who we're targeting

are really people who deal with data.

So engineers,

analysts, scientists, whoever that may be, and people who are comfortable in Python. So people who who are looking for to use Python to deal with large datasets.

So

the platform is purposely not trying to be anything too fancy. It's we're just really trying to use the most used, the most loved

tools

to

create a space where it's just really easy to do your work. So we use Conde and Mamba

for environment management.

We just use regular Jupyter. We don't have any special

variation of for flavor of Jupyter.

Regular Dask, you can deploy Flask apps or Vologda

dash panel dashboards.

So it's meant to feel really familiar

and easy to work with for

these people who are already familiar with this whole Python data science world.

And in terms of the

sort of building blocks for the platform, you mentioned that you're leaning heavily into sort of the core elements, and I'm wondering

what the

sort of overall approach is to being able to onboard new tools or

new workflows

to the Saturn cloud system. If somebody comes in and they've got some esoteric Python dependency that maybe isn't part of the out of the box Anaconda distribution

or

somebody who's maybe trying to do something that is

sort of not the

most common approach and workflow for being able to maybe do some sort of deep learning training or some complex aspect of their overall workflow, flow, sort of how that factors into what you're building at Saturn and sort of the product direction that you think about? Yeah. So

Saturn is super flexible. Like, you just get a Jupyter notebook. You can fully customize your image. Right? So you can use our UI to

put in a content environment dot yaml or a PIP requirement, so TXT.

But you can also just build your own image and point us to that, and then you can use that as a basis for your Jupyter.

So you have total flexibility in terms of what's in your environment,

and then we don't restrict at all, like, what types of things you can deploy. I just rattled off a list of, like, you know, different dashboarding solutions.

But

there's nothing specific about those that's built into Saturn. It's just we have the notion of deployments.

And to support that, we expose a port. So,

basically, like, if you know the right arguments for your whatever best API, then you can deploy it in Saturn.

And similarly,

if you're trying to use a specific package, you can either create an image that has it installed or you can install it in a start script that runs right at runtime.

And then given your focus on data scientists and the work that they're trying to do and trying to simplify some of their overall work and the deployment story there, I'm wondering if you can talk through some

of the common roadblocks

that data scientists

and data professionals run into as they're trying to iterate on their experiments, and I'm sure most of them are running it locally on their laptop or workstation due to some of the complexities that arise from trying to either

build out a complex

machine learning model or deal with larger volumes of data and just some of the

difficulties that they encounter in that process?

Yeah. So there's a couple things that people run into that are specifically solved by distributed computing and, in this case, DASK. And that's running out of memory

and just time of computations. So just

things taking a long time. And that's something that any distributed library will solve, right, because it's able to

access

a large number of nodes and spread the computation out, And that solves your memory issues and hopefully make things go faster.

So those are 2 of the big issues that people come to us for. The others are

reproducibility.

So this environment management that I was just talking about of bring your image or we'll build you an image, that's something that people really struggle with locally, especially when they're going from running things locally to sharing them with a small group of people, maybe, like, coworkers or something like that. It can be challenging for people to create those reproducible environments.

The other big things are we give access to GPUs, which a lot of people don't have locally, and that speeds things up. And then the other last thing is colocation of data. So if your data's on s 3,

we run on AWS, and that is often a better idea than

running things locally and pulling the data from s 3. So you can do your computation sitting right next to your data, and you don't have to do all the transfer in ingress and egress. And then as far as the

scalability and parallelization

aspects, I'm wondering what are some of the

potential issues that people might run into as they're going from I have this running on my local machine. Everything works great, but now I'm just starting to run into memory issues because the volume of data I'm processing is growing.

And now I want to run this across a desk cluster and the in aggregate, the overall memory is larger, but now I need to figure out how to be able to chunk this up appropriately and how much of that is something that they need to factor into their overall code and approach, and how much of it is the sort of magic of DaaS that figures out how to distribute things for them? So the goal is that the magic of DaaS takes care of it. Right?

The goal is that you don't have to think about it much at all.

And take the example of data frames. Right? If you're using Pandas,

the goal is that you can just replace import pandas with import DaaS data frame. That's true in some simple cases, but oftentimes, people find that they need to be a little more so let me take a step back. So Dao's data frame mimics the pandas API. It tries to recreate

most of the surface area of the Pandas API, but instead of doing eager computations like Pandas where you get the result immediately back as soon as you call the method,

you get you get a task graph that can then be optimized and computed later on with the

whole function distributed across a cluster.

So that's a different thing. Right? Like, the thing that Dask is doing is different than what Pandas is doing. And so sometimes

that becomes a problem for people or it's something that people have to deal with. So like you mentioned, people might have to think about how to chunk up their data or think about when they want to set the index of their data or when they want to, like, maybe trigger a partial computation.

As you get more into using DASK,

there are best practices. The DASK docs have really good documentation about the best practices for the different APIs.

I've been talking about the data frame API, but there's

Dask also has a NumPy like API,

Dask array, and there's several others as well. So there's sort of best practices for each of those. But

by default, DASK will try to do, like, a best guess about what your chunk should be or, you know, what your partition structure is based on whatever file type you're reading from. It's a long winded answer,

but ideally,

Dask handles the magic. Sometimes you need to think a little bit about what's actually happening and dive in a little more deeply as you get more involved or if you're trying to do slightly more custom operations.

But

as Dask is developed, it's getting closer and closer to doing the best thing, by itself. Like, every time someone tries to submit a little patch or something that makes things better for their workbook, we try to think if there's a way to, like, generalize that to make it better for everyone. And I think there have been some big improvements recently in memory management, and

the goal is to make it more and more so that the magic just works.

We've been discussing things like Diask and Pandas and Jupyter, and all of those are very well established open source projects in the overall Python ecosystem.

And I'm wondering if you can talk a bit about some of the ways that open source factors into the overall architecture and strategy of the Saturn cloud form and some of the ways that you're engaging with the broader community to identify

what are the useful components to bring in, what are some of the ways that you can collaborate with the overall community and sort of try to kind of grow the ecosystem?

The SoundCloud product is built around the idea that people already

know and love these open source libraries, and we did that very seriously. So we want to make sure that we are giving back.

So the way that we do that right now is that I spend half my time,

DAS maintenance,

and that's not related to specific Saturn Cloud

agenda or anything. I just do bug triaging and issue handling.

A lot of what I focus on is maintaining compatibility

between the Pandas API and the DAS data frame API

and just sort of, like, the low level

grunt work of maintaining an old library. Or, you know, DASK isn't that old, but in these in this world,

that's how we are foreseeing

our contribution to open source that will contribute back to these packages

that are the most essential ones within that our users use. So I contribute to Dask a little bit to minor stuff, maybe to pandas or NumPy.

And then, hopefully, we'll be able to expand that to maybe contributing to Jupyter or some of the other projects that are so essential to our users.

When I was preparing for the interview, I noticed that you were previously working for Anaconda, which is also well known for being a very large contributor to the overall ecosystem of Python data tools. And I'm wondering if there are any particular

lessons that you learned there that you've been able to take with you to your work at Saturn Cloud or some of the ways that your experience at Anaconda has influenced your overall

thinking about open source as a community and some of the ways that you can be a good steward of that community as a corporate entity?

So I worked at Anaconda

in 2 very separate jobs. So I worked on Anaconda Enterprise, which is probably something that maybe a lot of Eurobins haven't heard of, but it's a similar thing to Saturn Cloud where it's a data science platform.

I subsequently worked on HoloVis, which is a suite of of, like, high level visualization libraries

that build on top of other renders and make it really easy to go from data to visualizations.

So

on a personal level, during that time, I learned that I like both those things very much. And, like, the 2 pretty separate worlds. Right? Like, building a open source tool and building a platform,

an application in Python.

So that was a big learning moment for me to learn that I like to build those things. But I definitely, in my

role on the HoloLens team, definitely learned about

what it means to make promises to users about the API that you are providing on these open source tools and

how to engage with the community, like you said, how to make decisions about what to include and what to not to include, how to, you know, how to review PRs. Like, there's all this stuff that goes along with open source

maintenance

that takes a while to get familiar with. And then in terms of the actual architecture of Saturn Cloud, I'm wondering if you can talk through some of the ways that it is designed and deployed and managed

and some of the ways that you are working to

integrate the different pieces of the stack to be able to create a smooth experience for the end user?

So I can't speak to as much about how it's deployed and managed

because I purposefully don't know about that part as much. But,

essentially, it's built on Kubernetes,

and it's a tornado

app. That's, like, the bare bones of it, and then it's deployed in AWS.

We have 2 versions, actually. We have a version that people can purchase on the marketplace and deploy into their own AWS. So that's something that's good for companies. And then we have a hosted version

that we deploy where you can have a free account, where you get a certain limited number of compute hours and things, or you can be billed, you know, in a regular way.

So

that's the

core of the functionality.

And

from the user perspective,

we're just trying to make it, like, as easy as possible to spin things up and to customize

their environment to have what they need in it. So

we're trying to take all the DevOps burden onto ourselves. And by ourselves, I mean, other people on my team, not my own self.

So the goal is to really streamline that entirely out from the user's viewpoint, particularly on the version the hosted version that we manage. But even on the marketplace version, I think it's like you provide an I'm role

and

we go from there.

And then in terms of the actual application itself, you mentioned that it is built on top of the tornado framework. I'm wondering if you can talk through some of the aspects of building the application and the environment for being able to tie together things like Jupyter and DaaS and be able to manage some of the

interaction there and being able to pass the workflows from the user building their experiment or trying to build their model

and being able to handle how they sort of distribute the data, you know, upload the data or generate the data within the Saturn ecosystem?

Yeah. So it's a Tornado app with a Vue

front end. And

the user data,

in my mind, we don't really interact with the user's code or anything. But the way the DASK part works is the part that I'm most familiar with, is we use DASK Kubernetes, which is a open source DASK project.

And we send requests,

basically. We have a special

client,

a special little library called DAS Saturn

that creates these Saturn cluster objects. It's a thin wrap around the

DAS cluster object that's available in DAS distributed.

And

we just send requests

to create a cluster using a little microservice.

And

then we just rely on regular DASK protocols to move the data around.

The scheduler is exposed at a particular endpoint, and then there's a proxy to, like, make that all work. And Jupyter works in a similar way. There's just, like, proxy do, I think.

This is this is maybe not the most specific answer,

but

it's not a very heavy

system.

In terms of DASK itself, as we were discussing earlier, there is some measure of

sort of built in intelligence as to how to be able to chunk up data frames or chunk up processing to be able to handle the distribution aspect. But there might be a situation where somebody has written their code in such a way that

it is not parallelizable

until they either adopt some of the APIs of Pandas or Dask itself or sort of restructure the way that their computation is being executed. And I'm wondering if there are

any sort of learning curves that you've seen people go through of going from,

here's a simple workflow of I have a Jupyter notebook and everything works on my machine, and now I want to be able to scale it up to be able to handle larger data volumes or

more complex workflows and just being able to figure out how to best leverage the capabilities of Dask while still being able to, you know, run it locally for, you know, quick experimentation

or debugging purposes?

You can run DaaS locally as well. So that's really the best way to get these workflows running is or it's a good way at least, is to just not take it back out of Dask. Once you've started down the DASK path, you can just stay in that world. And you can use a local cluster, or you can just

not specify a cluster at all, and that'll all just work.

So DaaS data frame is 1 of the higher level

DaaS APIs,

but there's also

lower level ones. So there's DaaS delayed, which

allows you to wrap any object or function

and makes it lazy.

So, basically, if anyone has a workflow that has a for loop or something, that could benefit from a DAS delay. So there's really simple interventions that you can

take that can improve parallelization.

If you don't already have

anything that's parallelizable,

like, yeah, that might be some work. But oftentimes,

maybe there's ways you could read in partial

pieces of your data or you can, you know, find ways to partition up your workflow

in a way that that does allow it to be parallelizable. So, yeah, there's cases where you might just, like, need to do some work,

but I don't know if there's, like, common

patterns that every person who goes on that path will will run into. The other motivation for somebody adopting

Saturn is the reproducibility

angle of, I have this

execution environment running on my machine. Everything works fine. But now when I try to send this code to my coworker, they have to go through all the same steps to be able to get it set up. And I'm wondering if you can talk through some of the ways that

Saturn is able to help in some of the collaboration

aspects and some of the other motivations for reproducibility

in a given data workflow.

A lot of them are pretty simple. Like, Saturn gives you the ability to share your Jupiter or your deployment or whatever

with anyone else who's on Saturn. So

what that means is that they can then create a cone that has the same image.

So it has the same environment as yours. It has the same files in it, and so then they can they can see and reproduce and just carry on with your work. You can also attach Git repositories to

Jupyter. So if you're working from the same Git repository, you can you can connect that up and send them a link to that Jupyter that you've shared with them, and your collaborator can

create their version of it and, you know, set up their credentials and push and pull and do whatever.

So

in that regard, it makes it easy. The most important thing about reproducibility

is the environment.

Yeah. You can download you can email someone your notebook, but anytime an environment even if you have an environment dot yaml, every time that resolves, it's it's gonna be slightly different unless you've got it pinned, like,

all the way down.

So

the having 1 image

really

helps cut down on the issues that you can run into with reproducibility.

Digging a bit more into your experience

of being a contributor to the DASK project and some of the ways that that plays into

what you're building at Saturn, I'm wondering if you can just talk through some of the

specific areas of the project that you found yourself working within or some of the sort of interesting

edge cases that you've run into trying

to run a DaaS as a service and some of the weird ways that people are stressing that overall infrastructure.

1 of the common issues that I see

is people thinking that Dask can be, like, even more magical than it purports to be. Maybe thinking that they can just install Dask and that will

change things in their code

or that just by

having DASK in a GPU, that that will automatically distribute

their computation across all the GPUs. And there's actually a couple more steps that you need to do to get that working properly.

I don't take issues that our customers

run into and

go, like, file them and fix them on DaaS necessarily.

I spend my time more reading issues that people have already reported on DaaS proper

and

and addressing those as they're written.

So

the issues that I encounter are much more,

like I tried to do this merge on this data frame and

turns out, like,

this keyword argument isn't supported. Like, can we add that? Like, there's a lot of small compatibility

issues that we run into.

And trying to smooth out

the wrinkles between the pandas experience and the Dask experience is something that I'm really

interested in focusing on and trying to make that

as smooth a process as possible. And when it fails, when we can't do what Pandas is going to do,

trying to raise a warning or an error as soon as possible

to give people somewhere to go from there, some understanding of what's going on, why it's going on, and what to do next. That's really what I spend my time doing.

There are, like, gnarly customer

things that customers run into. There are some issues right now that a bunch of people are working on on DASK around

really large parquet files

and how to read those efficiently.

And that is that's super interesting work, and I think it's gonna be really beneficial to all sorts of people, especially people who have tabular data.

But I haven't been as involved in that work. There's also an effort that's going on right now, which is a ways out yet, which is about higher level expressions

and trying to figure out how to do better optimizations

essentially on DAS

data frames

and trying to

know more

about what the outputs are gonna be and know more about what each specific

task that you've chained together in your task graph, what it's trying to do so that we can do better optimizations.

Bringing up the subject of a task graph reminds me too of the fact that, you know, a lot of the ways that people are interacting with DASK

are

particularly from a data science perspective are, you know, through its compatibility layers with pandas and NumPy, but there's also just the underlying Dask dot distributed,

which is being used for a lot of systems to do things like

workflow orchestration with projects such as Prefect or Dagster,

and also, in some cases, just potentially replacing tools such as Celery for just doing asynchronous task execution. I'm wondering if there are any

interesting applications of Dask that you've seen people leveraging

beyond just the pandas and NumPy layers.

Yeah. I mean, you can basically do anything. Right? A task graph is just a dictionary.

So there's a lot of cool things that you can do just by accessing

that layer,

and

the prefect is a good example. We really encourage people to use prefect. It has a similar goal to Airflow, right, of, like, just doing this workflow orchestration like you said, but it can use Dask as an executor to achieve parallelism.

And

you can basically access Dask at any

level, and

there's people doing it. Right? Like, it's not an uncommon thing for for people to be working at that really low level. I think they tend to be people that we hear from less, like banks or people who are really,

really focused on this high high speed computation.

So I think it's a slightly less well understood

group of users.

Another

angle on Dask that I've come across a couple of times is as a replacement for something like Spark where

Spark has gained a lot of ground and popularity in the overall data ecosystem as a successor to the types of things people were doing with Hadoop and MapReduce because of its ability to

do sort of, you know, microbatches and streaming style workflows as well as its capabilities as a workflow orchestration layer for doing things like ETL processing

and then also some of the built in machine learning capabilities. But with Dask being so closely tied into the Python ecosystem, a lot of those machine learning aspects are easily filled with

Python native tooling. And because of the fact that there is a lot of the

data processing ecosystem moving into Python, Dask becomes a natural place for that to go as well. I'm wondering if you've seen any of that style of

workflow

being run-in

Saturn.

Machine learning workflows in particular are really common.

And,

yeah, DASK has integrations with scikit learn, and there's also

some integrations with PyTorch

and other machine learning libraries

that make it easier to use these things together.

I mean, the main thing that Dask has over Spark is really just Python, right,

and the whole Python ecosystem.

So,

like, if people want that, then Dask is a great

solution for them. In terms of

the

Dask community and the sort of visibility in the overall Python ecosystem, I'm wondering what your sense has been as far as the level of awareness that it has gained for people who are building machine learning workflows or doing work with data or who would benefit from this sort of distribution and parallelism,

and particularly if you have

any sense of its sort of relationship

or juxtaposition

with the Ray project, which is another project that is trying to aim for some of those same goals of parallelism and easily going from local to to distributed computation?

So first off, it's really hard for me to tell because

I'm, like,

I'm nestled deep within Py dataland.

And

from my perspective, it seems like most people know about Dask and most people like Dask, but I understand that I'm in, like, this particular little bubble.

I think that

the people who work on DASK are a lot of the same people who work on pandas, on NumPy, on X-ray, on

whatever else in the Py data world. So that seems like an advantage to me. So

there's other

little universes, right, in Python data science. It's not all Py data. But within this Py data world, Dask seems to be the 1 that

is most tied into that community.

As far as the sort of future direction

of the DASK project, and you've mentioned some of the kind of ongoing work and some of the major projects going there. But what are some of the ways that you see the project and the community evolving and some of the,

I guess,

untapped potential in the project and the community and some of the ways that it can be used?

I'm working on a

with another DAS maintainer, I'm working on a documentation

refresh right now. So, hopefully, that will

clarify where we're at and make it clear what is currently possible in DaaS. But

I think that there's definitely a a lot of potential to expand in the machine learning space, to expand in how

DASK interoperates

with other machine learning tools like

TensorFlow

or within the DASK org,

there is a DASK ML

module that that could be expanded that provides

a lot of

the more basic machine learning

algorithms.

There's opportunities for expansion there, but from my perspective,

I think there's a lot of opportunities to help people understand

what exactly is going on and try when they call DAS operations and

trying to make it so that if people get confused

when they first use DASK, they don't then, like, stop using DASK. Right? Like, I think a lot of people's experiences, they try to use Dask as a replacement for pandas,

and then it doesn't quite work the way that they expect, and that might trigger them to then stop using mask. But I try and figure out ways to catch those people

and help them, like, get back into the fold.

That's

what I hope for for the project.

And I think that looks like documentation that looks like earlier, better error messages and little stuff like that that really helps people

diagnose their own issues and gain understanding about what's going on. In your work, both in the DASK community and at Saturn, I'm wondering if you can share some of the most interesting or innovative or unexpected

I'm not sure if you're familiar with, but it's hard to define. But Pangio is a I'm not sure if you're familiar with, but it's hard to define. But Pangio is a group of scientists,

mostly geo related scientists, so

meteorologists,

oceanographers,

and things like that.

And they have been using DASK and X-ray

and JupyterHub.

There have been, like, power users of those tools for a while now,

and their goal is really to enable computation on these massive

earth science datasets

that are stored on

s 3 on

Google Cloud.

And

so they're really, like, where I first learned about collocation of data being an important thing because Pangio's ability to use these tools

to process their data

right next to where it's stored has really enabled

this whole world of open source computation

in

the geosciences

that was previously

rare.

It also allows people who maybe don't live in countries that have, like, massive

computing institutions

to actually work on their own datasets

and to access cloud compute to do that. That's 1 of the coolest

applications

of Dask that I know of, and it's definitely worth looking at.

In your experience

of working at Saturn and helping to grow the platform and contribute back to the open source tools that you're using and building on top of, what are some of the most interesting

unexpected or challenging lessons that you've learned in the process?

So I feel like I keep learning over and over again about API creep and how to try and limit the scope of

both tools like DASK

and

and applications like Saturn

and how to I feel like there's this ongoing challenge of how to

expand

functionality

in a responsible way that doesn't then create this maintenance burden

down the road and doesn't make promises

to users that you won't be able to keep.

So

that's a big part of what I think about, and that's

something that I think people don't necessarily think about when they try to go do their first contribution to an open source project.

But it's definitely

something I think about. Yeah. There's the sort of old trope of open sources, you know, can be free as in puppy, where anytime somebody adds a contribution to the project, it it's great and wonderful

until it makes a big mess.

Right. Right. And it's hard to be protective of a project like that

while still being

excited

and

interested and curious about the enhancements and improvements that people are trying to make. I think that's a real challenge. Yeah. It's definitely 1 of the ongoing issues that a lot of projects have to relearn either

from other projects or multiple times within the same community, particularly if there are

sort of generations or cycles of contributors and maintainers who are coming into the project. So as the community grows, the same lesson has to be kind of relearned about

what are the overall goals and, you know, intended scope of the project, what are the points at which we say

that is a good idea, but it needs be implemented outside of this project and not as a sort of core component of it, which adds some layer of friction as well. Yeah. I think in DASK, this issue is particularly

distilled because

it's particularly in the APIs that are mimicking other APIs

because it doesn't have to be a discussion about what the API will look like or how it'll be spelled because that's already you know, Pandas has already decided or NumPy has already decided. So So the issues are much more, like, whether it should be done and then, like, how. It's an interesting project to work on.

So,

definitely,

if people don't have large

data problems at all and they don't have collaborators, then, you know, they really don't need

Saturn Cloud.

I mean, you can always stand up a JupyterHub.

It's really a personal choice. Right? If you want to administer your JupyterHub

and stand up like a DaaS gateway,

that's great. You know, that's fine. Do that. But I think that

our goal is really to have for people who don't want that development

that DevOps burden,

that they don't have to take that on. And that stuff's all getting easier and easier, and people should a 100% go try to do that. If that sounds like a fun challenge, they should do that.

But if not, then sound of God is good for people who don't want to do that.

And we talked a little bit about some of the future direction for the DASK project, but what is in store for the near to medium term for Saturn Cloud?

Yeah. So

I talked about how we have a version that you can just buy the AWS Marketplace,

and it installs into your own AWS. And then we also have a hosted version that's really good for individuals.

We'd like to have sort of an in between, like a hosted Teams version

that makes it easy for

smaller companies maybe or groups of academics or something like that to work together without having to go through the process of setting it up in their own AWS. AWS. That's something that we're

gonna be working on in the short term. Well, for anybody who wants to get in touch with you or follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And so with that, I'll move us into the pics. And this week, I'm going to choose the movie Peter Rabbit 2. Watched that 1 recently with the family, and it was just a lot of fun, hilarious, well done movie.

Just definitely worth a watch if you're looking for something to keep you entertained for a little while. And so with that, I'll pass it to you, Julia. Do you have any picks this week? Yeah. My pick is the fruit, the pawpaw.

I had my first pawpaw recently. And also I just learned that they're indigenous to the East Coast of the United States, which I did not realize at all. And they're very good. They're in season right now. And if you're not familiar, they kind of look like a giant kiwi, but with no

fur.

And then they taste sort of like a persimmon, but custardy or

so I recommend them if you can find them. Well, thank you very much for taking the time today to join me and share the work that you've been doing at Saturn Cloud and helping to contribute back to the DASK project and some of the other components of the Python data ecosystem.

Appreciate all of the time and effort you've put into both of those, and I hope you enjoy the rest of your day. Thank you. Thanks for having me.

Thank you for listening. Don't forget to check out our other show, the Data Engineering Podcast at data engineering podcast dot com for the latest on modern data management.

And visit the site at pythonpodcast.com

to subscribe to the show, sign up for the mailing list, and read the show notes.

And if you've learned something or tried out a project from the show, then tell us about it. Email host@podcastinit.com

with your story.

To help other people find the show, please leave a review on Itunes and tell your friends and coworkers.

The Python Podcast.init

Summary

Announcements

Interview

Keep In Touch

Picks

Closing Announcements

Links

The Python Podcast.__init__