Building A Business On Serverless Technology

Hello, and welcome to podcast.init,

the podcast about Python and the people who make it great.

When you're ready to launch your next app or you want to try a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode. With 200 gigabit private networking, scalable shared block storage, node balancers, and a 40 gigabit public network, all controlled by a brand new API, you've got everything you need to scale up.

And for your tasks that need fast computation, such as training machine learning models or running your CI pipelines, they just launched dedicated CPU instances.

In addition to that, they just launched a new data center in Toronto, and they've got 1 opening in Mumbai at the end of 2019.

Go to python podcast dotcom/linode,

that's l I n o d e, today to get a $20 credit and launch a new server in under a minute. And don't forget to thank them for their continued support of the show.

And to keep track of how your team is progressing on building new features and squashing bugs, you need a project management system that can keep up with you that's designed by software engineers for software engineers.

Clubhouse lets you craft a workflow that fits your style, including per team tasks, cross project epics, a large suite of prebuilt integrations, and a simple API for crafting your own.

With such an intuitive tool, it's easy to make sure that everyone in the business is on the same page.

Podcast dot init listeners get 2 months free on any plan by going to python podcast.com/clubhouse

today and signing up for a free trial.

And bots and automation are taking over whole categories of online interaction.

Discover.bot

is an online community designed to serve as a platform agnostic digital space for bot developers and enthusiasts of all skill levels to learn from 1 another,

share their stories, and move the conversation forward.

They regularly publish guides and resources to help you learn about topics such as bot development,

using them for business, and the latest in chatbot news.

For newcomers to the space, they have the beginner's guide to bots that will teach you the basics of how bots work, what they can do, and where they are developed and published.

And to help you choose the right framework and avoid the confusion about which NLU features and platform APIs you will need, they have compiled a list of the major options and how they compare.

Go to python podcast.com/discoveredbot

today to get started and thank them for their support of the show.

And you listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis.

For even more opportunities to meet, listen, and learn from your peers, you don't want to miss out on this year's conference season.

We have partnered with organizations such as O'Reilly Media, Dataversity,

and the Open Data Science Conference.

Coming up this fall is the combined events of Graph Forum and the Data Architecture Summit.

The agendas have already been announced, and super early bird registration is available until July 26th where you can get up to $300

off,

or the early bird pricing for $200

off is available through August 30th.

Use the code b m l l c to get an additional 10% off any pass when you register. Go to python podcast.com/conferences

to learn more about this and the other conferences and take advantage of our partner discounts when you register.

And, also, the Python Software Foundation is the lifeblood of the community, supporting all of us who want to run workshops and conferences,

run development spritz or meetups, and they also ensure that PyCon is a success every year.

They have extended the deadline for their 2019 fundraiser until June 30th, and they need help to make sure they reach their goal.

Go to python podcast.com/psf2019

today to make a donation.

And if you're listening to this show after June 30, 2019, then consider making a donation anyway.

And you can visit the site at python podcast.com

to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. And if you have any questions, comments, or suggestions, I'd love to hear them.

And to help other people find the show, please leave a review on Itunes and tell your friends and coworkers.

Your host as usual is Tobias Macy. And today, I'm interviewing Raghu Murthy from Data Coral about his experience building and deploying a personalized SaaS platform on top of serverless technologies.

So, Raghu, could you start by introducing yourself? Hi, Tobias. Thanks for having me today. My name is Raghu Moorthy. I'm the founder and CEO of Data Coral, and I'm glad to be here.

And,

so I know that I've interviewed you about your work on Datacorl for the data engineering podcast,

but I'm curious if you can talk a bit about how you first got introduced to Python and,

maybe just briefly talk about some of the ways that it gets used at data coral.

Yeah. So,

just a little bit about my, background as well. I've been an engineer for,

quite a long time at companies like Yahoo and, Facebook.

And, for the 1st few years of my career, I was mostly working on c plus plus and Perl. I love Perl because of kinda how you could succinctly represent a lot of business logic,

but it was also kind of a write only language.

You if you wanted to write

similar logic, you just write it all over. You wouldn't try to reuse any code or anything like that. So that is kinda super interesting, but, this is all at, Yahoo mostly. And then at Facebook, I got introduced to Python in the data infrastructure team because a few of the tools that were being built on top of Hive and Hadoop,

especially around kind of the ETL side were built in Python. I found the language to be initially at least cumbersome to work with. It was pretty opinionated about,

kinda how code should look

to the human versus just a computer, which was kind of a pretty dramatic change.

So all around my time at Facebook, like, of course, we are writing a lot of Java code, for these kind of big data systems, even c plus plus for some period.

But Python was mostly around the tooling side. So

there was, like, an ETL system called Databie, which was a Python configuration driven ETL tool. There's, like, the next 1 is called Data Swarm,

which was also,

Python.

And,

like, that is probably,

also was an inspiration for Airflow, which is, like, a pretty popular

ETL

kind of DAG management software that's open source right now. At Datacorrel,

we use Python mostly for

helping

our data scientists

write their own code, custom code to

build out the transformations.

We have just about, frankly, Datapol, we have just about gotten started kind of using a lot more Python. Earlier, it was kind of simple tools.

But once we decided that we wanted to support,

data scientists and allow them to do more than just

SQL, Python was the obvious choice. And, we're working on some pretty interesting

and, kind of fun features that, we hope to put out there, please. And

so as I mentioned, we talked a bit about your work at Data Coral on the data engineering podcast, and I'll add a link to the show notes for anybody who wants to go back and listen to that. But for anyone who hasn't listened through that conversation, if you can just give a bit of a brief overview about what it is that you do at Data Coral. Yeah. So at Data Coral, our goal is to make,

data scientists and

data analysts and even kind of citizen data engineers kind of self sufficient.

We help automate data pipelines

so people who know,

just SQL can actually build complex data flows using the SQL like declarative language.

And these data flows end up getting compiled into serverless data pipelines.

And

these data pipelines run, and they capture kind of data quality metrics and data provenance and things like that that make it very easy for

data scientists and data analysts to be able to kind of really get a good sense of how their data is doing by just focusing on the business logic of the data.

And, we've been around,

for about a little over 3 years now. And,

yeah. So that's, I guess, a good bit of overview.

And so

the majority of the technology stack that you have built at Data Coral is based on serverless technologies.

And before we get too deep into the technical aspects of how you're leveraging those capabilities,

I'm wondering if you can just share your definition

of what you see as being the term serverless, and the types of services and technologies that fall under that umbrella. Yeah. Definitely. So clearly, I mean, serverless does not mean that there are no servers. It just means that,

if you are using any serverless technology, you're not really thinking about anything that is related to servers. Things like provisioning them

to handle capacity,

worrying about am I paying for idle resources,

worrying about whether,

whatever our provision is actually gonna

work as it as my application set, scales,

how do I deal with fault tolerance.

All of these things, if I'm not having to worry about it by using a certain set of services,

those services

are what are serverless for me. Yeah. So in terms of the actual,

technologies themselves, like,

if you kind of think about it, a SaaS application is kind of the ultimate serverless technology or just a user of it,

and you just use it. You don't worry about how it got deployed or how it got provisioned and so on. But when you think about as a developer,

as a developer, you're not having to worry about all the provisioning and stuff like that. Then you're probably using a public cloud or a cloud that offers a bunch of kind of platform as a service kind of services.

So if you're using

a cloud like AWS,

then you have databases like DynamoDB or Aurora. You may have, like, streaming systems like Kinesis. You may have other kind of services like SMS, SQS. Those are all, at least in my mind, serverless technologies for a developer. And, of course, the big 1 that we use a lot is AWS Lambda,

and that is for compute. Right? So the others can offer different kinds of functionality

beyond the straight up compute. And, of course, I guess, the other way going is s 3, which, again, is mainly for storage. And so you mentioned that Lambda

gets used pretty heavily in data coral, but I'm wondering if you can talk through some more of the types of services and technologies that you're using to build your particular application stack, and how your usage of those technologies has evolved as they have matured, and as more of them have become available over the past few years? Yeah. So when I first got started, I mainly started looking at AWS Lambda and, of course, s 3 for storage as kind of the 2 main services. We also, I think, had, used things like API gateway to be able to provide, like, an events endpoint and things like that. And

the goal as, as I mentioned earlier, is for data core to provide a way for people to specify,

end to end data flows. So as in be able to collect data from different places,

organize that data in different kinds of query engines, and be able to even publish the transform data into applications, into production databases, and so on. So the way we have picked these serverless technologies,

has been by kind of carefully looking at every single 1 of I mean, AWS,

for example, keeps providing newer and newer technologies that are kind of these past services. So we have always kind of looked at every single 1 of them to see what is the best possible way to use each of these serverless technologies to be able to build, like, a really robust end to end kind of data infrastructure stack. And

the way we have kind of used so I'll give you an example of how it has evolved. Right? So

earlier, we had kind of we had stored data in s 3, and we said, okay. We would like to make that data variable

directly from s 3. And back then, there was EMR, and, there was, like, Hive on top of EMR. So we said, okay. We'll actually spin up a a Hive metastore

and stick all of the metadata into that Hive metastore

and then allow, like, our customers to use EMR to kind of query that data that's sitting in s 3. But then about, I guess, a year in, Amazon offered the service called the Glue Data Catalog, which is essentially

the

Hive, meta store product as a service by Amazon. So the moment we looked at it, we're like, why are we spinning up, like, a database and a server to actually run this service? We might as well just use the blue data catalog. So we have we have, like, a opinionated view of how these data flows should be built out, what the interface should be for our users. And given the technologies that are available, we kind of pick and choose the ones that we feel are kind of the best

best way to kind of provide, like, the really robust service. And then as and when newer ones kinda come along, we are able to kind

of replace, like, small parts of it while still providing the same service and, of course, improving the robustness or improving the even

the cost of operating,

data coral itself. There's as you can imagine, there are several other examples, but I think this could just be a good 1. And so

for developers,

they're typically used to thinking about their application design in the context of deploying to a server or even something like a container where there is some sort of resource where

they're able to gain access to different operating system primitives or that you can have maybe multiple services running,

you know, adjacent to each other. And I'm curious

how you have found your experience of building on serverless technologies

to influence your overall approach to application architecture and systems design. Yeah. Definitely. So in terms of the

application architecture itself or, like, in general, like, what we are building is a distributed system. Right? So there's huge amounts of data that we'll need to process, so you cannot all do it in kind of 1 server. So you're you have to split that computation or, like, that, data storage and stuff like that across multiple machines. But we didn't wanna be in the business of, managing those machines. So what we ended up looking at I mean, when you think about data infrastructure, there's kind of 3 distinct kinds of operations that, we have identified in a data infrastructure stack or in a data flow, if you will. You're collecting data from different places. So this operation

of collecting data, it can be, in some sense, kind of infinitely parallelized. Right? So what that means is I can do this piece of pulling data from different places into these bite sized chunks that can then be kind of run inside of something like an AWS Lambda. But then you might end up in a situation where there's a transformation that is trying to process, like, I don't know, lots and lots of data, and the the logic is pretty complicated. In those cases, we would kind of just offload that into any of the kind of, for example, the data warehouses that that are available. It could be, like, a big data kind of query engine like an Athena or it could even be Redshift or it could be,

Snowflake. In those cases, those databases themselves are they do require some amount of kind of cluster management, if you will, but that is necessary for kind of complicated workloads. But when you think about what an end to end data infrastructure stack does,

a lot of it can be represented in terms of these bite sized operations. And those are the ones that we have kind of moved to serverless technologies.

And what that has also done is, it has essentially freed us up from

kind of doing a significant amount of kind of cluster management. So this also has, like, pretty significant implications on kind of the overall application architecture itself. So if you're trying to provide a SaaS kind of tool or a SaaS application, you're typically trying to build a multitenant

architecture where there's the similar the same

installations being able to handle multiple customers. Right? So,

building,

when you think about building a SaaS application,

in general, the way, it is built out is you have kind of 1 installation

with all of your kind of application code, and then you have multiple customers of yours using that same installation. So all of your customer data is actually flowing into the same,

kind of infrastructure.

So the this gives you things like

the ability to,

scale across customers. It gives you, kind of the economies of scale. You can reuse resources across different customers. So those are all kind of, like, the good things about a multi tenant architecture.

But then that also means that you're having to deal with

kind of the vagaries of cluster management.

So let me give you an example. So now let's say you have, like, a single installation and you have multiple customers

that are using single installation for the application, and then there are multiple customers who are using that application.

Now you can kind of commingle all the data, like, all the processing. That's fine. And then you start realizing that there are your application is blowing up, so you're getting more and more customers.

Now you cannot indefinitely

kind of increase the size of your installation, add more compute, add more storage all the time.

So at some point, you'll have to break it up into, let's say, 2 installations.

And at that point,

you have this decision of of, okay. Am I gonna have

customers who are gonna be all in 1 installation or the other installation? I kind of play this forward and just end up with this kind of bin packing problem. You have, like, a small number of installations,

and then you have customers that you need to allocate to these different installations.

Now these customers may be growing, shrinking.

Now you you're spending most of your time

trying to organize customers within your installations and a huge amount of, automation that needs to get built out. I know this because this is literally the problem that I had to solve at Facebook back in the day where we had, like, 1 giant Huddl cluster and multiple teams using it. So we had to kind of carve out pieces and move it into different data centers and so on. So for, these kinds of, SaaS applications, the ideal situation would be to have essentially kind of 1 installation per customer so that they can

go, scale them up and down based on the customer usage.

But then that adds a huge amount of overhead around kind of management

and kind of maintaining fault tolerance and so on. Now when you think about a serverless

architecture for your application,

then spinning up new installations is actually pretty straightforward. It doesn't take too much time. Each installation is kind of almost gonna it can do kind of lights out operations in terms of scaling. And you're not having

to kind of centralize all of your kind of customer data into 1 place. So this allows

somebody who has built an application in a purely serverless manner

to be able to, in fact, deploy isolated installations

for each customers

for each of their customers.

So you end up having multiple isolated installations instead of a single installation that is multitenant.

So our belief is that serverless technologies have made it kind of really possible

to build

this kind of, if you will, a private SaaS,

kind of,

applications

where

customers can decide to use different applications, but then all of those applications run within their own environments,

which then means that no data has to leave their system. So there's, like, a pretty strong data security and data privacy argument to be made. And, yeah, as you mentioned, having multi tenant services

become

increasingly difficult as you deal with customers that might be operating at orders of magnitude different scales than each other because then you started dealing with issues of managing

priorities

as far as access to different resources within your environment,

or

ensuring that you have allocated enough capacity for certain customers and that they're not going to stomp on the capabilities of the sort of,

the customers that have lower requirements.

But, yeah, as you mentioned, having

everything delegated to serverless technologies or being able to deploy directly to customer environments removes having to even think about those different edge cases that are sort of easy to overlook from the outside, but, incredibly painful once you start having to deal with it on your own.

Yeah. Absolutely. Like,

1 of the bigger challenges around kind of building multi tenant systems is this whole notion of a noisy neighbor. So there could be, like, these spikes of activity by kind of 1 or 2 kind of customers, which then causes, like, the entire cluster itself to kind of get into a bad state.

And then

you had to spend a bunch of time nursing the cluster back into health. And then, by the way, there'll be a lot of workloads that have kind of piled up while this cluster was in a bad state.

Now you are in you're then having to kind of recover the whole thing.

But then while you're recovering, there's more data potentially that's coming in. So you increase the size while you're recovering and then spin it back down because you don't wanna kind of pay for too many resources.

So this is literally the the life of, like, a really high scaling

kind of multi tenant architecture application. And so you mentioned that a significant portion of your application

relies on the AWS Lambda service, which has been,

referred to as a function as a service platform, and there have been other incarnations of that

paradigm and other cloud providers and in open source technologies. But I'm wondering what you have found to be some of the main benefits

for targeting services such as Lambda and other function as a service platforms,

and what your litmus test is for determining whether a given project is a decent fit for something like that? Yeah. So when when you think about a function as a service, let's kind of define what that actually means. Right? So it just means that the application developer writes a piece of code,

upload it to the service,

instructs the, the service to

run that function whenever certain events happen.

The service then

is able to kind of figure out how much resources are needed in order to run that function. If the number of invocations kind of increases dramatically, then the funk the the service

automatically scales up and down the number of resources that are needed to be able to run, those functions. Now this is a very kind of different philosophy to building, you know, distributed systems. Right? So when you think about building, kind of, distributed systems and the 1 that I'm kind of familiar with is this kind of Hadoop like,

systems where,

you have a single distributed system,

and

this distributed system needs to handle workloads of very different types. Right? So you might have, like, jobs that are processing, like, multiple terabytes of data so they run for a really long time.

Then you might have people coming in and wanting, like, quick answers to or, like, quick results to their kind of jobs.

So you have jobs that have kind of interactive,

latency

requirements.

There might also be jobs that have kind of deadlines

to say, okay. I want this job to finish by this time, every day or something like that. So most of the time that is spent in building these kinds of distributed systems

is in getting the most, kind of intelligent way of doing,

resource management

and resource scheduling

so that you can handle all of these different kinds of workloads.

And this is what makes building such kind of distributed systems

incredibly hard and also a lot of fun. But Lambda essentially

took our function as a service in general, took an opposite approach. So what it told the developer was, I don't care what your workload looks like. Break it down

into

a shape and size

that fits me. And if you did that, then I'll be able to run it in a really robust manner. I'll make it super scalable.

And, also, it'll be kind of cheap for you to run it that way. So for me, coming from this kind of background of doing this cluster management and dealing with different kinds of workloads and stuff like that, This is a very kind of refreshing,

change

in, in some sense, the philosophy of,

of a service. Right? So what I started looking at, and this is also kind of the, I guess, the litmus test as you call it, is if there are operations that can actually be broken down into

this kind of bite sized chunks of work, then

they are

amenable to Lambda.

So

even though Lambda offers this kind of philosophy of representing any kinds of workloads into the shape that it wants, building a distributed,

system itself

like, there are certain things that you had to handle that,

will not those problems don't really go away. So when you think about building a distributed system,

there's very simplistically you're thinking about kind of 5 things.

1 is, kinda, how do you deal with configuration management? As in, how do you do provisioning? How do you, like, configure on those jobs and things like that?

How do you

kind of do the resource allocation and, resource scheduling?

And this is something that we have outsourced to Lambda. But once you,

kind of run these kind of jobs or whatever else, you need to do a whole bunch of state management to make sure that the jobs are running in the right way and

there is enough,

kind of state that is maintained so that orchestration becomes

easy to build. And, of course, orchestration itself has to be built out. Like, how do you make sure that, you run 1 task after the next and so on? And then finally, and this is something that most distributed systems don't really do a great job of, is around visibility.

How exactly is,

the system behaving? How are workloads behaving inside of a given system? So if you're thinking of using Lambda,

you know that the resource management and resource scheduling aspect of it is something that you don't have to worry about. But the other 4 things around configuration management,

around orchestration,

around state management, and trying to provide visibility, those are things that you still have to take care of.

And with Lambdas, they also, as you said, come with a certain number of constraints that require you to

build your services to fit that.

And some of those are things like the cold start problem where

the first time an event fires to

load a particular function, there might be some latency as that, as that function gets warmed up and loaded into the cache.

But there are also things like the

execution time limits or some of the other constraints that

placed on Lambda functions

and particularly for cases where, as you were mentioning, there might be some task that needs to churn through a lot of data or needs to be able to fetch data.

I'm wondering what you have found to be some of the edge cases that you've run into with Lambda and some of the different ways that you've worked around it and

other services that you have found useful to leverage in conjunction with Lambda?

Yeah. So like I mentioned, like, Lambda is very particular about how big a particular

task can take. When it was, I mean, first built, my understanding is that it was mainly meant for building these applications where

it's kind of you send a request, and then there's a response. And in order for you to produce that response, there's, like, a small piece of code that needs to run, and that was Lambda.

And that is typically,

you want responses there in, like, let's say, few 100 milliseconds or, like, yeah, let's say, a few 100 milliseconds.

Then

the problems around, like, cold start and stuff,

become really,

kind of they they get, sorry,

I'm blanking on the word.

So cold start becomes a very big problem

if you want your latencies to be kind of in the order of milliseconds.

But then turns out, Lambda can also, if it is not invoked through an API gateway or something like that, Lambda itself can actually run for several minutes.

So that is what we have leveraged,

quite a lot.

The fact that

Lambda can run for I think when we first got started, Lambda could run for up to 5 minutes.

And my thinking was you can do some serious damage

in 5 minutes in terms of, like, processing. But, of course, you'll have to do a bunch of kind of state management to figure out what is the next Lambda to run and so on. Right?

So, in fact, we have actually,

nowadays,

Lambda can run up to, I think, like, 15 minutes, which is really good.

But at the same time, in terms of the overall architecture of our kind of data flows, we have, we chose, like, a whole micro batch processing model for our data flows. And a Lambda function that could run only for a few minutes was like a perfect show in for what we needed

to build

near real time, really robust

data pipelines

where

you didn't have to,

kind of wait for too long for your data.

And, also, if something failed, you wouldn't have to go back several hours and kind of restart everything. You're just kind of doing the last 5 minutes worth of, processing.

So we actually

leverage the, the main limitation,

around Lambda, which is the amount of time that it could be run to be, like, a core advantage

for the overall architecture for,

the data pipelines that we were building.

And then what we ended up doing was

saying that, okay. Now given that we have to do bookkeeping every 5 minutes,

what is the right kind of data model to have

for that kind of processing? And we've, kind of built out, like, a whole way of thinking about,

this micro batch kind of processing

model

and how that actually impacts

the way you wanna represent these data flows. I don't think we have kind of much time to talk about that. That's probably more around data engineering.

But we have essentially leveraged both the strengths and the weaknesses

or or, I guess, constraints

of Lambda to actually build something that works really well for our overall architecture.

And so

when building for cloud specific technologies for so things like Lambda or if you're relying on other,

services that Amazon might provide or that are

not something that you can easily replicate in the local environment.

It can be difficult to figure out how to set up your local developments

to make sure that what you're running locally for iterating and testing for your code is actually going to function as expected once you actually deploy it to

the destination service.

And so I'm wondering how you approach things like local development when you're building on these serverless technologies

and how you manage to

ensure that you have

sort of a a close enough approximation

of what the functionality is going to be rather than having to push to

Lambda or wherever not if

you're

using, like, a whole bunch of PaaS services, then, you're not if you're using, like, a whole bunch of past services,

then trying to mock all of them to run locally in your environment I mean, there are some libraries that are out there, but they're not really that robust. Or at least, in our experience, we have just found that the best way to know how your code is running is to actually deploy it in Amazon. But that said, we have actually kind of layered our,

kind of our code

in such a way that most of the business logic of what we need to, like, actually build and, like, that is where we spend most of our time, that stuff doesn't really require any Amazon services.

Right? So

when you think about,

let's just talk about Lambda.

Lambda gets invoked when an event kind of triggers it. Now when you're actually writing that Lambda code,

there is no reason

for you, for that code to know how that event actually triggered that function. In fact, that that whole thing is abstracted away

by Lambda itself. So what we have built is

kind of there's a whole notion of kind of of course, we had to deploy our software in AWS. But then we have built, like, a thin layer on top where we have defined the interface

between what our framework so that that thin interface is what we call our framework. So that framework knows how to deal with all of the Amazon services, but then it only provides

the necessary information for the rest of the business logic to actually,

kind of do the orchestration,

do even the state management,

do even do kind of actually the business logic itself. And for the the data aspects of it, like, if we are writing using s 3, then we, end up just using local files to do a lot a lot of that testing.

But it has been a lot of trial and error. Right? So our goal is to,

get to a point where,

like, 90, 95% of the time that, we are writing and testing code, we're actually able to kind of test it out locally.

But then the framework code itself and kind of the deploy time code and all of that stuff, in order to test it, we do have to deploy it. And in those cases, we have kind of,

broken up our, kind of all the entire,

kind of platform into a bunch of microservices

so that we'll be able to test each microservice

individually.

And that allows us to even though we are deploying stuff into AWS, it takes maybe tens of seconds

instead of just running immediately.

But we'll try we typically try to restrict that to purely the framework code. But, again, this is not a solved problem for us.

We are still learning how to make it,

much better. There are other kind of, tools that we had looked at in the past,

but they didn't kind of offer the kind of flexibility that we wanted,

especially around the deployment side if you wanted to deal with all of the permissions and the roles that needed to get created and so on. So for those kinds of things, we have kind of rolled our own. And in terms of the

experience of building microservices,

I know that 1 of the, sort of,

pieces of wisdom that I've come across as to

whether or not it makes sense to use microservices

is based on the sort of

structure of your organization because

in general software tends to replicate the communication

patterns of your organization.

And

so seeing as how data coral itself is still fairly early on, I'm sure you don't have hundreds of developers who

all sort of, focus on 1 service at a time. I'm wondering what your experience has been in terms of

building microservice

style architecture

and how

the serverless technologies

might simplify that overall application paradigm given that you don't have to deal as much with the underlying infrastructure

and deployment pains that go along with it? Yeah. So, I mean, at that overall architecture level,

in terms of kind of actually running this whole software, clearly,

serverless makes it a lot easy.

But,

you had to think very carefully about what are the interfaces that you have between these kind of microservices,

and are those interfaces clearly defined?

So the way we have,

tried to approach this is that we have kind of a a central way of,

or, like, a standardized way

of doing state management

and of doing orchestration.

And that means that all of these microservices

kind of talk the same

language, if you will, in terms of how they communicate with each other. And in terms of the microservices that we are,

trying to build,

we are we have standardized some interfaces. So that's kind of 1 of the main things that you have to do.

And you have to like, this is essentially kind of the the ultimate form of kind of a service oriented architecture, right, where you're kind of saying that every little piece of functionality that you build can be invoked in, like, a certain manner, and it publishes how it can be invoked and things like that. Does that make sense?

Yeah. So, when you think about building,

these microservices,

they're kind of,

2 things that we always keep in mind.

1

is, can we actually standardize the interfaces of all of these microservices

so that there's a shared metadata layer

that has, like, the complete

state

of your kind of entire system. And all microservices

are communicating with that

the

state store,

to be able to figure out what they need to do. So that's number 1, which is kinda trying to standardize interfaces between microservices.

Then the second thing

is this, whole concept of separation of concerns.

Now if you have kind of standardized interfaces and you have said, okay, these types of microservices need to be focused on these types of functionality,

then if you realize that while you're trying to kind of add this additional feature or, like, build this new microservice that kind of tries

to kinda combine

the,

the functionality of

microservices that need to actually be separated out, then we think really hard about where exactly this functionality should lie

so that we are not kind of mixing up,

multiple piece of functionality

into kind of a single microservice.

So this is something that's kind of done in a pretty case by case

basis.

But,

like, off late, we've been talking quite a bit about, this whole notion of, separation of concerns

where you're saying, why is this microservice having to talk to this other

kind of database or, whatever else when you know that this other microservice owns all of the data that's going into that database.

So those are kind of conversations that need to happen.

As you mentioned, we are not at that scale where we need kind of each team to kind of build out their own and, like, not really have to worry about it.

For now, we have tried to build out,

kind of these frameworks on top of these serverless

services. And then we're really, kind of thoughtful about how these services communicate with each other. So these microservices, as you can imagine, is kind of like the,

ultimate service oriented architecture

for your application.

But this is something that we continue to learn how to do better every day. And I'm also curious how you approach

managing

continuous integration and continuous deployment and testing

of the services that you're building in the serverless

paradigms,

and particularly given that you're deploying to

customers,

VPCs,

I'm sure that that

compounds the level of difficulty

as far as making sure that you

have everybody synchronized across all the different versions and that you aren't

potentially impacting customers at a point where they're not expecting an upgrade or just sort of managing that overall communication and expectation and

trying to make sure that everybody is staying up to date with the latest version of what you're building and managing?

Yeah. Absolutely. So

when you have,

multiple isolated installations

running into, like, tens or 100 or 1, 000, kind of SRE kind of takes a new meaning.

Right?

So we are we in fact have an opening for a serverless reliability engineer,

which is

a way to kind of just think about,

not only how the CICD works, but also how do you make sure that you are able to, monitor all of these isolated installations

without having to go to each 1 of them

separately. So

right now, as as I mentioned, we try to build a lot of unit tests that can run locally.

So,

that ends up

kind of getting done whenever,

new code gets kind of becomes available for review and so on. But in terms of the integration side, we have built out a bunch of integration tests that can actually deploy into

AWS. A lot of this is kind of homegrown. We are actively working on making it kind of more continuous integration and also

have, like, much more lower latencies

for even doing the deployments.

But a lot of it has been kind of homegrown.

So that's the reason,

we're we're looking

for

somebody who can who has done the SRE kind of a role,

but then can essentially rethink how it could actually work in this kind of multiple isolated installations of, like, serverless services.

And you mentioned

the challenge of monitoring

the,

sort of, capacity and status of all of these different customer deployments. And so I'm wondering what your approach has been in terms of

metrics and monitoring

and just overall observability

and alerting for the product that you're managing for your customers?

So yeah. So 1 of

the decisions that we made very early on to have, like, a shared metadata layer for a given installation. Right? Because it's all isolated. So that shared metadata layer allows us to actually have quite a good view of, like, what's going on in the system. In fact, we actually make all of this kind of state become available to the data scientists so that they can see how their own data quality is, how fresh their data is, and so on. But when you think about across different installations,

most of the time, we have, been able to centralize,

like, errors and stuff like that into kind of a a separate

SaaS tool that then is,

able to kind of alert us when things are kind of going alright.

But that is still something that's

work in progress for us.

So we use this tool called Rollbar, which has been, like, pretty

good for us, where we've been able to actually

send alerts or, like, send errors from across different installations into kind of 1 place that allows us to have, like, a single pane of glass across all installations. But, again, being able to

kind of automatically,

remediate when there are errors and stuff like that is something that's actually going into our platform. I don't believe there are love to know about tools that will actually help us to do that. In terms of

the just overall technology landscape for serverless

platforms and capabilities.

So Lambda was 1 of the forerunners in terms of functions as a service. But as you mentioned, it's not just functions as a service that encapsulate what it means to be serverless. There are also things like SNS and SQS that you mentioned or

BigQuery from Google.

And so I'm curious

what you personally

are most excited for in the overall serverless space that you're keeping an eye on or things that you would like to see come,

available

to simplify some of the challenges that you're facing?

Yeah. I mean, ultimately, it's about getting to,

more and more complex workloads

that can all be kind of automatically provisioned pretty quickly and be able to run. So 1 of the technologies that we're actually excited about and we are playing around with it quite a lot is AWS Batch, where

you can kind of,

it essentially takes the Elastic Container Service

and makes it really easy to kind of spin up jobs that all look the same, or at least have the same configuration.

And you can kind of spin up many, many, many jobs very quickly. So we have actually

built a layer on top of AWS batch that makes it look like a Lambda. So if you want to run some complicated piece of code that might take longer than 15 minutes,

then we can automatically kind of fill that all into a batch if you want to. Right? So that's the capability that we are actually building about. And I think there'll be more and more such things where you can spin up containers or maybe a cluster of containers. Just spin it up, do the work that you need to do, and then it can be kind of turned down. I think there are many companies that are trying to do this, and I'm always kind of on the lookout for

these kinds of kind of services, that allow us to spin off large amounts of compute in, like, an easy way

and be able to kind of spin them back down when they're not needed.

And if it just allows kind of application developers to build, like, much more sophisticated business logic. I think those are the kinds of things that would really make it easy for

application developers to build, like, much more sophisticated business logic,

without really having to worry about kind of all of the stuff around kind of provisioning,

capacity management,

resource scheduling, and things like that. And are there any other aspects

of your experience of building with serverless technologies

or any of the other work that you're doing with data coral that we didn't discuss yet that you'd like to cover before we close out the show? No. I mean, just, in terms of Data Coral itself, I mean, our goal has been to leverage as many of these kind of services that the clouds have to offer

to be able to provide, like,

a really robust end to end data infrastructure as a service within our customers' environment. So the approach that we have taken is

to not be cloud agnostic

as in only use kind of the common denominator that's offered by all clouds, kind of build out the rest ourselves. Instead,

we have chosen to be cloud best,

as in we leverage any and every piece of kind of technology that a cloud has to offer.

And we'll build

the

stack that would be kind of best in class for that cloud.

So all of the abstractions and stuff that we build on top, I mean, they'll remain consistent.

But

whenever we kind of move to a particular cloud or, like, we you know, start providing our service in a given cloud, we wanna make sure that we leverage everything that the cloud has to offer.

So as these clouds start offering similar kinds of services,

then, like, of course, we can just reuse the code, but then we'll write kind of, layers that'll allow us to build the abstractions that we need

and then leverage whatever the clouds have to offer. So we're kind of super excited about

kind of everything that,

we're about plan to build out in the near future as well as the kinds of problems that we believe will be solving,

in the near

in the future. Alright. And for anybody who wants to follow along with the work that you're doing or get in touch, I'll have you add your preferred contact information to the show notes. And so with that, I'll move this into the picks. And this week, I'm going to choose the Avengers Endgame movie.

Finally got a chance to watch that the other week, and it was quite enjoyable. I think they did a really good job of tying all the story lines together and bringing a lot of it to a close.

So I'm gonna avoid giving any spoilers, but,

suffice to say that I had a good time. It's definitely worth a watch. And so,

yeah. With that, I'll pass it to you, Raghu. Do you have any picks this week?

First of all, thanks for,

not

giving away any spoilers for that movie. I hope to watch it,

sometime soon.

My pick,

this week is the Golden State Warriors to actually win the NBA championship.

Alright. Well, thank you very much for taking the time today to talk about your experience of

building on top of and using these different serverless technologies.

And also thank you for sharing your experiences

at Data Coral. So thank you for that, and I hope you enjoy the rest of your evening. Yeah. Thanks for having me again. Yeah. Absolutely.

The Python Podcast.init

Summary

Announcements

Interview

Keep In Touch

Picks

Links

The Python Podcast.__init__