Summary
Every software project is subject to a series of decisions and tradeoffs. One of the first decisions to make is which programming language to use. For companies where their product is software, this is a decision that can have significant impact on their overall success. In this episode Sean Knapp discusses the languages that his team at Ascend use for building a service that powers complex and business critical data workflows. He also explains his motivation to standardize on Python for all layers of their system to improve developer productivity.
Announcements
- Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- Your host as usual is Tobias Macey and today I’m interviewing Sean Knapp about his motivations and experiences standardizing on Python for development at Ascend
Interview
- Introductions
- How did you get introduced to Python?
- Can you describe what Ascend is and the story behind it?
- How many engineers work at Ascend?
- What are their different areas of focus?
- What are your policies for selecting which technologies (e.g. languages, frameworks, dev tooling, deployment, etc.) are supported at Ascend?
- What does it mean for a technology to be supported?
- You recently started standardizing on Python as the default language for development. How has Python been used up to now?
- What other languages are in common use at Ascend?
- What are some of the challenges/difficulties that motivated you to establish this policy?
- What are some of the tradeoffs that you have seen in the adoption of Python in place of your other adopted languages?
- How are you managing ongoing maintenance of projects/products that are not written in Python?
- What are some of the potential pitfalls/risks that you are guarding against in your investment in Python?
- What are the most interesting, innovative, or unexpected ways that you have seen Python used where it was previously a different technology?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on aligning all of your development on a single language?
- When is Python the wrong choice?
- What do you have planned for the future of engineering practices at Ascend?
Keep In Touch
- @seanknapp on Twitter
Picks
- Tobias
- Delver Lens app for scanning Magic: The Gathering cards
- Sean
Closing Announcements
- Thank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. The Machine Learning Podcast helps you go from idea to production with machine learning.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
Links
- Ascend
- Perl
- Google Sawzall
- Technical Debt
- Ruby
- gRPC
- Go Language
- Java
- PySpark
- Apache Arrow
- Thrift
- SQL
- Scala
- Snowflake runtime for Python Snowpark
- Typer CLI framework
- Pydantic
- Pulumi
- PyInfra
- Packer
- Plot.ly Dash
- DuckDB
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to podcast dot in it, the podcast about Python and the people who make it great. When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So check out our friends over at Linode. With their managed Kubernetes platform, it's easy to get started with the next generation of deployment and scaling powered by the battle tested Linode platform, including simple pricing, node balancers, 40 gigabit networking, and dedicated CPU and GPU instances. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to python podcast dotcom/linode today to get a $100 credit to try out their new database service, and don't forget to thank them for their continued support of this show.
The biggest challenge with modern data systems is understanding what data you have, where it is located, and who is using it. Select STAR's data discovery platform solves that out of the box with a fully automated catalog that includes lineage from where the data originated all the way to which dashboards rely on it and who is viewing them every day. Just connect it to your DBT, Snowflake, Tableau, Looker, or whatever you're using and select star will set everything up in just a few hours. Go to python podcast.com/selectstar today to double the length of your free trial and get a swag package when you convert to a paid plan.
Your host as usual is Tobias Macy. And today, I'm interviewing Sean Knapp about his motivations and experiences standardizing on Python for development at Ascend. So, So, Sean, can you start by introducing yourself?
[00:01:42] Unknown:
Absolutely. Hi, Tobias. Thanks for having me. My background, I'm a software engineer by training. Started my career back at Google in 2004, which feels like, forever ago now. But I've really spent the last 18 years as a software engineer, as a team lead. I founded my own company as a CTO with 2 others. Built that up to about a 500 person company over 8 years and then decided that I was a glutton for punishment and would do it all over again. And now the founder and CEO of Ascend. Io. We focus heavily on data automation for data pipelines on top of Snowflake, Databricks, and your favorite data infrastructure that you may be working with today. And do you remember how you first got introduced to Python?
I do. When I first started at Google, coming out of college, I had spent a lot of my time really focused on Java, c plus plus c. I'd even had internships and that were predominantly writing in Pearl back at a company called Palm, like, way back in the day. This is, I think, 2002. I've worked with many other languages, but I never actually worked with Python. And it was the dominant scripting language at Google at the time. And so you get into Google and you learn c plus plus you learn Python, you learn SALZO, which is their their MapReduce language, and that was my first foray into it.
[00:03:05] Unknown:
So now you've come full circle back and you're running your own business, and you mentioned a little bit about what Ascend is. I'm wondering if you can dig a bit more into the types of work that you're doing and some of the story behind how you ended up focusing on this problem area and dedicating so much of your time and energy to it. We spend a lot of time focused really heavily on data pipelines, data engineering workloads, and in particular,
[00:03:28] Unknown:
our center of gravity and the problems that we solve for are really around how do we better automate data pipelines. Whether your frame of reference is ETL or EOT, whether you like writing in Python or SQL or Java or Scala, there's this really big complex domain when it comes to data pipelines. And what we've observed is the problem scope of storing and processing really large volumes of data is really well solved by companies like Snowflake and Databricks and clouds and etcetera. But the construction of actual data pipelines can still be very hard. We go down a really long rabbit hole. We won't do that. But when you construct a data pipeline, you are largely building a manually crafted query plan. According to what a query planner and query optimizer would do in a database, you're manually crafting and codifying it and hoping that it will run perfectly based off of all of your assumptions and nothing will ever change or break it. And for most of us in software engineering, you know that that is a very aspirational goal that never actually is achieved.
And so what we do at Ascend is we create an automation platform that layers on top of the underlying data infrastructure, takes higher level declarative models, and is a continuously running Proplane that analyzes, optimizes your data pipelines and pushing work down to those underlying engines. The reason why I think that's so cool, outside of the sheer benefits that it it provides in how you build data pipelines, is from an engineering perspective, we get to solve a lot of really neat problems. We run dozens. I think we're almost up to 100 of deployments of Ascend.
We run on all 3 clouds. We run a very heavy Kubernetes infrastructure underneath. We run-in private deployments. We run some air gapped deployments. We run hosted deployments. We connect to 100 of different data systems. We have tons and tons of developers across our customers building complex pipelines, and we do everything all the way down to SDK and APIs, all the way up to really elegant front end interfaces. And so the the square footage from a product and technology perspective is really broad, touches all the clouds, touches lots of distributed systems running on Kubernetes, touches lots of really big data oriented systems, and provides programmatic access as well as, UI out to users as well.
So huge square footage.
[00:05:49] Unknown:
In order to be able to support a product of that level of ambition, can you give a sense of the number of engineers that are working at Ascend and some of the ways that they're segmented in terms of focus?
[00:06:02] Unknown:
We have roughly 15 engineers inside of Ascent today, and it is a very tight, compact team for a very broad amount of square footage. The reason why we're able to achieve so much, and we'll talk about languages and developer methodology and and philosophies in general, We tend to have what is an externally facing engineering team that spends a lot of time with our customers. Our customers and all of our users are developers. They're all cutting code on top of the ASN platform until a subset interact with our customers. The other portion of engineering runs the core platform itself. In many ways, think of that core platform as, like, hosted if we think about Kubernetes as a control plane and automated framework. They're running and building the software and operating it on behalf of our customers, and the other half is helping the customers write cooler things on top of its end.
When we think about and this is an important part of how Ascend has structured engineering. We are very deep believers of polymath engineering, which I would contend is is that step beyond generalist engineer. We don't break our teams into front end, back end, infrastructure, database, etcetera. We actually push our teams really hard to be polymaths and to embrace and understand multiple parts of our infrastructure. You know, as we talked about, we're moving more and more and more to Python. We have Go. We have Scala. We have Node. We have React. We have other parts of our stack, and they learn those as well. And so we push folks really hard to become experts in all things of Ascend, which gives us really high speed and agility, which is also tied to 1 of those core reasons why we're moving more and more to Python is around the speed and agility.
[00:07:50] Unknown:
In terms of the technology stack that you're running, as you mentioned, you're dealing with Kubernetes, you're dealing with distributed systems, big data platforms. A lot of those big data platforms in particular have been heavily biased towards Java. Some of that's shifting in recent years, but I'm wondering if you can just talk to some of the policies that you have for selecting which technologies you're going to use and support at Ascend and some of the ways that that has changed from where you are today to where you were when you first started. 1 of the things that we spend a lot of time on is
[00:08:25] Unknown:
how do we minimize the square footage of code systems, etcetera, that we're responsible for? Draw a pretty tight correlation to the lines of code and amount of debt that you have. There's good debt and there's bad debt. There's home mortgage debt and there's credit card loan debt. And we try to be really smart about the kind of debt that we're willing to accrue. And so when we make technology decisions, we're generally putting it against a backdrop of a couple of things. 1 is, is there anything else out there that we can either buy, install, license, something that we can sit on top of? Because if we don't have to build it and we don't have to be responsible for it, even better.
The next layer be from that is, is there something that even if we could build the perfect thing for us today, if there's something that's close and we do a lot of assessment of where is what's the trajectory of a particular technology, framework, tool, SaaS platform, But do we think that this technology is going to likely continue to accelerate, continue to improve? Because if so, even if we could build the perfect thing that we want today, if that's likely to crust through our needs in a reasonable amount of time in the future, let's go build on that instead. And the reason why we make those decisions is that we consider the most precious resource we have is Teams' time.
And 1 of our core values on previous podcast, we talked about some of our core values. 1 of our core values is the notion of evolve with intent. And it is the core notion that innovation is incredibly expensive. And you should save that energy for things that really matter. And you should put your time and energy into things that are highly differentiating for your company, your key, your technology, your product. And so as we pass things through those measures and those filters, what we're really looking for is, hey, for, you know, per hour of hands on keyboard, what's gonna give us the highest leverage and allow us, as a development team, to have the highest impact for our time effort. So that's where, you know, we spend a lot of time on which languages we pick, which frameworks we build upon, and we try and pass everything through that rubric.
[00:10:46] Unknown:
And so as you were first getting started with Ascend, what were the set of languages, technologies, frameworks that you decided you wanted to focus on, and what are some of the ways that maybe that started to kind of splinter or grow as you grew the company? Great question.
[00:11:05] Unknown:
So when we think about early on, and it's funny because my last company, I was the founder and CTO of it, and I introduced Ruby as our primary scripting language, and so hadn't touched Python for quite some time. And the lots of core drivers why especially looking at communities and placing different bets on the trajectory of those communities and in those languages It made sense at the time, about 2007. And, you know, in 2015, when starting Ascend and looking at a lot of the technology landscape, made a couple of bets. We did get really, I would say, lucky. I think it was an informed decision, but still off of a very small slivers of information. We made a really big bet on Kubernetes in February of 2016, which seems like so, so, so long ago. I remember saying that 1 point, I was like, was Kubernetes even out then? I had to go back and check. I was like, yep. I've seen it in the Git repo. But we we were really fortunate to make a bet like that that's paid off tremendously.
Other bets that we made that were quite valuable, you know, we we standardized on gRPC as our communications layer between microservices. We built most of our back ends in Go. We built a lot of our core control plane scheduling infrastructure in Scala, and we chose Node and React. I think this was around January, February time frame of 2016 as well for our front end. We made a lot of bets there, and, in fact, had very, very little, if any, Python in our tool chain for quite some time. So those are our early bets. Core drivers behind why, Scala worked really well for a lot of our underlying data model and in controlling infrastructure.
GRPC was fantastic and remains fantastic for communication between microservices. And then, Go was really emerging as a strong contender for back end technologies. Great community, beautiful language, and really interesting and powerful concurrency models that for back end servers can make a lot of sense. And it's oriented a lot towards that, as well. I was recently having a conversation with you for my other podcast, and you mentioned that you decided
[00:13:10] Unknown:
recently that you wanted to actually standardize everything on Python for developer productivity. So clearly, the benefits of Go's concurrency model has come to the point where it is outweighed by the developer productivity of Python. I'm wondering if you can talk to some of the challenges and pain points and points of friction that you and your engineers were running into with the set of languages that you were developing with and why you decided that changing the programming language you worked in was actually going to be the answer that, you know, unlocked some additional productivity benefits that you clearly
[00:13:47] Unknown:
needed. As we go through, as all engineering teams do, and you go through these architectural evolutions, you get a couple of really cool points where you get to revisit your designs. And especially for us in the data landscape, things are are changing so quickly, and they're adapting, and there's new technologies and new frameworks and all new platforms all coming up. It really does afford you some incredible opportunities to take a step back. Ideally, especially in start up land, a very quick step back and assess your technology decisions, what's worked really well, what hasn't, and start to make new bets on where you want to go. And for us, you know, 1 of the things that that we were really looking at was we were introducing a whole new part of our technology stack.
We have the notion of data connectors and what we call data planes, the things that connect into all the systems that allow you to do really cool things, whether it's an MS SQL database or a Kafka queue or a Snowflake warehouse, a whole foundation and framework for how you interact with these. And at that point, you had to take a step back and and look at, well, what languages and technologies do we have in the in the ecosystem today? Have we had other experience and exposure with? And that was really, you know, 1 of the bigger entry points for us in thinking through what what language and and framework should we use? And there, we swung pretty hard towards Python for 2 reasons. The first reason that actually enticed us in was in the data landscape, the pendulum's swinging pretty hard towards Python.
And, you know, as you highlighted accurately so, there's so much in the data and especially the big data landscape that is in Java. But the linkages between Python and Java are increasingly strong. And moreover, there tends to be control path and data path. And having data path in Java is fantastic. But as we've seen with PySpark and other technologies, having control path be in Python or even in c or c plus plus. I see you with your Aero shirt that you're rocking there. Like, we find that you can get the best of both worlds by having the sort of performance bottlenecks handled in CC+Java tackling the control mechanisms in Python.
So as we started to really dig deeper on that, you know, the thing that we found was we're just faster. We're faster at building, we're faster at debugging, we're faster at deploying, and it was a really good opportunity to take a huge step back and take that step back of what matters most, like, for us as a a start up, an engineering team, a product delivery vehicle for not just our company, but for tons of companies who have built mission critical systems on top of our product, What's most important for us is our speed and our agility. You know, this would be a slightly long thread, but bear with me on this because I think this is a really important topic. At least it was for us. So it was a very defining moment, was we sat down as an engineering team and said, what defines great code? Like, we have to have the definition. What is it that's defining a theme for great code for us? And it's different for every company, it's different for every team, or can be. But for us, given our state and a fast moving space as a startup with really, really exciting technologies, but with a lean team that needs to be able to deliver quickly. And we actually cycled through what were really the first 3 classic definitions.
You know, first 1 was ability of code to stand the test of time. And we ruled that out as our industry is changing so fast that it's borderline impossible to predict where everything is going to end up. That if you magically got something right 5 years ago and you just happen to nail that perfectly, like, like, Arthur, that was luck or you spent a profound amount of time trying to figure out where everything was going, and you probably missed opportunities along the way. So we ruled that out. Second 1 was performance. And we also rolled out that as the defining measure of code because as we looked at our systems, we looked and said, well, we're the control path. We're the control plane.
And for us, we can push down to these highly performant underlying engines. We're not doing, you know, our API is not having to answer thousands of requests a second as we do isolated deployments for every customer. Nobody's making thousands of requests a second to their control plane. Well, okay. Every once in a while, they do, but not very often. And we can horizontally scale things up when we need to. Kubernetes is awesome. So we rolled out the performance part as the defining. The third, which was a classic 1 that we see with engineering teams, is the measure of how quickly can somebody come up to speed, a new member of the team come up to speed. We spend, actually, a lot of time on that 1, because we tend to gravitate towards that as a really strong measure as well. But we ultimately settled on the 4th measure, which I think is a really important 1, which is the ability for that product, not the code, but for the product that the code delivers, but for the product to rapidly and nimbly adapt to changing requirements in the landscape.
We adopted that as the primary measure of great code, just embracing the fact that we're in this early exciting space where things change so quickly. And for us, we must pick a language and a set of frameworks that allow us to just turn on a dime. And customer needs something or we need to go fix some feature or add some capability that we can do it in a fraction of the time. And so when we got to that lens, we started to look at how fast are we in Scala, how fast are we in Go, how fast are we in Python, or even React and Node.
And as we started to look through all of those frameworks and languages, like, we're just faster in Python. We can move quicker. We can deliver features faster. We're confident in them, their performance, and that's what really led us down this path.
[00:19:38] Unknown:
1 of the interesting challenges that comes up a lot when you're having this type of exploration and conversation is, oh, well, I can build that same feature in Python in a week, but it's gonna take me, you know, 2 weeks to do that in my existing code base. And sometimes it has nothing to do with the language or even your familiarity with the languages. So if I'm an expert in Python and an expert in Go, it's gonna take me 2 weeks to write it in Go. It might just be a factor of the architecture that I've built into Go and has nothing to do with the language itself. And so I'm wondering what your approach was there to be able to try and do an actual apples to apples comparison. And, you know, as you are maybe shifting some of your existing code bases into a new language, how are you either making sure that you can keep things, 1, operating the same way, but also be able to make those effective comparisons where you're not also doing a rearchitecture and that's why you're doing things
[00:20:30] Unknown:
faster? Yeah. It's a really good question because you find people who will get stuck in that path. And I think it's a really great thing to be cognizant of, which is maybe we just have all these abstraction layers, which I'm sure my team's gonna be rolling their eyes if they listen to this. I hope they they don't hear me say this too much, but, like, I hate abstraction layers, like, with a passion. Well, I shouldn't say I hate them. I think most organizations prematurely abstract. And in doing so, you incur all sorts of debt. And if I have to keep hopping between windows and screens to build up enough context to make simple changes, like, this is where you generally find impeded productivity. And so I always like pushing people towards, like, simplicity. And I think this is where you oftentimes end up with with more systems that have been around long enough is you've now started to go down abstraction layer patterns.
And you've started to architect for more for a scale, and and that can slow you down. So I think you do need to be very cognizant of, hey, greenfield, blue ocean sort of opportunity is very, very different than take Eusignity Framework and extend upon it. I do think there's also, I would contend, kinda going back to, it's different for every company, but it is important for companies and teams to sort of embrace, like, what is your developer pattern? What is your cadence? Like, for Ascend, we release either 1 feature enhancement, almost 1 per engineer per week, for our our customers.
And we do this because we just have to move so quickly. And so against that backdrop, you know, we look at like, hey, How long have these other systems been around for? Like, our API server was built 5, 6 years ago on Go. I mean, it's awesome that it stood the test of time, but it's also terrifying because I think of just how far along this community has gone. I thought we could rewrite that thing and a quarter of the lines of code at this point. And so we try and be smart about to when do we say, hey, let's keep evolving it and celebrate getting to delete a bunch of stuff and simplify systems, and or when do we just build the next generation of something and just lop off an entire part of the architecture?
I'm a big believer we should actually celebrate deleting of systems more so than we should, celebrate creation of new systems, because that keeps our world simple and and gives us a tighter amount of square footage that we have to maintain, which brings us to do more innovation.
[00:22:43] Unknown:
In the work of migrating onto Python, another thing that you mentioned briefly is the question of frameworks and selecting a framework that gives you that agility to be able to quickly adapt to changing requirements and changing circumstances. So I'm wondering if you can talk to your selection process there and some of the choices that you made for being able to actually have that foundation to build from so that you didn't have to build all your own abstractions?
[00:23:10] Unknown:
We benefited from a couple of the earlier architectural decisions. So So for us, you know, as I mentioned, we run everything on K8s. We do full gRPC comms for all the back end services, and that made it really easy to literally you know, we have Python services now that pop up. We never even had to change the protobuf structures. We didn't have to change any of the communication structures. And what we found was we could actually launch services. You know, 1 of ours is called a data plane manager. We're talking single digit thousands of lines of code that is actually able to represent what would be 4 or 5 different surfaces that were written in Go and Scala and consume all of them with much more elegance and ease, probably a quarter as many lines of code. But the decision that we had made early on of having gRPC for comms made it very easy to be very language agnostic.
1 of the old sayings I've had is debt tends to grow manually across systems and exponentially within. And so if you can have strong refactored elements, ideally through interfaces like gRPC or Thrift, if that's more your jam, then it gives you the ability to not just swap out implementations, but actually have 1 service, as we've done, consume multiple services in their interfaces, which allows you to do an incremental sort of cleanup and absorption of local debt.
[00:24:31] Unknown:
So I'm assuming that this isn't the first time you've used Python at Ascend. And so I'm wondering, what are some of the ways that it was already being used and some of the lessons that your engineers have taken from their time working with those systems that they have been applying to building these new systems. And 1 of the things that is often a stumbling block as people start to work in new language ecosystems is the question of the developer tooling and the developer experience of being able to actually work with the code in their local environment.
[00:25:00] Unknown:
Yeah. We definitely have seen that. And there's also a lot of, I would say, the cool things we're increasingly finding and utilizing in the Python ecosystem, as we built a lot before. You know, for us, we've been using a lot of Python in our own tooling, but think of like our own command line interfaces and tools and and so on that we've been using just to deploy code and run CICD, etcetera, etcetera. What then started to happen was for our users, 65% of transformations on a send are written in SQL, 32% are written in Python.
Only about 3% of data pipeline logic is written in Java or Scala on our platform. And, obviously, we see Python continuing to climb. We even see folks like Snowflake add Python support via Snowpark, which is just awesome. And so what we were starting to see was more and more of our users were writing Python, and they were doing really cool things with it. We were building out more and more tools internally in Python. And then as we really made a big push about a year and a half ago, I think it was now, around integrating to, I think, an order of magnitude more external data systems. And so many of them were building out packages that wrapped their own JARs in CLIBs.
It was just becoming clearer and clearer for us that, you know, we really should invest and embrace this trend. So as part of that too, what we also did then was, in going through, we started to look at how we build our own tools. We wrote out an SDK for our users that is very Python centric. 1 of the few systems out there that has bidirectional sync of Python objects in it. So you can download your entire DAG in a send as executable Python, but if you execute it, will actually reapply it back up in a declarative manner. So we started to invest a lot more there. In doing so, you know, frankly for us, we just continue to find more and more momentum behind frameworks, behind libraries, and we try and do a lot of we do weekly tech talks inside of Ascend. We share a lot of these findings across the engineering teams as we go. Another interesting element to point to is what you were just saying with the weekly tech talks is the just general engineering culture and some of the ways that you approach
[00:27:14] Unknown:
how you think about the design and being able to popularize some of those design and architecture choices and helping people onboard into new code bases that either because they're new to the company or new to a particular project and, you know, some of the ways that you reduce the growth of kind of programmer silos or the kind of the resident expert in, you know, capability x of the software.
[00:27:40] Unknown:
Yeah. I agree. Like, we see, 1, more and more folks coming out of college these days are versed in Python. As we were talking about before the show, not a whole lot of us are, like, learning, you know, c, c plus plus these days. Travesty. But it's the reality. And so we're finding more people with Python. I would contend it as an interpreted language. It's also faster and nimbler At the command line, troubleshooting, triaging, debugging, we find it's much faster for folks as well. And so for people onboarding into the ecosystem, you know, we really like being able to stay pretty standardized on standard Python, standard frameworks, standard library, standard tools, it makes it much easier. We actually find because that ecosystem is so rich, our need to build abstraction layers is much less as a result.
And in doing so, I think it reduces the square footage required for us to have to support and maintain.
[00:28:34] Unknown:
The other question is the kind of ongoing maintenance where you have decided, Okay, we're going to standardize on Python. We have these existing systems. What are some of the ways that you're thinking about the overall life cycle of those projects that aren't already written in Python and their ongoing maintenance. And what is the trigger point where you say, okay. Now we actually have to completely rewrite the service because keeping it running as is is too painful. Or in other cases, I'm sure there are systems where you say, no. This particular microservice is completely stable. We have no need to make any changes to it. It is feature complete, so we're just gonna leave that running as is for the foreseeable future. Just some of the ways that you think about that kind of life cycle and, you know, balancing priority of paying down tech debt and bringing everything into a standard approach so that you don't have to go through these, you know, context shifts of, oh, now I'm working in Python to now I'm working in Go, and this is the Go that we wrote 3 years ago, so now it's completely different than any Go that I might be writing today.
[00:29:38] Unknown:
What? That never happened? Yeah. Totally. So we do try and find a balance. And I think there's never, like, any perfect formula we can run through. A lot of it becomes judgment calls. You know, we definitely have systems where, like, there's a couple of places where there's over micro serviced design, where, like, we have a separate auth n and separate auth c service. Maybe we shouldn't. Maybe we were just, you know, when I say we, I also will use we in the royal me sense at times. So maybe I was just overeager on microservicing those. There's a great whole meme video on YouTube about the over microservicing of of architectures. For some of those, honestly, they're pretty simple. Like, don't mess with it. It's there. Like, at some point, maybe we run quarterly hackathons. So if somebody's really, like, you know, has a bee in their bonnet about it, great. We'll replace it then. But if not, yeah, don't worry about it. There's others where, like, we feel pain and we want to remove it. We try and align that with specific product value. 1 of the measurements that we try and run is if you're going to do a pure tech swap, question we like to ask is, what is the breakeven?
Right? If it's like, hey, this thing's creating 4 hours of pain every week for me, but it's gonna take you, let's call it, 2 months to go build the next gen, so call it 40 days of work. Like, the breakeven is 80 to 1 on that. And so 80 weeks out, a year and a half to get a breakeven, probably not worth the while. As much as it's gonna irritate you, probably not worth the while, unless we can drive product benefit from it, so our customers see value. And so that's where we'll spend time and say, hey. Can we get the 2 birds with 1 stone? Like, is there something that we really wanna go build out this new feature?
And it's going to require a new service, and that service could actually consume this other services functionality too. And in the process of doing so, we can add add value. 1 of our other core values is 10 by a 100, build for 10 x, plan for a 100 x. You're saying like, hey. At some point, we wanna kill all these old systems, but we need to build for the next horizon too. And the the horizon is creating value for our customers and their developers and and their users. And so, can we create value along the way while reducing our debt and modernizing our system? And we look for the those win win designs. And while it may not always feel like those are readily apparent, our experience has been, more often than not, something there is available if you think creatively about how you approach this, and in ways where you can usually create value on a very short horizon along the way in
[00:32:15] Unknown:
flight of modernizing your architecture too. Another aspect of this type of undertaking is the grass is always greener problem where you say, oh, everything will be wonderful once we're in Python. And then as you start to make that transition, developers start to pine for the good old days of when I was developing in Go. And I'm wondering what are some of the things that people have noted that they miss from working in other language environments or some of the idioms that they're trying to cargo cult over into Python and some of the ways that you are trying to make sure that the Python that you're writing is idiomatic so that new people who come to the project don't have to try and figure out, okay, well, this says it's Python, but it looks like Go, so what is happening here?
[00:32:57] Unknown:
Go's concurrency model with channels and so on is is pretty awesome. There's a couple of times where we've been trying to do some highly concurrent stuff in Python where you're like, ugh. Okay. Like, not quite the same. So every once in a while, you'll find a few of those use cases where you're like, oh, wow. Maybe that Go would be better here. You try and keep it pretty contained. I would say, for the most part, we try and keep these very structured. We run heavy, heavy linters on all of our code. We do this pretty aggressively all the way up part of our CICD process, so, like, builds don't even kick off if you don't pass linters. And so we try and keep at least some basic structures in place. And then I'd say for the most part, you know, folks have uniformly been the sort of bandwagon of, like, wow, we really can go faster and things are easier.
Quarterly for us on the engineering team, we tend to be pretty outcome oriented. And so the faster we can create a really cool new feature and get it out to developers, the more motivated we are. And so that tends to be the the north star for us of just how fast can we build out new capabilities, and consistently have found that Python's a win of the day for us in that.
[00:34:07] Unknown:
As you have been going through this process of making the migration and investing further into the glue language, in a, in in a glue language, in a, in a product domain that is explicitly focused on gluing together other systems. What are some of the most interesting or innovative or unexpected ways that you have seen Python being leveraged in Ascend where you were previously using other languages or tools or frameworks?
[00:34:41] Unknown:
We're definitely finding increased usage. It's it's really become the default approach. Like, now, like, when we pop up a new tool, it's in Python. It used to be even in Bash, and I love Bash. Not for good reason. Well, maybe a 1 or 2 good reason, but, like, it's probably just an age thing, but I love Bash. And so I'm the master of sending people really gnarly bash scripts, like, oh, you just do it this way. And even for me, the thing that's broken me from that, you see random scripts now pop up in for tooling, and we do it more in Python. I'd say the thing that actually that tipped us over on that 1, we found Typer. It's 1 of the, like, my delight things that I even asked this in my interviews with engineers, which is like, what's like 1 of the things that just doesn't have to be, oh, we put, you know, rockets in orbit, but just 1 of the things that just delights you as an engineer where you see something and you're like, that's why we do this. And we came across Typer. 1 of the engineers found it. We're like, this is so cool.
It just makes writing command line interfaces so much easier, and it's just, like, caught on like wildfire. Probably not a good thing to use as a analogy these days. It's caught on a ton inside of Accent just because it makes things so much easier, and just the sort of cognitive burden and hurdle to get over to create a new tool now is smaller. And so we find people doing it more and using it more frequently just because it makes life easy. Not worrying about parts or, you know, even other more modern ones.
[00:36:10] Unknown:
As you have been going through that, what are some of the pain points and some of the risks and potential pitfalls that you are guarding against and anticipating as you continue down this path?
[00:36:22] Unknown:
SQL hand in hand. 1 of the things that we put in place as a a philosophy too with engineering as we were moving more Python esque or Python centric. And attached to this notion of ability to change our agility and our nimbleness are the defining characteristics of what we want to deliver on and and become as a team. You know, we also then said, hey, for example, we're not gonna do premature abstraction layers. We used to joke, you know, until you copy and paste something 3 or 4 times, don't go and create the abstraction layer. Like, copy and pasting something twice does not justify it. And so this is something that we are highly cognizant of. Hey. We know as we're creating these systems, and because we're moving so fast too, which is great, finding that right and perfect balance of, like, okay. So you copy that file, like, to, you know, create another page for something. Right? X y z. And as you start to modify it, wait until you're on the 3rd or the 4th pattern implementation of a related pattern.
Because at that point, you're not wasting cycles with abstraction layers too early, and you have enough data points to really figure out what is the healthy balance such that, you know, as a new user coming in, just trying to make changes, they don't have to ramp up on, like, 5 layers deep of abstraction. We'll just slow them down. And so we're trying to be really careful about the when to abstract, when not to balance. And it's accelerated because of the Python impact.
[00:37:48] Unknown:
Yeah. The waiting until you've done something 2 or 3 times is definitely a valuable lesson that I have taken to heart as well where, as you said, if if you haven't copied and pasted it and the that I might do, and so now I'm going to write a function, and I think these are the, you know, 5 arguments I'm going to need, you fall into the pit of, you know, you ain't gonna need it where you say, you build this function that can do all of these things, and then you realize 2 years later, oh, these 5 things, I'm actually only doing 2, and then I actually had to add a 6th parameter because the other 3 didn't do what I wanted it to do. Yep. And and at that point, like, you end up with this, like, Frankenstein
[00:38:29] Unknown:
with you a 1000%. I feel like so much of this goes back to too. Like, as engineers, we're oftentimes, you know, trained. Like, we wanna make sure that the code is great and deleting code is bad because you, like, got something wrong. Deleting code is the best thing ever. It's so cathartic. Like, it's fantastic. You're like, oh my gosh, I now see the pattern. It's really clear. I can simplify, and you should have a sawtooth like, your lines of code should be this, like, sawtooth style motion with both of the, like, micro teeth and macro teeth through your development cycle. And building on top of really great other external technologies should be some of those big macro bumps that let you see you delete a ton of stuff. Like, we do a lot of all of the comms inside of our Kubernetes clusters are SSL backed between every single node. Everything's super encrypted because we touch a bunch of the health care data and so on.
And as Kubernetes versions evolve, all of a sudden, there starts to be really cool new capabilities and cool capabilities with things like CEM as a technology, that networking layer inside of a case. All of a sudden, you're like, oh my gosh. We may not have to worry about this anymore. That sounds so cool. Like and and so for me, with the team, I'm like all I see in my head is, like, the big drop of code that, like, oh my god, so we just like delete all of this stuff and not have to worry about it anymore. Sounds fantastic. This is what we want to watch. Like lines of code should not be an up and to the right thing. Like, this should have huge drops over its life cycle. Yeah. I am in the process of migrating off of a
[00:39:56] Unknown:
framework that I'd been using for a number of years that was great up to a point, and then we hit the kind of logical ceiling of complexity that it can handle. And at around the same time, its community started to hit a downward trend. So we rebuilt on top of a collection of different tools. So it was our systems automation layer, and so we ended up moving to Pulumi, Pyinfra, and Packer. And recently, I actually went in and started cleaning out old code that we're not using anymore, and I ended up deleting about half of the repository, which was great. It's so good.
And so in your work of going through this process, aligning your development teams and your engineering around a single primary language, refactoring and rebuilding some of your existing systems, what are some of the most interesting or unexpected or challenging lessons that you learned in the process?
[00:40:46] Unknown:
I think we've learned a couple. I'd say probably the most important thing we learned is to align on your north stars, your first principles. Trying to have a language conversation without a how do we measure great code and more importantly, how do we measure success for us as a team? Very, very hard to do. As if you just get into language concepts, like, we can debate forever, ad nauseam, of Python versus Go versus Scala versus anything Lisp. And so the anchoring first on, like, there is no perfect language. Python's like really, really freaking cool. But like, let's let's assume.
There is no perfect language. And it may not actually even be a perfect language for any 1 company. There's, you know, really a best fit for particular jobs. And so what we did was, what's our north star that's going to direct us? And and this was the big lesson learned was, how do we define success? And for us, given our industry, our company size, our landscape, our customers. Success for us was speed, agility, nimbleness, and obviously, in the data world, of course, also ability to rapidly connect into all sorts of different systems. Like, there's not a lot of connectors written in Lisp or even Go by comparison to Python and Java to the data ecosystem. And so helps us be most successful there. I think without defining that, it is very hard to have a productive conversation with an engineering team around the rest.
Features, capabilities, very, very hard. And these are are religious debates as much as they are technical debates.
[00:42:25] Unknown:
As you have been investing more in Python, what are the cases where you've decided that it's the wrong choice?
[00:42:32] Unknown:
When we can't use Python 3? There's a handful of places. You know, the 1 that I would pick is like, at times, you have really deep, complex code and systems. We have some really incredible things written in Scala, for example. And I can interact with Python, but trying to rewrite all that in Python, low ROI, and in many ways, it's scalable for some of those data models is actually more effective. Similarly, I would say, like, really, really old or high concurrency systems, Go, Java. I think that that argument, except for specific companies and teams and use cases, diminishes over time.
If you look at most engineering organizations, you take headcount cost versus infrastructure cost, and it's usually a 10 to 1 of headcount cost versus infrastructure cost. So I would generally say optimizing for your engineers' time as a more economical approach than optimizing for server time. But there's certain use cases where you need, like, ultra low latency, high accurances around deliverability, etcetera. Those are probably 2. And then, I mean, honestly, ultimately, at the end of the day, people need to be effective in it. So you have a team and they just don't wanna do it, or you just don't like it, or you, like, really like compiled languages, it's just not a fit. It's personal preference at that point. Like, I like the belief structure of, like, we're all like, time is precious.
1 of the few resources we don't get to make more of for individually, so we should figure out, like, we're all hopefully engineers because we like solving the problems and we want to have macro impact on the world around us. That's 1 of the greatest benefits of technology. And so we ascribe to whatever helps you put the most points up on the board and and have the biggest impact out there, go find that tool. Use that.
[00:44:13] Unknown:
As you continue to direct and foster the engineering practices at Ascend, what are some of the things that you have planned as you move forward or some of the particular, you know, technologies or projects that you're excited to dig into and learn more about?
[00:44:29] Unknown:
So we're doing a launch right now. So we've been investing a lot in broader multi cloud systems and communications, especially when it comes to data pipelines. We're doing some really interesting work around multi tenant systems and shared Kubernetes clusters, which is really interesting when you have a developer platform and users can write and execute code. And so we have some really cool innovations there. And then I'd say the you know, we are starting to see some other really cool technologies pop up, even in around both Python and data ecosystem as well. You know, we're doing some work with Dash as a technology right now, which, again, Python based, basically, toss it to data frame. It can be through Aero. It can be through Pandas. It can be a Spark data frame, and Python controlled rendering and visualizations and React components.
Super cool. We're starting to work more with technologies like that, as well as actually DuckDp, doing some really cool work with, which is really fun.
[00:45:31] Unknown:
Are there any other aspects of the work that you're doing at Ascend or your experiences of aligning on Python or your overall practices of managing and supporting your engineering teams that we didn't discuss yet that you'd like to cover before we close out the show? I think
[00:45:47] Unknown:
the other huge benefit that we see that is really important, as I mentioned, we're all in technology, hopefully, for amplified and outsized impact on the world in which we live. You know, for us at Ascend, as an enterprise company, we provide our technology and platforms to other companies, and they build on top of that. And they get to do the cool stuff of changing the world around us in health care, finance, retail, media, you name it. And so 1 of the benefits we've also found in doing more and more with Python too is most of our customers, they're data engineers and they're analytics engineers. They speak Python and SQL. And so having tool chains that are aligned with customers makes it easier for us to share snippets, build modules, and have collaboration with our customers. We're all just a bunch of geeky engineers. Like, we're all just trying to build some really cool stuff on top of a really cool data platform. And so that collaboration, interactivity, because our end users are also developers, Python is a really great fit. And so that was 1 of the other, I think, cool learnings for us is it brings us closer into the world of our customers as well.
[00:46:54] Unknown:
Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And with that, I'll move us into the picks. On the subject of being geeks and liking to explore cool technology, I'm going to choose an app that I've been using for a while called Delver Lens, and it's a app for Android phones only. They don't seem to have plans for support for iOS, so sorry for you Apple users. But it's an app for being able to scan your Magic the Gathering card collection so that you can see what you have. It has a bit of a deck building capability built into it, and it'll also show you what the current value is of your cards and so you can track that over time. So I've been using that for a while just to keep track of my collection and add new cards as I get it. And 1 of my plans is to then export that into either 1 of the existing deck building apps, or my long term goal is to be able to actually use that as input to a machine learning project to generate deck embeddings so that I can feed it in information about winning decks from different tournaments and then feed in my card collection and have it spit out a recommendation of what deck I should build. So
[00:48:00] Unknown:
Oh, alright. Well, this is getting advanced here. I like that. Well, I'm really excited about that because as a fellow Android user, I always feel like we get most apps last. Well, okay. 2nd last, and this is very cool. As we were talking about earlier, I have all of my cards sitting in a box. So I'm gonna go start pulling these out and go check back through all my debts. As far as picks, I think there's a couple. You know, I mentioned 2 that are really neat. Kinda work from multiplicity and elegant stuff all the way up into higher level. Like, I'll say, you know, 1 of my favorite picks, I love Typer. It's just like, the team loves it. Internally, we love it. I love working with it. It's been super, super fun.
DuckDV, as I alluded to, this is a cool technology. We find that the performance is just off the charts for small datasets. And so when you're looking for, like, really interactive parquet files or CSPs or JSONs, but we mostly use parquet. Super, super, super cool. And then go, like, totally different end of the spectrum just to give other folks other stuff. I'll do a plug. I don't read very many books, but Amp It Up by Slootman, CEO of Snowflake, is a really great book. When we think about building teams, especially for impact and intensity, I really liked that book. I would highly recommend it to anybody wanting to lead a team.
[00:49:17] Unknown:
Alright. Well, thank you very much for taking the time today to join me and share your experiences of reorienting the technology stack for your business and the process that was involved in the decision making that you had to go through. So definitely very useful and, appreciate you taking the time to share your experiences and thoughts on that. So thank you again for taking the time, and I hope you enjoy the rest of your day. My pleasure. And thanks for having me, Tobias.
[00:49:44] Unknown:
Thank you for listening. Don't forget to check out our other shows, the Data Engineering Podcast, which covers the latest on modern data management, and the Machine Learning Podcast, which helps you go from idea to production with machine learning. Visit the site at pythonpodcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. And if you learned something or tried out a project from the show, then tell us about it. Email hostspythonpodcast.com with your story. And to help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.
Introduction and Sponsor Messages
Interview with Sean Knapp: Background and Career
Introduction to Python at Google
Ascend.io: Focus on Data Pipelines
Engineering Team Structure at Ascend.io
Technology Stack and Language Choices
Standardizing on Python for Developer Productivity
Frameworks and Tooling for Python
Engineering Culture and Onboarding
Life Cycle and Maintenance of Existing Systems
Interesting Uses of Python at Ascend.io
Lessons Learned from Migrating to Python
Future Plans and Exciting Technologies
Benefits of Python for Customer Collaboration
Closing Remarks and Picks