Summary
As software projects grow and change it can become difficult to keep track of all of the logical flows. By visualizing the interconnections of function definitions, classes, and their invocations you can speed up the time to comprehension for newcomers to a project, or help yourself remember what you worked on last month. In this episode Scott Rogowski shares his work on Code2Flow as a way to generate a call graph of your programs. He explains how it got started, how it works, and how you can start using it to understand your Python, Ruby, and PHP projects.
Announcements
- Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- Subsurface Live is the cloud data lake conference, a virtual conference where data engineers, data scientists, data architects, and data analysts can gather and hear about cloud data lakes and the data ecosystem. Subsurface Live Winter 2022 includes keynote talks from Bill Inmon, the father of the data warehouse, Author of Deep Work Cal Newport, and several more from companies such as Dremio, AWS, dbt, and more. Subsurface will also have many breakout sessions featuring Pandas creator Wes McKinney, Apache Superset & Airflow creator Maxime Beauchemin, and engineers from Apple, Uber, Adobe, Bloomberg, and more. Meet other data professionals and learn about the data technologies and practices helping companies meet their current and future data needs. Register today at pythonpodcast.com/subsurface
- Your host as usual is Tobias Macey and today I’m interviewing Scott Rogowski about Code2Flow, a utility for generating "pretty good" call graphs for dynamic languages
Interview
- Introductions
- How did you get introduced to Python?
- Can you describe what Code2Flow is and the story behind it?
- What are some of the ways that a program’s call graph might be used?
- How does the visual representation generated by Code2Flow help with exploring the structure of a project?
- What are some of the alternative approaches/tools that might be used to gain similar insights?
- What do you see as the overlap in utility between Code2Flow and e.g. SourceGraph?
- Can you describe how the Code2Flow project is implemented?
- How have the design and goals of the project changed since you first began working on it?
- Given that Code2Flow is implemented in Python, how have you managed the parsing/processing of the other languages that you support?
- Visualizing a complex program can quickly become very messy. How have you approached the layout of the output to enhance comprehension?
- What are some of the situations where Code2Flow will be unable to provide a full picture of a program’s call graph?
- What are some of the pieces of information that are unavailable due to the static analysis approach that you have taken?
- Can you describe the process of applying Code2Flow to a project?
- Once the structure is on display, what are some next steps that an individual or team might take to analyze and act on the information?
- Given the static nature of the output, how might Code2Flow be incorporated in a CI/CD system to provide insight into the evolution of a projects structure?
- What are the most interesting, innovative, or unexpected ways that you have seen Code2Flow used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on Code2Flow?
- When is Code2Flow the wrong choice?
- What do you have planned for the future of Code2Flow?
Keep In Touch
- Website
- scottrogowski on GitHub
Picks
- Tobias
- Taking Vacation
- Universal Studios, Florida
- Scott
- Service work
Closing Announcements
- Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
Links
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to Podcast Dot in It, the podcast about Python and the people who make it great. When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode. With the launch of their managed Kubernetes platform, it's easy to get started with the next generation of deployment and scaling powered by the battle tested Linode platform, including simple pricing, node balancers, 40 gigabit networking, dedicated CPU and GPU instances, and worldwide data centers.
Go to python podcast.com/linode, that's l I n o d e, today and get a $100 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show. Subsurface Live is the Cloud Data Lake Conference, a virtual conference where data engineers, data scientists, data architects, and data analysts can gather and hear about cloud data lakes into the data ecosystem. Subsurface live winter of 2022 includes keynote talks from Bill Inman, the father of the data warehouse, author of deep work Cal Newport, and several more from companies such as Dremio, AWS, DBT, and more.
Subsurface will also have many breakout sessions featuring Panda's creator, Wes McKinney, Apache Superset, and Airflow creator, Maxime Beauchmann, and engineers from Apple, Uber, Adobe, Bloomberg, and more. Meet other data professionals and learn about the data technologies and practices helping companies meet their current and future data needs. Learn more and register for free today at pythonpodcast.com/subsurface. Your host as usual is Tobias Macy. And today, I'm interviewing Scott Rogowski about Code 2 Flow, a utility for generating pretty good call graphs for dynamic languages. So, Scott, can you start by introducing yourself?
[00:02:01] Unknown:
My name is Scott. In the past, I've been a full stack web engineer and a data engineer. Most recently, I led the analytics services team at a midsized accounting software company called Workiva. Currently, I'm on a break from programming, and I'm actually talking to you from far eastern Colombia near the border with Venezuela. I'm here with the organization called On the Ground International, and we're basically just working to address the Venezuelan refugee crisis. So I've been here for almost 2 months now, but I'll actually be leaving just next week. When I get back, I do plan to return to the software world, and I plan to split my time between CodeDeploy, which is what we are talking about today, and Mangita, which is my other project of some popularity, just to briefly introduce that. So that's a small implementation of MongoDB, which is meant to be a local dependency free drop in replacement for when you just don't wanna install MongoDB or can for some reason. But today, we are talking about CodeDeploy.
[00:03:08] Unknown:
Yeah. Definitely some interesting work that you're doing there. I could definitely see the appeal of having a not MongoDB implementation of MongoDB for local testing, but I'm also curious about your ventures in Colombia if that's something where you're actually using some of your software development capabilities to help with that endeavor or if it's some other skill set that you have that you're using to apply down there or if it's just abundance of goodwill and energy?
[00:03:36] Unknown:
I think it's the last option there. I'm certainly not using any of my software skills here. I'm helping them a little bit with some basic web design. But, no, essentially, I just wanted to take a little break and find a way to connect with people on a 1 to 1 level. But I am the sort of person who does like dealing with systems and software. So it is a break, any sort of a career change, but I am thoroughly enjoying my time here, and I really like the work we're doing. Well, it's definitely always great to talk to people who have that kind of drive and
[00:04:10] Unknown:
energy to be able to dedicate to helping other folks in whatever fashion that might take. So definitely glad to be able to speak with you and glad that you're able to take the time away from your work down there to speak to me about CodeDeploy.
[00:04:23] Unknown:
Yeah. Absolutely.
[00:04:24] Unknown:
And do you remember how you first got introduced to programming and how that led you to Python?
[00:04:29] Unknown:
Yeah. So I've been programming since a super young age. I remember actually starting on the PI 83 calculators back in 8th grade. Basically, we used to trade games on these calculators. Being what it was, you could actually go in and, like, view the source code of what was in these games. So I decided 1 day to basically just I wanted to make my own game. So I looked at a program that was already on my calculator. I just kinda figured out how to do it through some horrid programming that was a lot of go to's and labels, but it worked. And that was my intro into the programming world. For me, Python came much later near the end of college.
So I was the treasurer of the university hiking club, and we had this weird system at the time of getting payment for people who wanted to go on hikes that involved this complicated spreadsheet. So me being the person who was the CS major in our group, I was tasked with fixing this and making it better. So I had heard that Python was an easy language for these sorts of things, basically, just gluing different things together. So we had to connect a payment processor to a spreadsheet and also generate some sort of a confirmation page. So all these different APIs working together, and Python works great for it.
From there, in every job that I've had, I've been using Python ever since. I still write JavaScript
[00:06:01] Unknown:
here and there and c plus plus when I need something really, really fast, but I do maybe 90% of my programming in Python. At some point in that journey, that led you to building this Code to Flow program that we're talking about today. So I'm wondering if you can describe a bit about what it is that you're building there and some of the story behind how it came to be and why you decided that you wanted to put in the effort to bring it into reality.
[00:06:23] Unknown:
So CodeDeploy is a way to generate call graphs, which you can also look at as flowcharts from your source code. So when you have a program and you have functions calling other functions and you want a map of that to broadly understand the structure of your program, this is what you use CodeDeploy for. CodeDeploy works on dynamic languages and currently supports Python, JavaScript, Ruby, and PHP. The origin story is pretty interesting. So it was now 9 years ago, and I had to look back through the commit history to figure out exactly when I started on this. But at the time, I was about 1 year out of college, and I was working on this project called Event Tsunami.
Event Tsunami was 1 of these maybe 1,000 failed attempts at what I view now retrospect as the most common and maybe worst idea in the history of startups, which is an event aggregator. So this is something that still doesn't exist today if you're curious about what's happening downtown, what cool things there are to do. There's not really a place to find this sort of thing. So, anyways, I was young when you're out of college, and, I was writing this thing, and I had this very long JavaScript file. And there was a point where I looked at this 1 day and said, I don't understand this anymore. I would like to try to understand the structure, how everything is connecting together. So 1 day, I spent maybe 2 or 3 hours with a series of regular expressions writing this thing that pulled out all of the functions and all the function definitions and connected them together.
And I saw a graph at the end of this, and I thought that was really cool. So I showed my roommate. He thought it was really neat as well, and I decided to put it online. I didn't think that would come of it, but I thought that it was worth a shot, and it proved to be a little more popular than I expected actually. But either way, that was 9 years ago. So fast forward to last year, actually, and I'm contacted out of the blue by a small Japanese startup called Cider, and they wanted to use CodeDeploy for what they were doing. So I thought that this was a good idea. They offered to pay me money for this.
But at the time, everything was horrid with this. I was 1 year out of college when I wrote the original version, and there were really strange things that I was doing. Like, instead of using abstract syntax trees, I was counting the characters of indentation in order to try to determine how large a Python function was and try to get the scope of this. So I was doing all these weird things that you would only do when you're straight out of college, and I asked them if they would let me rewrite it as well. And they said okay. So I went ahead and rewrote it, and it's in a very good place now, I think. But I'm happy about it, and it's been growing pretty steadily ever since then.
[00:09:34] Unknown:
And so you mentioned that some of the original impetus for building this project was to be able to understand some of your own code. And I'm curious if you can talk to some of the ways that being able to generate a call graph will help the software development process and some of the types of insight that you can gain by being able to convert the raw source code into some visual representation to be able to explore it in that manner?
[00:10:00] Unknown:
Yeah. Absolutely. So to back up on that question a little bit, I think that humans are primarily visual people. And source code is this weird agreement that we've made with the computers that says that we will write this thing that you can kind of understand and we can kind of understand. And this is the agreement we have, and this is how we're gonna communicate together. But I think that we, as humans, really like the idea of being able to understand these complex programs through the use of flowcharts, for example, or other visual representations. So Code 2flow produces this, and I view it as being most useful for new developers.
For people who are just getting started with a large code base when you're coming into a company and you have no idea what anything does in the codebase. I think that that's the time to use CodeDeploy to try to get this broad, high level overview of how everything fits together before you dive in a little deeper.
[00:11:14] Unknown:
In terms of the visual representation of it, I'm wondering if you can talk through some of the ways that you're able to convert these semantic and structural components of the software project into visual elements to be able to help somebody tie together what the actual logical flow is through the program where you say you've got function definitions, you've got class definitions, you've got instantiations of classes, executions of functions, parameters passed to functions, and being able to convert all of that into some visual diagram to be able to say, okay. These are the different pieces. These are how they interact. Now I want to be able to follow the logical flow from this function definition through to where it's called, through to what the output might be, and just some of the ways that you've thought about that sort of mapping between the semantic and the visual components of it?
[00:12:09] Unknown:
So the basic units of CodeDeploy is the function. And within every function, you have a variety of function calls that map to different functions. There's also on the outside of every function, there's usually some namespace, some class representation that you can kind of contain a function into. So the hope is that by looking at it from this level, from the sets of 1 function calling another function, you can get the structural representation.
[00:12:48] Unknown:
In terms of the sort of utility of a visual flowchart of this call graph for the purposes of being able to explore and and understand the code base. I'm curious how you think about that in comparison to projects such as Sourcegraph that is taking more of an indexing style approach of being able to say, okay. I've got this function. Now I wanna see all the places that it's called and being able to jump around a project from the very sort of semantic and structural aspect of it versus this visual representation and just some of the differences in terms of the utilities or the use cases that each of those approaches might provide?
[00:13:27] Unknown:
On the high level, they're all developer tools. And I think that you wanna use the right tool for the right job. Sourcegraph has a job in terms of jumping around to projects, being able to go quickly from lines of code to lines of code. And where Code 2flow fits in is understanding the high level representation.
[00:13:54] Unknown:
In terms of being able to understand that high level representation, once somebody has been able to grasp, okay, these are the different objects that exist in this semantic graph, What are some of the next steps that they might take to be able to say, okay, I now know that, you know, this is a function that is tied to 5 other places based on the lines that are connecting to it in this representation and then saying, okay. Now I'm going to dig into this function and understand more about the logic that's embedded within it. And maybe, you know, from this visual representation, I can identify some potential opportunities for refactoring or further exploration and just some of the sort of next steps somebody might take after they have that initial visual exploration of the project?
[00:14:39] Unknown:
So I am agnostic about how developers use CodeDeploy specifically, but there are a couple of obvious use cases and next steps. Once they have this visual representation, they can then proceed onwards. The first is we've talked about is obvious, visualizing the internal structure, understanding that high level overview. The second, maybe a little less obvious, is finding orphan code. So this is something that is especially from the perspective of a language like JavaScript where you can often have code that is being bundled and being shipped to the end user, to the client side browser, but is not being used for anything.
And when you have code like this, it literally slows down your website. But when you have code that's in any language that's not being used by anything, this is extra cognitive load for the developers, and it's best to eliminate this when possible. So Code 2flow is actually useful in finding this orphan code and determining that, okay, nothing actually connects to this thing, and we can delete this code. The other use case that I'll mention is being able to analyze the risk of changes that you might make. So if you have a pull request that touches 1 function and you're a little bit concerned about it as you might be if it's a system that's in some way business critical.
To be able to see the functions that are up stream and downstream of this function and try to understand what actually is the risk to the overall business based on this pull request. I think that that's also a useful use case of CodeDeploy.
[00:16:43] Unknown:
And so in terms of the implementation of CodeDeploy, I'm wondering if you can describe a bit about some of the architectural and design elements that you have built into it now that you've had the opportunity to learn from iterating on it for a few years with this regex focused approach and being able to revisit it with this paid engagement that has allowed you to rearchitect it from the ground up.
[00:17:07] Unknown:
The difference now is that well, for 1, the code is all better. And it's hard to describe what better is except for it's the results of 10 years of additional engineering experience. But to speak to the architectural element specifically, I have it broadly divided into 3 files. So the first file is called engine. Py, and that's what starts everything off and is universal amongst all the languages. And that file will call various functions. I believe that there's 5 of them in each individual language file. So the tricky part has been managing the differences between each language because as you've noted, all of these 4 languages are very different.
So they all do their AST. Typically, they all have different rules for scope or doing the function calls, all of these things. So attempting to abstract away the interface so that you have this universal interface to each language, and then each language handles the individual details has been a challenge. So to clarify, the second file that I was discussing in that 3 file hierarchy are the individual files for each language for there's a Python dot py, JavaScript dot py, Ruby dot py, PHP dot py. The 3rd primary file in this hierarchy is just called model.py, and that's where I handle all of the internal things related to what is a function call, what is a variable, what do all these things mean?
[00:19:00] Unknown:
In terms of the actual target languages, I'm wondering what have been the driving forces to which language targets you decided to actually implement and focus on and sort of curious what types of projects have introduced the different requirements that led you to those different run times.
[00:19:20] Unknown:
Yeah. So the original languages were just JavaScript and Python. And the only reason for that was because I worked primarily in those 2 languages, and those were the ones that I was most interested in. Moving on from there, when CIDR had contacted me to update this project, they had requested the 2 additional dynamic languages that we now have, which are Ruby and PHP. Someday, I would like to add more languages. But for the moment, these are essentially the most used dynamic languages in the world. And if anything, I think that I would move into static languages yet and maybe try to pull in something like Java, Erlang, Rust.
[00:20:07] Unknown:
Given the fact that you are focused on being able to support generating these call graphs for these various dynamic languages, I'm wondering what are some of the edge cases that you've had to deal with across these language run times, how you've approached the actual implementation of being able to generate and parse the respective ASTs and how the respective ASTs structure themselves and sort of what that lowest common denominator ends up being across those run times?
[00:20:38] Unknown:
So I think that that's a couple of different questions. The first 1 I'll address is the ASTs. So every language AST is different. And, essentially, how I get the ASTs from each individual language is to use their respective parsers. So I generate a system call to pull in the ASTs, and they're usually in some sort of a JSON or JSON like representation. And then from there, you can handle them in very similar ways, though each AST is different and each AST requires different approach. When I was working on this originally, my strategy, because there are a ton of edge cases when you're doing this sort of thing, was to pull in increasingly more complex features of a language or simply just increasingly more complex projects.
And 1 by 1, handle the edge cases and check visually that the call graph was what was actually happening in the individual projects.
[00:21:58] Unknown:
And then as far as being able to turn the AST into a visual representation, what are some of the tweaks that you've had to make to be able to convert them into a common representation that you can then feed into that engine that will spit out the visualization
[00:22:17] Unknown:
from it? So that in itself, if I'm understanding your question correctly, has not been especially difficult. The common representation that I have is to have, I believe, 3 different kinds of units. So you have the node, which is essentially a function. You have the edge, which is a function call, and you have the group, which is a namespace or a class. So everything gets converted into these 3 elements. And from there, you can write that directly to this intermediate file format that graphviz uses. And graphviz is the tool to actually generate the visual graph. So once it's in this intermediate representation, you pull it into graphviz. Graphviz does its calculations. It determines where it would make the most sense to put every individual node, every individual group in order to make the graph as readable, and as clean as possible.
[00:23:26] Unknown:
Because of the fact that you are using graphviz for that visual element of it, I'm curious. What are some of the tweaks that you've had to make to be able to make that output useful where it's a static image and so you need to be able to make sure that it is uncluttered enough to be useful but informative enough to be representative of the program and some of the ways that you've thought about the constraints of that static representation of that sort of point in time representation of the program, how to structure that in a way that is relevant to the sort of tasks that somebody might be trying to achieve by virtue of using CodeDeploy to generate that representation?
[00:24:13] Unknown:
So there's 2 parts to that question. The first is that Graphviz does a lot of work for you. So Graphviz does have this internal algorithm where you can feed it any sort of directed graph, and it's going to be able to place things in as uncluttered of a way as possible. But the second part of that is that no matter what, when you have projects of greater than trivial complexity, the output can get very messy very quickly. So what I've done with that is implement a couple of command line arguments so that you can zoom in a little bit more into exactly the parts of the code that you're interested in.
And I'll do 1 that's simple to explain and 1 that's a little more complex to explain. So 1 that's simple to explain is if you just want to exclude namespaces, if there are classes you don't care about, if there's functions you don't care about, there's command line arguments for that. So just to pull out the things that you are very sure you don't care about. But if you're sure that you actually care specifically about 1 thing, I have a command line argument that's called target function. So you can zoom in on this 1 very specific function. And then with that command line argument, there's 2 more arguments, which are downstream depth and upstream depth.
So if you zoom in to, say, my fortune a and you have upstream depth of 2, you will be able to see 2 levels up everything that is calling this function. And if you do downstream depth of 3, you can, in the same way, see 3 levels down what's happening there. And after I implemented that, the graphs became a whole lot cleaner and a whole lot easier to read.
[00:26:10] Unknown:
And then as far as being able to actually apply code to flow to a given project, I'm curious if you can talk through some of the workflow of saying, I have this project. I want to apply code to flow to it to be able to visualize the representation. What are some of the limitations in terms of the scale and complexity of the project as far as the utility of CodeDeploy and its look ability to it, and how you might scope the specific execution of CodeDeploy to be able to componentize the different elements of that project if it is at a sufficient scale where running it across the entirety is infeasible?
[00:26:50] Unknown:
So CodeDeploy by itself is easy to apply to a project. It's not a complex installation process. So just do hit install code to flow. If you're using 1 of the 3 languages that isn't Python, you might have to install some dependencies for those as well, and that's well documented. And then all you do is do code to flow path to either my project directory or path to my individual file. And it will run. And then you will be able to see the visual representation of your code. You can dig in a little bit more with the command line arguments that we talked a little bit. But in terms of just applying it to individual parts of your code to try to get a more focused overview.
I find that it is very helpful to apply it just to individual modules, just to individual files if you can, rather than project as a whole. As far as
[00:27:53] Unknown:
the dynamic nature of the languages that you're targeting. I'm curious. What are some of the limitations in terms of what Code2flow is able to interpret and visualize, particularly when the program author starts delving into things like metaprogramming or monkey patching or some of these ways that dynamic languages can be very powerful, but also can very easily be abused and start to become too clever for their own good?
[00:28:24] Unknown:
Sure. So that's a good question, and it really does touch at a key points of what CodeDeploy is. So the slogan of CodeDeploy that I have on GitHub is pretty good call graphs. And the important distinction there is that it is not perfect call graphs. And what CodeDeploy is doing is using a lot of clever heuristics to get to where it's going. There are a couple places where not just the lack of heuristics, but just the nature of dynamic languages like Python makes it very difficult or impossible to generate perfect call graph, and I'll go over 2 of those here. So the first is when you have factory functions. So the example would be is if you have a factory that takes a single parameter and depending on that single parameter returns 1 of 2 functions, either function a or function b.
So let's just say for simplicity that parameter is a boolean. And if the boolean is true, it returns function a. If the boolean is false, it returns function b. You do not know until run time what's the value of that Boolean is, and therefore, you don't know in what way that's gonna connect. During the design phase of this, I had the option of doing 1 of 2 directions, which was either put both functions in and say that this particular node connected to both function a and function b, or just say that it connected to neither. And I decided that the safe thing to do was to say that it connected to neither in this case.
The second example that I'd like to share is when you have a dictionary of functions. So in my view, this is a very good programming practice when depending on some string, you pull out a certain function to use for a certain case. I see it most often in very clean code. But in this case, again, you don't know what function is coming out when you're actually pulling it out until you're actually in runtime. So that is the limitation of this static approach. But, again, you get 90% of the structure of a project. And as far as generating this understanding and helping programmers along with their task, I believe that the 90% is very helpful.
And I believe that this is actually the reason why we don't see more things like CodeDeploy because a lot of programmers get scared away by this sort of thing and say, well, if it's not a 100%, we shouldn't do it. But I do believe that having a little bit there and it's not a little bit, it's the majority of the project. Having the majority of the project there is incredibly helpful, especially if you're a new developer.
[00:31:21] Unknown:
Given the fact that you're focused currently on a static analysis model, I'm curious, what are some of the additional types of information that you might be able to pull in and some of the ways that it might enhance the visual representation of the code structure if you move to a profiling based model or something similar to Pylint where you're actually executing the code to be able to analyze these runtime requirements where you're saying before about not being able to determine at runtime whether it's linked to function a or function b, just some of the additional pieces that are useful based on that runtime information versus just this static analysis approach.
[00:32:00] Unknown:
Yeah. Absolutely. And I think that you touched on a good point that I'd also mentioned, which is that without being able to see what is happening at the run time, it's very hard to see these I don't know if I'd call them edge cases, but I'll just call them edge cases for the sake of arguments. And it's certainly something that I would like to do is to move to a runtime approach. I believe that there are still benefits to the static approach, specifically in showing the paths that are not traveled very often in addition to the so called happy paths. And when you are a developer and trying to figure out what is happening or trying to address a certain bug, That bug is probably there because somebody did not follow the happy path in a weird way. So being able to see the road west followed, just to channel Robert Frost there, I think is very important. I think it's not necessarily something that you get from the runtime analysis.
That being said, there's no reason why you can't do both at the same time and get a more complete picture in general.
[00:33:18] Unknown:
If you were to move to that runtime based approach where you're actually executing that code, I'm wondering what are some of the additional complexities that that would bring to you as a maintainer and some of the challenges that that might introduce for some of the end users where they're expecting that the code is just going to be analyzed statically and maybe they need to worry about some of the sort of reentrant characteristics of the program.
[00:33:45] Unknown:
Yeah. Sure. So, clearly, the addition of a runtime component to CodeDeploy would make it more complex both from a maintenance perspective of CodeDeploy and from a developer perspective. That they would have to go in and actually run their program. And depending on whether what you're doing is actually some sort of a production system, it might be prohibitively difficult as well. But this is the sort of thing that you can add on incrementally and basically gets increasingly better views of what's actually happening with your program. The more times you run it, the more corner cases you run it on.
And just to clarify, this is a fully hypothetical thing that we're talking about here. This is not the current behavior of CodeDeploy, but I do think that it would be an interesting path to go down, and it's certainly a path that I intend to explore as I continue to develop it.
[00:34:48] Unknown:
Another interesting element of Code 2flow, particularly in its current state of being a static analysis, is that it gives you this snapshot in time of what the logic of the program looks like. And I'm curious what you have seen as either some existing applications or some potential implementations of ways that somebody might incorporate it into a CICD workflow to be able to view the evolution of the program structure over time as people make the changes and introduce new capabilities and features?
[00:35:21] Unknown:
Sure. So I've not seen any specific examples of this actually being done, but I do believe that by being able to view the evolution of a program over time, you get a couple of benefits. The first is just kinda cool that you can see how what you're doing has gotten bigger and more complex. The second is you can see how things used to work and how things change. But I think that's the core of the question from the perspective of a CICD is how that could be useful. And I think that it is this potential ability to see the delta between what was there before the commit that you're doing and what the commit has changed in terms of the structure.
And there's a couple ways that this can be used. 1 that I can think of is as essentially like an early warning alert that something big is happening in this pull request. It's changing the structure of the program in these very important, very key ways, and maybe you should take a closer look at this sort of thing. I assume that there's more use cases beyond that, but I've not thought about that in time.
[00:36:42] Unknown:
And so in your experience of building code to flow and then reimplementing it and rearchitecting it, I'm curious what are some of the most interesting or unexpected or challenging lessons that you've learned in the process?
[00:37:00] Unknown:
To actually put out the PNG JPEG outputs of CodeDeploy. And 1 interesting thing that I've come across with graphviz is that depending on the size of the output that I'm giving it, it can run either in milliseconds or it can run until the end of time. So what I've had to do is depending on the specific complexity of what I'm getting it, pass it different options in order to have it run a little bit faster at the cost of a slightly prettier graph. And that's taken a lot of experimentation to get that balance correct. But I think that that's 1 thing that I was not anticipating that ended up becoming the case. Because with these sorts of things, you expect it to be all the hard problems that give you pause and, like, make the development process really difficult.
But, ultimately, it ended up being this third party dependency that I have
[00:38:03] Unknown:
that's as I'm just passing it too much stuff. I've needed to play a little bit more nicely with it. Yeah. It's always interesting how you expect some big complexity and, you know, difficult challenge, and that's actually the easy part. And it's really just these little niggling details that end up taking up all of your time.
[00:38:23] Unknown:
Absolutely. No niggling details.
[00:38:26] Unknown:
In terms of the sort of utility of CodeDeploy as this method of exploration and understanding of a program structure, what are the cases where CodeDeploy is the wrong choice and you might be better suited with something like profiling or some other visualization element or something like source graph that has more of this sort of functional indexing capabilities?
[00:38:48] Unknown:
So I don't think that Code 2flow is ever the wrong choice. I think that it's 1 tool in your toolbox, and it is the first step in what you're doing. Maybe not even the first step. Maybe in the top 10 first steps. But it does what it does, which gives you the high level overview of what you're doing. And I think that its limitations as far as understanding what's actually happening in the program is that it doesn't actually capture business logic or to your point, runtime. But for business logic, this is often a very important part of what a program actually does.
And I think that in order to dig into something like that, there's not really a way around actually looking at the source code, spending a lot of time with it, and understanding it. And I think that CodeDeploy is not going to be able to do that. And I can't think off the top of my head of any tool that makes that particular step any easier.
[00:39:56] Unknown:
In terms of the near to medium term future of the project, I'm curious about some of the next steps that you have in mind or any project that you're particularly excited to dig into to help improve it or add new functionality or or any ways that you are looking for help and feedback on the project?
[00:40:16] Unknown:
Talk about the future first. So what I would like to do with it in the immediate term is make CodeDeploy interactive. So we've talked a lot about the static nature of the outputs, the limitations of that, and also the overall complexity of potentially having a large graph. And while there are ways to limit this large graph and to make it a little more manageable, I don't think that it's necessarily the right approach. So what I would like to do in the near term is build a GUI for CodeDeploy so that instead of the static output, you have this interactive tool where you can click and scroll through your program and zoom in and out and just have it as more of an exploratory conversation rather than this generated artifact. As far as rather than this generated artifact.
As far as where help might be needed, I certainly think that languages are the obvious answers. I think that I've cut across a large part of the programming world with the 4 languages it does support, but certainly not all of it. And I'm not even sure whether that would be half of it. So adding more languages, I talked a little bit about Rust that's becoming popular in many circles, but there's no reason that CodeDeploy can't also manage static languages. Rust, c, c plus plus Java, things that people use every day.
[00:41:53] Unknown:
I'm curious what you see as some of the steps or modifications that would be needed to be able to add support for some of these static languages and just some of the ways that you have architected the project to allow for this extension into other language runtimes rather than being tied to the existing ones that you've already targeted?
[00:42:15] Unknown:
I don't think that it would be especially difficult to add a static language to CodeDeploy. I think that going the other direction would be far far more difficult to have a tool that was designed for static languages and move to dynamic languages because what you're looking at when you're looking at the structure of a static language is a dynamic language, but with a lot less gotchas, in my opinion. So I do believe that the same method that I've used for the 4 dynamic languages could be applied to static languages.
[00:42:51] Unknown:
Absolutely. Alright. Well, for anybody who wants to follow along with you and get in touch, I'll have you add your preferred contact information to the show notes. And so with that, I'll take us into the picks. This week, I'm going to pick taking a vacation because it's the first time in many years that I have actually taken a real vacation, and that's where I'm recording this episode from. So definitely recommend that for folks who have not taken the opportunity. Personally, I am currently taking the week at Universal Studios Florida, which has been quite a lot of fun. So if you have the time and the means to be able to do so, I recommend checking that out. And so with that, I'll pass it to you, Scott. Do you have any picks this week? Sure. I think that my pick and I thought about this for a little while after listening to a couple of your earlier versions.
[00:43:38] Unknown:
But what I'd like to add is that service work has been valuable in a way that I didn't initially anticipate. I'm really happy to be here, and I think that I'm growing a lot as a person by working with people. And I am excited to see how I'm going to be able to apply this back into the software world. And I'm not sure how that's going to work yet, whether it's gonna be in terms of a different way of thinking or a different way of communicating with people because, fundamentally, software is a collaborative business.
We do things like this all the time. So I think that it is undervalued as a thing to do to go on service projects.
[00:44:21] Unknown:
Absolutely. Well, thank you very much for taking the time today to join me and share the work that you're doing on CodeDeploy and for the time that you're spending doing that service work where you are right now. So definitely excited to be able to speak to you and understand more about the project that you've built and some of the ways that it can be applied to some of the projects that I've been working on and other folks have been working on. So definitely glad that you have taken the time to share that with the world, and I hope you enjoy the rest of your day.
[00:44:49] Unknown:
Thank you. You as well.
[00:44:53] Unknown:
Thank you for listening. Don't forget to check out our other show, the data engineering podcast at data engineering podcast.com for the latest on modern data management. And visit the site of pythonpodcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email host at podcastinit.com with your story. To help other people find the show, please leave a review on Itunes and tell your friends
[00:45:23] Unknown:
and coworkers.
Introduction and Sponsor Messages
Interview with Scott Rogowski: Introduction and Background
Scott's Journey into Programming and Python
The Genesis of CodeDeploy
Benefits of Visualizing Code with CodeDeploy
Technical Details and Challenges in CodeDeploy
Handling Different Languages and ASTs
Applying CodeDeploy to Projects
Potential Future Enhancements
Lessons Learned and Challenges Faced
Future Plans and Community Involvement
Conclusion and Picks