Summary
Every software project needs a tool for managing the repetitive tasks that are involved in building, running, and deploying the code. Frustrated with the limitations of tools like Make, Scons, and others Eduardo Schettino created doit to handle task automation in his own work and released it as open source. In this episode he shares the story behind the project, how it is implemented under the hood, and how you can start using it in your own projects to save you time and effort.
Announcements
- Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- Your host as usual is Tobias Macey and today I’m interviewing Eduardo Schettino about Doit, a flexible and low overhead task automation tool
Interview
- Introductions
- How did you get introduced to Python?
- Can you describe what doit is and the story behind it?
- What are the main goals and use cases of doit?
- Can you describe how you approached the implementation of Doit?
- How has the design changed or evolved since you first began working on it?
- The realm of task automation tools for developers is an exceedingly crowded one, with each tool prioritizing certain use cases. How would you characterize the position of doit in the current ecosystem?
- How does it compare to e.g. Click, Invoke, Typer, etc.?
- What is your guiding philosophy for when and how to add new features?
- You have been running the project for ~13 years now. How has the evolution of the Python language and ecosystem influenced your approach to the development and maintenance of doit?
- What is the workflow for getting started with doit and integrating it into your development process?
- For every project there are some tasks that are identical and some that are bespoke for that application. What are the options for maintaining a standard set of tasks across repositories and composing them with per-project activites?
- What are some of the useful patterns that you and the community have established for designing tasks and execution graphs?
- How do you use doit in your own work?
- What are the most interesting, innovative, or unexpected ways that you have seen doit used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on doit?
- When is doit the wrong choice?
- What do you have planned for the future of doit?
Keep In Touch
- schettino72 on GitHub
Picks
- Tobias
- The Matrix series
- Eduardo
Closing Announcements
- Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
Links
- doit
- Zope
- Twisted
- Django
- Pyflakes
- scons
- Make
- Nikola
- Nose
- Pytest
- Click
- Typer
- Invoke
- Puppet
- Ansible
- Chef
- Sphinx
- Snakemake
- Airflow
- Luigi
- pytest-incremental
- import-deps
- dbm
- MetalK8s
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to podcast dot in it, the podcast about Python and the people who make it great. When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode. With the launch of their managed Kubernetes platform, it's easy to get started with the next generation of deployment and scaling powered by the battle tested Linode platform, including simple pricing, node balancers, 40 gigabit networking, dedicated CPU and GPU instances, and worldwide data centers.
Go to python podcast.com/linode, that's l I n o d e, today and get a $100 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show. And do you want to get better at Python? Now is an excellent time to take an online course. Whether you're just learning Python or you're looking for deep dives on topics like APIs, memory management, async and await, and more, our friends at the talk Python training have a top notch course for you. If you're just getting started, be sure to check out the Python for absolute beginners course. It's like the 1st year of computer science that you never took compressed into 10 fun hours of Python coding and problem solving.
Go to python podcast.com/talkpython today and get 10% off the course that will help you find your next level. That's python podcast.com/talkpython.
[00:01:34] Unknown:
And don't forget to thank them for supporting the show. Your host as usual is Tobias Macy. And today, I'm interviewing Eduardo Scattino about DoIT, a flexible and low overhead task automation tool. So, Eduardo, can you start by introducing yourself? My name is Eduardo. I'm from Brazil. For the last 10 years, I've been living in China. And do you remember how you first got introduced to Python?
[00:01:56] Unknown:
My background was studied elect engineering, and I was mostly a c, c plus plus programmer. And then at some point, I started to try out different languages, and then I heard first about Zope. And then I tried out, and then I started using Python for some small scripts. And then on a job, it was not related to Python, but I set up, like, a roundup, like, the bug tracker that Python used today. And then I decided my next job must be Python. And then after that, I just applied for Python jobs. And then I landed in web development with Twist and Django. And then I never looked back from there, just Python.
[00:02:34] Unknown:
That brings us now to the DoIT project, which you started a number of years ago. I'm wondering if you can just describe a bit about what it is and some of the story behind how it got started and why you decided to build it yourself and what motivates you to keep maintaining it. Coming from, like, c, c plus plus
[00:02:52] Unknown:
I enjoy Python very much. But in the beginning, I have this oh, there is no compile step. And then I quickly found out about PyFlakes. And that was very nice to run PyFlakes before you try to run your program to find, like, some basic syntax errors. And then I thought, oh, maybe I should put this in a build tool because I don't want to run PyFlakes on my whole code all the time. But the problem with the traditional build tools like make is that, like, it compares the timestamp of your target build files, the object files with the source. But when you're running PyFlakes, it's just a checker. There is no output file.
So just because of this small change, the traditional build tools could not handle. And that was the basic idea that started to trigger as like I say, oh, I want to build something that I can just handle more generic tasks, not only build things. So that's why I put the name do it. It's just like make, but it's more generic for executing any kind of task that you want to do in efficient way. And then at first, I tried s cons as well from Python, but it also couldn't solve the problem. And then I start with a very basic script that I was starting to add more and more features until Duet came to life as a separate project.
[00:04:16] Unknown:
For people who are first coming across the Duet project and trying to decide if it's something that will be useful for them. I'm wondering if you can just talk about sort of the main goals and use cases that it is aimed at and some of the ways that people might integrate it into their workflows.
[00:04:32] Unknown:
For example, make. Make is like if you talk to a developer, CC plus plus developer, because compilation times can take 1 hour, but when you start using a build tool, it can go down the time because it's more efficient. It knows what's changed and then what can be done. But most of people know make from just installing programs. They're just like entire configure, make, install. So a problem that I have in the beginning to describe do it is just like, oh, it's kinda build 2, but many people think, oh, but I build 2. They don't know about this concept of incremental builds of executing only what's necessary, what's not up to date.
And the other problem describing it is like, oh, it's a build tool, but Python is you don't need you have no build step. So the way that I tried to market it, it was quite hard, but now it's more common the concept of, it's just a task runner. So, basically, over the time it grows, it can be used, like, from 3 different ways. I would say it's, like, the most basic way is similar to make that you have, like, in the way we call a doodle file where you describe your tasks. And that's just, like, to manage the life cycle of your project. For example, I have a web project. I can use it, like, to run some linked checks, to execute the tests, or to build some static assets.
But over the time, do it also grow to be able to be used, like, as a framework? Like, if you're using you want to develop your own application, you can use do it as a framework for your application that you have all the powers of a build tool. 1 example for this is a project called Nicola. Nicola is a static site generator, and it's based on do it. And the command line, everything, it's based on do it tasks as do it tasks. So the do it goal to scale down, should be able to to be used as a simple task runner, or you can also scale up to applications as well.
[00:06:48] Unknown:
As far as the actual implementation of do it and the ability of it to be used as a standalone build tool or as a component of a larger framework, I'm wondering if you can talk to some of the design elements and the architecture of the project and some of the ways that the current state has evolved from where you first began it?
[00:07:09] Unknown:
So when I started to design it, the main point was, as I said, I want to build, like, a build tool that would not require targets. Like, you have the input files and have output. But there is other things that I want to tackle. Like, for example, when you use make, it's not a proper language, so it's kind of very limited what you can do. It's quite powerful, like the regular expressions for for defining rules. But once it starts to growing, it becomes very hard to debug and to create your own rules. 1 thing that you can see from that is, like, you have to start to have programs that generate your make file. So you start kind of losing the purpose, like, you have a very compact format for defining your tasks, your rules, but then it's not powerful enough, and then you have to create a different program to generate those format for you. So 1 goal that I have for do it is that this should never happen.
When you are defining your tasks, you should be able to do everything at once without resorting to generator or Marcus. Another thing that I had in mind was about readability. There is a bunch of support tools that comes with do it. It's not just for running, like, you can list the tasks without having extra effort. Once you define the tasks, you can generate, like, a graph of the and the answers of the tasks, And you can get information about what's the status, if it will run or not, or what's the reason if it's not up to date. And yeah.
So another aspect of the implementation is that I wanted to make it very easy to get started. So at the time, I was writing tests with notes and pytest. So I decided to go to the approach similar to Pytest where on your Python module, you define any function that have task underscore. Do it. We will understand that this is a task definition. Another important factor, I would say, is that do it is designed to be used by developers. So you define your tasks in Python code. Although, programmatically, you can write Python code to define your tasks, The tasks can be described by a simple dictionary.
So it makes much easier for introspection and for understanding the relation between the tasks. So when you write your first tasks, you don't need to import anything from do it. You just define some basic functions that return a dictionary with the task metadata.
[00:10:02] Unknown:
In terms of the sort of design philosophy, you mentioned that it's targeted for developers. The way to interact with it is just by writing arbitrary Python code. I'm wondering how you've thought about some of the sort of programmatic constructs that you expose and some of the ways that you have designed it to be composable and evolvable so that it is sort of powerful and easy to get started with while also being extendable and able to grow with the project that it's embedded within?
[00:10:32] Unknown:
I guess 1 of the easiest ways is just, like, I try to keep it as much as plain Python as possible. It's like your code for the tasks. You don't really require to use anything from Do It itself. Do It comes when you attach metadata to your tasks. Like, we'd have the concept of actions. Actions is like can be Python code or a shell command. And then you attach some metadata about the file dependencies or dependencies to to another task or how to pass information from 1 task to another or to get, variables from the command line. Most of your code is just plain Python.
And then you can use all the basic Python constructs you use normally. There is no much difference.
[00:11:22] Unknown:
In terms of the DoIT project, as you mentioned, it's a tool for task and build automation, and Make is the sort of canonical example of that. It's been around probably the longest at least in terms of tools that are still being used today. But there have been a number of other projects that have started up within the Python community as well as outside of it. So from Ruby, the most notable is Rake. But within Python, there are tools like Click and Typer, and then there's the invoke project that grew out of the Fabric project. I'm just wondering if you can talk about some of the ways that the DoIT tool compares to some of those other systems that are available. And if you can just give your sense of its position in the overall market and some of the reasons that somebody might want to choose it over some of the other options that are out there? Click and typer, they're basically
[00:12:15] Unknown:
frameworks or libraries to help you build command line tools. That's really not the goal of Duet. Although it can be used in this way, it was never the goal because Duet expose some other commands, and you won't have the same flexibility you have if you use click, for example. There is some people do some programs with do it, and then they wrap up for the user interface with Qlik or another arc pass or whatever. Regarding invoke, do it is about the same target people, same target usage. My opinion is that invoke doesn't have this execute something only if it's not up to date, and this is the essential feature that do it have from day 1.
And I think sometimes for some people who doesn't need that, it might be okay at some point, but it doesn't let you scale. So that's the problem I see with it. Do it is also compared to other tools, like, because do it is used for very wide range of applications. When I just released do it, most people we started using was, like, comparing do it to Puppet and Ansible and Chef for some basic setup of their machine. It's okay if you're doing your only on your machine, but Duet doesn't have, like, the capability of bootstrap itself. But for your own project, it might be reasonable.
Do it is also used a lot in the scientific community, and there are some other tools that people compare like snake make. I'm not that familiar with snake make. Snake make is more like completely programmatically as I understand it. And finally, nowadays, some people compare DoIT to or Airflow. DoIT is come online only, So you have advantages and disadvantages over airflow in this aspect. I guess the original reason that I created DoIT is the capacity to check when a task is up to date or not. This is mostly neglected by most other tools still today. And I see a lot of people who use do it, they stories like this. Oh, I tried this too, then I tried that too, and then I tried that too. And then after they try everything, they try do it and said, dude, it was the only 1 who could handle it. And usually, the users report is the only tool that can handle anything that you try to put at it because it was really built to hand any kind of task.
And that's how it evolved. Like, I got people working with, like, circuit design or some things that I have no idea what it's about, and it's like, oh, I need to make this, and then that's how it evolved. And this is 1 thing that I always took pride. That's my priorities. Like, if someone have whatever condition that they want to have a task that should not be run, though it should be able to have a way to figure out.
[00:15:29] Unknown:
In terms of the history of the project, I was looking back to see when you started, and it's been running for about 13 years now. So there's definitely been a lot of evolution in the project and evolution in the Python ecosystem. And I'm curious if you have any thoughts about the overall philosophy as to how and when to add new features and when to try to kind of push them into maybe an extension to the Duet project so that the core remains lean and evolvable, and just some of the practices that you have brought into the maintenance of the project that has allowed it to sort of survive and thrive for this long?
[00:16:09] Unknown:
The project was very much, I would say community driven not by their development, but by requests. And 1 project that was very important for in the history was Nicola. A lot of the development that happens was to support the Nicola project. In terms of Phyton features itself, when I did my first release, I created my own website with, and it was horrible. And then the second release, the sphinx come out, the documentation that makes all the difference. So even something that today might seem trivial because, like, even me, I couldn't look at the website because it was so horrible. And then Phoenix make it beautiful and easy to create and the documentation. And then generators was a big change that I had on my codebase.
After that, multi processing, when it came out, I added support for parallel running with multi processing. And I still integrate asyncio in some way. Yeah. But basically, that's it.
[00:17:15] Unknown:
As far as the overall ecosystem of the Python language and some of the new capabilities and features that have been added, particularly since the official deprecation of Python 2 version and some of the shift in focus of the overall community to having a large contingent of data scientists and data oriented workflows. I'm wondering how that has influenced your thinking about ways that do it is applicable to people's usage and maybe some of the language capabilities that you're able to incorporate into the project or some of the ways that the, you know, overall ecosystem around packaging or builds and releases has impacted the work that you're doing on DoIT and some of the types of features that you want to build into it or use cases that you want to enable?
[00:18:04] Unknown:
That's interesting because, like, I actually built do it with a focus on testing because I wanted to run PyFlakes and I remember my first project I had a junk when twisted they have like a different set of tests and different modules and I have, like, JavaScript tests, and I want something to drive them. But I think I'm the only person who use really do it with this goal of making testing more efficient. And through the years, DOIT was picked by different communities. And, like, when I created do it, there is nothing about data analysis and pipelines. But I don't know how you heard about do it, but it seems that this community is picking up. First was people doing log generators and documentation websites, and then comes the scientific community and now the data analysis and machine learning. I think it proves, like, the tool, even though I have never thought about building data pipelines on it, but it's generic. So people just come and new communities come and discover it and applies to them.
For me, it's very nice that DoIT still can compete with tools
[00:19:15] Unknown:
from the each community that they build a specific tool, and though it still can find some people who thinks it's worth it. Yeah, I actually first came across the Duet project when I was first starting the podcast and was looking at what website I wanted to build. And so initially, I actually used a Pelican static site generator and and was also looking at Nicola and had the maintainer of that project on the show a while ago, which is how I first came across Duet. But now I actually spend a decent amount of my focus on sort of the data engineering space. You know, 1 of the main abstractions there is the idea of the directed acyclic graph or the DAG, which I know is also also what Duet relies on internally. So it's a tool that definitely fits very well to that workflow. In terms of evolution of Duets, like, I think maybe you're gonna cover this, like,
[00:20:01] Unknown:
But 1 thing that I would like to build, like, a proper UI interface, like, in a browser, integrate, like, a continuous integration system or to drive pipelines to be more like in airflow. But I just don't have the time and resources to dedicate that. But I know some people use do it as a task inside of those kinds of pipelines. But I see no reason why it couldn't be something that you use both on your common line and also on a richer user interface in a distributed system.
[00:20:37] Unknown:
And for people who are using DoIT, I'm wondering if you can talk to the overall workflow for getting started with it, starting to integrate it into the development process or being able to define different tasks and their relation to each other and some of the types of metadata that you're able to track to determine when or if you need to actually execute a particular stage in that execution graph?
[00:21:02] Unknown:
So for people who is not familiar at all with DoIT, I usually say, first, get your code working without DoIT. Because do it is just like something that you can use to cache your results or to paralyze your execution. So but first, you need to have whatever your business logic working by itself. After that, oh, maybe this calculation is expensive, and I don't need to do it all the time. And then you extrapolate it to a task. Then you go on. You you just say whatever you think it fits to be executed independently, you go and extract it to a task.
Or if you're starting with, like, more for a project management, you just create 1 task for each different thing you wanna do. And starting with do it itself, it's very easy, a basic task definition. The only thing you need to define is actions. Action is just, some Python function or a shell command. And then after that, you can just start growing whatever you need. You can add, like, file dependencies. That means whenever a file change, you need to execute the test. A target that says that, like, the target must exist. And the basic check is only if a file was modified and the target exists. But you can also define custom functions to check if a task is up to date or not. You can integrate test values from 1 task to another.
You can have tasks to be dynamically created, and it goes on.
[00:22:42] Unknown:
Being able to dynamically generate the tasks is definitely particularly applicable to data oriented workflows, where maybe you want to ingest some piece of information, and then based on some operation that you do on it. So maybe you take a CSV file and you split on the number of rows, and then for each row, you wanna do some other downstream task and then being able to do sort of fan out, fan in topologies. I'm curious what are some of the other ways that you're seeing some of that kind of dynamic task graph building used maybe in, you know, web or testing or build and release use cases?
[00:23:16] Unknown:
There is many different use case. Like, there is sometimes even before the build, you know what tasks should be, but you have a very large number of tasks, and you don't want to be build the whole DAG every time you do it in the common line. So you can delay that. Give an example, maybe you get a file from the Internet. After you get the file, we will check whatever other dependencies or calculation needs to be done. I also use, like, for testing, like, for example, I first analyze your source code to define which order I should be executing something.
Even, like, for although you do it, it's usually not used for CC plus plus compilation. Like, once I built just for fun, I built a compiler for CC plus plus, and I was looking at the s cons implementation. And s cons have, like, a construct for you to build your own rules. But if you try to use their API, you will not be able to reproduce their c compiler Because under the hood, you do some stuff that is not exposed in their API.
[00:24:30] Unknown:
In terms of being able to build reusable task flows, I'm wondering what you and the community of users have been able to design or build for being able to say this particular set of tasks I wanna run-in every Django project that I use, but I don't want to have to copy and paste this code everywhere. So I want it to be just a PIP installable package, but I also then wanna have some bespoke tasks that are specific to this other project and just some of the ways that people are managing those reusable and those custom elements in different projects?
[00:25:06] Unknown:
Usually, the main build part is just like Python functions, so it doesn't matter. The tasks itself is just a dictionary that you can pass around. And a function that can create as many tasks as it wants is just like a generator yielding many, many tasks. So for example, the Nicola project, again, it has, like, a plug in architecture that they have their own database. So, for example, a task a plug in can register a task that it says, oh, this task will modify the home page. So we make sure that this task is executed before the home page. Everything is just like plain dictionaries. It's easy to integrate even the code that you're not aware of.
But for the very basic tests, it's trivial. It's just like a function that return a dictionary, just plain Python. You put in a model, import it, and you execute it. In terms of the
[00:26:02] Unknown:
sort of discovery of tasks that are present in a code base, I know that, you know, for, like, rake, you can use rake dash t, and it'll see say these are all the tasks that are available. I'm wondering if you have something analogous in do it so that you can come into a new code base and see what are all the tasks that are available for me to run, maybe include some information about, you know, maybe like some help text about this is what it will do. These are some of the inputs that you might need, or this is the relation to other tasks within the overall task graph so that as a developer, I can say, okay. This is what I can do, but I now want to do something that is an additional stage after all these other steps so I know where to hook in and just some of the kind of developer conveniences
[00:26:44] Unknown:
that are available for people who are coming into an existing code base and want to extend it? Do it, it have, like, several commands. The default command is to run. But it also has a command list that will list your tasks. And the command help will work for each of the tasks. So you can write documentation for each of your tasks and about parameters, how it's used. And you can get info from the task to show more information about its metadata, why it's updated or not. Though it can also generate tab completion for your tasks, So it works for bash and z shell.
And there are plugins as well. There is a plugins that generate dot file with a graph. And now just on this release under request from the user, you can now attach some metadata to your task that do it doesn't use itself. Because do it is also extensible through adding new comments to manipulate your tasks, something different than running. So this guy was building something that he wants to classify different tasks by some tags that he's using. So it's also very easy to extend, do it in this way that you can build new commands that get the task metadata.
[00:28:11] Unknown:
As far as the sort of debugging of tasks where maybe you accidentally introduce a cycle into the task graph or you are, you know, iterating on adding some new functionality to a task that's maybe in the middle of the graph and it ends up breaking some of the downstream flows, what are some of the useful debugging techniques or some of the design patterns that you've seen people use for being able to have some level of kind of resiliency or some of the testing approaches for ensuring that tasks remain operational as people are evolving your code base?
[00:28:49] Unknown:
So since Duet is playing Python code, you can just put a debugger breakpoint there. And even you can run DoIT with PDB mode that whenever there's an exception, you drop into the PDB. There is 1 user of Duet. He mentioned that the graph plugin was useful for him. That after he find a new task and he noticed there is some problem in his definition just by looking at the image. But this is a hard problem.
[00:29:17] Unknown:
Sometimes it's just through the bugging. Doctor. In terms of your own work, I'm wondering if you can talk to some of the ways that your usage of DoIT has evolved or some of the most useful workflows that you have built out for your own applications of DoIT within the different projects that you have focused on over the years?
[00:29:37] Unknown:
In the beginning, when I created DoIT, my main focus was really for testing. I was, I actually, at some point created my own continuous integration system. And, yeah. Nowadays, I'm not a heavy user of DoIT myself. I have it in all my projects, but just simple stuff. I know people do much more interesting stuff than me. 1 project that I like done by myself, it's a Pytest plugin. It's called Pytest incremental, and it's just basically the idea is that it will analyze your source code and all your imports, and it will execute only the tests that were affected by the change since the last successful run.
Another thing that I think is very important in pytest is that how it orders the test that you execute. My test by default is like alphabetical order. I think it's not the best way. So also try to create a DAG from your source code, and try to execute from the leaves because if you break something from the root, because if something is broken on your basic or of your code, usually, it will start getting hundreds of errors if you have a big test suite, but you wanted the first failure to be the most significant because if you're testing some code way down, then that's 3, that failure of the test probably be meaningless for you.
Not only that, but you have wasting your time testing some code. So this is something that I think mostly no 1 do, but I think it's very important when you have a large test base that you execute your test in the right order and do it to will do analysis of all your imports and say which order the test should be executed. And so your first failure is fast and significant.
[00:31:37] Unknown:
As far as being able to enumerate all of the import sequences and the dependency graph there, I'm wondering if there are any particular libraries or tools that you found useful for that.
[00:31:48] Unknown:
At first, I was using I forgot the name. I think it was snake fit, but in the end, now Pytest use 1 library that was created by myself. I think the name is import depths. I might be wrong. So I think when I migrated to Python 3, I had to write my own library.
[00:32:06] Unknown:
And then on the caching layer, because of the fact that do it allows you to understand, okay, this task has already been executed. There's nothing that's changed here. We can just skip that. I'm wondering what the options are for being able to plug into that layer, particularly for the use case that you were discussing of maybe adding a UI on top of do it so that it can execute in somewhat of a distributed fashion so that you can say, you know, locally, I'm just going to use, you know, local file system for the caching. But, you know, if I'm running in a distributed context, maybe I wanna use Reddit or a database so that I can parallelize across multiple machines for being able to execute these tasks.
[00:32:45] Unknown:
Do it by default use DBM. DBM is included in the standard library. It's fast, but it's on local. It also have the options to run on a plain JSON file. It's have some advantages, but usually have disadvantages because it's very slow in the end of the process to close the file. And do it also comes with support for SQLite. SQLite allows parallel executions from your command line, but also have some disadvantages. This back end is like there is a interface for it. There is a plugin written for Redis. I myself never used it, but the back end it's basically any key value store can handle it. This database, it contains not only, like, the time stamps, the hash of the files, but it also contains intermediate values. So for example, if you have 1 task computing whatever and you want to use this value in a different step of the process, you don't need to create intermediate files. You can just put it in Duet database.
This comes automatically when you return this from your task.
[00:33:59] Unknown:
In your work on building DoIT and maintaining it and working with the community, what are some of the most interesting or innovative or unexpected ways that you've seen it used or some of the most interesting projects that you've seen built on top of it?
[00:34:11] Unknown:
This is hard. People usually don't come back to you to tell how they are using Duet. Sometimes I see quite a few scientific articles mentioned that they used Duet. Some big names that they wrote back to me that I heard it's about a company called Atomwise. They do, like, cam informatics. It's looks very cool. BMW used for designing the panel of the cars with their contractors and everything. And there is an open source project called Metal k8s. In fact, they build the ISO for Kubernetes distribution, and it's based on DoIT.
That's the ones I know of.
[00:34:53] Unknown:
And in your work on creating the DoIT project and maintaining it and using it in your own work, are some of the most interesting or unexpected or challenging lessons that you've learned in the process?
[00:35:03] Unknown:
Definitely documentation was really challenging. First, like the tool itself, many people have some problem getting, like, okay, it's called do it, but what does it does? You know, what does it does? And then you need to introduce what its capabilities. And although the basic usage is very simple, it grows to have a lot of features. So we have to write a tutorial where people can get started fast, but still hiding some complexity and showing the benefits. And a lot of the documentation I get when people start asking the same thing many times and say, oops, the documentation is lacking on this, And just marketing is, like, very hard. I don't know how to do it, how to advertise it.
[00:35:47] Unknown:
For people who are considering do it or they're looking for some tool to handle some of their task automation or build automation or just running arbitrary steps, you know, in or out of sequence? What are the cases where DoIT is the wrong choice, and maybe they're better suited with a more custom built framework?
[00:36:05] Unknown:
I would say, like, if what you want to do, it's very specific, and there is already a build tool to handle that. Like, if you're compiling CC plus plus code, just go through s scons or cmake or whatever. And, also, if you do it as a common line tool, so if you're doing I know some people do very long pipelines on it, but if you have, like, some very long pipelines on it, some distributed or UI, I would probably go with Airflow or Luigi or other in this area.
[00:36:39] Unknown:
As you continue to maintain the duet project and use it for your own work, what are some of the things you have planned for the near to medium term or any areas where you're looking for help or contribution?
[00:36:51] Unknown:
Myself, I want to revisit the idea of using DoIT in, continuous integration system. I'm trying to do a GitHub actions integration. So you can do only what's necessary on GitHub actions. That would be useful for teams if you have, like, a build that takes, like, 3 hours. It's not like, oh, it doesn't matter because it's my machine. It matters because it's a very long time even for CI to do. So if you get this do its support out of the box on GitHub actions, I think, would be a big win.
[00:37:29] Unknown:
Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And so with that, I'll move us into the picks. This week, I'm going to choose the Matrix series of movies. I started revisiting them recently because there's the new 1 coming out, and it was rather hilarious watching the first 1 again now that it's a little over 20 years since it first came out. So just looking at some of the technology that's in there and some of the thoughts on sort of their representation of the future now that we're living in parts of it is definitely humorous to see. So definitely recommend revisiting that series or watching it for the first time, you know, good fun. So with that, I'll pass it to you, Eduardo. Do you have any pics this week? I'm recently watching John Pilger documentaries.
[00:38:15] Unknown:
They are all very good, very insightful, and surprising.
[00:38:20] Unknown:
Well, thank you very much for taking the time today to join me and share the work that you're doing on DoIT. It's definitely very interesting tool, and 1 that is very useful and valuable for developers and people working in various aspects of software. So appreciate all the time and energy that you've put into it, and I hope you enjoy the rest of your day. Thank you very much. Thank you for the opportunity to talking.
[00:38:45] Unknown:
Thank you for listening. Don't forget to check out our other show, the Data Engineering Podcast at dataengineeringpodcast.com for the latest on modern data management. And visit the site of pythonpodcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email host at podcastinit.com with your story. To help other people find the show, please leave a review on Itunes and tell your friends and coworkers.
Introduction and Sponsor Messages
Interview with Eduardo Scattino Begins
Eduardo's Background and Introduction to Python
The Origin and Motivation Behind DoIT
Main Goals and Use Cases of DoIT
Design Elements and Architecture of DoIT
Comparison with Other Task Automation Tools
Community-Driven Development and Key Features
Impact of Python Ecosystem Evolution
Getting Started with DoIT
Dynamic Task Graph Building
Reusable Task Flows
Discovering and Extending Tasks
Debugging and Testing Tasks
Eduardo's Personal Use and Interesting Projects
Interesting Uses and Projects Built on DoIT
Challenges and Lessons Learned
When Not to Use DoIT
Future Plans and Contributions
Closing Remarks and Picks