Summary
Developers hate wasting effort on manual processes when we can write code to do it instead. Cog is a tool to manage the work of automating the creation of text inside another file by executing arbitrary Python code. In this episode Ned Batchelder shares the story of why he created Cog in the first place, some of the interesting ways that he uses it in his daily work, and the unique challenges of maintaining a project with a small audience and a well defined scope.
Announcements
- Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- Your host as usual is Tobias Macey and today I’m interviewing Ned Batchelder about Cog, a tool for generating files or text from embedded Python logic
Interview
- Introductions
- How did you get introduced to Python?
- Can you describe what Cog is and the story behind it?
- What are the use cases that you initially created Cog to address?
- What were the shortcomings or extraneous overhead that you encountered in tools such as Jinja, Mako, Genshi, etc. that led you to create a new tool?
- What was your path from a quick and dirty script that suited your own purposes to turning it into a niche open source project that was general and stable enough for the broader community?
- One of your claims to fame is your role as the maintainer for coverage.py. How has your experience managing such a widely used project translated to the relatively small and low traffic project like Cog?
- Can you describe how Cog is implemented?
- How did you approach the design of the syntactic elements for embedding Python code into a host file?
- What is the workflow for someone using Cog to generate all or parts of a file?
- How does the introduction of third party dependencies impact the viability and utility of Cog as compared to other templating systems?
- What are the most interesting, innovative, or unexpected ways that you have seen Cog used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on Cog?
- When is Cog the wrong choice?
- What do you have planned for the future of Cog?
Keep In Touch
Picks
- Tobias
- Ned
- McFly Command Line History Tool
- Go for a walk
Closing Announcements
- Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
Links
- Cog
- Boston Python
- Lotus
- Lotus Notes
- Zope
- Cheetah Template Engine
- Coverage.py
- Unix Philosophy
- Hungarian Notation
- Jupyter Notebooks
- GitHub Profile ReadMe
- Ned’s GitHub Profile
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to Podcast Dot in It, the podcast about Python and the people who make it great. When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode. With the launch of their managed Kubernetes platform, it's easy to get started with the next generation of deployment and scaling powered by the battle tested Linode platform, including simple pricing, node balancers, 40 gigabit networking, dedicated CPU and GPU instances, and worldwide data centers.
Go to python podcast.com/linode, that's l I n o d e, today and get a $100 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show. Your host, as usual, is Tobias Macy. And today, I'm interviewing Ned Batchelder about cog, a tool for generating files or text from embedded Python logic. So, Ned, I'm happy to have you back on the show. But for anybody who isn't familiar with you and your long history with Python, if you can just give a bit of an introduction.
[00:01:15] Unknown:
Sure. Long history, is that a way of saying that I'm old? No. Of course not. Because because I am. But You're well seasoned. Well well seasoned. That's right. Yeah. I have a long history with Python. I've been using it since 2000 or 1999 maybe. I should go and dig that up. And I've built a few Python packages. I've given talks at PyCon a number of times. I've been blogging about Python and other things since 2002. I'm organizer of the Boston Python meetup, so I'm well embedded in the entire Python world.
[00:01:48] Unknown:
And do you remember how you first got introduced to it? You mentioned that you've been using it since it's relatively early days, but how did you first come across
[00:01:56] Unknown:
it? Yeah. Well, so I was working at, Lotus on Lotus Notes. And Lotus Notes was a groupware thing, and it had authentication systems and authorization systems. And someone said, hey, you know, there's this other thing that also does all that it's called Zope. And I went and looked at Zope, and Zope was kind of amazing and complicated and was interesting, but I didn't need it. But the language it was implemented in seemed really interesting, and that was Python. And that's how I found Python. And I pretty soon, you know, turned to Python for anything that I needed to build by myself where Python was appropriate, which is most of the kinds of things I was doing at the time.
So I've been using it back since those days.
[00:02:38] Unknown:
So 1 of the tools that you ended up creating in your history with Python is cog. And I'm wondering if you can describe a bit about what it is and the story behind it and the problem that you were trying to solve when you created it. So cog stands for cogeneration,
[00:02:53] Unknown:
which is what I needed it for at the time. We were working in c plus plus and had XML data, and we had a data schema, and we needed to be able to say in the data schema that there are these 15 pieces of data, and then we needed c plus plus header files that corresponded to those 15 things and XML schema that corresponded to those 15 things and probably SQL table descriptions that corresponded to those 15 things, and it felt like, sure, I could just hand edit all those files. And then when I add a 16th item, I have to go in back to all those files and add the 16th item. But wouldn't it be great if we could generate that stuff from 1 central data source? Because it's all very rote. You know, if there's an integer column, then you know how to say it in c plus plus and then blah blah blah. And anyone who's dealt with these things knows this dynamic where you're just feel like you're saying the same thing over and over again in a bunch of different places, and there's no trickiness to it. It just is very manual labor.
So what I wanted to do was find a tool that would let me generate those files. And I actually started by using an existing templating system, it was called Cheetah at the time. I don't know if Cheetah is still around. I don't hear about it much. So I thought, I know. I'll use a templating engine to she just kinda like Jinja is today. I'll use a templating engine to generate these files, and I hacked around with that a bit, and it mostly turned into a painful exercise in getting the quoting right and somehow convincing this tool that thought it was gonna be generating HTML to generate c plus plus header files.
And it quickly became apparent that, yeah, maybe you could make it work, but this is crazy. It's just not the right thing for this. And then I thought, well, what I really want is I just wanna make a c plus plus header file, a dot h file. And then for these 10 lines in the middle, I want a Python for every column in the data schema generate this member definite declaration for this c plus plus class or whatever. And that seems not too hard to do. Like, that couldn't be that difficult to cobble that together, so that's where Cog came from. And it worked really well, and it's been kind of puttering along quietly in the background of a few different places ever since then in kind of a surprising way. So it's just a cog in the machine now? It's it's just a cog in the machine. Yeah. Unfortunately, 1 cog as a PyPI package name was never available, so you have to install cog app to get it. And then the word meant code generation, which was really not a great description of what it does, you know, in 1 sense. And then cog also means things in, like, Discord bots and other places and various science fiction movies. So there's a huge terminology conflict with the word cog. Yes.
[00:05:42] Unknown:
Yes. 1 of the hard things in computer science.
[00:05:45] Unknown:
Exactly. Exactly. Naming things. Exactly.
[00:05:49] Unknown:
And so you mentioned some of the use cases that you initially created COG to address and that it has now been used in a few other contexts. And I'm wondering if you can just talk to some of the motivation for keeping it going as a project where now you mentioned that Cheetah was the initial templating engine that you tried to solve the problem that you were faced with, but now we also have things like Jinja and Mako and Genshi and many other templating systems. And I'm curious what the kind of trade offs are between what all of those different template systems are capable of and the use cases that they target versus how cog is used and some of the sort of limitations of those templating engines where COG is a good fit and maybe some of the places where COG would not be a good fit and you should just go ahead and use Jinja,
[00:06:40] Unknown:
etcetera? It's a really interesting question in part because so the original use case for COG was at this company that no longer exists, and I had have had stopped working for them, I don't know, 15, 20 years ago. I forget now how long ago. That original problem evaporated. In the meantime, the main thing I've used cog for is that when I give a presentation, I author my presentation in an HTML file that uses a JavaScript based presentation engine for the slides. And on my presentations, I often want to have a little bit of generated content in them even if it's only that. I have a code example, and rather than typing the code into a dot HTML file, I wanna put the code in a dot p y file so that I can actually run it and see that it really works, and then include it into the HTML file.
My presentations, and like I mentioned at the top, I've got I don't know how many, maybe 10 different presentations I've done at PyCon. Those all are authored in HTML files and use cog to include files or generate tables of diagrams, whatever it is I want in there. And the irony there is that Cheetah and Jinja were designed to template HTML files. And here I am using cog, which is much stupider to generate my HTML files. And I think that gets an interesting discriminant between where those things are good and where cog is good. 1 of the things I like about cog is that it is very stupid. Jinja and and Cheetah before it and Mako and all those other templating engines know that they're dealing with HTML for the most part. Right? They've got, like, escaping logic and syntax that fits into HTML tags really well, so they understand a lot about the HTML syntax that they're gonna be generating.
And if HTML is what you're generating, you know, if you need to do the thing that Ginger was designed to do, it's really good at it. Right? It's got special punctuation, meaning slurp up the white space here, so that my HTML comes out right and things like that. Cog, on the other hand, knows none of that. What all Cog knows about is how to find the code that's in the middle of the static file, how to ignore the comment characters that protected that code from being the content in the static file, how to run the Python that it finds, and how to insert that content back into the file where it found the code. And that I find makes it easier for me to reason about what's going to happen when I put code into files even if they're HTML files.
And that simplicity appeals to me that I don't have to think too hard about what the cleverness of the tool might do to me and whether it's the right cleverness. I can just use this small amount of cleverness that cog gives me to do the right thing, and that makes it broadly applicable. Any kind of static file could be cogged and has been because it just cares about lines, basically. And it doesn't know what comments and text you're using because it doesn't care, and so on and so forth. It's been designed to be kind of stupid, and that's partly what gives it its power.
[00:09:46] Unknown:
From looking at some of the examples on your blog and in the code and the ways that Jinja is used, you know, as you were saying, some of the challenges of dealing with Jinja and some of these other templating languages is figuring out how many layers of escaping I have to figure out before it gets to the destination. You know, 1 of the things that comes to mind is Ansible and SaltStack where you're trying to do these very complex things and build these systems, and you have to, you know, pull a value from 1 place, load it through the Jinja, write it out to another destination. It's like, okay. How many backslashes and quotes do I need before it gets to just, you know, a single quoted line?
[00:10:21] Unknown:
Yes. Exactly. And you mentioned Ansible and, of course, being a DevOps tool, as all DevOps tools must be now by federal law, it uses YAML as the input syntax. And YAML itself is so complicated. I don't understand the rules of YAML syntax. I use in my editor, I literally have a shortcut that means parse the current YAML file and show me the Python data structure that's going to result so that I can quickly figure out, do I need to indent before I put in those dashes for that list that's nested under that thing? Or what is that gonna do? Because YAML is is way too complicated. And then then you for Ansible, then you layer in the curly braces that it's gonna use to do its thing, and it just gets too complicated.
I understand what people were doing. They were thinking, if I make this ad hoc and complicated in just the right way, then it'll be exactly what people want. But it never quite works out that way, and then you've gotta deal with all these weird rules. COG tries to not do that. And for example, 1 of the challenges in building something like COG is that, let's say you're writing a c plus plus header file. It's a dot h file. Right? And you're gonna put Python code in the middle of it. Well, in order for it to still be a valid c plus plus file, you'd have to make sure the Python code is in a comment. Now in c plus plus, that's pretty easy because you can put a slash star at the line before the Python code and a star slash below it. But let's say it was a SQL file. Right? The only way to put a comment in a SQL file is to proceed the line with a dash dash. Well, that means that all of your Python lines have to have a dash dash preceding them. So then how is it valid Python?
And what I decided to do with cog is I said, look, there's this line that marks the beginning of the code. It's a special cog line. And what I would do is I would take any characters up to the cog marker on that line, and I don't care what characters those are. I'm gonna remove that same sequence of characters from all of the lines in the cog code. Right? And that's a very simple rule and doesn't require any escaping and makes it possible for cog to work in any file syntax. Because as long as you can comment out a line with some sequence of characters, then you can hide your Python code in your static file.
And that just keeps it simple, and that makes it easily applicable to other other syntaxes and domains.
[00:12:40] Unknown:
In terms of your journey from, I have this problem to now COG being a package that you have maintained and I'm sure evolved over several years. I'm curious what the kind of initial implementation, quick and dirty proof of concept looked like and your path from that to, okay. This works mostly how I think it's supposed to. So now I'm actually going to clean it up and make sure that I can actually use it for a longer period of time. And just that journey from, you know, I have an idea to, okay. I'm going to actually keep this and maintain it and use it for other things going forward.
[00:13:12] Unknown:
That's an interesting question. I don't remember how dirty the first implementation was because the idea itself is, you know, in its simplest form, is going to require dealing with, for instance, those different syntax characters. Right? Like I mentioned, my original use case was a c plus plus dot h file and an XML file and a SQL file. And so right off the bat, I had to come up with an idea of how to deal with the different syntax. Right? I couldn't just say this tool is for embedding Python in c plus plus and hard code the syntax or whatever. Right? So I had to come up with some way of dealing with that. There have been features added to COG over time as people asked for things.
So it's gotten more support for other needs in that sense, but I don't know that it was too grungy to begin with. It must have been some grungy. I didn't have, you know, documentation or anything. But once that I saw it was really nicely solving the problem I had, I think it was in a pretty good shape. It's been interesting to see how people have adopted it. 1 of the things that I still don't quite understand is that the cog documentation is the only documentation I've ever written that someone else has offered to translate into another language. If you go and look on my site, you can read the cog documentation in Russian. And And that's just because some guy said, hey, I'd like to translate this for you.
Meanwhile, you know, there's coverage dotpy, which is way more broadly used. No one's ever offered to translate that documentation, maybe because it's longer. I don't know. So Cog has been interesting in that. It seems to have sort of a cult kind of following. Like, not many people like it, but the people who like it really like it, which is very appealing in its own way.
[00:14:58] Unknown:
And, you know, as you mentioned, you're also the maintainer of coverage dot pie and had another episode where we talked about your work there. And I'm wondering what were some of the kind of lessons that you've learned working in each of these projects as a maintainer that have been applicable where coverage but dotpy is very widely used, has a, you know, large install base. And cog, as you mentioned, is, you know, much smaller in terms of scope and usage, and I'm sure it has different requirements as far as the actual maintenance burden and the amount of communication overhead that's required and just some of the lessons that have been shared across those and some of the ways that the maintenance of those projects is widely divergent.
[00:15:43] Unknown:
1 thing that I've tried to do in both is to not get too far afield. Right? Each tool was designed to do a thing, and I've resisted growing them until they do more than that thing. I think that the original UNIX philosophy, make tools that do 1 thing and do it well, is a really good philosophy. And it's certainly been easier with cog because it's just smaller to begin with. There's a dozen or so different options on the command line for cog even so with different tweaks to how you might want to generate content or options for where the output should go or whether you wanna check that everything is still right or whatever.
So even while doing just 1 thing, cog is a little bit complex. Coverage, it's been a little harder to hold the line there. People often want to add things to coverage, and I actually have a draft blog post I've been working on about, like, well, how do I think about what should go into coverage dot pie? Why isn't it a third party tool? Why is it a feature? Why isn't it a plug in? You know, etcetera, etcetera. They're similar in that sense that I like things to be broadly applicable, and 1 way to do that is to make them not know too much about what they're gonna be doing so that you can apply them in different places. It's harder to see in coverage dot py, I guess. But for instance, it some people and we don't need to get too far into coverage dot py, but for instance, typically, coverage dot py is used while you're running your test suite, but it really doesn't know anything about tests, and that's kind of surprising to people that it's not a test runner.
Likewise, you know, cog only does its thing of running Python code and outputting stuff into your static files. I haven't gotten too many requests that are really bizarre beyond that for COG, but it's good to keep in mind what it's supposed to be doing so that if you do get a strange request, you don't go and just add some weird bell and whistle on the side. A big difference between the 2 is that coverage. Py because it's broadly used and Python is a very broad ecosystem. I get bug reports there about all sorts of things I don't understand. You know, there's I've got a bug report that says it doesn't work for TensorFlow. And my feeling is, well, I can spell TensorFlow, but that's about all I know about it. And that looks super intricate, and I don't know how I'm gonna figure out what's going on there.
So, you know, it's kind of unfortunate, but there it sits. Again, cog is just manipulating text. You know, no one's ever said to me, here's the thing cog is doing, and I look at it and I think I have no idea what's going on. I can see what's going on. It's nice and simple. So it's a nice break from the complexities of code coverage to deal with something like COG. Of course, at the same time, I hardly ever touch
[00:18:32] Unknown:
COG. It doesn't change much from year to year. I could definitely see it as being 1 of those projects where it actually does have a point where it's basically just complete. Like, it's done. There's nothing else to do there. I mean, obviously, there was the Python 2 to 3 transition, and I'm interested in digging into that. But, you know, as you said, its scope is fairly narrow. So at a certain point, it's like, it's done. It does the thing.
[00:18:52] Unknown:
Moving on. It does the thing. Well, it's interesting because that is the way I felt. But ironically, I just recently added a feature to cog because it seemed it's really dumb that it didn't have it, and I'm not sure why it took so long, and that is that when you write the Python code that you put into your file, the code that's gonna generate your content, The way you put content into your static file from Python is you use a method called cog.outl, which is very Java like. I think 1 of the unfortunate things about cog is that it's really shows my early Python style, which wasn't very Pythonic in a lot of ways.
If you look at the code for cog, it actually uses Hungarian notation on Python variables, like a little s prefix for strings and a b prefix for booleans and things like that. Like I said, the way you got output into the file was you have the cog module available to you and there's a function in there called out l for output align, which is all like Java's print lin. And I was actually applying cog in a new place just like last month or the month before, and I thought, why do I have to say out l? Why can't I use print? So the feature I had at the cog was you can give it an option on the command line that means standard out goes into the file, which is kind of obvious.
And in fact, I'm not sure what I thought standard out would ever do anyway other than go into the file. So, yeah, cog is kind of done, but it's actually getting better anyway. And maybe the thing to do is to just make a cog 4 that breaks everything and says only print goes anywhere and get rid of that other thing and just be done with it. But then all my PyCon talks would break, and I won't don't wanna have to port my PyCon talks from cog 3 to cog 4. So the dilemma there, that's just my dilemma, but that's a dilemma.
[00:20:37] Unknown:
Yeah. As far as the Python 2 to 3 transition and just the overall evolution of the language and its capabilities. I'm curious how that has played out in terms of the work that you've done on COG and the ways that it's being used. It's funny you mentioned that, and and don't remember it
[00:20:53] Unknown:
being a big deal. And maybe that's because well, I was adopting Python 3 pretty early. I first worked on Python 3, I think, in 2009 as part of the coverage dot py work. Always tried there to stay be a very very early adopter of releases. Again, I mentioned PyCon Talks, and 1 of my PyCon Talks from 2012 was about Unicode, and there I wanted to show both Python 2 and Python 3 examples. And that's probably the time that I made cog work on Python 3 because I really needed it for that talk. It's been on Python 3 for a very long time. It does deal with text, so I'm sure I had to do something kind of intricate for bytes and text and those sorts of things, but that's a long time ago now. I'd have to dig back through GitHub if I could see what's going on, what happened there.
[00:21:41] Unknown:
Another interesting conjunction is how you have applied coverage dotpy to COG, either the package or the code that you've written in COG and embedded to make sure that you are executing everything that's, you know, embedded in the host file or
[00:21:57] Unknown:
or that you're writing tests for the code that lives in that host file. The projects go both ways. So COG, of course, has a good test suite, and I use coverage on cog to make sure that it's a well tested project. And cog isn't that many lines, so that wasn't that tricky. But it goes the other way too in some kind of surprising ways. So for instance, when you look at the coverage dot py docs, I link to a sample HTML report of the kind of report you can get from your project, and that report is a report about cog. But to make it look interesting, it's only a subset of the cog test suite because I didn't wanna give you an HTML report that was a 100%, in which case, you know, none of your lines are colored red, there's nothing to see. So the release process for coverage dot py includes running the cog a part of the cog test suite to get to, like, 34% coverage or something, and and then make an HTML report out of that and publish that as the sample HTML report.
The more surprising thing for me about how those 2 projects intersect is that today, I use cog on the coverage docs because I had some code that I wanted to include into the coverage documentation, in particular, the SQL schema for the data file that coverage rights. And that SQL schema is in the Python code that actually creates the data file, and I wanted a copy of it in the coverage docs. And what I originally did was I literally copied the SQL code into the dot rst, the restructure text file that makes the docs. And then I wrote a little Python program that just checked that the 2 copies were the same. And then and that was years ago. I don't know. 5 years ago, let's say. I don't know what it was. A few years ago. And it's been like that for a while. And then again, just this fall, someone tweeted at me that they used COG to make sure that the code in their docs was the right code because cog literally goes and grabs the lines from the code and interpolates them into the docs.
And I thought, duh, why didn't I do that? That's exactly the right way to solve this problem instead of writing some weird little dozen line ad hoc Python thing to check check that a copy is correct, just cog the docs and pull the right text into the docs dynamically, And then you don't have a copy that you have to check. So here I was being schooled. I forget who it was. I should have looked this up before we started this. I forget who it was that schooled me on how best to use my project on my project. But yeah. So the 2 projects definitely, feed into each other.
Again, it's a good sign that they're both useful. I've written lots of little things that are either on my blog or on PyPI that, you know, during a feverish weekend seemed like the best idea ever, and then just were never I don't use them anymore. No 1 cared. It didn't matter. COG isn't like that. COG solved a problem for me once and continues to solve problems for me and other people. So that's really, really kind of fun. I'm wondering if you can dig a bit more into the actual internals of COG and some of the ways that you have written it to
[00:24:58] Unknown:
be maintainable and extensible and just some of the evolution of the project and, in particular, the design of the syntactic elements that you used for being able to embed that Python code into arbitrary host files, which you've mentioned a little bit, but I'm curious to explore that further.
[00:25:15] Unknown:
Yeah. So there's a bunch of sort of white space handling and indent and dedent that has to happen to make sure that your Python code is gonna be runnable Python code. Like I said earlier, there's a line before your Python code that says this is the beginning of some embedded Python code, and there's a line at the end that says here's the end of the code. And then there's a third line, which is where the interpolated output ends so that if you run it again, we can get rid of the interpolated output, run the Python again and put the output back into the file. And because we strip off the initial characters from those 3 marker lines and then those same initial characters come off of all the Python lines, It's very nice because it means you can indent your Python to where it belongs naturally in the static file. There's not much tricky going on inside the internals of cog to make that happen, but there is a lot of attention to that and unit testing of those sorts of textual utilities.
The surprising things that are inside cog are, well, it has to generate a module synthetically so that it can run the code. It generates a cog module synthetically and makes it available so that you can say cog dot whatever you need to, you know, out l or whatever you need to do. There was some trickiness to if you had an error in your Python code, how I could give you a trace back that gave you the line number of where the Python code started in your static file, plus how many lines down into your Python file the error is. Right? If you have a 10 line Python chunk inside a 1, 000 line SQL file, you don't wanna get an error message that says line 8 because you don't know where it is and you don't wanna get an error message that says line 990 because you don't know where in the Python it is. So you need sort of 2 line numbers. So there's a little trickiness to kind of synthetically create those markers for trace backs.
I I mean, ironically, you know, coverage gets me into lots of internal details of Python execution and code objects and stack traces and things like that. But even something like cog, because it's fooling around sort of with Python execution, gets you into those sorts of internals. The thing that I keep overlooking as I build tools that I think of as command line tools is making them also be libraries that are useful. There's actually an open issue in the cog tracker asking me to make cog usable as a library, which it is not. Right? Right now, it's just a command line tool, and there's no way to synthetically call, you know, programmatically call cog to operate on a bunch of files, for instance, or something like that. There's another project I built a year or 2 ago where I made that same mistake. Oh, it's a command line tool. I'll just make a command line tool. When really what I should have said was, let me make a library that does these things, and then I'll put a command line wrapper around it.
I have to try to internalize that lesson better, because it really is a good idea to make a library first. It's more testable to begin with, but it also
[00:28:19] Unknown:
broadens the ways that people can use what you've written, which is always a good thing. Absolutely. Yeah. There's another project that I was taking a look at a while ago, and it's a very useful tool. And I wanted to take advantage of it, but not strictly in the out of the box way that they support it. I wanted to actually embed it in another project to manage its execution, but, you know, there there wasn't a core library that exposed the necessary interfaces that I wanted. I would have had to actually go in and write that wrapper library. Like, they did actually have things extracted into a core module, but that wasn't designed for sort of external consumption. It was digging too deep into the guts of the project. And And so I started down the path of saying, okay. Well, I'll write a library to wrap around that and then switch all the calls from those, you know, external pieces
[00:29:05] Unknown:
to this, but I never actually ended up finding the time to get it done. So Right. Right. And it's frustrating if you want to programmatically call something that's a command line tool. Well, yeah, you can use sub process to shell out to it, but now you've really sort of turned things inside out. If it made a library to begin with, it'd be much better. And, you know, if I if I were to go and look today at cog inside, it, you know, probably does kind of have a library in there. Right? I've made some classes and things. I just never thought of them as an externally visible
[00:29:35] Unknown:
interface to everything. It was all meant to be internal. That's 1 of the interesting things about the Go language and ecosystem is that you kind of don't have a choice except to make things be exposed as interfaces and libraries, just the way that the language is designed. Although the way that it's exposed is a little odd when you're coming from other languages where basically any method or object that's capitalized in a file is automatically exported as public.
[00:30:06] Unknown:
Other languages conventions always seem really weird. Yes. And we can't throw stones because, you know, I see people all the time coming to Python and thinking, wow. These conventions are really weird. Absolutely.
[00:30:19] Unknown:
Yeah. Yeah. Another 1 of those cases where learning other programming languages is a valuable way to understand more about what you're doing in your language of choice. Mhmm. And then as far as the actual workflow of using cog for generating parts of or an entire file, I'm wondering if you can talk to some of the kind of thought process that goes into designing the code that you're embedding into the host language and maybe some of the challenges that you've run into or seen people face in trying to either eliminate or avoid side effects of that execution beyond just outputting the desired text into the host file?
[00:31:03] Unknown:
Yeah. So it's interesting because in some ways, Cog shares some of the same challenges as Jupyter Notebooks. I won't claim that I invented Jupyter Notebooks 15 years ago in an unrecognizable form, but they both have the same idea of, you know, here's this thing and we'd like to put Python code into the thing and then run some of that Python code in small chunks, but actually the chunks share a global context because you want to sort of flow from thing to thing. 1 of the challenges with Jupyter Notebooks is that they do share a context, but the cells might not execute in the order you read them in because you can go to the top of the file and execute a cell again.
And so it can be sometimes tricky to track what cell is affecting what other cell. Cog doesn't suffer from that problem because it's not interactive, so they're always gonna run from the top to the bottom. They do have that same dynamic that Jupyter Jupyter notebooks have, which is that all the cogs are operating in the same global context. So you do have to be careful that if you have a number of different chunks of Python in your code in your file that they have the interactions you meant for them to have. It's very handy for them to share a context because, for instance, I'll have a chunk of of Python code at the top of the file, which generates no output at all. It's just there to define some functions for, you know, the rest of the file to you, and that will only work if they all show the same global context.
So I think the main tricky question for both Jupyter Notebooks and for cog is when is my code big enough, complex enough, useful enough that it shouldn't be in here anymore at all. It should be in a dot py file someplace. And like all of the step functions, all of the steps up in modularity, it's hard to know when you've crossed that line. Sometimes it's easy to know when you cross the line a long time ago. It's not always easy to know that you're crossing it right now and should make a change. So, you know, you see this this in all sorts places, right, when you have a function that returns 1 thing and then it returns 2 things and then it returns 3 things and then you want to turn it into a dict and then I should have made it a class. Well, I should have made it a class a week ago is what I should have done. But so you've got that same problem in cog where I've seen this happen, for instance, like I mentioned, my PyCon talks, I'll do something in 1 talk where I have some code that does something useful. And then for my second talk, I wanna do the same thing. Do I move the code to a 3rd place and change the first talk to use that code? Do I copy the code and just drop it into the talk because I'm probably gonna need to do something different anyway?
You know, this it's hard to know when's the right thing. Maybe if the point is to get a talk done, then maybe I should stop refactoring support code and I should just get on with it. That's the issue that when you start doing a lot of generating content with cog, which is when should this code be someplace else. And if you do put it someplace else, you know, you can import into your file because it's just Python code. It's not a problem to do that. You just have to recognize that it's time to do that. And as far
[00:34:00] Unknown:
as being able to manage those imports, that also brings in the question of managing the Python environment where COG is executing, and maybe you need to incorporate some, you know, 3rd party packages off of PyPI to be able to handle some calculation or computation or, you know, text manipulation before you output. And I'm curious how you have approached that overall system of being able to ensure that the context in which the COG file is being executed and interpreted matches the desired set of dependencies in order for it to be able to actually run to completion.
[00:34:37] Unknown:
Right. I think the challenge there is that the the point of cog is that you're dealing with files that are conceptually just static files. Right? You've made a markdown file. You're typing your docs into it. It just sits there, right, until it gets to, you know, GitHub or something, and it gets rendered. Once you put some Python code into it, right, which is great because now you can generate part of that markdown or something. I should tell you about the the GitHub profile in a sec. Once you put some Python code into that, well, it's not just a static inert thing anymore. Right? That markdown file is now part of some process.
You know, probably there's a build process for your docs or it's part of the packaging process. And so whatever you do to manage that process, you know, are you using a virtual end? Is it happening in talks? Do you have requirements dot text file that manages your dependencies? All of those things now come into play. And hopefully, that's something that you're familiar with from the rest of your project, but now you have to do that. It might be that you don't have a project that has that, like, again, my PyCon talks. Right? If I didn't have computed content in those talks, they would just be a dot HTML file. There's no requirements files. There's no build process. There's no make file. There's nothing. It's not until I put the Python code in there with cog that it becomes an active process that needs all of that management.
Now I can bring, you know, the tools I use to manage those processes on Python projects into something like a Python talk. You know, that's something you're gonna have to acknowledge is happening. And again, that's 1 of those times where, and you need to recognize you've made that step. You might get to the point where, it's not a big deal. I don't need to think about these requirements. And then, you know, a month from now, you realize, oh, a month ago, I should have actually recorded what those requirements were and put them in a dot text file and etcetera, etcetera. So that's another of those types of it's growing up and you need to deal with it. That does have to happen. Luckily, it's just Python code in the file. Right? There's no weird syntax that's gonna do something different where however you're running the cog application, it's running in a Python environment, and you can import anything that's in that Python environment. So you can manage your dependencies in all the the usual ways.
[00:36:49] Unknown:
Yeah. I think the place that becomes really interesting is that, you know, for a regular Python project, you have your standard set of ways of managing those dependencies and setting up the environment. But because cog is intended to be used in arbitrary contexts that don't necessarily have anything to do with Python, where, for instance, the c plus plus files that you, you know, started the project with, Why are you gonna set up a Python environment for a c plus plus project? How are you going to manage the dependency chains for those 2 different systems? You know? How do you decide where COG actually gets executed in the overall flow from I have files on disk to I have files in a source repository to here's the actual intended output after cog has been executed. You know, do I just embed it as a step in the CI pipeline and put the Python environment there? But then if I need to test it on my, you know, some other machine that doesn't have that environment. So Right. And there's definitely the potential here to overengineer
[00:37:44] Unknown:
something that could have been a lot simpler. Like, going all the way back to the beginning of cog, it could have been that what I should have done was just accept the fact that I was gonna edit 3 files a lot of times in a very mechanical way and not built a whole tool for it. I don't even remember, you know, looking back whether that was a good use of my time to build a tool to do that instead. I'm really glad I did it now because it's turned out to be really useful in a bunch of ways, but for the problem that it was actually designed to solve, it might have been over engineering. And for sure, it was introducing Python into a place that didn't have Python before, which, you know, could have been some friction.
So you gotta be careful. Speaking of over engineering, I mentioned the GitHub profile. So you know that if you go on GitHub and you make a repo that's the same name as your GitHub username, then it's readme will will be displayed as your readme with for people who visit your GitHub page. And for most people, that's just a markdown file and it gets rendered as markdown. On my profile, that markdown file is cogged and overengineered a lot. But that's okay because it's just mine, and the GitHub profile is just a fun thing anyway. So what's the big deal? But for instance, I used Cog in a few ways for that profile. 1, to abstract away how you make a nice badge in markdown with alt text, which was syntax I can never remember. So I wrote a function to output the markdown so I could just put Python function calls for what I wanted to say.
But the other cool thing is that it automatically shows, you know, the latest 4 something blog posts from my blog. Because every time it's cogged, it uses the requests library to go and grab some data from my website and insert the blog post titles into the markdown. And the repo has a a GitHub action, which runs periodically, I think, every 6 hours or something. So just in case I have written a blog post in the last 6 hours, it can recog the markdown file and update my profile so that my profile always has the latest blog post displayed there. I won't say that this is an important thing to have accomplished, but it does show cog maybe not at its best, but at its most.
And it works great to solve the few problems I had there making that profile page.
[00:40:08] Unknown:
I like the vocabulary word you put in there of recog because, you know, is that the opposite of a precog? And
[00:40:14] Unknown:
See, there you go. That's another terminology conflict. That's right. Precog. Actually, it's interesting. 1 of the other ways, the sort of cultish ways that COG has grown is that there have been implementations of the same idea for other languages, and I haven't gone and looked in a long time at these other implementations. They're linked from the bottom of my page on my website about cog. They're for them. The PHP implementation is literally called precog. That's what reminded me of this thing. But its link is broken, so maybe it doesn't exist anymore.
[00:40:49] Unknown:
You've outlived it. I've yes. I've outlived it. And so in terms of the different applications of COG, you mentioned a few that you've had. I'm curious what are some of the other interesting interesting or innovative or unexpected ways that you've seen it applied?
[00:41:04] Unknown:
That's 1 of the things that I wish I knew more about is how people are using COG, and maybe that's something that I can research on my own because I do get feature requests. So they must be using it for something, and I don't always know what they're using it for. And, again, someone told me that just like my first exposure to Python was Zope, I think someone told me that their first exposure to Python was COG, which is fascinating.
[00:41:29] Unknown:
I should go and find that person. There's all sorts of threads I have to chase down as a result of this conversation, I think. Yeah. And 1 of the other applications of cog that I'm thinking of for potential ways that I might use it is that, you know, I work in operations, so there's a lot of automation to be had. And 1 of the challenges that comes up in dealing with some of these complex systems is trying to figure out a way to have 1 canonical reference to a piece of information that can be applied in multiple different locations. So being able to define that in a Python data structure that can then reference in cog to be able to output it to some of these other locations is 1 potential application of it that might be interesting to dig into. So you might have another user.
[00:42:13] Unknown:
Okay. Let me know. Yeah. That's where it got its start, really, that idea of avoiding copies by having a 1 source of truth for things. And that's how I'm using it for the coverage dot pie documentation, for instance.
[00:42:26] Unknown:
And so in your work of building COG and maintaining it and using it across your various projects over the years, what are some of the most interesting or unexpected or challenging lessons that you've learned in the process?
[00:42:38] Unknown:
I don't know about challenging. I think supporting COG has been happy from beginning to end. People don't bug me if I don't get to their issues, and they seem to like it. I guess, ironically, I think 1 of the challenges now is resisting rewriting the whole thing from scratch, because like I said, it just looks really old when you look at the code, but it works. You know? So what if it uses the get the deprecated get opt library in the Python standard library to read its options? That still works. As far as I can tell in tech in the tech world, deprecated just means you should feel bad if you're using this. It doesn't it doesn't seem to have any bad effects other than that. So
[00:43:19] Unknown:
Eventually, we might actually remember to delete this thing, but we're never gonna know for sure if we actually can. So That's right. We're just gonna say you shouldn't use it and move on. That's right. Exactly.
[00:43:30] Unknown:
So I don't know. COG has not been a challenge. I've had plenty of challenges in maintaining coverage dot py and other projects, but COG has really been a nice little corner of the open source world for me. COG is your happy place. COG is my happy place.
[00:43:47] Unknown:
And so for people who are interested in exploring cog and wanna use it for managing arbitrary output in their various files, what are the cases where cog is the wrong choice? And maybe they'd be better suited using some language native implementation if they're not using Python or using something like Jinja or Mako, etcetera? Right. So, definitely,
[00:44:08] Unknown:
if you're making an HTML file that that needs data interpolated into it, it's probably better to use Jinja or another HTML templating engine. Right? Cog is purely line by line. So for instance, you can't use Cog to put just 1 word into, you know, an attribute of an HTML tag. It just literally cannot do that. It's all line oriented. So there's lots of kinds of interpolation or generation that cog would be wrong for. If you start using cog and you find yourself generating lots of HTML with the Python code itself, that probably is an indication that you wanted a more HTML native templating engine.
Another case where you shouldn't be using cog is if you're already in a file that has support for what you need. So for instance, restructured text files, dot rst files. There's an include directive that sphinx will understand. So if you're generating documentation and you need to include content from another file, you can probably do it just with Sphinx. In fact, I could probably do that SQL thing that I need just with Sphinx. It's actually quite a sophisticated include directive. You can say include this file, and you can say include this file starting at this line number or starting when you see this text in the file, etcetera, etcetera. So Sphinx already has support for a use case that I just described I'm using cog for. I should go back and look to see if I can even actually get rid of cog from the coverage dot pie documentation and just use Sphinx. Those are 2 examples where cog is not the right tool. It could work, but you might have a better option before you already.
[00:45:43] Unknown:
As you continue to use and maintain the COG project, what are some of the things you have planned for the near to medium term?
[00:45:50] Unknown:
So I don't have any plans for COG. COG has been kind of mostly demand driven. 50% from what I wanted it to do when I was using it and 50% from suggestions other people had that seemed very reasonable to me. Making a library out of it might be less the closest thing I've got to a plan, but it's not urgent. You know, 1 of the challenges of being an open source maintainer is you get into it because you have an itch you wanna scratch or you just like building things that people wanna use. And the dynamic can shift to, I feel guilty because I am not attending to, you know, the bug reports, the feature requests, the pull requests, you know, whatever it is, the next bit of work that should happen on that project.
And that's a corrosive emotion to feel about your project. So for Cog, especially, I try not to for coverage dot py. It's a little harder because it's so broadly used. But for Cog, especially, my feeling is if it's not fun, why do it? So I'm not gonna jump to do something with cog just because someone asked me to. I mean, just saying that, it sounds kind of harsh and blunt, but there it is. To paraphrase Brett Cannon, every commit is a gift. So Yes. Sometimes a gift that eats. Yes. So yes. And like we said, COG, for the most part, is done. No one's out there screaming for something that's really missing in COG.
So it's fine the way it is.
[00:47:18] Unknown:
Are there any other aspects of the work that you've done on cog or some of your other work in open source maintenance or just your experience in the Python ecosystem that we didn't cover yet that you'd like to discuss before we close out the show? Oh, boy. There's a you threw Python ecosystem in there, so you've opened a can of worms. Don't get me started. You'll have to come back for that 1. Okay.
[00:47:40] Unknown:
Singletons are bad. Don't use them.
[00:47:44] Unknown:
Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And so with that, I'll move us into the picks. And this week, I'm going to choose the new microphone that I got that I'm recording this with currently. It's the Samson q 9 u. I had been using a kind of convoluted setup that worked well for a number of years, but finally got to be enough of a nuisance that I went and splurged and got a new microphone with a boom arm to go with it. So definitely recommend that for folks who are looking for a decent setup that want to have something easy to use. And so with that, I'll pass it to you, Ned. What do you have for picks this week? So I have 2 picks. The first is a command line history tool called McFly.
[00:48:29] Unknown:
I guess, like the Marty McFly, the character from Back to the Future, McFly's claim to fame is first that it gives you distinct command line history for each directory, so that when you type control r to get to your history, it's only gonna show you commands that you ran in that directory before. But also it uses claims to use at least, I don't know, live and look code. It uses a neural network to decide which commands are probably the ones you want, and I have no idea what the input is to that. It's not perfect, but it's under active development, and they've been responsive in GitHub issues. So I'm looking forward to it getting even better. I use it all the time, and it's great. And the second pick is walking. I highly recommend going out and taking a walk. During the lockdown, I've been walking for exercise, and I've actually gotten a few blog posts about it. And I have enjoyed not only the exercise and the podcast listening time, but getting to know much more about places that are very near to me that I have never been to before.
[00:49:27] Unknown:
So I encourage you to go do it. Well, thank you again for taking the time today, Ned. It's always a pleasure to get to talk to you. So definitely appreciate all the time and effort that you've put into COG and your other open source endeavors and your support of the Python community and ecosystem. So thank you again for all of that, and I hope you enjoy the rest of your day. Thanks. Thanks for having me, Tobias. Thank you for listening. Don't forget to check out our other show, the Data Engineering podcast at dataengineeringpodcast.com for the latest on modern data management.
And visit the site at python podcast dotcom to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email host@podcastinit.com with your story. To help other people find the show, please leave a review on Itunes and tell your friends and coworkers.
Introduction and Guest Overview
Ned Batchelder's Python Journey
Introduction to Cog
Cog vs Other Templating Engines
Evolution and Maintenance of Cog
Cog's Internals and Design
Workflow and Challenges with Cog
Managing Dependencies and Overengineering
Interesting Applications of Cog
When Not to Use Cog
Future Plans and Final Thoughts