Automat State Machines with Glyph Lefkowitz

Hello, and welcome to podcast dot in it, the podcast about Python and the people who make it great.

I would like to thank everyone who supports us on Patreon. Your contributions help to make the show sustainable.

When you're ready to launch your next project, you'll need somewhere to deploy it, so you should check out linode at ww

w.podcastinnit.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your app or experimenting with something that you hear about on the show.

You can visit the site at www.podcastinit.com

to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. To help other people find the show, please leave a review on Itunes or Google Play Music. Tell your friends and coworkers and share it on social media. Your host as usual is Tobias Macy. And today, I'm interviewing Glyph about automat, a library that provides self-service finite state machines for the programmer on the go. So, Glyph, could you please introduce yourself?

Sure. I'm Glyph.

I've been doing Python for quite a long time. I am usually interviewed in my capacity

the Twisted Project. So it's refreshing to be here today to talk about another 1 of my projects because I do have a few. I'm a open source maintainer

and a professional programmer for the last,

closing in on 2 decades now, and you can see lots of my public speaking at PyCon and other conferences around the Internet. So I'm a what passes for a public intellectual, I guess, in our industry.

Yeah. It's funny. This is actually the second time that you've been on the show and in neither case, have you been talking about Twisted?

Yeah. No. It's,

you've got good taste, I guess.

I figure it's just a subject that's probably been pretty well covered elsewhere, so we can discuss other things. So for anybody who missed it, you were on fairly early in this series. Don't remember the exact episode number, but I'll put it in the show notes, but you were talking about the idea of ethics and software development. So I'll refer people back to that episode if they wanna learn more about that topic. But in the meantime, how did you first get introduced to Python?

Well, hopefully, this story changes every time. It's,

interesting for folks who who might have heard it before, as they've I'm hoping they're all listening

wrapped to each episode of your show. But I was, I was interested introduced to Python

because I was,

working on a video game

in Java. And as games tend to be, it was very script heavy, lots of things that needed to have dynamic behavior. And for a variety of reasons, I wanted to try and rewrite it in something new. And

because most of the logic in Java was hash table lookups of dynamic classes, after learning about Python's dynamic

method lookups and semantics and runtime metaprogramming,

Like, 90% of the code just dissolved. It did nothing because it's

trivial to write an app in Python that needs to, like, have dynamic behavior and get things wired together into hash tables of arbitrary methods because that's kinda what the runtime is already. So that was a great introduction in about, I wanna say, 2000 to Python,

152,

the still the best version of Python ever. I'm sure. And so I've been doing it ever since. As I mentioned in the introduction, we were gonna talk about your automat library.

And to begin with, I'm wondering if you can just do a quick high level overview

of what a state machine is and some of the cases that you might want to use 1.

So that's an interesting question because

all computers are state machines, and so you kinda might wanna use a state machine anytime you wanna use a computer. But, of course, that's way too broad of an answer, especially when it comes to dealing with a state machine library. So the real question is,

what's a finite state automaton, or what are certain formalisms around state machines that are interesting? And the the interesting thing about modeling

state as a finite state automaton or a deterministic finite state machine

is they're good for

modeling

objects that change over time that have different behavior

depending on what has happened to them previously.

So it's very easy to accidentally

implement yourself

something

that is effectively

a

deterministic finite state automaton

by setting a bunch of attributes and just checking them before certain methods do things. But, of course, that's very difficult to assert anything interesting about or to visualize or to understand.

And so,

really, it's

the idea behind a state machine library is that you might want to use a formal state machine to be able to get more insight into and and be able to more rigorously analyze

the code that you're already writing that already has this behavior that changes over time. So these different sort of types of state machines, deterministic finite state automata, are objects that have a state that have transitions

between

different states,

inputs that can cause them to potentially change states,

and outputs that are things that happen as a result of them changing states. So there's,

the usual place that state machines are applied is in parsers because there are an explosively huge number of states

that something that could, for example, parse c or Python. They might have millions of internal states, and so there are parser generators like yak and bison and,

flex that do a bunch of the machinery associated with manipulating state machines

so that you don't have to know that there's a state machine under the covers. The kind of state machine that AutoMat is is actually for kind of a higher level application. It's for something where you've got an object, you know that it's changing over time, you want to know that it's there, you kinda want to know what state it's in. So it's almost the opposite use case of what the usual kind of academic literature about state machines is about.

And there are a number of other libraries on the Python package index

that show up if you do a search for state machines. So what is it about the

offerings that were available or the lack of offering that made you feel the need to build a new option, and what are some of the ways that it differs from what was already available?

Well, no disrespect to any of the authors of any of those libraries,

but none of them were

terribly mature. Like, none of them struck me as particularly, like, the thing to compete with. If I was gonna write a web framework these days, I would have to have a big section on my web site about how does it compare to Django and how does it compare to Pyramid and how does it compare to Rails because they're kind of a couple of big names that people are gonna know about that that do this thing in a particular way. And when I looked around on PyPI,

because I did before I started this project, I did kind of a more thorough analysis of the field of competitors than I usually do before starting an open source project because it's kind of an academic computer sciency thing, and I'm not particularly academic computer scientist. So I wanted to see if somebody else had already done this. And what I found was that they were all sort of just enough to solve the problem that their author had had,

and they all, so that was the initial problem was that none of them were really kind of advanced enough

to be worth contributing to or extending as opposed to doing my own thing. And the big difference is that not only were they,

not super featureful, but they also all kind of fell into the same trap, which is that they provided convenience functionality.

And 1 of the ways that automatic is really different is that while I certainly hope it's convenient to use, it doesn't provide anything that goes outside the scope of the specific type of state machine that it is.

I believe the correct terminology for what it implements is it is a pure Mealy machine. This is as opposed to other types of state machines like war machines. And

the thing that I really want the reason that that was an important and interesting thing for it not to provide convenience functionality is much of the convenience functionality

in these other libraries are things like

a hook every time you enter a state or a hook every time you exit a state or interaction with

other pieces of

logic associated with the state machine that would help you sort of debug it or diagnose issues with it that were about stepping through the code

around the state machine and about allowing you access to its internal states. You could say, like, what's the current state, and what are the transitions away from this state? And the problem with that is that the goal of having a library to reflect

this type of state machine,

in my mind, is to provide consistency of behavior. Because if you have

color of that object should is the caller of that object should just say, like, I wanna send a message. And if you have a message sender that can be in 17 different states, the caller shouldn't have to care that it's in the connecting state or the connected state or the timing out state. It should always do the same thing as far as the caller is concerned when you say send a message. And the problem with

exposing

the current state of the state machine, for example, is it seems helpful. Like, you might wanna know that while you're debugging. You might wanna instrument it a little bit. You might have logic that was, like, a little different depending on what state it's in. But the problem is that that that pushes the behavior, which in order to get the benefit of a state machine, which is having this consistent interface for

outside callers to know that they can just send it the same message and it'll do the appropriate thing for its internal state. Instead,

now there's always this opportunity that the state machine can say, well, you really can only call this and these these 2 states, so

always check before you call. And all of the utility that you get from knowing that it's a state machine and having all of these pieces of information about it, like the transition table, goes right out the window because now the actual behavior is being decided by this combination of this innards of the state machine itself and whatever the caller is doing to check before it provides the inputs. So the idea behind

automatic was to make everything

very, very opinionated,

I guess, is the popular term for this these days, and and to

force all of the logic that should be in an output of the state machine to be in an output and to force all of the inputs

not advertise anything about the internal state of the state machine unless they explicitly want to.

Yeah. I can definitely see where

being able to

expose

the internals of the state machine would very quickly lead to a leaky abstraction

and cause people to take shortcuts and then, like you said, thereby sort of short circuit the capabilities that you're asking for in the state machine in the first place.

And at that point, you might as well just go back to your nested conditionals and,

just not even bother with trying to incorporate the state machine library. Exactly. 1 of the other questions I have too is you're mentioning how you were evaluating

the other libraries that were available on PyPI.

But what is it that first drove you to even considering either contributing to or creating your own library for state machines? What was the specific need that put you in that situation?

Well, as it turns out, we are here today to talk about Twisted because in fact, the the thing that led to the need for, state machine library was really the 100

of really unfortunate

informal state machines that live inside twisted. In particular, the thing that motivated me to really create this as a library,

was, Ashwini Uraganti was developing a new TLS library at 1 of the PyCon Sprints, I believe, 2 or 3 years ago, and she was

looking for a way to represent the TLS state machine and looking around for examples in twisted that were kind of good examples.

And unfortunately, there are very few,

and the few that there are mirror that exact same criticism that I have of state machine libraries that others have written, which is that extra abstractions and twisted, which attempt to model things in state machines expose

lots of internal implementation details. They have state variable. It's very easy to tell what state they're in. It's very easy to add little hooks to different elements of the state transition machinery. So I I I wanted to say, like, here's the way that Twisted does this well. And there are lots of things in Twisted that really kind of need state machines and don't have them yet. So 1 in particular, the first that I believe is already in a Twisted release, because Twisted does depend upon automatic now, is, client service. So in Twisted, you got this notion of,

an outbound connection needs to stay up because it's a pipe that you're streaming data to some other service, and it's a pipe that needs to be maintained consistently. And if the connection goes down, it needs to come back up. And our original implementation of this, reconnecting client factory is kind of a legacy API

for a whole bunch of different reasons.

It doesn't work super well

with SSL. It has, it uses older interfaces in the internals of the reactor. There's a lot of tight coupling. So we knew that we needed a a new thing that could

do this job in a more flexible way. And 1 of the interesting things about this is it turns out there's actually a bunch of states associated

with reconnecting because there's the initial state where you start off and nothing has started. Then the application can say, okay. Start up

and start maintaining the connection, and then it's in the connecting state. Then the connection actually gets established, and it's in the connected state. And later, the connection might be dropped, or you might shut it down, which are both different ways to reach other states like the disconnected state or the reconnecting state. And this seems very straightforward because if you think about it from the perspective of, like, how do I implement this in terms of

some logic inside the methods, you start thinking, oh, well, I'll have, like, a flag, and the flag will say, am I connected or not? And if I'm not connected, then maybe I should start connecting. And if somebody tries to send a message over this channel and it's not connected,

maybe I should buffer it or something. And so, like, you have you start adding all these little if checks. But the problem with all those little if checks is that with 1 flag, you have 2 cases to worry about in every method. So you take all your methods and you multiply the work by 2. And sometimes the work of that other branch of the false case is actually the same, and you can look at the method and say, oh, well, this method, it doesn't matter. It behaves the same way either way. That's why you got 1 Boolean. Then you get 2 Booleans. And now you have to add 4 cases to every method. Then you get 3 booleans, and it's 8.

And pretty soon, you're into a combinatorial explosion that you can't manage anymore. And the interesting thing about this is not only do you have this combinatorial explosion, but because it's a state space explosion and not an actual code complexity explosion, like, you got these flags, and until you remember to go in and add all the if checks that you need, it doesn't look any more complex. All of the bugs can't be caught by code coverage testing.

You there's no obvious unit test that you need to write because all of your code's covered, and you added every case you could think of. There's no tooling that can help you see the complexity that's arisen. So we rewrote client service to, use automatic, and we actually found a bunch of bugs, which we didn't even realize were there. So there's things like so there's things like that. There's the,

host name endpoint, which is the thing that just when you say, I wanna connect to google.com on port 443,

the amount of state transitions involved in that are kind of staggering. There's a branch in progress right now to rewrite

that particular twisted class in terms of an explicit state machine using automat.

And that effort has although it hasn't landed yet as a a patch in release,

it has found a couple of bugs in the way that we do host name resolution and a couple of edge cases that we haven't handled a 100 correctly. Because when you're making an outgoing connection, they're

you think of it as just like, I'm not connected. Now I'm connected. There's only 2 states.

What could be simpler? But in fact, you have to do DNS resolution. DNS resolution could give you back multiple results. So you start off in the, I haven't resolved the host name state, then you're in, I have resolved the host name state. Then you've got a bunch of host names, and you're connecting to the first 1, and you're timing out on the second 1. And so that's in the discrete state. And then you get all the way to the end of that list, and then you start transitioning into the, oh, now I'm draining the queue of outstanding connections. I don't have any more connections. You got time calls flying around all over the place here. So this type of logic is very common in an event driven context, not because

it's particularly more

necessary in an async program as an asynchronous 1, but it's more obvious because

you gen you tend to have more objects

that represent, like, connections and stuff as opposed to with threads. You know, you might have a lot of implicit connections or subprocesses, but you don't you don't have an object that has all the flags on it. You just have, like, a call stack where you're in the middle of something somewhere. But if you're doing anything concurrent, you very rapidly have start having all these flags and objects floating around that that might be in a bunch of different states. So that's kind of the the genesis of

why,

automat was necessary

for, like, for Twisted and where the idea came from originally.

Yeah. And an additional place that I've seen state machines leveraged too is for capturing business logic, particularly in something something like ecommerce where there are certain states that an order can be in and different transitions.

So, again, it's something that at face value seems simple, but then you start getting into the special situations of, okay. Well, what happens if the payment gets declined

after the order has been placed?

Or what happens if there is a stock shortage after the order has been completed and now it's ready for shipment except for this 1 thing. So there are various different states that can happen there, and there are certain

business needs. But, again, as soon as you have multiple different representations of state, how do you ensure that you never get into an impossible state within your program?

Because then it's somebody has to go in and manually try to figure out what happened and then resolve it with it with the you know, within the situation of the data, which can get quite messy quite fast. And, you know, from what you were describing earlier of, you know, starting off with the very simple case and then adding a conditional and then just adding another conditional,

after a certain point, it becomes very tempting to just keep going down that chain because you've already gotten started. So what are some of the signs that people should be watching out for to signify that they should stop adding new conditional blocks and start looking into actually incorporating a state machine into their code? Yeah. So I think that there's there's nothing wrong with conditionals.

If is a venerable

keyword in the structured programming pantheon. But the point where you you know you probably need a state machine is when you have a class that has a bunch of attributes.

You really know you've you're you you've fallen onto the spikes already when you when you start adding your 5th or 6th Boolean flag, like, is order declined,

is order delayed, is, you know, if you have if you have, like, 9 of those, you definitely wanna have a state machine because you got there as a workflow. And to your point, by the way, about, like, order systems

and

dealing with different states that they can that inventory and fulfillment and things like that can be in, that is absolutely

another

use case related to

asynchronous event driven programming. Because what you've got in those types of systems where you have orders coming in and fulfillment's happening and billing, those are all events that are happening to your business. And they might be happening as, like, a blocking WSGI thread and your technical implementation of them, but they are nevertheless

an event has occurred in the world that has now affected your business, and it is touching a piece of mutable state that has persisted over the life cycle of this process that's happening. So the thing the the number 1 sign that you need a state machine right now is you have 6 flags on your object and you're adding the 7th.

But the more subtle signs are just that you

have objects that are sometimes missing attributes that certain methods need. For example, if you ever see a trace back that's like an attribute error and you say, oh, yeah. That's I just need to check for that first because

it's fine that it doesn't have that attribute. Sometimes that's not set yet at this point in its life cycle, but I'll I'll just add another check. Even if you don't have a whole bunch of Boolean flags, a bunch of has adder checks are another indication that there's something going on there that really needs a state machine. And for somebody who

is at the point where they decide that they're going to start incorporating a state machine into their code base, what does the process look like for adding automat,

and, what are some of the steps that she that they should be considering in the process of adding that to the as a as a dependency to their code base?

Hopefully, the answer to that question is that it's it's very easy. AutoMat was designed

specifically to be something that's easy to integrate to an existing code base that's

really easy to adopt incrementally.

1 of the things about the other state machine libraries on PyPI that I didn't mention

is they all require that you call some method like input

or,

feed

or data or something, some some method to tell the state machine to turn a crank on itself. There's a state machine object with its own API.

With AutoMat,

the best way you can adopt it is to find some object where you're falling into that,

7 flags trap and decide what that object's public interface is. You've probably got some unit tests already.

Rip out the implementation of that,

replace it with an automatic based implementation, and nothing about it should change. The callers should all just invoke the the methods, which are in Automat's

terminology, the inputs, and it should sort of magically work because the whole goal of automat is to present this interface

that looks just like a regular Python object. The all of the state machine wizardry, all of the stuff that's happening with state transitions and state enforcements and whatever, that's all happening inside the object. The outer interface

is just, here are the methods, they have return values. And so and 1 of the things about that strategy of adopting it is that it kind of forces you to think about, like, what is your real public interface? Quite often when you're designing with AutoMat, you can remove a whole ton of private flags, you can remove a whole ton of other internal details, but you really need to think about, like, what are the things that are happening to this object over time? Like, how does its state change? The 1 gotcha that I would say right now about adopting AutoMatte is that there is 1 extremely subtle detail, which is a known bug and we are working on fixing it. There's been some implementation prototypes of how to fix this. But you can't

produce an input from an output. So if your state machine

receives an input

and then

runs an output method on itself as a result of generating that input, and then it that output wants to produce a new input. That recursion will sort of, like, look like it works, but it actually doesn't. You need to find some way to

pop out of that,

loop and provide the input from the outside. There's a new API coming, at some point called feedback, which will sort of automate the process of doing this. 1 of the reasons for this is that because an input can produce multiple outputs if the transition specifies it, you can't actually necessarily block

on your own input because automatic tries to provide certain guarantees.

And 1 of them is if you've made your way through the state machine to a particular state, all of the outputs that get you on your way to that state will have run. The idea there is that if some of the outputs set attributes on self, you should be able to rely on those attributes being there. You shouldn't need to have your has at or checks. You shouldn't need to have a do I have my connection yet? It's just in an output that's being generated from a certain state where you should have your connection or your, order number or whatever already. You should just be able to

rely on it, and you shouldn't have any conditionals at all in any of your outputs. So in order to preserve that property, reentrant inputs are not, are not supported and probably won't be supported, actually. Like, the thing that's gonna make it easier to deal with this problem is that there's gonna be better error reporting, and there's gonna be a facility for generating what's called feedback, which is, like, it'll push the new input into a queue, but you still won't be able to, like, return the result from that input. You you're still gonna need to kind of treat that as an asynchronous result that happens later.

And when I was looking at the documentation,

I was seeing that as far as the actual implementation

detail, a large portion of the way that you actually get started is just by setting the,

machine

attribute on the class object or on the class that you're defining. And so I'm assuming that that's using some sort of meta class hook to be able to then

instantiate

the internal machinery for the state machine. Is that correct?

Actually, no.

There's really

very little magical metaprogramming going on in the internals of automat at all. It will initialize,

the machine

object

as a descriptor. But almost all of the magic that happens inside of AutoMat is all of the decorators that are methods of that machine

object like input and state and output, those set things up so that when you call an input for the first time, it will

have all of the state that it needs. So,

it's actually for the amount of sort of convenience

of the way that you interact with the library and the way you interface with it.

There's actually very little magic going on. And once you understand how decorators work and you understand just, like, sort of creating a new callable from the existing callable, there's pretty much no other stuff. There's no metaclasses. There's no class hooks. There's no inheritance. It's just

straightforward. Like, the decorated method is an object in its own right. It has a documented interface, and that's it. And as far as, like, the internals of the library itself,

the the way that it's

structured

in terms of how those methods, like, initialize their state and interfaith interact with the user, those

the the magical internals are mostly

designed around preventing you from accessing any of the internal state of the library. So that machine object is there kind of in the class scope, but

once you've defined all of your inputs and outputs and states, it's not really there anymore. You can't really access any attributes on it. It's not an attribute of your instances as far as you can see. You just get attribute errors if you try to reach inside the machine to, like, grab the state or whatever. So that's that that's really the the prevailing design idiom of automat is everything's an implementation detail. Do not look inside the box.

And for somebody who has

different pieces

or different workflows that they wanna be able to link together, is it possible to compose 2 different state machines

that are using automat to be able to create sort of a more hybridized or higher level workflow?

Absolutely.

Composability of state machines was definitely an a goal of the library. And the way that it the composability works is that state machine a is an object and state machine b is an object,

and state machine a exposes some methods which are inputs and state machine b exposes some methods which are inputs. State machine a's inputs call output methods that are defined on state machine a, and those can just call methods on b. There's really nothing more to it than that. There is still the rule that,

a should not recursively generate a new input to a. However, that rule is the same whether b is another state machine or b is just a plain old Python object. So this is actually 1 of the things that makes adopting automat kind of so effortless is that even if every object in your entire code base will eventually turn into an automat object with a state machine inside it. You can migrate them 1 at a time, and no 2 of them will ever know if any of the objects they're talking to are using automatic. It's all just, like, an internal technical detail. The only thing that you really that sort of leaks out of that right now is

the errors that you get when you activate an invalid transition. But invalid transition errors are sort of like assertion errors. If you wanna deal with them, you don't actually want to add try, accept, catch the invalid transition. You just wanna add the a definition of that input

from whatever state that input's not defined in right now. And 1 of the things that I found myself asking while I was reading through the docs is that because

of the fact that the implementation

of a state machine with automat is tied to a class, are there any cases where you would want to be able to have some sort of state transition capabilities

outside of a class where you're just using their Python functions maybe in sort of a, scripted environment?

I think the main thing I would have to say about that question is don't be afraid of using little classes.

Even if you're in a script, writing a little class to represent your state

should be fine. That said, automat is very heavily, I guess, object oriented would probably be the the right term, in the sense that the reason you need the class is that you're providing

inputs to the state machine, and inputs are really method calls. Like, Python already has an a syntax for providing an input to a state machine, and it's dot word open parenthesis.

So I know a lot of, like, people in the scientific Python community or, folks who use things like Jupyter often

feel like they're not building libraries. They're not building big sophisticated systems, so they don't really use classes as much. But if you have a piece of state which changes over time and it needs to present a uniform interface, it's kinda hard to find a better tool in a class to model that. So,

automat could be thought of as, like, classes plus plus. And, you know, to those folks who are doing scientific Python who don't think of themselves as, like, writing technical assets, which like big fancy libraries that need lots of classes, something to remember there, I think, is, like, list is a class, tuple is a class. Even in Python, int is a class effectively.

So you're using lots of objects already,

and there's no reason that you shouldn't define your own to

as the task befits them. So hopefully in a scripting context, you're not doing a whole lot that involves, like, long term persistent state. But if you are dealing with a workflow that can kinda go 1 of 3 or 4 different ways, an automatic state machine in a little class that's just dedicated to that might be a good way to represent it. So in the process of building automat, what are some of the technical hurdles that you've been faced with and what are some of the ways that you have overcome them?

The interesting thing about Automat as a library is that it's there's very little in terms of technical meat. Like, the the algorithms are all extremely simple. It's, like, loop over this thing, call this thing, set this variable to a new value. So most of the hurdles are sort of more social. 1 of the tricky parts of it, though, technically, is,

is actually the visualization layer.

So 1 of the benefits that automatic can provide to you is if you declare your whole state transition graph

declaratively

as automatic you do, there's a data structure there that you can then inspect kind of at import time instead of at run time to say what are all of the states this might ever be in and then to visualize it. So there's actually a tool that comes with automatic, automatic dash visualize. It's like a shell command. You can run it over your Python code, and it'll spit out a diagram that shows you what your state machine looks like

and sort of it'll use graph is to draw out all of the states and the transitions, the inputs, the outputs. 1 of the most technically challenging things about writing this library was

developing a presentation

that made any sense at all that was readable at all because there's a lot of data you have to cram into a tiny little space. And it's by

by necessity, it's changing around a lot. Like, you add a transition and the entire machine is like a different shape now, and so labels overlap and they it's hard to choose a shape for the way they organize the inputs and outputs because if you just kind of let graphviz lay things out however it wants, the inputs will be, like, 7 inches away from the outputs that they're paired with because there's a lot of arrows in between them. So we tweaked that a fair amount, and I I think we have a reasonably good looking visualization, but that'll probably be where a lot of the kind of, like, long term means the project ends up going is just trying to make the visualizations

look good and trying to add features like, not just visualizing it statically where you show what any instance of the state machine might do over its entire,

lifetime, but to, like, add a debugger so that you can

visualize a particular instance, like what state is it in right now and what transitions did it take to get there, maybe doing a little animation. So most of the sort of real technical difficulty was either in that part or just in the kind of,

prereading

of all of the different

Wikipedia pages about state machines, because there's, like,

90 different kinds of state machine. The differences are very subtle. Luckily, there's only a few that are really of interest in computer science, but there are tons of things that start to touch on, like, set theory and mathematics and a bunch of other things, which, like I said, I'm not much of a theoretical computer scientist myself.

So I wasn't necessarily comfortable with all that stuff, and I I had to think really hard about it first because state machines are almost like cryptography

and that you kinda don't wanna have to roll your own theoretical primitives. You wanna be building on something that's a little more foundational. So I knew what general shape I wanted for this. And so, that term that I used earlier, a Mealy machine, was

the result of my efforts of scanning through all the different types of finite state machines and deterministic finite state automata and Mealy machines and more machines.

And there were at least 2 or 3 others that I saw and figuring out where in the hierarchy of different things, which all might be called the state machine, I actually wanted to implement.

Yeah. That was 1 of the things that kind of stalled me on putting together this particular episode is that state machines as a concept is something that I had been wanting to cover for a little while. But every time I started to try and dip into that space, there are so many different implementations

in the Python package index. And then also, as you said, trying to read up on the sort of ideas behind state machines. That's 1 big rabbit hole to dive into. So when I, came across Automat and saw the fairly clean API

and, you know, low overhead of implementing it, I thought it was, you know, definitely a good case study in the ideas of state machines and a practical implementation of it that people could actually walk away with and benefit from at the end of the day. So I was pretty happy when I saw that this library crossed my path. Well, thank you very much. That is exactly the kind of feedback that I was really hoping for when I first started this library, especially because, you know, there's so many different things you can do with state machines and so many different problem domains that they apply to that I was really interested in this 1 specific

domain, which is what I think most of the other libraries on PyPI really are also focused on, which is this domain of objects

that change state over time that have kind of like an application meaningful set of some small number, like a a couple of dozen at most states, where

the transitions

are, like, manually specified. It's not an optimizing compiler. It's not some kind of computer graphics thing. There's no, you know, there's no automatic generation of states

or lattice reduction steps or whatever to try to, like, turn something else into a state machine via automation, but to just build your own by hand with some with some automation. So I I I hope that by trying to kind of restrict this ambition to this very specific

narrow vertical niche of what the very, very broad category of state machines are for, it'll be a little more successful in some of the more general entries.

And 1 of the things that you alluded to briefly is the idea that when you're actually going about testing

your code base when it has a state machine involved that actually simplifies a lot of the cases because you don't have to worry about the impossible

situations that can arise

from a business domain perspective.

And so I'm wondering if you can just dig a bit into what an actual test case would look like for a state machine versus just the combinatorial

conditional explosion that somebody might find themselves in? So this is actually a very interesting question because I have yet to follow this to its logical conclusion. The first thing I should say is there's so much interesting stuff you can do in the area of testing and state machines

that it's very easy to get intimidated and to think that it's this crazy rocket sciency domain. So the first thing that I would tell anyone who's gonna try to adopt automata and what they're gonna do for testing is just write regular unit tests. If you've already got a thing you're trying to port to automatic and it's got unit tests, you shouldn't need to change anything. Just use that as your regression test suite because it's fine. If if you write a bunch of unit tests that check all of the cases that you think are interesting and make sure the behavior is correct, that works totally fine with. Automatic. You'll get reasonably good code coverage numbers. Like, everything will work more or less the same as you're used to, and you should definitely start with that. Because if you try to jump straight into the deep end

of model checking and generative testing, you will end up with, like, this explosively huge amount of stuff to learn, and it'll seem very very intimidating.

That said, the thing that you learn first about testing with AutoMat is that AutoMat will teach you that you're not testing quite as much as you thought. Like, all of these systems in twisted that I'm talking about replacing with explicit state machines that already had an implicit state machine. They already had a 100% test coverage. Like, we were very rigorous about that kind of stuff. And so there was no, like, big legacy untested

code in there. It was just, oh, now that we have a list of all of the state transitions that we can read, it's obvious that we're missing 2. Like, there really should be a transition from this state to this state, and there isn't 1, and we just never thought to write a test for it. So why don't we just go ahead and write a test for that that state transition? And that's the way that a lot of testing for projects that are initially adopt adopting AutoMat will go is you'll do all the same testing you would have done before, just plain vanilla unit test or py test or whatever your favorite tool is. You call the methods on the thing, and they call the outputs. And your logic is exercised, and you feel good. And then you run automatic visualize, and you look at state machine graph, and you're like, wait a second. Did we cover that transition? And then you go write some more tests, and you find out,

your state machine maybe had a couple of bugs in it that you didn't realize before. So it's the first thing that you get out of a state machine is definitely the additional visibility

into cases you might not have tested. And then you can test those the same boring old way you always have. Once you've you're comfortable there, then there are all there are a lot of other opportunities that having a state machine

presents.

And 1, this is what I said, I haven't actually done this myself,

but I it's been something that folks at PyCon and in the twisted community have been talking about with some excitement is using hypothesis,

which,

I don't know if all of your listeners would know about that, but it's a generative testing framework

for Python sort of modeled on Haskell's quick check. And it can

generate

inputs

for a program

somewhat randomly. And then it can discern based on the types of failures that it gets when it runs your tests with these different inputs,

which are the interesting values, which are the ones that need to be tested. And

what automatic gives you is because now you have

an explicit list of all of your state transitions and all of your inputs and all of your states, you can have use that as an input to hypothesis

and generate your testing data based on a kind of random selection of inputs to your state machine. And by combining the state machine and this generative testing

strategy, you will rapidly find these, like, edge cases that may have completely eluded you because you wouldn't have realized, like, oh, this is the these are my inputs and these are my outputs, and here's what I have to test. The few cases I've seen of people doing that, you wouldn't actually be able to tell that that's what they were doing because inputs in automat do actually potentially require parameters, for example.

And the types of those parameters still need to be in some defined range, and those defined ranges have to be specified to hypothesis using its usual test generation APIs.

So it actually just looks the same as if you had applied hypothesis to your existing application code with no use of a state machine at all. But the data

that was created as part of turning it into a state machine is a super useful

input and thinking tool when it comes to doing the input generation step. So there's also some potentially interesting stuff beyond that that gets into the realm of, model checking, but I am the wrong person to interview for that particular area.

So there are some things that are coming down the pipe in future releases of AutoMat, like a tracing API so that you can get coverage information

on your transition so that you can know not only where the lines of code covered, but all of the different orders that you can exercise them in were also covered. So that can provide both more information both on the sort of test design side, which is where your state transition table is useful, and then also on the analysis side to make sure your code coverage is, complete. There is no additional burden

that automat imposes on your code

to make it testable or to do testing with it. But there are a bunch of additional opportunities that it creates where you can do more testing and you can be more confident that your testing is capturing all of your interesting distinctions.

And you mentioned the tracing API

and your feedback

API as well as planned features that you're gonna be adding in the future. So I'm wondering, are there any other capabilities

that you have on the roadmap that people should be keeping an eye out for?

Nothing terribly

big. I think there's probably gonna be some refactoring in the area of what we we call collation, which is that because an automatic input can produce

multiple outputs,

there's a step where when the output start getting generated,

those they get converted into a return value for your input method. And I think some of the decisions that we made in the first release of automat

were kind of formally correct, but, like, not super helpful. Like, it used to return an iterator, which meant that your outputs wouldn't always be run immediately, which was not purely that helpful. And, even now, I believe it still returns a list. And quite often, what you want if you have an output an input that produces a single output via its 1 transition, you just wanna return that value. So there's probably gonna be some tweaks in the way that you define state transitions to make it easier to do certain idiomatic things. But beyond that,

I really think automat is it's 1 of those libraries that can kinda be done in that it performs a fairly simple function. 1 of the most important things about it is that it restricts the scope of that function, that any kind of scope creep into, like, oh, let's add an automatic, you know, hook that gets called every time you enter the state, or let's add a way to get the state out because that would be useful for debugging. Those kinds of things are explicit non features. So I think it'll remain a fairly small and tidy library that just does this 1 thing. The

so in addition to the small cleanups,

longer term, probably most of the energy will go into making the visualizations better, making it easier to,

extract information

from the running state machine, do logging or tracing or things like that.

We already have certain APIs that do things like allow you

to serialize the current state that were fairly carefully designed to well, I guess I should explain first. I've talked a lot about how you don't wanna expose the internal state of your state machine, except you totally need to because sometimes you have a state machine, like, in your order processing example. The actual state transitions in that order processing example probably happen 1 at a time where something happens, your state machine gets saved to a database, and then a couple hours later, another thing happens, it gets deserialized from that database, you run the input, and then you put it back in the database. And in order to put it into the database, you need to know its internal state. So we tried very hard to come up with an API where you can technically get the internal state out of the state machine

via a public API. That's totally doable. But the shape of that public API is

extremely focused on making it really easy if what you're doing is, like, doing the kinds of things 1 would need to do to serialize the state machine to a database

and incredibly

inconvenient and painful if what you want is, like, a dot state attribute that you can check before doing something. So in that same way,

we need to do some thinking about how to make sure the tracing API and the other

internal things that will make it easier to get bits of information out don't accidentally make it too easy to,

subvert the

intended design of AutoMat.

So are there any interesting use cases that you've seen of people leveraging AutoMat outside of Twisted?

It's pretty early for the library, I think. It didn't see a lot of promotion until, fairly recently. As I believe I mentioned at the beginning of the project's life, Ashwini Aruganti was asking about how she might use a state machine library to write a TLS implementation.

And at this last PyCon, she gave a talk about, among other things, automatic, generally using state machines to design secure software. So

that has sparked a lot of interest, and we're starting to see a lot of really interesting kind of proposed uses. But I haven't heard a lot from users who have done a lot kind of in anger and production yet outside of twisted and a couple of,

related applications.

There were some very cool things that I heard about at PyCon, which I'm never quite sure how kind of, like,

public these things are. So I probably shouldn't repeat any of them here, but I hope that the next time we we talk about this, there'll be some really cool stuff.

So are there any other topics or questions that you think we should talk about before we start to close out the show? Well, I guess I I should just kind of

give an impassion to plea to anyone who's got 1 of these libraries or or applications that are

stuck in this,

quagmire of way too many little state flags and way too many little values and counters that are proliferating thousands of little conditionals.

Give automat a try and definitely, like, report bugs and,

help out on the project because

it's kind of an easy project to work on. There's a tiny, tiny public API. There's not a whole lot to,

to mess up. There's a 100% test coverage on everything.

It's all in memory. It's just a data structure. So, you know, if you've been thinking, you know, oh, determine deterministic finite state automata, this sounds like too fancy. I'm not really a computer sciency person. Like, don't worry. Neither am I, and it can give you a lot of advantages to the, to reduce the complexity of your code

and give you more confidence that you understand what it's doing. Alright. Well, I will have you add your preferred contact information to the show. So anybody who wants to follow-up and give feedback on their work with Automat can get in touch and see the, progress being made on that and on its usage within Twisted. And so with that, I'll move us to the picks.

And for my pick today, I'm going to choose

a, set of lights that I found a little while ago at Home Depot that's actually

3 little LED pucks with a remote, and it's actually

a multicolored

LED light. So you can actually change

the hue from white all the way through to red. And you can do it either automatically or, you know, jump straight to to the color, or you can actually manually increment through each of the different shades as you go, and you can also vary the light intensity. So it's pretty nice for being able to,

set add some interesting light effects to whatever room you're in,

and it's fairly affordable. So thought it was pretty interesting and worth checking out. So I'll add a link to that in the show notes. And with that, I will pass it to you. Do you have any picks for us today, Glyph? Sure.

I guess my pick for today would be OmniFocus,

which is a task tracking application

for the Mac and iOS and watchOS.

And, it's difficult to overstate my reliance on OmniFocus. It kinda runs my entire life, and it's germane to today's show because,

when an when a state transitioning event occurs in my life, OmniFocus is the place that it goes to.

It that's so it it's a getting things done based system that allows you to input all the things that you wanna do. And then at at a particular time,

you can

view the tasks, which are sort of currently available, like things that you could do right now, things that aren't, like, dependent on anything else, things that,

are not deferred into the future, things that have an active context. So, like, you can have different context for home and work and different people that you might need,

help from in order to do a particular task, and you can activate, deactivate them, and then see if you have just the things that you kinda should actually be doing right now. Big fan. Always glad to promote the excellent work of the Omni Group, which is the software company that makes that app. Alright. Well, I appreciate you taking the time out of your day

to talk about Automat and the work you've been doing with that. Definitely seems like a very interesting and useful library.

So I appreciate your time, and I hope you enjoy the rest of your evening. Thank you very much, and thank you for the opportunity to spread the gospel of state machines and not having a million if checks.

The Python Podcast.init

Summary

Preface

Interview

Keep In Touch

Picks

Links

The Python Podcast.__init__