Mercurial with Augie Fackler

Hello, and welcome to podcast.init,

the podcast about Python and the people who make it great.

I would like to thank everyone who has donated to the show. Your contributions help us make the show sustainable. For details on how to support the show, you can visit our site at python podcast.com.

Linode is sponsoring us this week. Check them out at linode.com/podcastinit

and get a $20 credit to try out their fast and reliable Linux virtual servers for your next project.

We are also sponsored by Sentry this week. Stop hoping your users will report bugs. Sentry's real time tracking gives you insight into production deployments and information to reproduce and fix crashes. Check them out at gitcentury.com.

Visit our site to subscribe to our show, sign up for our newsletter, read the show notes, and get in touch. And to help other people find the show, you can leave a review on iTunes, Google Play Music, or tell your friends and coworkers.

You can visit discourse.pythonpodcastdot

com for your opportunity to find out about upcoming guests, suggest questions, propose show ideas, and discuss the show with other listeners.

Your hosts, as usual, are Tobias Mason Chris Patti. Today, we're interviewing Auggie Fackler about the Mercurial version control system. So, Auggie, could you please introduce yourself?

Yeah. I mean, I've worked on Mercurial for something like 8 years. Used to live in Chicago, and now I live outside Pittsburgh.

So how did you get introduced to Python?

While I was an undergrad, I got an internship, and,

you know, I I knew Objective c and Java and kind of knew Perl. And they basically said, well,

show up knowing Python.

If you don't know Python, your internship could be real short. Here we're gonna mail you a couple of books. And so I rewrote some software that I was playing with in my spare time in Python, and that's how I got started in Python.

So can you describe what Mercurial is and how the project got

started? Yeah. So Mercurial is a version control system. So,

hopefully, everybody who's listening is using 1 of those, whether it's version or git or Mercurial. The way Mercurial got started was

back in 2006 or so, the Linux kernel had been using BitKeeper.

And for a variety of reasons that I won't go into. There's a a long colored history there. The Linux kernel had to stop using BitKeeper.

And so Git and Mercurial both came into being,

I think it was in April of,

2

maybe April of 2005 now that I think about it. But in any case,

they needed a new version control system, and they wanted something that was close enough to the workflows they'd come to like out of BitKeeper that, Git and Mercurial

look very similar,

if you squint just right.

And that that that's where both of the projects came from. And and,

Matt, the founder of Mercurial, actually used to be a Linux kernel developer full time

Or I guess full time. I

That's really interesting. I knew that Git was started originally for the purpose of tracking the Linux kernel source code. I didn't realize that Mercurial was spawned from the same need. That's interesting, and it's

also sort of speaks a little bit to why some of the workflows, as you said, are somewhat similar, although there are, you know, fairly divergent particularly these days.

Yeah. It's it's less divergent than I think most people expect even today. The real big divergence is once you start getting under the covers. The the data storage models are astonishingly different for tools that look so superficially similar.

I think in terms of the BitKeeper

Git thing, I think it's like Auggie said, Linux for a while, at least, the kernel was in BitKeeper.

And then, Augie, didn't BitKeeper go proprietary? Maybe that was in the colored history that you were Yeah. So BitKeeper was always proprietary.

BitKeeper, I think, was the outgrowth of some source control ideas that originated at Sun in the nineties. Right. Right. And BitKeeper had always been proprietary. Yeah. I've heard a couple of different versions of history, and I don't know which 1 is right. So

I know I remember

that the Linux kernel project used BitKeeper for a while, and I just remember I was pretty heavily involved with the Linux community even back then, not as a committer or anything, but just in terms of as a a rabid fanboy.

And,

I remember the massive hue and cry over the need to switch because of BitKeeper's proprietary nature. There were there were other things. There was dissatisfaction with its model, and I seem to recall complaints about it not scaling or something like that. There was a whole raft of of reasons around why BitKeeper wasn't, you know, according to some people anyway, doing the job. And I remember,

Git coming around not too too long after that. There there were also some people,

who were displeased that it was a proprietary version control system, and that kind of precipitated some events that led to the license for the Linux kernel being revoked.

So how did you get involved with working on the Mercurial project?

Yeah. That that's kind of a a funny story. You can make a claim that my entire

career working on Mercurial is a yak shaving expedition from my previous company.

So a a teammate of mine at that that company showed me git SVN and was doing all this neat stuff with git SVN and and set me up with it. And I I think I burned myself

4 or 5 times in the first 2 days getting the Git repo wedged such that he he couldn't help me get out of it. And I was in over my head, and so I I had seen how how useful the workflows were but was too frustrated by the

tool and remembered, because I I have some friends that were early subversion committers

and had worked with some worked on subversion in the past, they had told me that Mercurial seemed like it was doing pretty well. So I kind of cobbled together

in bash scripts a hacky version of what eventually became hgsubversion.

And in evenings and weekends,

slowly turned Mercurial into a really powerful subversion client that served my needs at my last company where I was using subversion all the time. And just kinda kept getting in deeper from there until now I'm here, and I I kinda work on version control stuff for a living.

1 of my first introductions to Git was actually trying to convert a few subversion repositories to Git so that we could push to GitHub, which was then in its early days,

so that we could have a slightly more sane approach to how we're managing our source control because the company I was at was small enough that it didn't make sense to have a centralized repository, a la, the subversion model.

We we had a centralized repository, but

I discovered that it was running an ancient vulnerable to known CVEs

version of subversion when I was writing hgsubversion

because I found that the server didn't actually understand some of the newer wire protocol commands in subversion. It was it was interesting times. I ended up being the version control administrator

because I was the only 1 that cared.

And what are some of the features that can be found in Mercurial which are lacking in similar tools such as Git or Bazaar?

Yeah.

Bazaar is an interesting 1. I haven't actually used Bazaar at all. And they they have, to my knowledge, basically thrown in the towel and and said that no further development work is being done. Just kinda sad. Features, though. So we have a structured query language for revision history called revsets

and a related 1 for file trees called file sets. That was the the kind of thing where Matt had the idea and crafted the initial implementation and showed it off, and everybody thought it was really neat

when we first looked at it. And I think a day and a half later, it was like oxygen to everybody.

We just couldn't live without it. And and even now when I'm using Git on on a project that's using Git, I I get very frustrated that I don't have rev sets usually within a day or 2. I I don't know how much I use them until I I don't have them. We have a fairly nice templater. I asked a colleague of mine who's working on some Mercurial stuff right now and used to contribute to get at least somewhat what he thought was good, and he brought up the Templater is really, really nice. And, also, we have a concept called phases. So we have this notion of whether a change is public, draft, or secret, and that kinda helps you avoid screwing up history edits. But I think within 2 days of phases existing as a thing in Mercurial, it prevented me from screwing up a rebase.

And so what phases allow you to have a workflow where you are committing locally and pushing to a remote repository, but you have them marked as secret so that if you're doing the equivalent of a git commit dash dash amend, you wouldn't be negatively impacting anybody else who may have happened to have started working against those change sets that you had posted to the server?

Sort of. What you've described is actually the the leading edge of a project we've had going on for a while, which is actually why we have phases that we're calling change set evolution where we're actually tracking the metadata of this change replaces

that change over there. And so you you can record that metadata for things that are drafts, and then you can exchange the the metadata about what changes replaced what other changes along with these draft changes. Then if you have work that's based on mine, the tool can help you deal with your descendant changes from mine.

And then for the revsets,

can you compare and contrast that a bit with the reflog from Git?

Yeah. So they're they're completely unrelated,

for the most part. So the reflog in Git is a journal of where, like, master has pointed. So if you do reset

dash dash hard,

you'll end up with,

you know, a a new entry in the ref log, and you can, like, go look and see where master used to point. And if you did a rebase, you can you can look at the intermediate states there. Revsets are a straightforward

search query system. So if I

go

to selenec.com/hg,

there's a search box in the upper right hand corner. And that actually accepts a restricted subset of the revset language, all the things that we can actually run on a server that are safe. And you can look for all the changes that Matt made in version 3 point you know, between 3.73.8.

See, is there a way I can send you guys a link here in Skype? You can just put it at the bottom in the links section there. Great. So that's a sample revsec query. And so the 3.7coloncolon3.8

means

revisions that are topologically between 3.73.8,

and then and that's inclusive. And then the and user npm

means that if the username field has a substring npm,

it'll match. And so you can see, if you look at the link, it actually returns only the changes he made in that part of our history. And that that looks like a gimmick when you demo it, and and then, like, you start coming up with little places where having a structured query language is awesome.

No. I can I can totally see it because it seems like Mercurial is in a sense embedding,

the functionality that used to live in tools like,

the 1 Atlassian made that was like code search or grok or things like that where you, you know, where you wanted to be able to do rich queries on your SCM?

It sounds like you're at least providing the basis for that kind of tool without a lot of external processing. That's fantastic.

Right. And the help doc actually describes all the, I'll I'll put a link in for that too. That's the help doc for for revsets, which you may as well link to as well because that describes all the other search predicates. And then file sets are the same kind of thing, but over your files. So you can easily search for things like files that are larger than 10 megabytes and don't end in PNG because we've all done that with find, and it's kind of a pain. Yes.

Yes. Yes. Indeed.

So 1 of the common complaints with Git is that its human interface could use some more. How is Mercurial's UX an improvement over Git?

This was actually the the sort of the last thing my my coworker mentioned was,

argument order often doesn't matter with Mercurial, and with Git, it often does. You can you can put flags anywhere, that kind of thing. And we try to think really, really hard about

command line UX. I think a lot of people forget that just because you're on the command line doesn't mean you don't have a user experience, and it's really easy for that user experience to be unredeemably horrible. You know, git checkout does, like, 4 or 5 things depending on who you ask because it also is what you would expect revert to be if you had used any version control system that wasn't Git, and Git revert is in fact something completely different, which is baffling to a lot of newbies.

That's definitely something I see with with new developers,

you know, struggling like crazy with Git's with Git's interface, and I think it's it is exactly what you say. It's because they viewed it as this kind of the command line is this necessary evil sprinkled over the top of Git's amazing internals.

Whereas, I think it's really unfortunate that a little more thought wasn't put into it because I think there have been 100 of thousands of developer hours lost

to people who really just don't want to have to care that much about their SCM

frustratedly.

And that's actually where where revsets come from. So we had this thing called the parent revspec extension that stole a couple of git's sort of line noise operators for looking at history. Like, you can do git log master caret, and that gives you the first parent of master instead of master as the reference you're referencing.

And that's that's useful,

but it's

it's not ideal because that only lets you do a couple of things. And they've they've snuck a lot of really clever

operators in through this sort of line noise thing, and and there's this thing that I've I've only seen referred to as the git pickaxe. I don't actually know

a better name for it. People have blogged about it, and they have a bunch of Git log takes a zillion d flags because there's a flag for every kind of search you might wanna do. And instead of accepting these knockoff flags 1 at a time as we wanted the functionality, Matt thought really hard and played the long game and said, instead of giving you all of these flags, I'm going to give you a query language that you can then use any place you wanna talk about a revision, which is fantastic.

Yeah. It's definitely better to have a more uniform approach to it than having a particular corner of the tool that you can do that, you know, powerful querying, but everywhere else you're hamstrung and don't have that full flexibility. And having tried to parse through how to actually do some different

analysis in the git log

command line. I can definitely appreciate

being able to have a much more rich syntax for being able to pull that pull out that information.

Revsets are are are 1 example, but another

mostly what we do is we look at new functionality, and new functionality has to justify itself as baggage you wanna carry around for the next decade.

We're talk you know, getting Mercurial or software projects that are over 10 years old at this point. And if a if a if a piece of functionality can't justify itself on a 10 to 20 year kind of lifespan,

it's probably not worth keeping. We say no to a lot more features than we say yes to. Yeah. Because once you include a feature, it's impossible to actually get rid of it without upsetting a lot of the users of the tool.

So our backwards compatibility policy is that we don't break backwards compatibility unless it was a bug with with, I think, 1 exception I can remember in history, which was about command line flag positioning parsing on an obscure corner of the tool.

That seems like a legitimate reason for breaking backwards compatibility if it was, severely unintuitive.

Right. So it used to be that you couldn't put flags after positional arguments, and, you know, that changed something like 8 years ago.

And it seems to me that perhaps this sort of strict adherence to the long game, the long view would be made much easier by your

explicit inclusion of an extension mechanism. So, you know, you ask for feature x, Mercurial says no, but then you can go off and write an extension to do what you want anyway. So everybody wins, and the integrity of the overall project is maintained.

Right. And so HG subversion is not something that is ever gonna be part of core Mercurial because it's almost as big as Mercurial in terms of line count or was

when I looked at those stats, like, 5 years ago. It's incredibly valuable. A lot of people like it, but it's also

bananas, and we don't wanna support it in the same way we wanna support the core Mercurial product. It it does a lot of really strange things. And there's there's other stuff that comes in through extensions too. Like, Facebook has this remote file log extension that makes their clone times

instead of being insane be fine and also speeds up rebase and some other stuff. And, you know, that that was something we probably wouldn't have tinkered with at all upstream if they had asked, but because they can just do an extension and swap out the storage primitives, there it is. Here's remote file log. It seems to work.

And and, you know, it's got a pretty good installed base, and some other places are also using it. I'm guessing the large file extension is another example of that, talking about swapping out low level storage primitives.

Yep. That I don't even know where that originated. And then I think the Kiln Harmony people picked that up at

1 point and sort of polished it and got it to be a a a lot less of a mess. And and now it's something we ship in mainline Mercurio, which is great.

Speaking about its extensibility and flexibility,

can you describe some of the internal architecture and some of the design choices that allow for it to be so flexible and extensible?

Yeah. A lot of it's just about having

conscious layering in the tool, and and there are places where the layering is not as good as we would like. Our our class that is a repository is kind of a god object and really needs to be smaller and do less things.

And we've we've been slowly moving that. But in general, there's this layered architecture. So the the lowest level piece is a file structure called a rev log, and that that's where all the data

for for revision history gets stored, and the the change log and the manifest gets stored in a rev log too. And then there's a layer above that that knows how to interpret the relationships between

a manifest and a file log and how to

chase this path that this revision

is in a rev log that I can find here.

And then at a layer above that, you start gluing on

more interesting business logic around

creating new commits

or running verification

over the repo.

And then on top of that

is a layer of commands

that can execute using all of the lower level data types. So a lot of extensions are just, here's some new commands.

That's

most of what h g subversion does. The it has a couple of features where it's, say, wrapping the diff command so that you can have diff output diffs that look like they came from the SVN tool because it turns out there's a subset of tools that parse SVN's diff output and are extremely picky about, say, the whitespace characters in the diff.

Some tool some things are very monkey patch heavy, so large files is a good example of that. Remote file log is another good example where they sort of inject themselves into part of the tool and

then provide the same interface as the original

implementation, but with some different storage back end or that sort of thing.

Speaking again about some of the translation layers between Mercurial and other source control, I know 1 of the more popular ones is git hg that allows you to actually use Mercurial as a git client. So I'm wondering

what functionality

of Mercurial gets lost when you have to translate to and from

the git sort of semantics and data structures. So in using Mercurial as a client, what are some of the things that you should watch out for so that you don't accidentally put yourself in a situation where it will no longer translate cleanly to Git?

Yeah. Named branches are something that don't have an analog at all in git because they're this indelible label that get baked into the change when you commit it, where whereas a git branch is just a movable pointer that's not actually part of history. Rewriting history kind of gets a little weird, but to be honest, I I haven't used hggit in

several years.

It it it

interacts

sufficiently poorly with,

Garrett code review, which is where I've I've had to send enough patches that, I just kinda got frustrated with it. I I keep it on life support, but I I don't use it for anything but sort of casual interaction. I do have a couple of friends that use it every day, and they they seem to be pretty fine with it.

And are there any analogs with Gerrit in the Mercurial space that you're aware of, or does Mercurial have any comparable

workflow to what Gerrit allows?

Garrett specifically,

I don't know. There have been a couple of interesting code review tools

kinda come and go. There's the Calithia project, which I think has its own poll request implementation now. I'll send you a link to that so that you can not have to figure out how to spell it. And,

Mozilla has done some work with the review board

code review tool. That works pretty well. Fabricator's actually remarkably nice. I say remarkably mostly because my first reaction when I looked at it was, oh, this is written in PHP. I'm gonna run away now. But we've been using it at work, and I've been really pretty happy with it. Nobody's quite doing code review the way I I think they should. You know, not Garrett, not pull requests, you know, but, that that's that's that's another topic entirely.

And in reading through the documentation

and website, it mentions 1 of the core goals of Mercurial is for it to be safe. Can you explain what safety means in that context and how it is architected to achieve that goal?

Yeah. A a big part of that is the principle of least surprise. So users shouldn't do something and end up getting burned

just by having this foot gun that kind of destroys all of their work because they asked it to.

Another big part of that is taking functionality that requires

a significant amount of user thought and hiding it somewhere early on. So rebase is off by default because

if you rebase things that you've exchanged with other people, you kind of enter this world of hurt and probably are gonna need somebody more experienced to bail you out. We're hoping that change that evolution will get us to a point where we don't have to hide some of this stuff behind the curtain anymore. And we we have some things that aren't safe that we would like to take away, but our backwards compatibility guarantees mean we can't. We have this command called rollback, which you should never ever use as it undoes the last

transaction that that we committed to the data store, which could be a pull, in which case it's not a big deal, or it could be a commit you did 2 days ago because you haven't committed since then, and then terrible things happen.

Yeah. That definitely sounds like a frightening

frightening tool.

Safety is intertwined with usability.

The more usable a tool is, the more safe it'll be by default because people won't be guessing.

They'll they'll just make the right decisions on an informed basis and think things will guide them in helpful directions.

It's interesting hearing you say that rebase is considered

kind

of

prerequisite

for doing a merge. So

it's

it's kind of prerequisite

for doing a merge.

So it's it's risky just in the sense that if you've exchanged the code that you're about to rebase and somebody else is dependent on that, you could be bringing a world of hurt onto them. Yes. Yes. That makes sense. And our our wire protocol and this is also part of safety. Our wire protocol, out of the box, doesn't have a way to exchange the deletion of history. So in in Git, there's this notion of a forced push where you can tell the server, no. I didn't mean it. Throw away what you have under the name master, and replace it with this that I'm about to push. And and we we don't have a way to do that. You shouldn't be doing that in git in git either.

Yeah. Ex except, like, when you have a pull request and somebody asks for some changes, and then you need to rebase it so that it'll merge.

Like, it's it's something where Git definitely took the pragmatic approach.

And if I was building a tool and had to make it work today, that's probably where I would go. We have the luxury of being able to be a little bit more deliberate and try and make change that evolution work. Because it it the the the metadata I was talking about with change set evolution a little earlier is really tricky

to exchange efficiently. But if if we figure that out and 1 1 of our contributors thinks he's he's got it

pretty well locked down. He just needs to write it down and run it through some test cases.

We we we might be in a really good spot there. And then we'll actually be able to prevent sort of the the force push accidental problem where you you unintentionally

force push to master and blow away a whole bunch of history. Oh, yes. That famously happened to Jenkins a few years ago.

And what we'll be able to do notionally is you can make all of the stuff that's a permanent part of your project. So it's been reviewed and accepted.

It will be in public phase, and then that's indelible

through the wire protocol. Like, you could still SSH into the server and screw things up and burn yourself that way, but you'd have to go out of your way. And then anything that's in draft phase is just that. It's a draft. It can change. It can get deleted. That's not a big deal. So a lot of this is is what I mentioned before about playing the long game to try and end up with something better. That makes perfect sense. So 1 of the noteworthy aspects of Mercurial is, as we've mentioned, there's a strong focus on making extensions a first class concern in the project so much so that a number of the actual core functions are written as extensions.

Can you describe why that is and how the extensions plug into the core execution engine?

Yeah. You know, some of the extensions are just extensions because they're kind of half baked. We don't like them. So large files

is sort of the the unloved stepchild of core Mercurial in that I think pretty much everybody who works on core hates it, and at the same time, we think it's really great when you need it. We have a Wiki page called features of last resort. Large files is 1 of the things that's listed prominently there because most people that think they want large files actually want a better build system because they they wanna do something like

build a whole bunch of junk and then check-in a built binary from the source code that it's going to live next to. It's easy to see how you end up with that as the solution. Like, you have you have this thing that stores data pretty well, and all all you have is a hammer, so I guess you're gonna pound the nail. Right. But, like, you run into you run into places that are doing really bonkers things with source control. I think at 1 point in the video game industry, it was a done thing to check-in

not only your source code and, like, your 3 d meshes, which themselves are pretty big, but also things like the ISO of the particular version of Visual Studio that this this was going to compile with. Right. Right. Right.

The video game meshes and

needing to check-in ISOs and that kind of thing is more what large files is aimed at, not the I I built a thing and it's really big, and I don't wanna do the build again because it's slow angle. Well, I think that problem that you just mentioned, I somehow need to associate

these multiple

large blobs of binary data

with I I need to put it in source control somehow. You know, I need to be able to track it. And even if I can't diff it, I need to be able to say sort of these are all artifacts that are considered to be part of

this build or this tag or whatever the case may be. And while I totally agree that

doing those kinds of things in your,

source code control system

is perhaps not best practice, I've seen a number of situations

throughout my career where

the company just doesn't provide any other resources for tracking this kind of stuff. I mean Right. You know, sure. Like, you should probably be considering, like, an artifact management system like Artifactory

or something like that, but

maybe you don't have the resources for that or you don't control

the resources that you would need to create something like that.

So what I've seen over and over again is people glomming these things into Git where they just, you know, drag down the performance of the entire repository

horribly.

Well and and you get to tote along that, that 700 meg ISO for the rest of the life of that repo. Yes. Exactly. So so some things are

in extensions just

as a speed bump.

So you you have to go actually read enough about something to see the warning label that says, you don't want this.

No. Really, you don't want this. Okay. If you still think you want this, this is how you turn it on. Other things, you know, rebase and HistEdit.

HistEdit is our knockoff of rebase interactive.

Those are off simply because if you use them on exchanged history, you're kind of in for a world of hurt and or confusion, but most people turn them on. Progress was off by default for a long time because it had some unresolved issues, but now it's just part of the core product. Color is off by default because it's kind of invasive, and I think it might still cause problems on Windows, but I don't have a Windows machine, so I don't know. Other things are

in extensions just for historical reasons or

as examples. I think we have 1 thing still in h g x just as

here's how you can write an extension. Oh, EOL is another feature of last resort. So automatic end of line transformation between platforms. So window Windows users see Windows line endings and all your other users see UNIX line endings.

Shelva is in the extensions area because it's

still a little bit goofy in terms of implementation, and we'd like to sort it out before we kinda promote it to be a first class thing that we think everyone should use all the time. What are some of your favorite extensions to use, whether they're

shipped in the default distribution or ones that you install as extra extensions?

So the the, color extension is pretty much a must a must have in my opinion. It just makes the output a little bit more useful. Hist edit and rebase, I think, are pretty great. There's a new 1 in 3.8

called FS Notify

that uses

a a c plus plus binary called Watchmen,

which abstracts away the different file system notification

systems, whether you're on Windows, OS 10, or Linux. And then it makes status all but instantaneous,

pretty close to independent of the size of your repository.

That that that was something Facebook

decided to contribute upstream in the last cycle, and that's an extension right now because it's still kind of experimental. The last time we had something like this, it was a source a perpetual source of bugs, so we're a little paranoid. Then some some neat third party extensions,

remote file log if you've got a really huge repo, lets you kind of have a a sort of

clone, lazy loading file contents, and then hgsubversion

and hggit. I use a lot less than I used to because most of the work I do is actually in Mercurial now, but I still occasionally will go and need to do some poking at something that's in subversion. I'll just grab the whole history using HD subversion.

It definitely seems like Mercurial,

in a lot of ways, is a more robust and

flexible and in some ways advanced source control client than git is. But it seems that a large part of the reason that git has seen such large adoption is due to the prevalence of things like GitHub and GitLab. I know that

the Bitbucket exists for hosting Mercurial repositories.

I'm curious if you know of any other noteworthy Mercurial hosting options? And if you think that the

dearth of available options is partially due to the fact that Mercurial actually ships with a functional,

server built in as part of the distribution.

Yeah. I don't know.

I I actually attribute most of Mercurial's

sort of marketing problems at this point to

GitHub. I I think that if the tools were compared

in a vacuum, you could only use Git or you could only use Mercurial. No external anything. Mercurial would come out ahead pretty much every time.

But the the existence of GitHub, which sort of has social network ish lock in type effects because everybody's there, and you see the occasional comment on Hacker News where people are grumpy because a project isn't hosted on GitHub. And, oh, I guess I won't bother contributing then. I I really think that it's kind of a a Mercurial versus GitHub competition, and, you know, that's kinda hard to win as an open source project because they're a pretty decent sized

company these days. Bitbucket is a pretty good Mercurial host. I don't know of any others at the moment, that are notable.

Google Code was actually pretty good when they were still around, but they're gone now.

What about open source offerings? You know, something analogous to GitLab maybe? Yeah. Calithia is pretty good. I know a couple of companies that are using that, and the the built in server is actually pretty good. For my own little server I have here at home, I use a thing called mercurial dash server, which

is really just a thing that helps you craft the dot SSH

stuff correctly. So everybody connects as the hguser,

and what they can access is determined by which SSH key they present, which is it's it's pretty minimalistic.

And then I just for the things that should be public, I just run hgweb in a standard wiz key container, and there it is.

There's another 1 that I had used so, you know, a few years ago. I don't know if it's still around. It called Roadcode that was actually a combination of both Git and Mercurial hosting. So Calithia is a fork of road code.

Road code went proprietary

at 1 point Okay. Or sort of proprietary.

There there's some inside baseball I won't get into there. Certainly. Oh, yeah. And so for

somebody who is completely sold on Mercurial and wants to convince everybody of their work that that's what they should switch to, do you have any advice

or potential arguments that they could leverage to try and convince their coworkers and bosses that that would be the right approach?

I I mean, the the biggest argument I'm aware of right now is just go find the Facebook blog posts about how successful it's been for them. They have a bigger repository

than you do almost certainly because, like, the entire Facebook web app is

1 big old repo and all of its dependencies.

And that that sounds

kind of crazy and backwards to a lot of people, but,

monorepos are really nice.

It's having these big monolithic

trees.

Google gave a really good talk at a conference called atscale

last year, I think, about their monorepo.

Mercurial

is capable of handling repositories

with more files in them than Git without getting painfully slow. So I I think in Facebook's blog post, they talked about rebase operations being 50.

That that's a 5 followed by a 0 times faster for their Mercurial users than for their Git users.

Yeah. And that's interesting given the fact that 1 of the common arguments leveraged by users of Git against Mercurial is the comparative speed on the smaller side of repositories

because of Git being

primarily written in c.

Yeah. And and part of that's

sort of unfair historical baggage in my estimation.

There was a there was a dark time when a lot of people installed Mercurial using

easy install before PIP was the done thing. And easy install,

for those who who didn't live through that dark time, did a bunch of weird things with setup tools and, like, kind of mangled your

site customized file in really, really weird ways.

And so you'd have these people who are complaining that Mercurial was super slow. Like, it takes 2 seconds to do anything, and and you'd go look at their system. And it was not that Mercurial was slow. It was that their Python interpreter was bogged down by all the gunk they'd installed. And so they had Mercurio was taking 2 seconds to do anything because

1.9

seconds of that was

Python interpreter startup and processing site customize.

Oh, wow.

Yeah. So, like, on my machine right now, if I just

do, you know, Python dash c print, that takes, like, 0.02 seconds, and time h g version is 0.1 seconds. So it it can start up fast. It's just if you do bad things to your Python, we suffer too, which is Python has been both good for us and bad for us in places. That's 1 of the places where it sometimes hurts. Yeah. And also in in modern,

history,

Python itself has gained a lot of speed boosts in various areas, including startup time. So I'm sure that has improved the, story these days as well. Yeah. That was actually why we didn't look at Python 3 at all until about 2 years ago.

We would kick the tires periodically, and it would be a whole lot of infrastructure churn and and things we didn't like

and then functionality that actually broke us combined with much slower interpreter startup time. We're like, yeah.

We're just gonna not worry about this for now. So some of the changes that landed in 3:5, as I understand it, are because we and the twisted project complained and and said it is all but impossible to port our projects to Python 3.

Please please listen to us.

We're we're we're not joking, and they finally listened.

What were some of those changes out of curiosity?

The big 1 for us was being able to use the format operator on byte strings.

Oh.

Because so Mercurial

is intentionally agnostic about the bits you're storing in it. We don't have a different delta algorithm for binary files versus text files or anything like that. Data is data, and we stored we compute deltas and store them. So we don't actually know when you run hgdiff

what text encoding the file is in. And worse,

the old version of the file and the new version of the file could actually be different string encodings.

So, like, you might have had a file that was Latin 1, and then it got turned into UTF 16. And UTF 16 actually looks like binary data to us,

because it contain it often contains a lot of nulls. It usually contains a lot of nulls. But if you ask us for a diff and you say, no. Really treat this as text, like, we still just need to print you a diff, and it's gonna look like garbage on your terminal, but you get what you pay for or something. And so it was it was actually really hard to convince the the upstream Python developers,

no. Really, we need to print byte strings to the terminal. We need to do string interpolation with byte strings.

We know what we're doing. Help. And we we ran into the same thing with

some of the OS path

functions,

like listing a directory.

For a while,

you you could only give it a Unicode path to list,

and it would only return unicodes back.

And anything that didn't parse from a byte string from the file system into

Unicode using whatever decoding mechanism it was choosing, whether that was UTF 8 or something else,

it would just silently ignore, which is

catastrophic for us because,

in the Unix tradition,

file names are opaque bags of bytes.

Right. Right.

And it is possible but unlikely to have a repo a repository that has

Latin 1 file names and UTF 8 file names

and, my very favorite string encoding for file names called shift JIS file names.

Shift JIS

is notable because it's a widely used ASCII incompatible stringing mechanism,

and it's common in file names in Japan.

So can you share some of the most recent features that have been added to Mercurial and,

some of the future plans?

Yeah. So some of the recent stuff, just looking at our release page for 3738,

we have a a better delta mechanism that we finally turned on in 37.

I think that that,

yeah, that had been in in the code base for several years, but

we kept we kept finding little things where it made things worse, and finally, that's done.

Windows users will be glad to know that as of 37, we have wheel packages, so you don't need necessarily need to be able to compile Mercurial yourself.

A handful of other little things there. And then 38 was actually released,

like, 2 days ago. You actually want 381

because 38 accidentally didn't include a a security vulnerability fix.

CHG is now shipping with Mercurial, which is a

server,

and then you run a a tiny c binary that just speaks to the long live Mercurial. So sidesteps the

Python interpreter's

startup time problem entirely.

Rebase got a little smarter about picking its destinations. There's some nice new toys in the templater about defining aliases for the templater.

Copy detection is a bunch faster.

Our JSON output should no longer be invalid

in some random cases. We used

the the hypothesis library to fuzz our

JSON encoding. Oh, neat.

Yeah. Yeah.

Somebody managed to convince him to come to a sprint we had in London

because I guess he's local to London. And he just

started letting hypothesis run on our code base and, you know, started shaking loose a bunch

of weird bugs.

We actually used a a similar thing to hypothesis called American fuzzy lot recently on our c code and found a

probably exploitable

bug in our c code. So we had to do a release for that about a month ago. I mentioned

FS monitor.

That's that's actually pretty great. That's that's worth using pretty much all the time as far as I can tell. There's an auto move extension that attempts to detect file copies and renames automatically when you commit.

So they you don't have to tell the tool. It just figures it out. It's a time based release schedule, so not every release has a big headline feature.

Right. That makes perfect sense.

A lot of things going on behind the scenes right now for this sort of ridiculously large scale

repository problem that is becoming more and more prevalent and more and more places wanna solve. So Facebook was kind of the the first driver of that, and now there's some other people

poking in the same direction. Where version control is going? I think

more hybrid models

probably

coming.

You you a lot of people don't really need the full distributed workflow. I mean, the the popularity of GitHub is a testament to that in and of itself.

Having the local history is nice. Being able to make local commits is

basically like oxygen to me. I I wouldn't really wanna work without it. But at the same time,

companies really just need 1 history and that sort of thing. I think we're really onto something with change set evolution too. I've been using

the the sort of pre alpha version of it for 3 ish years. It was another 1 of those things that sounded kinda great when I first heard about it. And then when I started using it, it rapidly went from, this is kind of great to, I basically never wanna use a tool that doesn't do this.

And I'm I'm hoping that we will do a good enough job with this that every other version control tool will rip it off. You know, either we'll we'll achieve world domination because we have it or everyone else is going to build it again. 1 of those 2, I hope, happens.

So with regards to the hybrid model that you're talking about, you know, 1 of the complaints I hear with regards to Git and and I'm sure Mercurial is

developers really just don't wanna deal with the concept of having to maintain this locally

kind of detached

repository and then have to maintain the notion of origin and manage the push process or whatever the case may be. They really just wanna think of it in terms of, I wanna make my changes, I wanna make my commits, and then at the end of the day, I wanna check it in.

Like, do you think the hybrid model will return some of that simplicity to developers that they feel pushed away from with the DVCS?

I think we'll probably end up with some of that. Bazaar actually had a feature. I I haven't used bizarre pretty much at all, but 1 of the features they had, which I think was popular, was this notion of a a bound checkout where as soon as you did a commit in that checkout, it just immediately pushed it as well. I think there's some value in that for

sort of the the more casual user of source control. There was a really interesting UX usability study done by some UX researchers that I I can try and track down after we're done talking and send you a link that basically watched

users interact with their source control tool and figured out how they use them. And most users

are fairly unsophisticated and only have a handful of commands they use. And and it's sort of this spell book of things they know how to type.

And if they get outside the the lines that they're used to,

they go to

the person, and it's usually 1 person at an entire company or an entire team if you're at a big company that can bail them out and actually understands the tool. And and so I I think there's room for the the centralized,

decentralized

split going away to an extent and having a lot of the complexity be more opt in.

I think that makes a lot of sense because I think you're right. I think you really hit on something with regards to the very short

spell book

of of, you know, cheat sheet of commands that developers wanna use. And I think that's part of why I see them struggle so hard with with adopting Git because

that's very hard with Git. It's very easy to wind up in a situation where

your spell book doesn't have the necessary incantations.

It's like, oh, now you have to do a complicated rebase operation and then massage your repository thusly, and then you can merge without conflicts. And and developers just look at you like, really? Like, you know, it's they really they really just do not wanna deal with all this. The SCM is not something that

they want to have to think about or wrangle with, really. They just wanna be able to do their little thing and which I kinda understand. Right? I mean, they're they're subject area experts and they want to be experts

on developing code for their particular niche or application or whatever. They don't wanna have to become DVCS

experts and I I think that's that would actually be a step forward to make that possible. Right. And for as as popular as it is these days to hate on subversion and and malign subversion for the terrible tool that it was. Subversion was a revelation when it came out,

both because it was blindingly fast compared to CVS, which a lot of people forget, and also because

the subversion developers,

and and, you know, full disclosure, some of them are friends of mine, thought very carefully about

every flag that got added to that tool. They they understood, as I hope we mostly do in the Mercurial community, that every feature you add is an albatross.

And I remember seeing discussions

about, oh, we wanna add a new flag to this command, and I think it's gonna get used a lot, so I wanna give it a short flag. So instead of dash dash foo, dash bar, you know, dash capital f, or something like that. And they they started having this long thread on the mailing list about whether or not it was worth spending 1 of the 52 bullets.

Right? Because you basic with with each command, you have 52 bullets in the gun for a 1 1 character flag. And when the gun's empty, it's empty. You don't get any of them back.

And there there was a real concern about whether or not they could spend another 1. I don't I don't remember any more of the context than that, but that really stuck with me from when I worked on subversion a little as a hobbyist.

I think subversion did a lot of things really well, a lot of things very right, and I think UX is 1 of those things. And I think that's why companies move from subversion to Git. I've seen a lot of actually pushback from developers saying, you know, subversion works great. I love subversion. And and they don't, in a sense, don't care about the advantages that Git brings to the table because very often,

the pain that Git solves

is the

giant merge that you have to do when, okay, now you release branches, time to merge back to, you know, the main line or whatever the case may be. That was typically Bob who does the merges

or, you know, 1 1 particular victim developer who had to get stuck with that work.

So funny story about that. At my last company, I had HD Subversion work basically working. And it's we had this rotation

in our team where

every sprint was we were doing scrum or whatever scrum means. We we were doing something we called scrum. Right?

Every sprint, somebody got to be the release manager. And by got to me, I mean, had to be. And that meant you were doing all of the integration

merges of the feature branches at the end of the sprint.

And

at at some point, it was my turn, and we always budgeted, I think, 2 whole days to do all the merges for all the feature branches. And I think it took me about 45 minutes. Wow. And the PM looks at me, and he's like, what did you do?

And, like, oh, I did all the merges in Mercurial and then, like, did a little bit of poking and then sank them all back into subversion, and I'm done.

And I accidentally became the guy that does all the merges

because, well, it was quick, and at the time, Subversion's merge logic was not as good as it is now. And and so I was getting a better end result with no human interaction, and things were just working,

as opposed to if you did it in subversion, it was gonna be 1 branch at a time and kinda tedious and a bag of hurt. I I think most of my team ended up using h g subversion by the time I left the company. And as I was leaving, they were starting to switch to Mercurial, which was kinda kinda funny timing wise. I never got to use Mercurial there, but, I won the war sort of.

So do you have any other questions that you, would like to cover or any other topics that you'd like to bring up? I don't think so. The the other thing that we've got coming in the future, I'm hoping, is a lot more upstreaming of stuff we've learned from big companies. Big companies are

so nice as a developer tools

open source project because they can do things like take every single

interaction with Mercurial

and store it in a big old database.

And then you can say, how often does this get used? How often does this get used? And then the user runs help immediately afterwards

and actually start figuring out what your rough edges are in a a more structured way. So Facebook has a

extension that tweaks a bunch of the default behaviors that I'm I'm starting to look at as a a guideline for ways we could have an opt in flag to make a little bit less surprising.

Yeah. That's definitely useful having that rich history from a number of different users all interacting with different portions of such a large code

base. Yeah. And and it's a broad spectrum of users. You had users who refused to move off of Git until they have their 1 pet feature,

which,

a handful of those we picked up in Mercurial. And you had users who, you know, as you say, they wanna do their job, but they don't really care what the tool is. The tool's mostly in their way, and and we got really good usability feedback from those users too.

So for anybody who wants to follow you and keep up to date with what you're doing, what would be the best way for them to do that? I have a Twitter account that I don't use a whole lot. It's Duran 42. And, Yeah. Most of what I work on in open source is on Mercurial. I have a a bunch of things that I I want to have time for, but I never seem to. So with that, we will move it on into the picks.

For my picks today, I'm going to choose a book called Sapiens, A Brief History of Humankind.

I was listening to it as an audio book

and just recently finished it, and it's actually a really well done

treatment of

the history of homo sapiens

back from prehistory

and when we branched off of Neanderthal and Cro Magnon man,

and

the impacts that we had as we spread out into the world, both on

the different native species

and also the other humanoid species

and all the way up through to the modern era,

taken through the lenses of sociology,

psychology,

economics,

ecology.

It's just a really well done book, definitely worth a listen, and it leaves you with some

interesting questions to ponder at the end of it.

And my next pick is going to be

a keynote presentation by Vanessa Hurst

for,

I believe it was, the Velocity Conference at O'Reilly. It's called Cultures of Continuous Learning.

It's a pretty short video just talking about the

benefits and some of the approaches taken to foster continuous learning in your organization and what it can yield for the people who work there.

And I will pass it on to you, Chris. Thanks.

My first pick is a series of videos published by O'Reilly.

As a

Safari Books online subscriber, I get this as part of my subscription.

It's called intro to Django,

not very creative, but it's a really, really great series of videos. The woman who,

does the narration

is

really

very good. In fact, she's very clear, easy to understand. The videos are broken up into very sort of bite sized chunks.

It's it's perfect. It's, you know,

very easy to work alongside

her with the videos and build the the, examples being built. It's just been it's been really,

eye opening for me. Great stuff.

My next pick is a pub

pubcast, jeepers. A podcast from NPRPRX

called,

Transistor.

In each episode it's a science podcast, but each episode is

something really kind of unusual

and and, and or mysterious in the science world.

Really, really good stuff. My last pick is also a podcast from NPR slash PRX

called Embedded.

Each episode

is it's basically an investigative reporting podcast. Each episode

delves into a really different,

journalistic,

experience,

like,

you know, gang violence in Guatemala

or,

other things other sort of it it it's just really, really

good investigative

journalism, like, the kind of thing that you wish, you know, was going on on television but pretty much isn't.

Really interesting stuff. Auggie, what kind of picks do you have for us?

So I've I've got a couple of books. 1 is,

the tip of the iceberg on a series.

Leviathan Wakes is they're both science fiction books. This 1 is being turned into the TV series, The Expanse, which is how I found out about it, and I've read all of them that I've been able to get my hands on from the library. And then the other book is 3 Body Problem, which won 1 of the big science fiction awards recently.

I don't remember which 1. It's a translation of a Chinese

science fiction book,

and it's it's fantastic. And the translation is really well done. It it brings you up to speed on all the cultural bits of Chinese history you need to know to understand the story's framing.

Both of those are really good. And then a piece of software that I've been enjoying a lot lately is Prometheus, which is a monitoring package.

I'll I'll dig up a link for you for that because it's kinda hard to search for. Yeah. I've, come across that 1. I came from SoundCloud. Right? Yeah. It's it's the least infuriating

monitoring software I've I've worked with, I think. Monitoring software seems to be necessarily terrible on some axis, and

this actually

has not annoyed me too much.

Well, we really appreciate you taking the time out of your day to join us and tell everyone about your work with Mercurial and, all the different benefits that it has to anybody who is fortunate enough to be able to use it.

So thank you very much for that, and I hope you enjoy the rest of your night. Of course. You too.

The Python Podcast.init

Summary

Brief Introduction

Interview with Augie Fackler

Keep In Touch

Picks

Links

The Python Podcast.__init__