Summary
Digital books are convenient and useful ways to have easy access to large volumes of information. Unfortunately, keeping track of them all can be difficult as you gain more books from different sources. Keeping your reading device synchronized with the material that you want to read is also challenging. In this episode Kovid Goyal explains how he created the Calibre digital library manager to solve these problems for himself, how it grew to be the most popular application for organizing ebooks, and how it works under the covers. Calibre is an incredibly useful piece of software with a lot of hidden complexity and a great story behind it.
Preface
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so check out Linode. With 200 Gbit/s private networking, scalable shared block storage, node balancers, and a 40 Gbit/s public network, all controlled by a brand new API you’ve got everything you need to scale up. Go to podcastinit.com/linode to get a $20 credit and launch a new server in under a minute.
- Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email hosts@podcastinit.com)
- To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.
- Join the community in the new Zulip chat workspace at podcastinit.com/chat
- Your host as usual is Tobias Macey and today I’m interviewing Kovid Goyal about Calibre, the powerful and free ebook management tool
Interview
- Introductions
- How did you get introduced to Python?
- Can you start by explaining what Calibre is and how the project got started?
- How are you able to keep up to date with device support in Calibre, given the continual release of new devices and platforms that a user can read ebooks on?
- What are the main features of Calibre?
- What are some of the most interesting and most popular plugins that have been creatd for Calibre?
- Can you describe the software architecture for the project and how it has evolved since you first started working on it?
- You have been maintaining and improving Calibre for a long time now. What is your motivation to keep working on it?
- How has the focus of the project and the primary use cases changed over the years that you have been working on it?
- In addition to its longevity, Calibre has also become a de-facto standard for ebook management. What is your opinion as to why it has gained and kept its popularity?
- What are some of the competing options and how does Calibre differentiate from them?
- In addition to the myriad devices and platforms, there is a significant amount of complexity involved in supporting the different ebook formats. What have been the most challenging or complex aspects of managing and converting between the formats?
- One of the challenges around maintaining a private library of electronic resources is the prevalence of DRM restricted content available through major publishers and retailers. What are your thoughts on the current state of digital book marketplaces?
- What was your motivation for implementing Calibre in Python?
- If you were to start the project over today would you make the same choice?
- Are there any aspects of the project that you would implement differently if you were starting over?
- What are your plans for the future of Calibre?
Keep In Touch
- kovidgoyal on GitHub
- Website
- Patreon
Picks
- Tobias
- American Gods by Neil Gaiman
- Kovid
- Into Thin Air by John Krakauer
About how an expedition to climb Everest went wrong. Wonderful account of the difficulties of high altitude mountaineering and the determination it needs. - The Steerswoman’s Road by Rosemary Kirstein
About the spirit of scientific enquiry in a fallen civilization on an alien planet with partial terraforming that is slowly failing.
- Into Thin Air by John Krakauer
Links
- Calibre
- KDE
- Caltech
- Sony PRS500
- Linux
- Kindle
- Kobo
- ePUB
- Calibre Recipes
- Rapydscrypt NG
- Goodreads
- Qt
- PyQt
- build-calibre
- Kitty
- DRM (Digital Rights Management)
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to Podcast Thought in It, the podcast about Python and the people who make it great. When you're ready to launch your next app or want to try a project to hear about on the show, you'll need somewhere to deploy it. So check out Linode with 200 gigabit private networking, scalable shared block storage, node balancers, and a 40 gigabit public network, all controlled by a brand new API, you've got everything you need to scale. Go to podcast in it.com slash linode today to get a $20 credit and launch a new server in under a minute. And visit the site at podcastinit.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. And go to podcast in it.com/chat to join the community and keep the conversation going. Your host as usual is Tobias Macy, and today, I'm interviewing Kavith Goyal about Calibre, the powerful and free ebook management tool. So, Kavith, could you start by introducing yourself? Hi, Tobias. Thanks for having me. My name is Kovid Goel, and I'm the creator of Canva. I've been working on Caliber for almost 12 years. It started out as a simple way to use
[00:01:12] Unknown:
ebook devices on Linux with us, with a simple operating system of choice. But since then, it has morphed into, I think, the most popular and widely used legal management solution out there. And do you remember how you first got introduced to Python? Let's see. I think, I think there were 2 things that introduced me to Python. 1 was a fun 1. I used to use the KTE desktop, and this is way back 15 years ago. And there was something called Super Caramba that used to we used to draw fun little widgets on the desktop. And these were programmed using Python. I think that was the first, the first my first introduction to Python.
Also, at the time, I was a grad student at Caltech, and I'm still in some of my numerical work in quantum computing simulations. I used to use Python. That was the the 2 things that introduced me to Python when when it came time to work on Caliber, it started the national service.
[00:02:04] Unknown:
And so can you give a bit of a description about what Caliber is and how the project first got started?
[00:02:10] Unknown:
Alright. So, ebook readers are sort of, dedicated devices that are used for reading books. And in 2006, Sony released the first ebook reader. I think it was called the PRS 500, in the United States. And I was a grad student at the time. And I'm a bookworm, so I used to read a lot. I used to actually have a handheld GPS called the MyTech Neo. And I had sort of acted to be able to read simple text files on it. So naturally, when the otherigator device to do this was released, I was all over it and what I purchased the 1st day. Sadly, when I got it home, I realized that it did it doesn't it did not work with Linux. It used a proprietary USB protocol, and it required Sony software, which is available only for Windows and Mac.
So, you know, being aggressive and having a bit of time in my hands, I decided to reverse engineer it. And, there was this community of, people enthusiasts on mobidery.com. And we got together, and some of them helped me reverse engineer the protocol. And I wrote something called libprs 500, which was basically, have the ability to, 1, talk to the device, running on any operating system, and 2, to convert, ebook files to the format that Sony used for its, devices or something called LRS, which was crazy binary format that would die in a fire. So then now that was how Calabrio started. The PRS 500 initially was just a, you know, hobby project that I released to the world in the hopes that I could use to the people. It turned out to be quite popular. So, eventually, in a in a couple of years, I sort of renamed it to Caliber and, made it work with a whole bunch of different, leading devices in different companies and support a whole bunch of media formats. So it's been growing ever since.
[00:03:56] Unknown:
So when you first started working on it with the Sony ebook reader, the landscape of devices for being able to do dedicated e reading was still fairly small. But over the intervening years, there have been a large number of new e readers and tablets and phones and other types of devices that people might want to load their ebooks onto. So I'm wondering how you're able to keep up to date with all of the different devices and being able to support interfacing with them in terms of the wire protocols and being able to load and retrieve the file formats from those devices, particularly given the rate of change within that product category?
[00:04:39] Unknown:
So, the key to doing this is that Caliber itself has excellent data introspection capabilities. We can plug in an unknown device and and run a command which will output a whole bunch of information about the device and how the device works. So that allows me to support a lot of devices without actually having access to those devices just from the introspection information. The other thing is that while it is true that, to start with, there were a whole bunch of devices that were all very different nowadays, things are more or less settled down. They're basically 2 large families of devices. The Kindles from Amazon and the Kobo devices, from Kobo. Yeah. The rest are usually from the phones, tablets, and, you know, those kind of things that are generated for those devices. And those NAV speaks of standardized protocols like, master which protocols.
So, the most of the device protocol development work in Caliber is now more or less done. It's pretty stable. You know, Caliber can connect to your device over a wire wirelessly As for, I mean, yeah. Another another another sort of big thing that helps me, keep Calabrio supporting new devices is that we have a very active and helpful user community. And so people who get access to a new device will typically run the, you know, introspection capabilities and send me the information very promptly so I could then get them to release support for the device in a day or 2. So just in fact, just last week, Amazon released a Kindle, people pay for white 2018.
And so within a day of its being released, I received a bug report with the with the debug information A couple of days later, I made a new calendar of these supporting device. Although I don't have the device myself. And when it comes to ebook formats, there are now basically, you know, 2 major ebook formats left. There's EPUB, which is something of a standard. And, there is the Kindle family of formats, which actually now has 4 or 5 different incompatible formats. But Amazon keeps producing new ones every few years. So, yeah, those are the 2 major formats that people make to convert ebooks to. There is, of course, a large, number of formats that people are using wrong. For example, you cannot convert from office documents, eDOCX files, Libriox, ODT files, RTF files, text files, PDF files, and, you know, various order, e book commands, just f b 2 or movie, so on. The list is in this. I think if you look at the Caliber menu, you'll see there's some drinking or formats.
Yeah. So, so the the the design of Calabrio that enables you to do all this, is that it's it's basically the conversion as a pipeline. So there are input plugins at 1 end, and there's common code in the middle. And there's output plugins at the other end. So the input plugins all convert books, whatever the initial format, into an HTML based, sort of internal format, conversion pipeline that works on that internal format to do all the various features that Caliber conversion supports. And then the output plug ins take various different base format, whatever the output format is. This, design was was made very early in Calabrio's history, back when I didn't really know what I was doing. But, nonetheless, it's it's been pretty good for us and it's paved, largely unchanged in in in in the overall design for over the decade, which speaks well of its
[00:08:00] Unknown:
initial construction. And as you mentioned, Caliber has a large community. It's been around for a long time. So I'm curious if you can discuss some of the main features of Caliber and the ones that you have gotten the most feedback about that people find useful for their own particular use cases? So the the the main features of Caliber are ebook conversion, obviously, getting
[00:08:23] Unknown:
books from 1 format to another, sending, interfacing with ebook devices, and then you're managing the collection of books on your devices. So Caliber will automatically convert books to the the optimal format for a device when you plug it in and send send the books to that device. Another feature that's, that's that's very widely used and, actually has the most active set of third party contributors is the recipe system. So Calabrio has, a system whereby you can write small Python scripts that are called recipes that allow you that allow Caliber to, scrape websites and convert them into ebooks. They're typically used for publicly accessible news websites. So for example, the New York Times, The Economist, Wall Street, Germany, and so on. There are literally thousands of recipes in Calabrio. Most of these recipes have been contributed by users. And, you know, since websites' formats keep changing, this whole, area of caliber is probably the most actively developed because, it requires, endless updates to this piece and virtual formats change. So the the nice thing about the system is that you can sort of schedule, you can schedule Caliber to to download your news every morning and send it to your device, giving it to you every night. And that's pretty widely used in the beginning. It has millions of users. Just that feature alone. And what else? Calabrio has content server, so it can expose your Calabrio library over the Internet. You can access it using a web browser. You can add books. You can merge. You can edit metadata.
All the major features of the book can be accessed via the content server. And in fact, the content server is is relatively recently being, was, redesigned for Calabrio 3 and, rewritten using RapidScript, which is a sort of Python right language I worked on for Canabo that allows me to write Python in the browser, basically. And, yeah. So that so the content server is is is still being worked on. So it has a it has a integrated ebook viewer in it, so you can read books in a browser without needing them, especially in software on your phone. And, well, let's see what else does what else does what other major features. Oh, it has an editor, which was, which was a big addition. So, originally, there was this program called Sajid, which people used to use to edit EPUB format books. And that, at 1 point, it's maintained a step down. And for a while, it looked like no 1 was willing to take over. So I decided to write implement an editor in Calabrio itself. And the editor grew to be able to edit EPUB and, well, as a z w 3, which is the Kindle format closest to EPOG.
You can use the editor to, you know, edit books. It has live as you type preview. Now it allows you it really is to do things like embed forms, even speech, grammar check, so on and so forth. So, yeah, these are, I think, some of the highlights of Calabrio's features.
[00:11:07] Unknown:
And in addition to all of the built in capabilities for the core project, there is also a plug in ecosystem that has continued to grow. So I'm wondering, 1, how it's implemented as far as being able to extend the project and some of the different integration points, and then also some of the most interesting and most popular plug ins that have been created for Caliber.
[00:11:32] Unknown:
Right. So the internal design of Caliber is highly, motivated by the principle of modularity. So Caliber itself is implemented using plugins. There are a whole bunch of built in plugins. You go if you start Caliber when you go to preferences and plugins, you'll see that there are literally hundreds of built in plug ins. Everything is a plug in. Conversion is important output formats of plug ins. Metadata download each, you know, individual source of metadata as a plug in. The viewer has plug ins. The each each individual ebook device is a plug in, so on and so forth. So, you know, the the design of Caliber is highly plug in based internally anyway. So adding support for 3rd party plugins is really easy. You just, Caliber will mhmm. It's really easy to create a Caliber plugin. It creates a file with your Python files inside. It followed the Calabrio plugin API, and you can basically do anything you want in the Calabrio plugin. There are no restrictions, with the plugin, it's running an environment that has full access to the Calabrio program, all its capabilities, all the state, everything. So this is, of course, dangerous in something like a browser where, you know, people can't can't always trust plug in. Good calibers that way, much smaller and has a has a more friendly community. So so so far, there has been plug in malware. So yeah. So, so, the plug ins are sort of all, sort of there are different types of plug ins for different parts of Calabrio or parts that work with the database, parts that work with the conversion, parts that work with devices, and so on. So if you wish to write a Caliber plug in, you just need to read up on that on the API of that particular model calendar and write your plug in. So coming to, popular plug ins, interesting plug ins, there are, plug ins too. For example, download metadata. For example, there's a Goodreads plug in that that's very popular.
Download metadata from the Goodreads websites. There there are plugins for things like, convert Wikipedia pages to ebooks. You can just put in the Wikipedia URL, and then we'll go and, download that that particular page, and we get into a ebook. This sort of leverages the recipe system also from your voucher area. Other are plugins to do things like count pages in ebooks, you know, put them into your library so you can have a indication for what size your your books are. There are plug ins that deal with, things that Caliber doesn't do out of the box. For example, comics.
There are, so you can Caliber does not write metadata to comic files, the CBC and CVR files, because there isn't really a good standard for it. So probably the plug in that works. We have popular with some portion of the minds for it. Other plug ins do things like, yeah. 1 1 way popular about plug ins is a fantastic fiction, sorry, fan fiction plugin, which allows people to download fan fiction from various, websites and promote it into books in a category like really seriously, and get updated automatically when the stories are updated. There are there are plug ins that, improve Calabrio's integration with particular devices. For instance, the plugins, you know, do lots of things with Kobo devices.
Auto convert books to the Kobo specific k ePub format, which has certain nice thing nice properties compared to ePub format with the device. There are plug ins that deal with managing collections on the Kindle. This requires routing the Kindle, so it's not part of Canon itself. There are plug ins that allow you to manage your caliber, interface. So there's a there's there's something called a view manager plug in that, you know, allows you to, specify sort of use of your caliber library containing different subsets of books, their ICs, and so on. So that's quite popular as well. So, yeah, I mean, there are literally hundreds of plug ins. Other plug ins to, automate ebook cleanup tasks. Like, lots of e pub, ebook publishers don't do very well with formatting and coding in many ebooks approaches. Plugins to automate cleaning up some of that. Or there are plugins to check the quality of books in your library, flag commonly found mistakes and how to correct them. That kind of thing.
Yeah. It's
[00:15:34] Unknown:
it's a good summary, I think. And you mentioned that the entire architecture of Caliber is now highly plug in based where each of the different core features are themselves plug ins to the core engines. I'm wondering if you can discuss a bit further how the overall project is architected and how that has changed over the years that you've been building and maintaining it? So the overall structure of Calabar is highly modular.
[00:16:01] Unknown:
It's sort of inspired by the principle of, you know, each part of Caliber should focus on 1 thing and should export a well defined interface the rest of Caliber can use to do that 1 thing. So, for example, I I talked about the conversion pipeline a little while earlier. So there's there's there are input plugins and there's this common area or conversion engine and then output plugins. Similarly, in the the device subsystem, there are sort of, with with the there's a core engine that has all the common logic for talking to devices, sending books to them, upgrading metadata, and so on. There are plug ins with which your devices override little bits or supply little bits of functionality that before cannot have as device specific. So this this basic, this basic architecture was baked into Calvo pretty early on. So it hasn't changed much.
It's served as well and it's, you know, it shows every sign off, continuing to serve us well. The, so for example, the, the recipe system for downloading news into ebooks, each recipe is basically a plugin. And, there are a whole bunch of built in ones and users, not just programmers, and users can easily create recipes for, of their own liking, provided the RSS feeds. So yeah. The so every part of Caliber is basically architected in a model of fashion. There is a core engine that tries to keep all the basic and common and difficult logic within it. And then there are plugins that provide a bit of functionality that that core engine, sort of, needs that are specific to whatever the the task at hand is. So it's almost as if, the core engine is sort of a a central command, and each plug in is a sort of worker or processor that that does specific tasks at the direction of the central command. So, yeah, the the the architecture of Calabrio is, in some sense, the central command and control with whole bunch of sort of worker processes. Not they're not actually processes, but you can think of them as processes. But the delay is device specific or specific tasks. So, yeah, that's that's that's the basic architecture. The the the biggest, the biggest sort of evolution in Calabrio has been basically fixing mistakes I made because I was a much worse programmer 12 years ago. So various subsystems of Caliber have been rewritten over time. So for example, Caliber 1.0 was rewrite of the library database back end, and that's provided an almost an order of magnitude speed up. The nice thing that made it possible to do was that the library, the the database part of it was, again, behind a well defined API. So I could keep that API in place, more or less, and just, you know, rewrite the internals so it wouldn't affect the rest of Caliber. So yeah. I mean, most of Caliber's modern development has been either rewriting all parts of Caliber or adding new features within this overarching framework of the central command with the plug in to provide specific, implementations.
[00:19:06] Unknown:
And 1 of the main ways that people interact with Caliber is via the GUI, which isn't always the strongest piece for Python and its ecosystem. So I'm wondering what you're using for building the graphical interface and any sorts of challenges or issues that you've had to overcome in the process of maintaining Caliber, particularly as GUI libraries have come and gone in the ecosystem?
[00:19:32] Unknown:
So I was fortunate in that I chose qt or Qt and PyCute as the libraries to use right at the beginning. This was because, if you recall, I told you about KDE, and KDE was was and is based on Qt. So I chose that as, right in the beginning, and that's been pretty stable. I mean, it's been around for the entire history of Caliber. And, so so Caliber's main GUI is written using q t. And, the biggest challenge, Shirley, with the GeoEye has been so initially, in the early years, I used to try to have to give Caliber a platform specific look and feel. It looks like it looks like it behaves slightly different on Macs, on Windows, on Linux.
But over time, I realized that this is sort of a losing battle because what happens is that you sort of end up in an uncanny valley where things are somewhat similar but not quite similar to native programs. Programs are using whatever the the default tool kit format platform is. So you sort of have people who get to who, you know, who complain that things don't behave the way they do, they should, or they do in other programs. And at the same time, you are sort of restricted because things things work differently in different platforms. So, you know, it's harder to do support. It's harder to do a new features with each feature. We tested in each platform. So I I at at at some point, I changed that over to sort of having a unified look and feel by default on all platforms. So Calabrio looks the same regardless of whether you're running around in those Macs or we have something. More or less the same. There's things that fall rendering so on. It's still that font specific. So that was a big, simplification and a big, more I think positive thing in Caliber's history. I still get people complaining about Caliber not being native looking program. But on the other hand, there are lots of people who like the fact that it has consistent look and feel at most platforms. So, yeah. The GUI part of Caliber has not been a pain point. It's been pretty smooth sailing, especially.
[00:21:28] Unknown:
And I imagine that the choice of cute as the language for implementing the UI has also helped in terms of the packaging of the project because that's another area that has been fairly difficult to deal with in the Python ecosystem for a while, particularly given the number of platforms that you're supporting. Right. So so packaging Python programs is a huge pain.
[00:21:52] Unknown:
So I ended up writing an entire, an entire project of several of tens of thousands of lines to package calibers. It's there in my GitHub called it's called build hacking caliber. So it basically allows you to issue a single command and build caliber for all its platforms. It builds them in in sort of c h root jails on the apps, and it builds them in VMs for Windows and Macs. And it it it takes care of all the the, you know, the the collection of dependencies, packaging of the thing, of various libraries that are not widely available, the slimming down of Python for embedding to application, all that sort of stuff. So, I mean, that was a huge thing that has evolved over 10 years 10, 12 years that I have sort of to maintain personally. Because none of the available out of the box solutions or Python packaging are anywhere near where they need to be, to work this out for something at the scale of Canon. So I actually reuse this infrastructure for Kitty as well. Kitty is only available on the laptop and the next, but but the same infrastructure is used to build it for those 2 platforms. So, essentially, you know, this is my personal Python build infrastructure that I'm going to use for all my Python projects going forward, and it's just been Python. And as you mentioned,
[00:23:16] Unknown:
the, time span that you've been working on Caliber, I'm wondering what has continued to motivate you to keep working on it and how sustainable the project is overall, given the amount of time that must be required to keep up to date with the, regular release cycle
[00:23:35] Unknown:
and the improvements and evolution of the project over the years? So, 1 of the lucky things things where I lucked out was my choice of Python. So Python has an extremely, high development velocity. So I can deliver features, bug fixes, etcetera, with much less effort than using most other languages. Of course, I could be biased by my large familiarity with Python, but I I strongly feel that this is 1 of the things that that that led to Caliber's success. As for my own personal motivation, I mean, I've been working full time on Caliber for, I think, 8, 9 years now. So I actually make enough money from donations. Our sources of income that I can afford to work full time on Calabrio. It's been pretty sustainable so far. Been doing it for almost a decade. And, yeah. And and and, technically, it stays interesting enough because Caliber is is large enough and has a wide enough set of things that it covers, that there are always new things that I have to learn to, to implement new features. So, I mean, I've I've I've done things as diverse as rendering graphics primitives in the PDF format to writing a new browser based language to reverse engineer USB protocol, you know, dealing with various networking issues, SSL issues, writing code to deal with, also some primary file formats. So, I mean, the the the thing is diverse enough and challenging enough that I don't feel bored working on it. And from around the time, I do different things. Like, I contribute to another project called Kizzi, which is a terminal I'm relating also.
So, these these these types of so, yeah, I've I've I've been pretty consistently motivated to work on Calabrio for a very long time now. So, you know, as long as the, income streams still remains the way it has been, I think Calabrio is pretty stable. Plus, there are no guarantees. Yeah. It's pretty it's pretty sustainable. It has it has a big community. Also, few other players who are familiar with code base. Not as familiar as me, but familiar enough that, you know, if I was on a mobile bus, somebody would probably be people with me. It's always great to hear when a project is successful enough
[00:25:47] Unknown:
to allow the primary developer to dedicate their full attention to it. So it's great that you have continued to receive enough support to make that a viable option. And another thing that I'm curious about is, as we mentioned, the evolution of devices that Caliber is interfacing with. I'm curious how that has also evolved in terms of the use cases and feature sets that, people are looking for within Caliber as their needs change or as their particular, usages change in terms of the way that they're consuming and creating and obtaining material for reading on their devices? Mhmm. So,
[00:26:29] Unknown:
Calibre is mostly for biblio files. People like to read long form literature. Honestly, literature, but long form text. So that basic use case is pretty much constant. You're only you're only interested in Calabrio if you are the kind of person who likes to be long form text and likes to maintain his or her own collection of such text. So, fortunately, there are millions of people in the world who fall in that category. So Caliber has a reasonably large pool of users. So, the biggest change, the biggest change in Caliber's history has been the rise of the smartphone and the tablets.
So to start with, Caliber worked only with, dedicated devices, the Sony Ringers, Kimbells, or Bluetooth. But, nowadays, a lot a lot of people use it with smartphone or tablet. And, so so a big part big change that had to be, that had to be done in Calabrio was to add its add the content server, which allows you to access a collection using a browser, read it, manage it, etcetera. So that that that was a big demand from people who use, smartphones and tablets. Because, you know, just plugging you might do you don't typically plug those using a dual computer using USB cables, browser stuff. So, you know, having a network to way of accessing Calibos is a must, and I I think that we have addressed that pretty comprehensive right now. So that was, I think, 1 of the biggest evolutions that Calabrio itself had to go through over the years as people's usage patterns have changed.
[00:28:01] Unknown:
And in addition to the fact that the project has been around for a long time, over that period, it has also, as you mentioned, become sort of the de facto standard for ebook management and library management. So I'm curious what you see as being the reason that it has gained that level of popularity and maintained it, and what some of the other competing options are for somebody who's looking for a way to manage their collection of ebooks
[00:28:29] Unknown:
and digital texts? So I think the biggest the biggest single factor in Kalaver's success was simply timing. It existed it existed, before anything else did in a much better form. So I I told you about how I started working on Caliber when Sony first released its ebook reader. So Sony had a software, companion software for its, reader that only worked on DX and Macs, and it was pretty bad. And then Adobe released the Adobe Digital Editions, which was also pretty bad. So, essentially, Caliber sort of cornered the market early and once gained that momentum and never lost it. And the secret to not losing it is, being responsive to your users, been fixing bug reports promptly, responding to queries promptly, continually evolving the products. So 1 of the 1 of the, 1 of the biggest forms of positive feedback that I get from Caliber users is how happy they are that Caliber has a regular release cycle. There's always innovating or resending new few things or just fixing parts, becoming smoother, faster, more streamlined, efficient, nicer experience over time. I would we would really appreciate that sense of evolution. The other thing that that I've been really careful about is to not make changes just for the sake of change.
So the Caliber UI, if you compare it from 8 years ago and now, is not significantly different. There are no startling new UI paradigms. It still looks more or less the same. I mean, it's it's smoother. It's more polished, etcetera etcetera. But there are, so but I very rarely break user facing things, you know, if I can help it at all. So, I think you have caliber users. So that so that kind of thing, the the stability married to the constant improvements and progress engenders a lot of loyalty and calibers of the base. So we actually have some statistics there about people who donate to Caliber, and a very large fraction of them actually donate more than once, which, at least in my experience, is very unusual.
So the most important thing was the timing. The fact that Caliber existed when nothing else did and was much better than whatever existed whatever, existed at the time. The other the other the other big thing is that the main competitors of Caliber in this space have been companies that want to sell ebook devices or create an ecosystem of of of their own. For example, Apple with iBooks. And because they so so Caliber's only focus is to be the best ebook manager can for its users, whereas its competitors' focuses have often been to lock people into an ecosystem or to sell more devices or to promote their ebook format or, you know, some things that are not directly in the users' interests.
And so the fact that Caliber, aligns well with a with with with with its users' goals, I think, has helped it
[00:31:22] Unknown:
stay ahead of the competition. In addition to the devices and platforms that you're supporting and the multiple formats, there is a significant amount of complexity involved in being able to convert between them and support the different book formats on the different platforms. So I'm curious what you have found to be some of the most challenging or complex aspects of being able to interoperate between the various book formats and being able to convert between them and any sorts of issues that you've encountered in terms of loss of fidelity or loss of information as part of the conversion process?
[00:31:59] Unknown:
So the biggest challenge with dealing with ebook formats is proprietary formats. So there are so Amazon, for example, has a bunch of proprietary formats. They all have to be reverse engineered. And so, you know, when you're reverse engineering a binary format, you're gonna be a 100% sure of how it's intended to work. And then there is no there is no spec that you can code against. You just have to see what what what Amazon software does in certain different situations and sort of try to replicate that behavior in your own software. And this is it's a moving target because Amazon happily changes things and if you like it. So this is the single biggest challenge. I mean, reverse engineering proprietary formats is the single biggest challenge with the with ebook formats. The other sort of, challenge is when is different formats have different semantic information. So, trivial but very often encountered problem is that, say the PDF format has no concept of book covers. You typically just take the 1st page of PDF as you get the cover. But say when you're doing something like updating metadata in a PDF and you wanna change the cover, you can't sort of replace the 1st page because there's no way to be sure that it's a cover. So you just have to add in a page. And if you do that repeatedly, then you end up with multiple covers. So, yeah, the the the format has no concept of cover. So there's no way to sort of reliably detect when a PDF hit first page is a popular or not. So so similarly, when you convert between, Office formats like like Docx, to ebook format there, A whole bunch of semantics of HTML and c CSS, which is what ebooks are based on, that don't match the formatting and display semantics of office formats. So you so if you so you can actually convert from docx and to docx. So if you if you do a round trip, there is a whole bunch of the resulting topics that you get will look very similar to the original topics, usually.
But internally, there'll be a whole bunch of semantic information that will be lost in that process. So, it's this kind of mismatch between the capabilities and semantics or formats that is the biggest the 2nd biggest challenge in in converting between formats. And I guess the other the other thing is just simply finding the time to do this because, you know, you could cover the 90% use case with 10% of the effort, but getting all the common cases right and getting all the, you know, weird interactions between things in obscure formats right is painful and takes a lot of time. So a big help for doing that is having, active and a helpful user community who make, you know, meaningful bug reports and attach test cases and try to reduce things and simplify things. So that makes the work of the developers actually doing the work much easier. Calabrio has been fortunate in that it's your community is pretty good. You know, people are helpful. People are, polite. People are trying to do as much work as they can themselves. So just expecting us to do everything for them. So, it it works out pretty well in the end. And
[00:34:44] Unknown:
another aspect of being able to obtain and manage a private library of electronic books is the prevalence of, as you mentioned, proprietary formats and DRM restricted content that is available through major publishers and retailers. So I'm wondering what your thoughts and experiences are on the current state of digital book marketplaces and the availability of unencumbered formats and unencumbered books that people can obtain for being able to read on their own?
[00:35:18] Unknown:
So, as far as formats are concerned, you are best off looking for EPUB. That's the most open and the most documented and most widely supported format. And when it comes to DRM, different publishers, different retailers, vendors have different DRM policies. Some of them are DRM. All their books, some of them are DRM books at the authors or publishers request. So you should look for a sources that are DRM free if at all possible and prefer those to build your collection. But even if but if you can't, personally, I feel that if you bought a book, it's perfectly ethical.
I don't know if it's legal, but it's perfectly ethical to remove the DRM from the book. So there are plenty of tools available to do that. Fortunately, none of the DRM systems of major ebook retailers are unbroken. Yeah. So, I mean, if you want to build a collection, you pretty much, in the end, have to deal with DRM either by boycotting publishers who publish DRM books or removing the DRM. It's you can't really build a collection that contains only DRM. The work is highly fragile. Yeah. So you so so the ideal workflow for somebody who is building a collection using Caliber would be to buy the books, you know, DRM and add the book to to the Calabrio library so that it's very present for them for all time. And I think that it's a real shame that that DRM exists because it doesn't work as as is evidenced by the ease of removing it. It only inconveniences, people who want to do things the legal way. And, and it and it makes it makes the entire sort of ecosystem much more fragile.
So tomorrow, if say, some major, vendor of books was to disappear, the entire collection of PRM's works will become accessible. This has happened several times in the past. It's not all of your hidden scenario. So, you know, if instead everybody if instead people were were encouraged and allowed to maintain their own collections, things would be much more robust. Any individual book would exist in thousands of individual libraries. You know, it could always be recovered, regardless of what happens to the people who own the master copy as well. Yeah. By forcing DRM upon users of different publishing platforms and different marketplaces,
[00:37:29] Unknown:
it tries to lead to lock in for that particular platform. So it makes it much more difficult for somebody to be able to obtain books across different publishers or across different retailers and just leads to a lot of cognitive overhead of being able to remember what you have and where you have it and manage your library across multiple different platforms. So I definitely agree that allowing the books to be able to be purchased legally and easily and being able to manage your library in the format and the location that you see fit would greatly increase the utility and popularity of ebooks in general. Yeah. No. Absolutely.
[00:38:06] Unknown:
And, actually, my wife actually runs a website called, DRM Free, which is which lists books, a curated list of books that, you know, that are published without DRM. So, you know, and I I I I had to set up a website literally whenever I can to advocate for the DRM. The only purpose the DRM serves is lock into wall gardens and vendor, you know, in their software. It it it doesn't actually protect against, you know, copyright infringement. So it's, yeah, it's it's it's like, why.
[00:38:40] Unknown:
And so in terms of the Caliber project, you have been building it in Python for a number of years. And I'm wondering if you were to start the project over today, do you think that you would make the same choice of language or if there are any architectural or design decisions that you would do differently if you didn't have the weight of legacy behind the project? So I definitely choose the same language As for architecture
[00:39:05] Unknown:
and design, the overall architecture and design would also be pretty much similar. There are various works, obviously, in, you know, in project of caliber, size, and age that I will do differently if I could. But they're all relatively implementation based things, not design based things. I'm I'm really pretty happy with the overall design of Calabrio. And and Python has served us really well, and I would totally choose it again if I were to start Caliber Caliber again today. So the biggest challenge as far as Python is concerned and Caliber is concerned is migrating to Python 3. Caliber currently only works in Python 2. So that's a pretty large task that is proceeding slowly. I don't know if it will it will happen in time for the 2020 deprecation 2. But if it doesn't, I actually maintain my own, fork of Python, tool. So I will get to use that until as much time as the migration. And what are your plans for the future of Caliber? Just keep doing what I've been doing.
Add new features, the things. So So Caliber is completely user driven. So new features are added, by user demand. Add new features, you know, polish things, add support for new devices as they release, new ebook formats as they, are invented, that sort of things. So Caliber is both fun to work on, and I think it serves a important and useful purpose for society at large. So pretty,
[00:40:24] Unknown:
optimistic or pretty determined to keep doing what I've been doing as long as I can. And are there any other aspects of the Caliber project or your experience of building and maintaining it and growing the community or ebooks and ebook management in general that we didn't discuss yet that you think we should cover before we close out the show? So something, I guess, a bit of advice that I'd like to give fellow open source,
[00:40:48] Unknown:
software developers is, try to make it as easy as possible for nontechnical users to use your software. I mean, assuming you're writing something that's in there, nontechnical market. This means, you know, having things like installers and things like tutorials. Like, 1 of the things that is early on, Calibre is making video tutorials and tour of Caliber. And I've got lots and lots of positive feedback about that. So I do, you know, go the extra mile to make things a little bit easier. The other thing is, targets slightly that you'll be working on Linux or Macs if you are working on open source software. But but don't neglect Windows because most of your users will serve with Windows. Yeah. And, the other thing is don't be afraid of having bugs in your software. Our users are fairly forgiving as long as as long as you address their concerns and fix the bugs promptly. So, yeah, I think that's,
[00:41:36] Unknown:
that's that's what I like to say. And so for anybody who wants to get in touch with you and follow the work that you're up to, I'll have you add your preferred contact information to the show notes. And with that, I'll move us into the picks. And this week, I'm going to choose the book American Gods by Neil Gaiman. It is a very interesting and well written and engaging book, and it's a great piece of modern mythology. So I've read it a couple of times, and I thoroughly enjoy it. So I definitely recommend that to anybody who, is looking for a good book to read. And with that, I'll pass it to you, COVID. Do you have any picks for us this week?
[00:42:13] Unknown:
Actually, can I get back to you on that? I I I actually do have a few picks, but, I want to review the list before I for my analyzer. I didn't I didn't have time to finish this before. I'll send you an email with the with the list of my picks, everybody.
[00:42:27] Unknown:
Absolutely. So I will add your picks to the show notes once you get those sent over. So for anybody who is interested in that, they can refer back to the show notes once the episode goes live. So, with that, I'd like to thank you for taking the time to join me today and for your work on Caliber. It's a project that I have used on my own for quite a while now and have gained a lot of value from it. So I appreciate all your efforts, and, I hope you enjoy the rest of your day. Thank you. Thank you for taking the time to interview me, and it's been fun talking.
Introduction and Sponsor Message
Interview with Kovid Goyal: Introduction
Kovid's Introduction to Python
The Origin of Calibre
Keeping Up with Device Changes
Calibre's Key Features
Calibre's Plugin Ecosystem
Calibre's Architecture
Challenges with GUI and Packaging
Sustainability and Motivation
Evolution of User Needs
Calibre's Popularity and Competitors
Challenges with Ebook Formats
DRM and Digital Book Marketplaces
Reflections on Calibre's Development
Future Plans for Calibre
Advice for Open Source Developers
Closing Remarks and Picks