Summary
What is internationalization, when should you add it to your program, and how do you get started? This week Dwayne Bailey and Ryan Northey tell us about their work with Translate House and the different projects that they have built to make translating your software easier.
Brief Introduction
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- I would like to thank everyone who has donated to the show. Your contributions help us make the show sustainable.
- When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at linode.com/podcastinit and get a $20 credit to try out their fast and reliable Linux virtual servers for running your awesome app.
- Visit our site to subscribe to our show, sign up for our newsletter, read the show notes, and get in touch.
- To help other people find the show you can leave a review on iTunes, or Google Play Music, and tell your friends and co-workers
- Join our community! Visit discourse.pythonpodcast.com for your opportunity to find out about upcoming guests, suggest questions, and propose show ideas.
- Your host as usual is Tobias Macey and today I’m interviewing Dwayne Bailey and Ryan Northey about Translate House and the process of internationalization and localization for software projects.
Interview with Dwayne Bailey and Ryan Northey
- Introductions
- How did you get introduced to Python?
- Why did you get involved in localisation, what got you started?
- How would you describe the difference between internationalization and localization? Are there cases where it makes sense to only do one of those things?
- Why should people localise software into other languages?
- Translate House is an organization focused on localizing and internationalizing software projects. To that end there are a collection of projects that you develop and maintain. Can you briefly introduce each of them and describe their purpose?
- What was the first project that was created in that list and how did it lead to the creation of the other tools?
- At what point did you decide that creating an organization to own and support the tools that you were building was the right choice to make?
- You run a distributed organisation, how do you manage that?
- I was recently speaking with Michal Čihař about the Weblate project and he mentioned that he uses the Translate Toolkit for handling the low level aspects of managing the translation files. What are some of the architectural and design challenges that arise from needing to support so many different systems for managing source text and translations?
- How do Pootle and Virtaal compare to other tools for web or desktop based translation? Are they primarily used for translating software or do they get used for other sources of text as well?
- Given that Virtaal is intended for use on desktop systems by people who aren’t necessarily technically adept how have you approached the packaging and deployment aspects of it? What are some of the challenges that you have had to overcome?
- Given the fact that multi-lingual translation requires interacting with a large quantity of text in numerous alphabets, what kind of impact has the unicode handling in Python 3 had on your projects?
- What do you have planned for the future of your projects?
Keep In Touch
Picks
- Tobias
- Dwayne
- Ryan
Links
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to podcast dot in it. The podcast about Python and the people who make it great. I would like to thank everyone who has donated to the show. Your contributions help us make the show sustainable. When you're ready to launch your next project, you'll need somewhere to deploy it. So you should check out linode@linos.com/podcastinit, and get a $20 credit to try out their fast and reliable Linux virtual servers, running your awesome app, or experimenting with a project that you hear about on the show. You can other people find the show, you can leave a review on iTunes or Google Play Music and tell your friends and coworkers.
Your host as usual is Tobias Macy. And today, I'm interviewing Duane Bailey and Ryan Northey about Translate House and the process of internationalization and localization for software projects.
[00:00:54] Unknown:
So, Dwayne, could you please introduce yourself? Hi. I'm Dwayne. I'm a South African who develops in on PUTL, which is part of the Translate TAS tools. And, yeah,
[00:01:08] Unknown:
that's me. Yeah. Hi. I'm I'm Ryan. I'm currently the lead developer at Translate House. Work closely with Dwayne and work with lots of different software projects
[00:01:18] Unknown:
to localize their their their products. And how did you each get introduced to Python? Dwayne, how about you go first? So I was I mean, the date's 1996.
[00:01:26] Unknown:
I was doing some work on a newspaper, online newspaper, and to hack up some of their code. And, it's the first time I hit Python, and I was completely intrigued by space indenting. And at the top of all the files was a copyright header that said Mark Shuttleworth. So it was probably his first code before he went on to found thoughts and then Ubuntu. And how about you, Ryan? I've been working for an organization called Community Technology,
[00:01:50] Unknown:
and we've been mostly using PHP, various other kind of early web technologies. I think this is the late nineties. And we've really started to use Linux, and I started looking for a language that allowed me to assist admin tasks more. As things progressed, PHP became not quite so secure and so on. So gradually, I moved most of my web projects over to Python as well. And how did you each get involved in localization?
[00:02:16] Unknown:
So my thing was, in South Africa, we have 11 official languages. So I was just fascinated. I'm an impassionate open source person, and I was quite intrigued. Like, could we use open source to address a language need in the country on a technology platform? And I just started investigating how to localize, and and that's kind of how I got involved. So I was first really trying to localize into local languages. That eventually grew. We ran things across Africa. And then the tools kind of came out of that, which was developing tools that forced serious problems or missing things in the ecosystem space. But, ultimately, it was about helping people have computers in their language so that they could actually be part of the digital age. And how about you, Ryan? Well, my background,
[00:02:58] Unknown:
was in anthropology. Via a strange route, I got into coding and became very inspired by open source software. In truth, I I haven't been I haven't got anything like the kind of experience that Dwayne has in the in the localization field. But my goal really has been about increasing participation in the web, and, you know, thankfully, met met Dwayne and found out what Translating House have been doing over that time, and, I've got more and more involved in in localization. And and it seems like a great vehicle to increasing that diversity and participation in the web. So
[00:03:30] Unknown:
for people who aren't familiar with it, how would you describe the difference between internationalization and localization?
[00:03:36] Unknown:
And I'm also wondering if there are any cases where it would make sense to only do 1 of those things or if they're just different words for the same thing. Well, definitely not different words for the same thing. They are kind of in the industry part and so it's seen as 2 different things. The easiest way I found to to think about it is internationalization is the things that you need to do to your software to make it possible to localize it. So the simple things like well, they're not necessarily simple, but things like if you're gonna display a date, if it's internationalized, you would be allowing that date to be displayed differently.
It's kind of you could think of it as mostly as a 1 soft thing. Localization is then actually making that piece of software work in another language. So if I wanted to do it, to run the software in Afrikaans, you know, I would be needing to then define how data work in Afrikaans and do the the actual translation. And that's the process of localization and that you could think of as the ongoing process. As you add a new language, you repeat the same steps. And as the software gets updated, you you reupdate the translations. So internationalization is the kind of make it possible. Localization is actually they're doing it. The current 2 ends of interweave, so you definitely can't do localization without internationalization.
If you're gonna do anything, you should internationalize. And I think my big thing for people looking at it is you better you better move quickly from looking at to actually doing it because if you leave it late in your design process or your implementation becomes really difficult to implement. So if you haven't thought internationalization, you've probably hard coded dates and layouts and all that, assuming certain things. So you kind of wanna do it early even if you aren't actually delivering in other languages. So you do need to do internationalization first. But, ultimately, the best internationalization
[00:05:22] Unknown:
comes from the kind of feedback that localizers give you to say, oh, this actually doesn't work in our language. Following up on on what Dwayne said, I think there's also, a difference on a on a human level in the in the sense that internationalization is something that's much more done, in terms of the software, whereas localization is is, at least in our work, has been much more working with localization teams to to get knowledge about their languages and and how to best represent things within their language, then getting that back into the internationalization
[00:05:51] Unknown:
process in a sense. So for people who are starting a new project or also people who already have an existing project, are there cases where it doesn't make sense to do the internationalization step followed by localization? Or is it something that, in your opinion, at least, some everybody should be doing from the get go?
[00:06:11] Unknown:
You're asking localizers if if localization should be optioned. I mean, my thing is every case where I've met people to sell, we don't really need this. Like, either they don't have a market or they don't see a market. When you change the paradigm slightly, like, let's do community localization where the, you know, the cost of actually providing doing the translation is not necessarily something born in monetary terms. If they haven't done any of those process before, they can't really leverage that. I mean, I literally can't think maybe software that that's on the Mars Rover is something you don't wanna localize.
[00:06:44] Unknown:
Yeah. Yeah. I'm trying to think if there's if, you know, for example, there there is some software that's specifically designed for particular language and so on, but I I would say that's probably not the best approach overall. And I say that the Internet has just meant the need to localize as as now cut across many different types of software. And when we really see, our community growing both from 1 end, you know, big corporates to the other end guy sitting in his bedroom developing games and so on. So I think the need to localize or or or the opportunities presented by localizing have have really opened up to a much bigger group now. And so
[00:07:22] Unknown:
I can't think off the top of my head either really where I would suggest or advise to someone that they should not bother doing that. So for anybody who where the software that they're building is actually going to be used by end consumers in some for whatever definition of consumer, then internationalization
[00:07:39] Unknown:
and localization are definitely worthwhile endeavors. Yeah. Definitely. And I think compared to when I started this, the platforms and the the processes from the internationalization side are just much, much better. And so it's not a lot of work that they that they need to do. You know, if they're targeting certain platforms, like, or if they're targeting mobiles and all that, Even the move in the last few years in terms of what's possible. You know, you used to have struggle with things like on an iPhone. They were you're limited to languages that Apple kind of had approved, and now you kind of have much more freedom to do any language. So the freedoms that are enjoyed now are much bigger than kind of when we started. So there's really very little reason at the moment not to. I think the the big,
[00:08:23] Unknown:
you know, blocker to to someone adopting those processes really is how much overhead they need to learn and how much they need to adapt their process to do that. And I think that's more applicable in the sense of internationalization because you don't necessarily want to make, localizing to loads of target languages that you're not necessarily gonna use for your immediate goal. But building the internationalization into your software processes early on will offer a huge amount of benefit down the road if as and when,
[00:08:49] Unknown:
that becomes important. And I think that from our side as well, like, we're very passionate about languages, and there's they're they're not very good words for them, but often people will talk about minority languages. But we are talking about minority languages of 100, 000, 000 speakers. So the opportunity of those, you know, those markets is amazing. And I think we've looked at stuff where we say, well, how do we make it possible for people to to make it relatively easy to onboard a language like that because I think the traditional very traditional model of the 10, 20 years ago was it's very costly to add each new language, and I think it's not as costly, but there are process costs, and engagement costs. And I think some of our tooling is really built around how do you make that almost a 0 cost to an organization. So it would be easy to say someone arrived and said, oh, I wanna translate your product into Hindi.
You know, that you could just say, go for it. We we've got the processes and platform, and we're not, you know, we're not worried about the cost to us of actually delivering that. And I think that's where we feel like we've we're getting, you know, a measure of success where we're achieving our social objectives of getting more languages translated, but we're also doing that by making it easier for software developers.
[00:09:57] Unknown:
So I think there's also a dimension to this in terms of open source software development, and I think it really benefits communities. You know, as we work more and more with different localization teams, they're quite often well, there's let's say there's a big overlap between the local open source guys, the people who are contributing contributing to other projects anyway. And so there's a huge amount of benefits to open source projects in trying to engage localization communities to build their contribution and their their user base and so on with their developer's base. So
[00:10:28] Unknown:
you both work for an organization called Translate House that's focused on localizing and internationalizing software projects, and you have a number of projects that are contained under that organization. So I'm wondering if you can briefly introduce each of them and describe their purpose and how they relate to each other. So the tools that we've developed,
[00:10:47] Unknown:
really, each 1 of them is emerged out of the need that we had, within our own communities and that we saw in the localization space. The primary 1 is Appootle, which is a community based translation platform. It's a web based platform that allows people with pretty low technical skills just log in and and begin translating. We have the translate toolkit, which is really a toolkit that's used by other pieces of software, ours and others, to provide format support for the various localization formats, but also to provide some localization engineering and tools that people need. So being able to count words and strings and, being able to grip through translation files.
The third 1 is, which is a translation memory server. So for people not familiar with the terminology and localization, that's just a massive database of previous translations that we're able to to try and match close matches. And the last 1 is for tall, which is a desktop translation tool. So we were faced with situations where people have really poor connectivity. So how do we deliver a high quality translation tool for low skilled people, that don't have connectivity? And that covers the the primary 4 tools that we've developed over the over a number of years. And what was the order of creation of those tools? So it kind of goes the toolkit first. And the the primary reason for that is we were localizing various products. We realized we needed to translate a number of things to have an impact for our language. So so some of the key targets were things like Mozilla and OpenOffice at the time, but we're translating kde and gnome and some of Red Hat.
And so we had some pretty good tools for translating PO files, 1 of the formats in open source. But Mozilla and OpenOffice brought, a number of formats. Mozilla about 4 or 5 formats in OpenOffice, a really strange thing that just looked like a massive spreadsheet. And no there were no tools to translate those. So we so our first task was, well, could we convert these strange formats into PO so we could use great tools and we could consolidate them? And that's that's really why I built the the toolkit first. People, we we then built just to for the problem of trying to like, we just struggle to install software on people's computers.
Some of the translators would be translating from work where they weren't allowed to install software, and the web just provided a really easy paradigm. We run translateathons with some communities sometimes. And in 1 instance, we had a 1, 000 people translating in Uganda. So being able to do a web based tool that you could just deploy very quickly made it possible to just just get people translating as fast as you could.
[00:13:26] Unknown:
And then Amangamal Patel came after that. Yeah. I don't I don't have a huge amount to to add to what Duane said. I think our focus now is is probably worth mentioning. Our focus now is mostly around Pluto because it allows us to build communities around localization and and beyond the the challenges of the software itself. The mission for us really is localization. So the best platform for reaching people is, I think, the way to do that. And 1 thing that I was thinking of as you were describing your work on Amagama
[00:13:55] Unknown:
is that in a lot of languages, the context matters to determine what the appropriate translation would be. So I'm wondering how you manage to
[00:14:05] Unknown:
be able to capture that or if it's largely just a sort of word for word translation where you have seen this particular word or phrase before, and then you offer up the translated version in the language that the translator is working in. So maybe just clearing up it so people don't get confused with what it does. So it's not doing machine translation at all. So they're just looking for like, you've got a string that you've seen before. Could be a phrase. Could be a single word. You know, if it's a single word in a menu or it's a a sentence in in some help text, and we're looking for that in the database. So we're just doing edit distance matching or other techniques to try to find a close match, and that's kind of the technique. So we don't really solve the context problem. So I think there are things we'd love to do in Amalgamma that will help solve that. But the context problem for people that don't necessarily know the languages is certain languages, it all is English for 1, the noun and the verb are the same word.
Open word. That's not a very good example, but with the same word in English is a verb or a noun in another language. So we're finding a match for open in English. Because what we're doing is we're finding an English match and finding your translation that relates to that. So it doesn't necessarily solve that problem. But where it's really helpful, is sometimes it's just difficult to for new translators certainly to figure out what the word should be. Very new languages often having to coin a lot of words. So it's a technique to really make it easy for people to to translate a little bit faster and also to be pretty consistent in terms of their translations.
[00:15:34] Unknown:
Yeah. I think the consistency would definitely be an important piece as well because if you've translated it before and then somebody else goes to translate the same word, but they mistakenly add a typographical error, then being able to catch that with a suggestion saying, oh, did you mean this because
[00:15:55] Unknown:
mean, it's the same type of problem. So is if other people have translated, mean, it's the same type of problem. So is if other people have translated. So finding the consistent voice between multiple translators. So in open source where you've got volunteers, that's that could be a large problem where you've got teams of even 2 or 3 people which would have a a different voice or use different words that allows that. But even when people are using commercial translators, over time, multiple people will work on the translation. So there's a technique to keep consistency through throughout all the products that are being translated.
[00:16:29] Unknown:
That that kind of relates to terminology, which in in some cases, people are actually setting the the terminology as much as they are finding the terminology, if you like. And so recently, we've been working with an organization in India where in India, there's many different languages the government has to support. And even within each of those languages, the same word was being translated into lots of different ways. So what they've done there is is is create a cross India project for effectively setting the terminology in different Indian languages so that you can get that consistency across translations.
[00:17:05] Unknown:
So what were the motivating factors that caused you to decide that creating an organization to own and support the tools that you've that you were building was the right choice to make? So the the interesting thing about the organization is is is some of our objectives were there were the things that set up, like putting organization structures in place before we even built tools. So when we were starting work, we weren't developing tools. We were, engaging and building localization communities and addressing the localization problem. And part of the organization there was just to to be able to engage with governments and with, nonprofits as an organization in addressing a problem.
And when the tools are born out, it was the the perfect vehicle to carry that. So having an organization has been really helpful to do that. The tools themselves, the the the licensing of them, is actually within a nonprofit organization. So whereas an organization are really passionate about the open source nature of of the tools that we develop. And we've seen in the space, you know, some open source tools go proprietary, It's quite critical for us to be to remain steadfast on kind of keeping these tools open source, mostly because we need great tools for language communities that need need the services. So that's been built into the organization we've developed.
And the other realization, which came a little bit later is maybe a little bit of frustration. We built a great tool. People started using it. People would reject it because it didn't do something, but we never really heard about that. And so it was really a vehicle to be able to say to people, look, you've got this feature. You you need to integrate it with other tools. Like, the people who are developing these tools are available, to engage commercially to to actually address this. Some organizations some large organizations are using platforms like Poodll. And for them, it's about how do we ensure the sustainability of the software and how do we, engage in terms of business continuity. So so just some of it is not just support, but they need they just need to know that the software they they've invested in. And some of them built some pretty sophisticated localization processes. They need to make sure that the software continues. And and so it's been an amazing vehicle for us then to be able to communicate that we do support this professionally.
And we've got vehicles to do that but, still remain as a piece of a proudly open source piece of software. And the organization that you run
[00:19:25] Unknown:
is, if I understand correctly, distributed 1. So I'm wondering, what are some of the challenges that you face in managing an organization
[00:19:32] Unknown:
of that structure? Orion suffers from most of it. Well,
[00:19:37] Unknown:
I think we've learned quite a lot of lessons over the last couple of years and gradually built more more collaborative processes. We are widely distributed. Our core developers are distribute distributed across, Russia, Spain, UK, South Africa. We also kinda travel quite a lot for for 1 thing or another. So we really have to be able to work in a in in a kind of very distributed way. We tend to use, either open source or open source focused platforms for for doing that. We, obviously use GitHub a lot, and, I think that's that's been pretty critical to our development process, and also, allowing us to communicate with the wider community because beyond the the core devs, you know, there's quite a lot of contributors, both long term and and and people who who dive in just because they want to solve a particular problem.
As a core team, we have daily scrums. I think that's really critical to our process. You know, we are all working pretty much on our own in our own, environments and so on. And just in terms of, focus, it can get kinda lonely if you're not if you're not having those opportunities to talk through the things that you've done and and what you hope to do, you know, from there. So I think that's a pretty critical part of our process, you know, beyond the software.
[00:20:51] Unknown:
We're also trying because I think, we don't have necessarily a face to face communication, so that can be a little problematic. So 1 of the things we try to do is get together half yearly or every quarter, just to have face to face with us as a team and and either do that as a sprint or just try work through more complex problems. So we we haven't been able to avoid the fact that we do need to get together. Time zones can be a little bit of a problem when we're working together, but we kind of sync around UK time.
So we have you know, I think some of the things I've struggled with is we have had times when we've had developers in, in Australia and that's proved, proved quite difficult. So the time zone shifting thing is not something that I think you can completely solve in the kind of process we followed. We were pretty successful at making it work, but it is very different. If you're trying to look at some stuff like agile development, most of the stuff written requires face to face in the same room type stuff and trying to replicate that or find
[00:21:52] Unknown:
models that that allow some of that to work for us, has been trial and error and experimental. But I think that in part, the tools have improved. And in part, we've we've kind of learned better processes. So, we we we, you know, we tend to sort of jump in and pair program together whenever there's a problem, which where, you know, 2 heads would be better than 1 and, you know, combining that with just with voice communication, we all know each other pretty well. I think it's fair to say even if we if we're not having as regular contact. But at the same time, you know, I think as Dwayne said, nothing really does substitute for collocating.
And so, again, this is something we're sort of learning the best ways to approach it, but particularly around either conferences or other kind of meetings that we have to we have to do. We're trying to sort of build in time to spend with each other, you know, as a core team. Yeah.
[00:22:45] Unknown:
The whole concept of distributed work and distributed organizations is definitely something that has been around for a little while, but it's still evolving in terms of the appropriate norms and methodologies for making it work. And, of course, every team is different, so it's kind of hard to build up a sort of community knowledge of how best to approach these kinds of problems because of the fact that there are so many human factors involved. But I think it'll be interesting to see how this general trend of work environment continues to evolve in the in the coming years. Yeah. Definitely.
[00:23:20] Unknown:
I I think with with this is mostly we've answered in the term in terms of, our core team, but I think part of what we'd really like to do and and the challenges that we really face right now is to build that community of localizers, especially around not necessarily around a particular open source or or free software product, but around the open source and free software community. So as localization needs arise, you know, there is already a a kind of body of people there that are able to to address those. And I think part of that's about, you know, learning what's in it for localizers and and beyond their desire to see software in in their own language.
It's about working with young people and working out, you know, how this gives them career opportunities or how this gives them a community or how this allows them to interact in ways beyond their their immediate environment. So I think there's a kind of there's an answer to this about how do we how do we as an open source and free free software community, how do we build localization into our collaborative processes more generally? I think the the stuff that I picked up on messages from Jacintha Ryan was was that we wanna be distributed
[00:24:24] Unknown:
because we wanna include voices from other people. And I think my own story is a is a really critical thing of why other voices are important. We understand languages in our context of South Africa, mostly European languages are call them monolingual in a country. You speak French in France, German in Germany. And so people were tackling problems as if languages were kind of geographically located. And they're also pretty mature, and they hadn't really dealt with the with the language team that was dealing with 10 languages. So we would do strange things, like we had to do some of the basic internationalization stuff, but it was really painful to do it 10 times. But as you realize, most of those other teams had experienced some someone had experienced some pain at some time, but it hadn't been worth them documenting it. Because once the pain was gone, it was gone. They hadn't needed to build any tools. So some of our tooling was to address our own problem, and I think that voice needed to be heard. And I have a friend who calls a blowback localization. So Pluto is developed in Africa to address an African need, but then it's globally useful. So all the people who are benefiting it from now are actually benefiting it because an African group needed to address the problem. So I think that the importance of those other voices is critical for for what we wanna do. And on a somewhat technical level for projects that are in the process of
[00:25:41] Unknown:
translating the source text and doing the localization process, when you only have a partial translation completed, is that something, would you start exposing those localized strings as they are produced? Or is it something where you would want to wait until
[00:25:58] Unknown:
either a critical amount or the totality of the text has been translated before you actually expose it for the end user? So, I mean, the kind of mine put there would be it it really does depend. It's like, what is the piece of software? How do you structure stuff? So you can structure things to say, well, let's make sure we translate the most critical things that everyone's gonna see first. Because I think if you look at any it's that kind of 80 20 rule. 20% of the UIs is all you need to do because most people people don't need to know all the variants of Blowfish encryption to and all you know, it's somewhere in your app.
So if you can prioritize important things first, you can get away with very little translation. But that means that you have to be able to do that in your localization platform or in the tooling. My thing as well is it does I I weigh it up in terms of the language and the community. It's much much more important that the community is rewarded for the work that they do. So if it's a friend and his mates, they're they're 10 of them. Like, just get it out there early so that they can brag and build a community. Because because that was that hampered some of our work is that people wouldn't really let us release software because we needed to build a community. So we're like, but we wanna use your software to build our community.
And there are only 5 closet users, and there probably will be 10 in a week or two's time. In a year, there'll be a 100. So the damage is pretty minimal. So my my general feeling is there's some point that it will depend on your software. 80% rule is probably a a good 1 if you if you wanna suck it out of the thumb. You know, if the people at 80%, they're mostly done. And the other thing I would say, if you want some sense of confidence is, if the translator is actually using the software on a daily basis, which might seem like a strange thing, but there are a lot of people who will translate stuff they don't use. But if you have someone translating VLC and they actually watch all their videos on VLC, everything that needs to be translated will be translated.
So you can kinda be assured of
[00:27:48] Unknown:
that. So those are kind of 2 things. I would say that question probably would would get answered differently in the in the free and open source communities as it would perhaps in the corporate. I think in the free and open source communities, really, localizers want to see their localization into production and and and in the software as quickly as possible. And and and just in terms of that reward cycle, I think it is really important to get it you know, you can partially translate it up up and out there. I think in in the corporate world where they're much more concerned about how their voice comes through in in in in any particular localization.
They probably want a tighter review schedule to make sure, you know, that that any localization that goes into their product is reflecting their voice in that language, and I think that's quite a big challenge. You have to trust your localizers really to tell you whether or not the localization is reflecting your voice if if you don't speak that language. And so it it's a slightly different process, I think, in the in the sort of corporate world than it is in the community world. Also in the corporate world, though, you'll you'll often have people commissioned to do certain translations, so you will get a 100% translated.
[00:28:52] Unknown:
But I think Corpus will all discover that even paid translation is not necessarily good translation. It does get done. And if you choose well, you can kinda work out how to guarantee that. But I think your metrics will be slightly different if you're paying for translation. But even Microsoft doesn't translate everything. So there's some point at which you, yeah, you make decisions.
[00:29:12] Unknown:
To reduce the amount of content. In many respects, I think that, really, that's that's where the free and open source community really does have, you know, some advantages is that we can get localization. We can build localization communities in a way that we don't have to I understand. And worry about corporate voice. You know, we can reflect global voice instead. Yeah.
[00:29:33] Unknown:
Well, I think 1 thing 1 part of the narrative in terms of where we sit at the moment in terms of localization generally for minor minority and marginal languages is that some of the changes in terms of Microsoft translating much more stuff was actually precipitated by people translating free and open source software, so it set a precedent of needing to translate. And so it's led to much, much more being translated. So Google and Microsoft are pretty well translated to quite a lot of languages nowadays. Whereas, 15 years ago, a company that translated a lot of languages would do 35 and mostly you could get away with doing 15 would be quite a good good 1. And I think people are doing many, many more languages and big products now. Yeah. And I'm sure that part of that too is driven by the increasing globalization
[00:30:23] Unknown:
of industry and just communication in general because the Internet has been able to reach more places as we build out other methods for actually creating that interconnect because before it was everything was hardwired landline. And now that as we're developing more wireless technologies that are may that make it easier and cheaper to bring the Internet into more rural and undeveloped areas. It brings more people online and increases the need for them to be able to actually interact with the content that people are producing. Sure.
[00:30:56] Unknown:
When when we kinda started this and looked at mobile.nucleus in the form that it does now Yeah. We'd have long debates about the value of translating and whether people in rural areas even needed it or could even, you know, access it. And that kind of debate just became new to when Everyone has a mobile phone. And, I mean, not everyone is on the Internet. Like, you know, we we must have pretend about some of the realities of that. But suddenly, like, should I translate this phone into this language is a question of, do you wanna sell more minutes? And then that I met some Indian cell phone providers who tackled the problem like that that said we could make a lot of money if we could sell really cheap calls to lots of people. And the only way we could get lots of people is that having a Hindi phone was not optional. It was the only thing that they could do, and it had to be cheap. So the mobile space, I think, has changed radically the way we think about what should be translated and and who we'll consume.
[00:31:51] Unknown:
Yeah. And in in itself, I think it's the point you alluded to is is it has a hugely transformative nature, you know, including so many more people into that global conversation. And I think well, for us in organizations, how do we how do we make the Internet reflect those voices better? So at the
[00:32:09] Unknown:
technical level of the actual translation apparatus itself, I know that there are a number of different standards and formats for being able to actually encode the various translations into the source code that you're working with. So I'm wondering, what are some of the architectural and design challenges that have arisen in your work with the translate toolkit to be able to provide an abstraction over the top of all of those various
[00:32:33] Unknown:
systems? TCK is certainly developed a little bit slower because we've got a wider community of users, and it's used at a lower level. So we have to be very careful when we make changes to make sure that that that it remains compatible and we're making the right changes and so on. I mean, historically, I think, as you said, there's been a lot of challenges in terms of supporting multiple encodings and working with, you know, not only different formats and and not necessarily well specified formats, but formats that people have kind of adapted themselves to kind of meet their own needs. So, TTK has kind of grown quite organically over the years to meet those various needs.
Internally, it tends to represent different formats in a pollike way opposed to kind of get text standard. And it's been pretty robust. It it doesn't necessarily cover all use cases, but it, you know, like other standards, it's it's kind of been adapted and and overloaded, if you like, to to kind of meet different people's needs. So internally, TTK represents pretty much everything in a kind of like way, and that allows it to do, you know, things such as diffing and and kind of comparing different files. Python 3 is obviously a factor here. It's kind of making
[00:33:47] Unknown:
the encoding more standard and and making Unicode more standard. Yeah. But the 1 funny thing to think about with TTK is that if you leave a bunch of programmers in a room, usually, they have no localization experience. They will invent a new localization format.
[00:34:04] Unknown:
It's like the x k c d cartoon. You know? Like, there's already 15 formats. We need 1 that that'll encompass them all. And there's now 16 formats. So I think 1 of the things we've also been looking around that is I I think taking a step back and, like, what what are really the priorities there? And I think compatibility is a a priority. Possibility is a priority. I think brevity is also a priority. You know? We looked at XLIF quite a bit at whether or not that's a a good way to internally represent the different formats. I think it possibly is because it's very extensible, so we can kinda make it, you know, without having to do anything outside of the spec. If you like, we can make it meet or to cover all of those different formats.
I think it also provides a a, you know, a way of having, data portability, and and, obviously, diffing and so on. I think where it falls down a little bit is obviously the the kind of classic problem with XML that it tends to be a little bit data heavy. So even for small amount of information, you end up with quite a large file, which is not quite as portable as as as 1 might hope. But we haven't entirely decided our kind of road map in in in that sense, but I think we would like to move towards a format that is more extensible by design than Pose because Pose kind of served us so well up until now. For now, that's the the kind of internal standard that we tend to use. I think it's it's almost it's almost strange being an organization that we I don't think we anticipated that we would
[00:35:36] Unknown:
have to maintain a a library of formats. And, so kind of ensuring stability and testing that was has been really interesting in learning those those kind of skills.
[00:35:49] Unknown:
So bringing bringing it up a level to the POODL and VERTAL tools, I'm wondering how they compare feature wise to some of the other offerings for web or desktop based translation. And I'm also curious if they're primarily used for translating text in the context of software, or do you see them get used for other sources of text as well, such as maybe
[00:36:09] Unknown:
print media? So picking up on the on the first question, when Dwayne first out developing PuSols, there really wasn't much in terms of other open source options for for localization, and and localization tend then tended therefore to be done by hand. So PUSO was kind of a really critical play at that point. And 1 thing that's been you know, we've actually found really healthy is that there's now more and more tools available for localization. Thankfully, quite a few of those are, also open source. In terms of features and so on, you know, there's quite a few different platforms, so it's it's kinda hard to go into specific features and platforms and so on. I think for us, our our kind of focus is very much on localizers.
So, our platform is very much set up so that if if you need to translate a large number of strings, you can go through them very quickly. And, you know, certainly, our aspiration is if you're a a complete novice localizer and don't really know much about localization and formats and and what goes on in the back end and so on. It's also really easy for you to do that. So I think for us, it's about working with those communities,
[00:37:15] Unknown:
working with communities that are building, for example, their own machine translation and and and finding out how we can incorporate that. In that respect, I'd say our strength is our community, really. I mean, I think from from a feature perspective, I think we, you know, we cover we tick all the boxes in terms of what's needed in the localization tool. I think the 1 critical thing for us is that we've already worked. And we think quite a lot around how do we, put great localization technology, so the things that like t TM and MT and, terminology.
TM is translation memory. Yeah. MT is machine translation. Well, I kind of wanted it. Not not to Biden it because, specifically, we wanna we wanna put that in people's hands without them meaning to know what they are and what they mean and how they work. So maybe a simple example would be when someone gives you a t m match, they'll give you a percentage match. Like, who knows and who cares what that means? I don't know what it means. I don't think it means anything to anyone else. So we kind of don't show those kind of things to to so we weigh up in terms of translators. How do we get really great technology from that the industry would use into their hands without them necessarily needing to be experts in the tools.
And so when we stack up the features like that, we have we have the features that people need in terms of localization.
[00:38:27] Unknown:
We've also got a pretty healthy checks framework, I think. Beyond, human reviews, it allows a certain amount of checking and flagging up when there's issues. And I think that's really feeds into 1 of the future goals that we have is is how do we apply continuous integration as as it's been quite successful for us as a community and as a project. How do we kind of apply that into the localization context itself? I think that's an area we're really looking at, how do we improve the those those check systems, how do we make it easier for people to write checks specific to their language or their project? And how do we find ways to integrate that those checks frameworks into other continuous integration processes?
[00:39:10] Unknown:
So do you see the Poodll and Vertol tools used outside of the context of software at all? Yeah. We do. As kind of localization purists, we kinda think, well, these tools, We did design them for software translation. They're not necessarily used you know, we know the limitations in other spaces, but, everyone we mean to use is it almost invariably is translating other stuff. So they're translating web content or they're translating documents that they've processed and and loaded up. So we you know, anything anything that you could break down into sentences, you could get into into people do that.
So so part of that is the, you know, the step before, you know, using the Translate Toolkit or or other processes. But we see, like, you know, we see many people who are starting with translating their software and very quickly realizing they wanna translate other things. So they're adding other content into their
[00:40:03] Unknown:
people service and doing that. 1 of 1 of the our key areas of development has been, VCS integration. And in doing that, we've we've tried to make a very generic adapter so that so that Pluto can consume and produce data for for for different, you know, back end file systems. I think in doing that, we've kinda seen the opportunities that we might be able to hook right into other potentially database, you know, kind of content, stores. So I think that that's that's an area we'd really like to to to push things further because it's it's it's how the tool is used, not necessarily how the tool is designed. And is there room for integration with some of the
[00:40:40] Unknown:
natural language processing or machine learning systems for being able to actually do some of the things like word stemming so that you can take a particular translation and then be able to extrapolate some of the other potential word tenses for maybe bringing it into a different context, particularly in the
[00:40:59] Unknown:
case of the translation memory system? I think there's a huge potential for that. That's massive. Remind me a little bit of watching, DJs going from, vinyl to digital, if you like. Initially, when I got involved in localization, there was a huge amount of pushback to machine translation, because it was seen as being of low quality and not as good as what a human could produce. I think that that technology in itself has transformed just in the last couple of years, and I think there's there's now a lot more buy in. Not necessarily to automate translation per se because you're not necessarily gonna get the voice If you wanna reflect a particular voice in the culture and so on, you kinda need to understand the message on a slightly deeper level than than than the pure translation.
But certainly in terms of at this time at least, certainly in terms of, as an aid to localizers. But I think that's a space that's likely to really push forward. An interesting project we did a year or so ago was a professor in the US who translates in in a couple of different, Gaelic languages, and the languages are very, very closely related. And so really what he wanted to do was to be able to translate 1 language and then machine translate from 1 of those to the other because there it's almost systematic. And so we integrated that into Pluto, and that's kind of really sped up his pros in in his process of localization.
But it would be really good to see that applied not just to the the kinda headline big European or or Hindi languages or Asian languages, but also to some of those smaller languages because, you know, they're they're at risk of of of dying out or or or don't necessarily have such good support already. So I think getting a body of translation into machine learning could benefit those languages as well. So that's something we'd really we'd really like to look at. And if there's anyone who wants to kind of fund us to do some some some proper research into that, that would be amazing. But for now, it's you know, it's where where we can find collaboration, really, to to to bring those skills
[00:42:55] Unknown:
to our project. I think if we look at the the last year with Pucel, we spent a lot of our time just, cleaning out code and abstracting things and making things pluggable. Well, the machine translation integration is pluggable. We'd love to make it more pluggable. The translation memories is pluggable. Checks will become pluggable. So I think we feel there's a there's a ton of scope to allow people to adapt things. I mean, even, you know, we could do much more with translation memory by doing they call it in context exact matching where the string matches. It matches better if it's the string before and the string after match, and it might might address the context issue. We talked about but even doing I think your kind of question talked a little bit about stemming, but it's even doing that. So it's finding a word, stemming it, and and or figuring it out that that actually the English is using the verb, so we actually are gonna suggest the verb terminology for you. There's a time that we'd love to see happening there. So that that's really exciting for us to to empower language communities really to be able to drive some of that and and integrate tools and integrate stuff like that.
So it feels like, I think from from our side, because we do focus so much on on the teams, is that a lot of focus on localization tools is about the program. It's like, how do we integrate with your version control system and and your software, which is important. But I think we feel like there's a lot of power that we're missing by, just not empowering the teams that are going to come to translate your software well. And so we'd love to see more happening there. And on the language front, what are some of the complexities or difficulties that are introduced by needing to support a language
[00:44:33] Unknown:
needing to be able to support an additional language that isn't already covered by the that isn't already covered by the tools that you're developing?
[00:44:41] Unknown:
Pretty much no work, really. You know, there I mean, there are some technical things that a new language that comes in needs to do. But bless the Unicode consortium because they solved 1 of the biggest, biggest problems that really used to to stymie languages. They couldn't represent their characters and they didn't have fonts. So some of the big problems that we had have traditionally hit and a lot have been eliminated. So, Unicode is eliminated, 1 of the biggest problems. So, that is not too much of a problem for new languages. New languages face things like font issues, but that's been pretty well addressed now by Google and Microsoft in in making universally available fonts that cover a lot of characters.
But that is a language that minority languages will an issue that minority languages will hit. That they can write their text, but they can't actually see it on the screen. And they they can't see it on the Internet. And then there are other like, deeper things that languages need to start tackling with some software things. So you need kind of the good software localization frameworks will have concepts of plural forms. So there are 1 file to down downloading and there's 2 files downloading. They're kind of plural. And in Arabic, there's 6 forms. In Russian, there are 3. And so you translate it differently depending on what that number is. So some of those languages that don't have that defined need to define the algorithm that's or the equation that just determines which form you're gonna use. There are standards around that. That's right. And there are standards. So even Unicode has emerged the CLDR, which is the Common Locale Data Repository, and that's storing a lot a lot of stuff. How do you write dates? So those are the things that a new language would have to do. How do you write dates and times in your language?
What number of systems do you use? What kind of number of systems? But, you know, we're in a very different space these days where you basically have to be quite a small minority language that hadn't been localized into anything to still have to tackle those. So the things that we had faced on Pucel are minimal. Like, we we don't have to do very much to get a team translating anymore.
[00:46:47] Unknown:
Data bases cover their language and all that. Well, answering that question in in the in the context of of somebody wanting to see more free to open source software, localized to their language, I think the answer is probably more of a community thing. It's it's about knowing where to where to go and translate things. And I think I think 1 of the 1 of the things we'd like to see is a slightly easier process for somebody who wants to set about translating a language. At the moment, you would have to go to quite a lot of different places and engage with the with the localization groups within those projects in order to get that done. And what would be quite nice is if there was a place where people could engage with lots of open source projects at the same time. So I think, I mean, now that's the the communication in some way, which was,
[00:47:30] Unknown:
looking at it from our perspective and from the software perspective. But if you look at it from a language team's perspective, if they've never translated anything, 1 of their serious serious hurdles is they will have no terms for a lot of computer terminology. So the terms might exist. They need to collect them, or they might need to repurpose them, following very similar ways that language develops new terms anyway. It adapts words, reuses words, or creates new words. So that's usually for a new team is 1 of their biggest hurdles, is learning how to localize and then, building the missing terminology and using it.
So building a terminology base is is 1 of their biggest hurdles. And in terms of the distribution of your software,
[00:48:14] Unknown:
when you have a web based presence, it's fairly simple to tell people how to access it. You just go to a particular website and maybe go through the authentication routine. But you also have your desktop project, Vertal, which introduces a whole plethora of distribution issues in terms of being able to get it on somebody's computer and make sure that it's running effectively. So I'm wondering, what are some of the challenges that you've had to overcome in the process of packaging and distribute in terms of solving the packaging and deployment aspects of getting Vertol out to people who are trying to use it. Just I would just quickly pick up that in terms of Pluto, we do release it for different platforms. So we still have some of those challenges in terms of supporting a different
[00:48:54] Unknown:
ecosystem, for that software to be deployed. But I think you're right that the challenges in terms of Versailles, and Dwayne could probably talk about Well, the challenges are pretty big, which is why we haven't released it
[00:49:07] Unknown:
for a long time. So, I mean, Versailles is 1 of my favorite pieces of software, but it's just we haven't had the time or resources to really, really develop it. It's been fun watching other people take some of its ideas and implement it in theirs. So some of the the biggest problems we faced was and some of them do relate to Python and some of our choices in terms of, GUI toolkits. So we we use GTK and that that has been 1 of our hardest parts in terms of distributing is just how do you package it on Windows and Mac. So there's documentation and processes.
It's never as clear as it is on on Linux about how you do it. And so packaging on those has been really difficult. GTK went through 2 to 3 migration, and we've never made it to to 3 because that transaction was difficult. So so the biggest hurdles we found in terms of distributing was it's really difficult. Well, it's not difficult to do it, but it's it's quite a process. There was a lot of engineering and time to get the first Mac version out. But then there was the feeling of it because quite a lot of work to make it feel like it really belonged on a Mac platform. So g t k, at the time we were doing it, just just always felt a little bit like it didn't belong.
So packaging up GTK, compiling it, making sure that you've got a deployable app is quite difficult from the path and perspective. They make pretty nice packages that you've you've got everything there and it all works, but that was quite a process. So we have a luxury in terms of Linux and that other people end up packaging it for you. But having to package it was difficult. And that was the hardest part was just, you know, being able to build packages, getting it up on the website, getting people to download it was less of an issue. And so we we really have struggled, just to recreate
[00:50:46] Unknown:
those engineering skills to rebuild and automate that. And and in that time, web technologies really moved on. And so I think the question we we ask ourselves is, do we want to put that time into a desktop piece of software where, really, people are much more reluctant now to install a desktop piece of software, and and much more inclined towards using the web. And and and so the question really is, can we use the web to address the issues that Vital was was was addressing in terms of offline usage? I mean, it's not something we've we've really successfully
[00:51:17] Unknown:
conquered so far, but it's certainly something we've we've actually really discussed. Yeah. And I think some of what we wanna do with Pootle in terms of, like, using WebSockets and that would would put us in a very different paradigm for the for the editor, you know, like the Pootle transaction editor and say, well, you know, could we take this offline? And then we can take it offline. Are we addressing kind of the core need of why why we wrote Fatal, which was was people with no connectivity. So we were dealing with people who would have to drive to an Internet cafe, download files, go translate them at home, and we might be able to tackle that by by just doing Google a bit differently. So it's not the most positive story for releasing desktop stuff
[00:51:56] Unknown:
written in Python. But but, you know, we've got to best use our limited resources in terms of, our development time, and I think, really, there's so much more reach with with web technologies than there is with desktop,
[00:52:08] Unknown:
technology. So it's it's kind of been more of our focus, if we're if we're honest. We do love to tell people that, we accept poor requests, and we love community contributions. So in terms of FITALE, I know we've got people who are still using it. They love using it. If we can entice people out of the community who who're willing to just spend a little bit of time figuring out how to package on Windows now. And we know how to package on Mac. It's just the the time that's involved in in putting it all together and really what's making it. Then then we we're probably in a position to to do that again. So that's it's kind of an appeal to the wider community if you're passionate about GTK or Python on the desktop or Python on Please don't worry. Windows and Mac.
[00:52:49] Unknown:
Yeah. We'd love to we'd love to hear from you. So what are some of the things that you have planned for the future of your projects?
[00:52:55] Unknown:
So the last year or so has been really about tightening up mostly around Pluto and looking at how we can use, you know, continuous integration processes in Pluto. And we've kinda got our our test coverage right up, and we've, you know, really addressed a lot of the the performance and security issues. So that's been really successful, but it's meant we haven't released. And so we're just about to, release an RC version of, Pootle. So, I mean, that's quite exciting for us, really. We kind of made the decision some time ago that we didn't wanna call something a stable release until we'd addressed really some of the kind of fundamental problems that we we were aware of. That's a kind of quite positive story, really, because we've got that out of the way, and we're and we're now moving to that release. So I think moving forward, we'd really like to see a more regular release cycle. In terms of actual features, I think we'd like to make better use of newer web technologies.
Been working for quite some time on, VCS integration, and that's fairly stable now. We're gonna package it as part of the the release, although still quite a lot of work needed to be done around that. Improving our format handling, not a 100% which direction we'll go with that, but, really, we've been looking at how we can encompass some of the features that are specific to particular formats in a generic way and at the same time, keep the data that's, inside Pootsu, make that as portable and, as possible. Improving our quality check system, making them more extensible, making it easier for people to write more specific checks.
I think they're our kind of,
[00:54:30] Unknown:
top line goals really at the moment. Yeah. So for people who want to follow what you guys are up to and keep in touch, what would be the best way for them to do that? Well, the the the 2 places we are hanging out is, if we want to engage with the project, it's our GitHub,
[00:54:43] Unknown:
get a project at, translate slash poodle. And we're on a a git channel, gitter.im/translate/pootl,
[00:54:53] Unknown:
and they can engage us there. That's the best place for kind of real time chat. You know, there's always pretty much, you know, 1 of us around. And if we don't answer straight away, we'll generally answer fairly quickly. And,
[00:55:06] Unknown:
you know, so there's pretty active community there. So And for anybody who wants to follow either or both of you individually, what would be the best way for them to do that? How about you, Duane? I don't know.
[00:55:19] Unknown:
I really did. Okay. Oh, GitHub, I would say for myself. Github.com, Flax, phlax. And, if you wanna follow what we're up to, if you want to, actually speak with us, then the best channel is the Gita channel. Okay. So with that, I will move us to the picks. My pick today is going to be the
[00:55:41] Unknown:
Google Chromecast. I picked 1 of those up recently because it was on sale for about $10 off the regular price. And my experience so far in using it has been good. There's a little bit of disparity in terms of the support for different apps on my Android tablet, but it's a useful product for being able to easily get something from 1 screen to a bigger screen for more people to be able to share and viewing it. And it's I'll be interested to see what other sorts of support and uses it gets put to as they continue to develop that project. So with that, I will pass it to you, Duane. What do you have for picks today?
[00:56:19] Unknown:
So I struggle because I I think I wanted to use this as an opportunity maybe to call out some of the people that we've relied on and loved in terms of the development that we've done. You're free to do that as well, in addition to any picks that you might have. Okay. Cool. This is just an excuse, so I can actually think about what I'm gonna pick.
[00:56:36] Unknown:
Yep. No. Me too. I'm not sure.
[00:56:39] Unknown:
Wankawa is Jetsea, which we use every day for real time voice and video, which is an open source project developing an equivalent of some of the closed things that we're often engaged with. And so we we rely on them and love them. And mostly we love the fact that every morning we get to see their 4 word random room. So if you ever go to meet used a nip mnemonic. S I, they create a 4 word mnemonic for
[00:57:07] Unknown:
interim, which creates great solidarity every morning. Yeah. I'd probably go with, with GISS if we're if we're if we're gonna pick on basis of of what what gives us a huge amount of utility. Somebody had used IRC forever. You know, there's a lot of resistance to moving off IRC and certainly in in open source projects because people are so used to it. But I think in reality, HTTP based messaging offers so many, advantages over IRC that that that it's hard to it's hard to look back, really. And that's become a really cool part of our our development process, really, being able to communicate. And I think I think the the the killer feature really is is the fact that Gutter allows unlimited rooms, both pretty much both public and private for open source projects.
[00:57:51] Unknown:
So it allows you to sort of build build your community really effectively. Alright. Well, I really appreciate the both of you taking time out of your day to join me and tell me about the work you're doing with Translighthouse and the projects that you've produced for being able to bring localization and internationalization capabilities to larger communities definitely seems like an important thing for everybody to be aware of. And I'll definitely be trying to push more for including those capabilities in the projects that I'm a part of. So I appreciate it, and I hope you enjoy the rest of your day. Thank you so much for having us. Yep. Cheers.
Introduction to Duane Bailey and Ryan Northey
First Encounter with Python
Journey into Localization
Internationalization vs Localization
Importance of Localization for Software Projects
Overview of Translate House Projects
Context in Translation Memory
Challenges of a Distributed Organization
Partial Translations and Community Involvement
Technical Challenges in Translation Formats
Usage of Pootle and Virtaal
Integration with NLP and Machine Learning
Supporting New Languages
Challenges in Packaging and Distribution
Future Plans for Translate House Projects