Translate House with Dwayne Bailey and Ryan Northey

Hello, and welcome to podcast dot in it. The podcast about Python and the people who make it great. I would like to thank everyone who has donated to the show. Your contributions help us make the show sustainable. When you're ready to launch your next project, you'll need somewhere to deploy it. So you should check out linode@linos.com/podcastinit,

and get a $20 credit to try out their fast and reliable Linux virtual servers, running your awesome app, or experimenting with a project that you hear about on the show. You can

other people find the show, you can leave a review on iTunes or Google Play Music and tell your friends and coworkers.

Your host as usual is Tobias Macy. And today, I'm interviewing Duane Bailey and Ryan Northey about Translate House and the process of internationalization

and localization for software projects.

So, Dwayne, could you please introduce yourself? Hi. I'm Dwayne. I'm a South African who develops in

on PUTL, which is part of the Translate TAS tools.

And,

yeah,

that's me. Yeah. Hi. I'm I'm Ryan.

I'm currently the lead developer at Translate House.

Work closely with Dwayne and work with lots of different software projects

to localize their their their products. And how did you each get introduced to Python? Dwayne, how about you go first? So I was I mean, the date's 1996.

I was doing some work on a newspaper, online newspaper,

and to hack up some of their code. And, it's the first time I hit Python,

and I was completely intrigued

by space indenting.

And at the top of all the files was a copyright header that said Mark Shuttleworth. So it was probably his first code before he went on to found thoughts and then Ubuntu. And how about you, Ryan? I've been working for an organization called Community Technology,

and we've been

mostly using PHP,

various other kind of early web technologies. I think this is the late nineties.

And we've really started to use Linux, and

I started looking for a language that allowed me to assist admin tasks

more. As things progressed, PHP became

not quite so secure and so on. So gradually, I moved most of my web projects over to Python as well. And how did you each get involved in localization?

So my thing was, in South Africa, we have 11 official languages.

So I was just fascinated.

I'm an impassionate

open source person,

and I was quite intrigued. Like, could we use open source to address a language need in the country

on a technology platform? And I just started investigating how to localize, and and that's kind of how I got involved. So I was first really trying to

localize into local languages.

That eventually grew. We ran things across Africa.

And then the tools kind of came out of that, which was developing tools that forced serious problems or missing things in the ecosystem space. But, ultimately, it was about helping people

have computers in their language so that they could actually be part of the digital age. And how about you, Ryan? Well, my background,

was in anthropology.

Via

a strange route, I got into coding and became very inspired by open source software.

In truth, I I haven't been I haven't got anything like the kind of experience that Dwayne has in the in the localization field. But my goal really has been about increasing participation in the web,

and, you know, thankfully, met met Dwayne and found out what Translating House have been doing

over that time, and,

I've got more and more involved in in localization.

And and it seems like a great vehicle

to increasing that diversity and participation

in the web. So

for people who aren't familiar with it, how would you describe the difference between internationalization

and localization?

And I'm also wondering if there are any cases where it would make sense to only do 1 of those things or if they're just different words for the same thing. Well, definitely not different words for the same thing. They are kind of in the industry part and so it's seen as 2 different things. The easiest way I found to to think about it is

internationalization

is the things that you need to do to your software

to make it possible to localize

it. So the simple things like well, they're not necessarily simple, but things like if you're gonna display a date, if it's internationalized,

you would be

allowing that date to be displayed differently.

It's kind of you could think of it as mostly as a 1 soft thing.

Localization

is then actually making that piece of software work in another language. So if I wanted to do it,

to run the software in Afrikaans,

you know, I would be needing to then define how data work in Afrikaans

and do the the actual translation. And that's the process of localization and that you could think of as the ongoing

process. As you add a new language, you repeat the same steps. And as the software gets updated, you you reupdate the translations.

So internationalization is the kind of make it possible. Localization is actually they're doing it. The current 2 ends of interweave, so you definitely can't do localization without internationalization.

If you're gonna do anything, you should internationalize. And I think my big thing for people looking at it is

you better you better move quickly from looking at to actually doing it because if you leave it late in your design process or your implementation becomes really difficult

to implement. So if you haven't thought

internationalization,

you've probably hard coded dates and layouts and all that, assuming certain things. So you kind of wanna do it early even if you aren't actually delivering in other languages.

So you do need to do internationalization

first. But, ultimately,

the best internationalization

comes from the kind of feedback that localizers give you to say, oh, this actually doesn't work in our language. Following up on on what Dwayne said, I think there's also, a difference on a on a human level in the in the sense

that internationalization is something that's much more done,

in terms of the software, whereas localization is is, at least in our work, has been much more working with localization teams to to get knowledge about their languages and and how to best represent things within their language,

then getting that back into the internationalization

process in a sense. So for people who are starting a new project or also people who already have an existing project,

are there cases where

it doesn't make sense to do the internationalization

step followed by localization? Or is it something that, in your opinion, at least, some everybody should be doing from the get go?

You're asking localizers if if localization should be optioned.

I mean, my thing is every case where I've met people to sell, we don't really need this. Like, either they don't have a market or they don't see a market.

When you change the paradigm slightly, like, let's do community localization where the, you know, the cost of actually providing doing the translation is not necessarily something born

in monetary terms. If they haven't done any of those process before, they can't really leverage that. I mean, I literally can't think maybe

software that that's on the Mars Rover is something you don't wanna localize.

Yeah. Yeah. I'm trying to think if there's if, you know, for example, there there is

some software that's specifically designed for particular language and so on, but I I would say that's probably not the best approach

overall. And I say that the Internet has just meant the need to localize as as now cut across many different types of software. And when we really see, our community growing both from 1 end, you know, big corporates to the other end guy sitting in his bedroom developing games and so on. So I think the need to localize or or or the opportunities presented by localizing

have have really opened up to a much

bigger group now. And so

I can't think off the top of my head either really where I would suggest or advise to someone that they should not bother doing that. So for anybody who where the software that they're building is actually going to be used by end consumers in some for whatever definition of consumer, then internationalization

and localization are definitely worthwhile endeavors. Yeah. Definitely. And I think compared to when I started this, the platforms

and the the processes from the internationalization

side are just much, much better.

And so it's not a lot of work that they that they need to do. You know, if they're targeting certain platforms, like, or if they're targeting mobiles and all that, Even the move in the last few years in terms of what's possible.

You know, you used to have struggle with things like on an iPhone. They were you're limited to languages that Apple kind of had approved, and now you kind of have much more freedom to do any language. So the freedoms that are enjoyed now

are much bigger than kind of when we started. So there's really very little reason at the moment not to. I think the the big,

you know, blocker to to someone adopting those processes really is how much overhead they need to learn and how much they need to adapt their process to do that. And I think that's more applicable in the sense of internationalization because you don't necessarily want to make, localizing

to loads of target languages that you're not necessarily gonna use for your immediate goal. But building the internationalization

into your software processes early on will offer a huge amount of benefit down the road if as and when,

that becomes important. And I think that from our side as well, like, we're very passionate about languages, and there's they're they're not very good words for them, but often people will talk about minority languages. But we are talking about minority

languages of 100, 000, 000 speakers.

So the opportunity of those, you know, those markets

is amazing. And I think we've looked at stuff where we say, well, how do we make it possible for

people to to make it relatively easy to onboard

a language like that because I think the traditional very traditional model of the 10, 20 years ago was it's very costly to add each new language, and I think it's not as costly, but there are process costs,

and engagement costs. And I think some of our tooling is really built around how do you make that almost a 0 cost to an organization. So it would be easy to say someone arrived and said, oh, I wanna translate your product into Hindi.

You know, that you could just say, go for it. We we've got the processes and platform, and we're not, you know, we're not worried about the cost to us of actually delivering that. And I think that's where we feel like we've we're getting, you know, a measure of success where we're achieving our social objectives of getting more languages translated, but we're also doing that by making it easier for software developers.

So I think there's also a dimension to this in terms of open source software development, and I think it really benefits communities.

You know, as we work more and more with different localization teams,

they're quite often

well, there's let's say there's a big overlap between the local open source guys, the people who are contributing

contributing to other projects anyway.

And so there's a huge amount of benefits to open source projects in trying to engage localization communities to build their contribution and their their user base and so on with their developer's base. So

you both work for an organization called Translate House that's focused on localizing and internationalizing

software projects,

and you have a number of projects that are contained under that organization. So I'm wondering if you can

briefly introduce each of them and describe their purpose and how they relate to each other. So the tools that we've developed,

really, each 1 of them is emerged out of the need that we had, within our own communities and that we saw in the localization space.

The primary 1 is Appootle,

which is a community based

translation platform. It's a web based platform

that allows

people with pretty low technical skills just log in and and begin translating.

We have the translate toolkit,

which is

really a toolkit that's used by other pieces of software, ours and others,

to provide format support for the various localization formats,

but also to provide some localization engineering

and tools that people need. So being able to count

words and strings

and, being able to grip through translation files.

The third 1 is,

which is a translation memory server. So for people not familiar with the terminology and localization, that's just a massive database of previous translations that we're able to to try and match close matches.

And the last 1 is for tall, which is a desktop translation tool. So we were faced with situations where people have really poor connectivity. So how do we deliver a high quality translation tool for low skilled people, that don't have connectivity? And that covers the the primary 4 tools that we've developed over the over a number of years. And what was the order of creation of those tools? So it kind of goes the toolkit first. And the the primary reason for that is we were localizing

various products. We realized we needed to translate a number of things to have an impact for our language. So so some of the key targets were things like Mozilla

and OpenOffice at the time, but we're translating kde and gnome and some of Red Hat.

And so

we had some pretty good tools for translating PO files, 1 of the formats in open source. But Mozilla and OpenOffice brought,

a number of formats. Mozilla about 4 or 5 formats in OpenOffice, a really strange

thing that just looked like a massive spreadsheet.

And no there were no tools to translate those. So we so our first task was, well, could we convert

these strange formats into PO so we could use great tools and we could consolidate them? And that's that's really why I built the the toolkit first.

People, we we then built just to for the problem of trying to like, we just struggle to install software on people's computers.

Some of the translators would be translating from work where they weren't allowed to install software, and the web just provided a really easy paradigm.

We run translateathons with some communities sometimes.

And in 1 instance, we had a 1, 000 people translating in Uganda. So being able to do a web based tool that you could just deploy very quickly

made it possible to just just get people translating as fast as you could.

And then Amangamal Patel came after that. Yeah. I don't I don't have a huge amount to to add to what Duane said. I think our focus now is is probably worth mentioning. Our focus now is mostly around Pluto because it allows us to build communities around

localization and and beyond the the challenges of the software itself.

The mission for us really is localization. So the best platform for reaching people is, I think, the way to do that. And 1 thing that I was thinking of as you were describing your work on Amagama

is that in a lot of languages, the context matters to determine what the appropriate translation would be. So I'm wondering how you

manage to

be able to capture that or if it's largely just a sort of word for word translation where you have seen this particular word or phrase before, and then you offer up the translated version in the language that the translator is working in. So maybe just clearing up it so people don't get confused with what it does. So it's not doing machine translation at all. So they're just looking for like, you've got a string that you've seen before. Could be a phrase. Could be a single word. You know, if it's a single word in a menu or it's a a sentence in in some help text, and we're looking for that in the database. So we're just doing

edit distance matching or other techniques to try to find a close match, and that's kind of the technique. So we don't really solve the context problem. So I think there are things we'd love to do in Amalgamma that will help solve that. But the context problem for people that don't necessarily know the languages

is certain languages, it all is English for 1, the noun and the verb are the same word.

Open word. That's not a very good example,

but with the same word in English is a verb or a noun in another language. So we're finding a match for open in English. Because what we're doing is we're finding an English match and finding your translation that relates to that. So it doesn't necessarily solve that problem.

But where it's really helpful,

is sometimes it's just difficult

to for new translators certainly to

figure out what the word should be. Very new languages often having to coin a lot of words. So it's a technique to really make it easy for people to to translate a little bit faster

and also to be pretty consistent in terms of their translations.

Yeah. I think the consistency would definitely be an important piece as well because if you've translated it before and then somebody else goes to translate the same word, but they mistakenly add a typographical error, then being able

to catch that with a suggestion saying, oh, did you mean this because

mean, it's the same type of problem. So is if other people have translated, mean, it's the same type of problem. So is if other people have translated. So finding the consistent voice between multiple translators. So in open source where you've got volunteers, that's that could be a large problem where you've got teams of even 2 or 3

people which would have a a

different voice or use different words that allows that. But even when people are using commercial translators,

over time, multiple people will work on the translation. So there's a technique to keep

consistency through throughout all the products that are being translated.

That that kind of relates to terminology, which in in some cases, people are actually setting the the terminology as much as they are

finding the terminology, if you like. And so recently, we've been working with an organization in India where in India, there's many different languages the government has to support.

And even within each of those languages,

the same word was being translated into lots of different ways. So what they've done there is is is create a cross India

project for effectively setting the terminology in different Indian languages so that you can get that consistency

across translations.

So what were the motivating factors that caused you to decide that creating an organization to own and support the tools that you've that you were building was the right choice to make? So the the interesting thing about the organization is is is some of our objectives were there were the things that set up, like putting organization structures in place before we even built tools. So when we were starting work, we weren't developing tools. We were,

engaging and building

localization communities and addressing the localization problem.

And part of the organization there was just to to be able to engage with

governments and with, nonprofits

as an organization in addressing a problem.

And when the tools are born out, it was the the perfect vehicle to carry that. So having an organization has been really helpful

to do that.

The tools themselves, the the the licensing of them,

is actually within a nonprofit organization. So whereas an organization are really passionate about the open source nature of of the tools that we develop.

And we've seen in the space, you know, some open source tools go proprietary,

It's quite critical for us to be

to remain steadfast on kind of keeping these tools open source, mostly because we need great tools for language communities that need need the services.

So that's been built into the organization we've developed.

And the other realization, which came a little bit later is

maybe a little bit of frustration. We built a great tool. People started using it.

People would reject it because it didn't do something, but we never really heard about that. And so it was really a vehicle to be able to say to people, look, you've got this feature. You you need to integrate it with other tools.

Like, the people who are developing these tools are available,

to engage commercially to to actually address this. Some organizations some large organizations are using platforms like Poodll.

And for them, it's about how do we ensure the sustainability

of the software

and how do we,

engage

in terms of business continuity. So so just some of it is not just support, but they need they just need to know that the software they they've invested in. And some of them built some pretty sophisticated

localization processes. They need to make sure that the software continues. And and so it's been an amazing vehicle for us then to be able to communicate that we do support this professionally.

And we've got vehicles to do that

but, still remain as a piece of a proudly open source piece of software. And the organization that you run

is, if I understand correctly, distributed 1. So I'm wondering, what are some of the challenges that you face in managing an organization

of that structure? Orion suffers from most of it. Well,

I think we've learned quite a lot of lessons over the last couple of years and gradually built more more collaborative processes. We are widely distributed. Our core developers are distribute distributed across,

Russia, Spain, UK,

South Africa. We also kinda travel quite a lot for for 1 thing or another. So we really have to be able to work in a in in a kind of very distributed way.

We tend to use, either open source or open source focused platforms for for doing that. We, obviously use GitHub a lot,

and, I think that's that's been pretty critical to our development process,

and also,

allowing us to communicate with the wider community because beyond the the core devs,

you know, there's quite a lot of contributors,

both long term and and and people who who dive in just because they want to solve a particular problem.

As a core team, we have daily scrums. I think that's really critical to our process. You know, we are all working pretty much on our own in our own,

environments and so on. And just in terms of, focus, it can get kinda lonely if you're not if you're not having those opportunities to talk through the things that you've done and and what you hope to do, you know, from there. So I think that's a pretty critical part of our process, you know, beyond the software.

We're also trying because I think,

we don't have necessarily a face to face

communication, so that

can be a little problematic. So 1 of the things we try to do is get together

half yearly or every quarter,

just to have face to face with us as a team and and either do that as a sprint or just try work through more complex problems. So we we haven't been able to avoid

the fact that we do need to get together.

Time zones can be a little bit of a problem when we're working together, but we kind of sync around

UK time.

So we have you know, I think some of the things I've struggled with is we have had times when we've had developers in, in Australia

and that's proved,

proved quite difficult. So the time zone shifting thing is not something that I think you can completely

solve in the kind of process we followed.

We were pretty successful at making it work, but it is very different. If you're trying to look at some stuff like agile development,

most of the stuff written requires face to face in the same room type stuff and trying to replicate that or find

models that that allow some of that to work for us, has been trial and error and experimental. But I think that in part, the tools have improved. And in part, we've we've kind of learned better processes. So,

we we we, you know, we tend to sort of jump in and pair program together

whenever there's a problem, which where, you know, 2 heads would be better than 1 and, you know, combining that with just with voice communication,

we all know each other pretty well. I think it's fair to say even if we if we're not having as regular contact. But at the same time, you know, I think as Dwayne said,

nothing really does substitute

for collocating.

And so,

again, this is something we're sort of learning the best ways to approach it, but particularly around either conferences or other kind of meetings that we have to we have to do. We're trying to sort of build in time to spend with each other, you know, as a core team. Yeah.

The whole concept of distributed work and distributed organizations is definitely something that has been around for a little while, but it's still

evolving in terms of the appropriate norms and methodologies for making it work. And, of course, every team is different, so

it's kind of hard to build up a sort of community knowledge of how best to approach these

kinds of problems because of the fact that there are so many human factors involved. But I think it'll be interesting to see

how this general trend of work environment continues to evolve in the in the coming years. Yeah. Definitely.

I I think with with this is mostly we've answered in the term in terms of, our core team, but I think part of what we'd really like to do and and the challenges that we really face right now is to build that community of localizers, especially around

not necessarily around a particular open source or or free software product, but around the open source and free software community. So as localization needs arise, you know, there is already a a kind of body of people there that are able to to address those. And I think part of that's about,

you know, learning what's in it for localizers

and and beyond their desire to see software in in their own language.

It's about working with young people and working out, you know, how this gives them career opportunities or how this gives them a community or how this allows them to interact in ways beyond their their immediate environment. So I think there's a kind of there's an answer to this about how do we how do we as an open source and free free software community, how do we build localization into our collaborative processes more generally? I think the the stuff that I picked up on messages from Jacintha Ryan was was that we wanna be distributed

because we wanna include voices

from other people. And I think

my own story is a is a really critical thing of why other voices are important.

We understand languages in our context of South Africa, mostly European languages

are call them monolingual

in a country. You speak French in France, German in Germany. And so people were tackling problems

as if languages were kind of geographically located. And they're also pretty mature, and they hadn't really dealt with the with the language team that was dealing with 10 languages. So we would do strange things, like we had to do some of the basic internationalization stuff, but it was really painful to do it 10 times.

But as you realize, most of those other teams had experienced some someone had experienced some pain at some time, but it hadn't been worth them documenting it. Because once the pain was gone, it was gone. They hadn't needed to build any tools. So some of our tooling was to address our own problem, and I think that voice needed to be heard. And I have a friend who calls a blowback localization. So Pluto is developed in Africa to address an African need,

but then it's globally useful. So all the people who are benefiting it from now are actually benefiting it because an African group needed

to address the problem. So I think that the importance of those other voices is critical for for what we wanna do. And on a somewhat technical level for projects that are in the process of

translating the source text and doing the localization process,

when you only have a partial translation completed, is that something, would you start exposing those localized

strings

as they are produced? Or is it something where you would want to wait until

either a critical amount or the totality of the text has been translated before you actually expose it for the end user? So, I mean, the kind of mine put there would be it it really does depend. It's like, what is the piece of software?

How do you structure stuff? So you can structure things to say, well, let's make sure we translate the most critical things that everyone's gonna see first. Because I think if you look at any it's that kind of 80 20 rule. 20% of the UIs is all you need to do because most people

people don't need to know all the variants of Blowfish encryption

to

and all you know, it's somewhere in your app.

So if you can prioritize important things first, you can get away with very little translation.

But that means that you have to be able to do that in your localization platform or in the tooling.

My thing as well is it does I I weigh it up in terms of the language and the community. It's much much more important that the community is rewarded for the work that they do.

So if it's a friend and his mates,

they're they're 10 of them. Like, just get it out there early so that they can brag and build a community.

Because because that was that hampered some of our work is that people wouldn't really let us release software because we needed to build a community. So we're like, but we wanna use your software to build our community.

And there are only 5 closet users,

and there probably will be 10 in a week or two's time. In a year, there'll be a 100. So the damage is pretty minimal. So my my general feeling is there's some point that it will depend on your software.

80% rule is probably a a good 1 if you if you wanna suck it out of the thumb. You know, if the people at 80%, they're mostly done.

And the other thing I would say, if you want some sense of confidence is, if the translator is actually using the software on a daily basis, which might seem like a strange thing, but there are a lot of people who will translate stuff they don't use. But if you have someone translating VLC and they actually watch all their videos on VLC, everything that needs to be translated

will be translated.

So you can kinda be assured of

that. So those are kind of 2 things. I would say that question probably would would get answered differently in the in the free and open source communities as it would perhaps in the corporate. I think in the free and open source communities,

really, localizers want to see their localization

into production and and and in the software as quickly as possible. And and and just in terms of that reward cycle, I think it is really important to get it you know, you can partially

translate it up up and out there. I think in in the corporate world where

they're much more concerned about how their voice

comes through in in in in any particular localization.

They probably want a tighter review schedule to make sure, you know, that that any localization that goes into their product

is reflecting their voice in that language, and I think that's quite a big challenge. You have to trust your localizers really to tell you whether or not the localization

is reflecting your voice if if you don't speak that language.

And so it it's a slightly different process, I think, in the in the sort of corporate world than it is in the community world. Also in the corporate world, though, you'll you'll often have people commissioned to do certain translations, so you will get a 100% translated.

But I think Corpus will all discover that even paid translation is not necessarily

good translation.

It does get done. And if you choose well, you can kinda work out how to guarantee that. But I think your metrics will be slightly different if you're paying for translation. But even Microsoft doesn't translate everything. So

there's some point at which you, yeah, you make decisions.

To reduce the amount of content. In many respects, I think that, really, that's that's where the free and open source community really does have, you know, some advantages

is that we can get localization.

We can build localization communities in a way that we don't have to I understand. And worry about corporate voice. You know, we can reflect global voice instead. Yeah.

Well, I think 1 thing 1 part of the narrative in terms of where we sit at the moment in terms of localization generally for minor minority and marginal languages is that some of the changes in terms of Microsoft translating much more stuff was actually precipitated by people translating free and open source software, so it set a precedent

of needing to translate.

And so it's led to much, much more being translated.

So Google and Microsoft are pretty well

translated to quite a lot of languages nowadays.

Whereas,

15 years ago,

a company that translated a lot of languages would do 35

and mostly you could get away with doing 15 would be quite a good good 1. And I think people are doing many, many more languages and big products now. Yeah. And I'm sure that part of that too is driven by the increasing globalization

of industry

and just communication in general because the Internet has been able to reach more places as we

build out other methods for actually

creating that interconnect because before it was everything was hardwired landline. And now that as we're

developing more

wireless technologies that are may that make it easier and cheaper to

bring the Internet into more rural and undeveloped areas. It brings more people online and increases the need for them to be able to actually interact with the content that people are producing. Sure.

When when we kinda started this and looked

at mobile.nucleus

in the form that it does now Yeah. We'd have long debates about the value of translating and whether people in rural areas even needed it or could even, you know, access it.

And that kind of debate just became new to when Everyone has a mobile phone. And,

I mean, not everyone is on the Internet. Like, you know, we we must have pretend about some of the realities of that. But suddenly, like, should I translate this phone into this language is a question of,

do you wanna sell more minutes? And then that I met some Indian

cell phone providers who tackled the problem like that that said we could make a lot of money if we could sell really cheap calls

to lots of people. And the only way we could get lots of people is that having a Hindi phone was not optional. It was the only thing that they could do, and it had to be cheap. So

the mobile space, I think, has changed radically the way we think about what should be translated and and who we'll consume.

Yeah. And in in itself, I think it's the point you alluded to is is it has a hugely transformative

nature, you know, including so many more people into that global conversation. And I think

well, for us in organizations, how do we how do we make the Internet reflect those voices better? So at the

technical level of the actual

translation apparatus itself, I know that there are a number of different

standards and formats for being able to actually

encode the various translations into the source code that you're working with. So I'm wondering, what are some of the architectural and design challenges that have arisen in your work with the translate toolkit to be able to provide an abstraction

over the top of all of those various

systems? TCK is

certainly developed a little bit slower because we've got a wider community of users, and it's used at a lower level. So we have to be very careful when we make changes to make sure that that that it remains compatible and we're making the right changes and so on. I mean, historically, I think, as you said, there's been a lot of challenges in terms of supporting multiple encodings and working with, you know, not only different formats and and not necessarily well specified formats, but formats that people have kind of adapted themselves to kind of meet their own needs.

So,

TTK has kind of grown quite organically over the years to meet those various needs.

Internally, it tends to represent

different formats

in a pollike way opposed to kind of get text standard.

And it's been pretty robust. It it doesn't necessarily

cover all use cases, but it, you know, like other standards, it's it's kind of been adapted and

and overloaded, if you like, to to kind of meet different people's needs. So internally, TTK

represents pretty much everything in a kind of like way,

and that allows it to do, you know, things such as diffing and and kind of

comparing different files.

Python 3 is obviously a factor here. It's kind of making

the encoding more standard and and making Unicode more standard. Yeah. But the 1 funny thing to think about with TTK is that if you leave a bunch of programmers in a room, usually, they have no localization experience.

They will invent a new localization

format.

It's like the x k c d cartoon. You know? Like, there's already 15 formats. We need 1 that that'll encompass them all.

And there's now 16 formats.

So I think 1 of the things we've also been looking around that is I I think taking a step back and, like, what what are really the priorities there? And I think compatibility is a a priority. Possibility is a priority.

I think brevity is also a priority. You know? We looked at XLIF quite a bit at whether or not that's a a good way to internally represent

the different formats.

I think it possibly is because it's very extensible, so we can kinda make it, you know, without having to do anything outside of the spec. If you like, we can make it meet or

to cover all of those different formats.

I think it also provides a a, you know, a way of having, data portability, and and, obviously,

diffing and so on. I think where it falls down a little bit is obviously the the kind of classic problem with XML that it tends to be a little bit data heavy. So even for small amount of information, you end up with quite a large file, which is not quite as portable as as as 1 might hope. But

we haven't entirely decided our kind of road map in in in that sense, but I think we would like to move towards a format that is

more extensible by design than Pose

because Pose kind of served us so well up until now. For now, that's the the kind of internal standard that we tend to use. I think it's it's almost it's almost strange being an organization that we I don't think we anticipated that we would

have to maintain

a a library of formats. And,

so kind of ensuring stability and testing that was has been really interesting in learning those those kind of skills.

So

bringing bringing it up a level to

the POODL and VERTAL tools, I'm wondering how they compare feature wise to some of the other offerings for web or desktop based translation.

And I'm also curious if they're primarily used for translating

text in the context of software, or do you see them get used for other sources of text as well, such as maybe

print media? So picking up on the on the first question, when Dwayne first out developing PuSols, there really wasn't much in terms of other open source options for for localization, and and localization tend then tended therefore to be

done by hand. So PUSO was kind of a really critical play at that point. And 1 thing that's been you know, we've actually found really healthy is that there's now more and more tools available for localization.

Thankfully, quite a few of those are, also open source.

In terms of features and so on, you know, there's quite a few different platforms, so it's it's kinda hard to go into specific features and platforms and so on. I think for us, our our kind of focus is very much on localizers.

So,

our platform is very much set up so that if if you need

to translate a large number of strings, you can go through them very quickly.

And, you know, certainly, our aspiration is if you're a a complete novice localizer and don't really know much about localization

and formats and and what goes on in the back end and so on. It's also really easy for you to do that. So I think for us, it's about working with those communities,

working with communities that are building, for example, their own machine translation and and and finding out how we can incorporate that. In that respect, I'd say our strength is our community, really. I mean, I think from from a feature perspective, I think we, you know, we cover we tick all the boxes in terms of what's needed in the localization tool. I think the 1 critical thing for us is that we've already worked. And we think quite a lot around how do we,

put great

localization

technology, so the things that like t TM and MT and,

terminology.

TM is translation memory. Yeah. MT is machine translation. Well, I kind of wanted it. Not not to Biden it because, specifically, we wanna we wanna put that in people's hands without them meaning to know what they are and what they mean

and how they work. So

maybe a simple example would be when someone gives you a t m match, they'll give you a percentage match.

Like, who knows and who cares what that means? I don't know what it means. I don't think it means anything to anyone else. So we kind of don't show those kind of things to to so we weigh up in terms of translators. How do we get really great

technology from that the industry would use into their hands without them necessarily needing to be experts in the tools.

And so when we stack up the features like that, we have we have the features that people need in terms of localization.

We've also got a pretty healthy checks framework, I think. Beyond, human reviews, it allows

a certain amount of

checking and flagging up when there's issues.

And I think that's really feeds into 1 of the future goals that we have is is how do we apply

continuous integration

as as it's been quite successful for us as a community and as a project. How do we kind of apply that into the localization context itself?

I think that's an area we're really looking at,

how do we improve the those

those check systems, how do we make it easier for people to write checks specific to their language or their project?

And how do we find ways to integrate that

those checks frameworks into

other continuous

integration processes?

So do you see the Poodll and Vertol tools used outside of the context of software at all? Yeah. We do. As kind of localization purists, we kinda think, well, these tools, We did design them for software translation. They're not necessarily used

you know, we know the limitations in other spaces, but,

everyone we mean to use is it almost

invariably is translating other stuff. So they're translating

web content or they're translating documents that they've processed and and loaded up. So we you know, anything

anything that you could break down into sentences,

you could get into into people

do that.

So

so part of that is the, you know, the step before, you know, using the Translate Toolkit or or other processes.

But we see, like,

you know, we see many people who are starting with translating their software and very quickly realizing they wanna translate other things. So they're adding other content into their

people service and doing that. 1 of 1 of the our key areas of development has been,

VCS integration.

And in doing that, we've we've tried to make a very generic

adapter so that so that Pluto can consume and produce data for for for different,

you know, back end file systems.

I think in doing that, we've kinda seen the opportunities that we might be able to hook right into other potentially database,

you know, kind of content,

stores. So I think that that's that's an area we'd really like to to to push things further because it's it's it's how the tool is used, not necessarily how the tool is designed. And is there room for integration with some of the

natural language processing or machine learning systems for being able to actually do some of the things like word stemming so that you can

take a particular translation

and then be able to extrapolate some of the other potential

word tenses

for maybe bringing it into a different context,

particularly in the

case of the translation memory system? I think there's a huge potential for that. That's massive. Remind me a little bit of watching,

DJs going from, vinyl to digital, if you like. Initially, when I got involved in localization, there was a huge amount of pushback to machine translation,

because it was seen as being of low quality

and not as good as what a human could produce. I think that that technology in itself has

transformed just in the last couple of years, and I think there's there's now a lot more buy in. Not necessarily to automate translation

per se because you're not necessarily gonna get the voice If you wanna reflect a particular voice in the culture and so on, you kinda need to understand

the message on a slightly deeper level than than than the pure translation.

But certainly in terms of at this time at least, certainly in terms of, as an aid to localizers.

But I think that's a space that's likely to really push forward. An interesting project we did

a year or so ago was a professor in the US who translates in in a couple of different,

Gaelic languages,

and the languages are very, very closely related.

And so really what he wanted to do was to be able to translate 1 language and then machine translate from 1 of those to the other because there it's almost systematic.

And so we integrated that into Pluto, and that's kind of really sped up his pros in in his process of localization.

But it would be really good to see that applied not just to the the kinda headline big

European or or Hindi languages or Asian languages, but also to some of those

smaller languages because,

you know, they're they're at risk of of of dying out or or or don't necessarily have such good support already. So I think getting

a body of translation into machine learning could benefit those languages as well. So that's something we'd really we'd really like to look at. And if there's anyone who wants to kind of fund us to do some

some some proper research into that, that would be amazing. But

for now, it's you know, it's where where we can find collaboration, really, to to to bring those skills

to our project. I think if we look at the the last year with Pucel, we spent a lot of our time just, cleaning out code and abstracting things and making things pluggable.

Well, the machine translation integration is pluggable. We'd love to make it more pluggable.

The translation memories is pluggable.

Checks will become pluggable. So I think we feel there's a there's a ton of scope to allow people to adapt things. I mean, even, you know, we could do much more with translation memory by doing they call it in context exact matching where the string matches. It matches better if it's the string before and the string after match, and it might might address the context issue.

We talked about

but even doing I think your kind of question talked a little bit about stemming, but it's even doing that. So it's finding a word, stemming it,

and and or figuring it out that that actually the English is using the verb, so we actually are gonna suggest

the verb terminology for you. There's a time that we'd love to see happening there. So that that's really exciting for us to to empower

language communities really to be able

to drive some of that and and integrate

tools and integrate stuff like that.

So it feels like, I think from from our side, because we do focus so much on on the teams, is that a lot of focus on localization tools is about the program. It's like, how do we integrate with your version control system and and your software,

which is important. But I think we feel like there's a lot of power that we're missing by,

just not empowering the teams that are going to come to translate your software well. And so we'd love to see more happening there. And on the language front, what are some of the complexities or difficulties that are introduced by needing to support a language

needing to be able to support an additional language that isn't already covered by the

that isn't already covered by the tools that you're developing?

Pretty much no work, really.

You know, there I mean, there are some technical things that a new language that comes in needs to do. But bless the Unicode

consortium

because they solved 1 of the biggest, biggest problems that really used to to stymie languages.

They couldn't represent their characters and they didn't have fonts.

So some of the big problems that we had have traditionally hit and a lot have been eliminated. So, Unicode is eliminated,

1 of the biggest problems. So, that is not too much of a problem for new languages.

New languages face things like font issues,

but that's been pretty well addressed now by Google and

Microsoft in in making

universally available fonts that cover a lot of characters.

But that is a language that minority languages will an issue that minority languages will hit.

That they can write their text, but they can't actually see it on the screen. And they they can't see it on the Internet.

And then there are other like, deeper things that languages need to start tackling with some software things.

So you need kind of the good software

localization

frameworks

will have concepts of plural forms.

So there

are 1 file to down downloading and there's 2 files downloading. They're kind of plural. And in Arabic, there's 6 forms. In Russian, there are 3.

And so you translate it differently depending on what that number is. So some of those languages that don't have that defined need to define the algorithm that's or the equation that

just determines which form you're gonna use. There are standards around that. That's right. And there are standards. So even

Unicode

has emerged the CLDR,

which is the Common Locale Data Repository, and that's storing a lot a lot of stuff. How do you write dates? So those are the things that a new language would have to do. How do you write dates and times in your language?

What number of systems do you use? What kind of number of systems?

But, you know, we're in a very different space these days where you basically have to be quite a small minority language that hadn't been localized into anything to still have to tackle those. So the things that we had faced on Pucel are minimal. Like,

we we don't have to do very much to get a team translating anymore.

Data bases cover their language and all that. Well, answering that question in in the in the context of of somebody wanting to see more free to open source software,

localized to their language, I think the answer is probably more of a community thing. It's it's about

knowing where to where to go and translate things. And I think I think 1 of the 1 of the things we'd like to see is a slightly easier process for somebody who wants to set about translating a language. At the moment,

you would have to go to quite a lot of different places and engage with the with the localization

groups within those projects in order to get that done. And what would be quite nice is if there was a place where people could engage with lots of open source projects at the same time. So I think, I mean, now that's the the communication in some way, which was,

looking at it from our perspective and from the software perspective. But if you look at it from a language team's perspective, if they've never translated anything,

1 of their serious serious hurdles is they will have

no terms for a lot of computer terminology. So the terms might exist. They need to collect them, or they might need to repurpose them, following very similar ways that language

develops new terms anyway.

It

adapts words, reuses words, or creates new words. So that's usually for a new team is 1 of their biggest hurdles, is learning how to localize and then,

building the missing

terminology

and using it.

So building a terminology base is is 1 of their biggest hurdles. And in terms of the distribution of your software,

when you have a web based presence, it's fairly simple to

tell people how to access it. You just go to a particular website and maybe go through the authentication routine. But you also have your desktop project, Vertal, which introduces a whole plethora of distribution issues in terms of being able to get it on somebody's computer and make sure that it's running effectively. So I'm wondering,

what are some of the challenges that you've had to overcome in the process of packaging and distribute in terms of solving the packaging and deployment aspects of getting Vertol out to people who are trying to use it. Just I would just quickly pick up that in terms of Pluto, we do release it for different platforms. So we still have some of those challenges in terms of supporting a different

ecosystem,

for that software to be deployed.

But I think you're right that the challenges in terms of Versailles, and Dwayne could probably talk about Well, the challenges are pretty big, which is why we haven't released it

for a long time. So, I mean, Versailles is 1 of my favorite pieces of software, but it's just we haven't had the time or resources to really, really develop it. It's been fun watching other people take some of its ideas and implement it in theirs.

So some of the the biggest problems we faced was and some of them do relate to Python and some of our choices in terms of, GUI toolkits. So we we use GTK

and that that has been 1 of our hardest parts in terms of distributing is just how do you package it

on Windows and Mac. So there's documentation

and processes.

It's never as clear as it is on on Linux about how you do it. And so packaging on those has been really difficult.

GTK went through 2 to 3 migration, and we've never made it to to 3 because that transaction was difficult. So so the biggest hurdles we found in terms of distributing was it's really difficult. Well, it's not difficult to do it, but it's it's quite a process.

There was a lot of engineering and time to get the first Mac version out. But then there was the feeling of it because quite a lot of work to make it feel like it really belonged

on a Mac platform. So g t k, at the time we were doing it, just just always felt a little bit like it didn't belong.

So packaging up GTK, compiling it,

making sure that you've got a deployable app is quite difficult

from the path and perspective.

They make pretty nice packages that you've you've got everything there and it all works, but that was quite a process.

So we have a luxury in terms of Linux and that other people end up packaging it for you.

But having to package it was difficult.

And that was the hardest part was just, you know, being able to build packages,

getting it up on the website, getting people to download it was less of an issue. And so we we really have struggled, just to recreate

those engineering skills to rebuild and automate that. And and in that time, web technologies really moved on. And so I think the question we we ask ourselves is, do we want to put that time into a desktop piece of software where,

really, people are much more reluctant now to install a desktop piece of software,

and and much more inclined towards using the web. And and and so the question really is, can we use the web

to address the issues that Vital was was was addressing in terms of offline usage? I mean, it's not something we've we've really successfully

conquered so far, but it's certainly something we've we've actually really discussed. Yeah. And I think some of what we wanna do with Pootle in terms of,

like, using WebSockets and that would would put us in a very different paradigm for the for the editor,

you know, like the Pootle

transaction editor and say, well, you know, could we take this offline? And then we can take it offline. Are we addressing

kind of the core need of why why we wrote Fatal, which was was people with no connectivity.

So we were dealing with people who would have to drive to an Internet cafe, download files, go translate them at home, and we might be able to tackle that by by just doing Google a bit differently. So it's not the most positive story for releasing

desktop stuff

written in Python. But but, you know, we've got to best use our limited resources in terms of,

our development time, and I think, really, there's so much

more reach with with web technologies than there is with desktop,

technology. So it's it's kind of been more of our focus, if we're if we're honest. We do love to tell people that, we accept poor requests, and we love community contributions. So in terms of FITALE, I know we've got people who are still using it. They love using it. If we can entice people out of the community who who're willing to just spend a little bit of time figuring out how to package on Windows

now. And we know how to package on Mac. It's just the the time that's involved in in putting it all together

and really what's making it. Then then we we're probably in a position to to do that again. So that's it's kind of an appeal to the wider community if you're passionate about GTK

or Python on the desktop

or Python on Please don't worry. Windows and Mac.

Yeah. We'd love to we'd love to hear from you. So what are some of the things that you have planned for the future of your projects?

So the last year or so has been really about tightening up mostly around Pluto and looking at how we can

use, you know, continuous integration processes in Pluto. And we've kinda got our our test coverage right up, and we've, you know, really addressed a lot of the the performance and security issues.

So that's been really successful, but it's meant we haven't released. And so

we're just about to, release an RC version of, Pootle. So, I mean, that's quite exciting for us, really.

We kind of made the decision some time ago that we didn't wanna call something a stable release until we'd addressed really some of the kind of fundamental problems that we we were aware of. That's a kind of quite positive story, really, because we've got that out of the way, and we're and we're now moving to that release. So I think moving forward, we'd really like to see a more regular release cycle.

In terms of actual features, I think

we'd like to make better use of newer web technologies.

Been working for quite some time on, VCS integration,

and

that's

fairly stable now. We're gonna package it as part of the the release,

although still quite a lot of work

needed to be done around that. Improving our format handling,

not a 100% which direction we'll go with that, but, really, we've been

looking at how we can encompass some of the features that are specific to particular formats in a generic way

and at the same time, keep the data that's,

inside Pootsu, make that as portable and, as possible.

Improving our quality check system, making them more extensible, making it easier for people to write more specific checks.

I think they're our kind of,

top line goals really at the moment. Yeah. So for people who want to follow what you guys are up to and keep in touch, what would be the best way for them to do that? Well, the the the 2 places we are hanging out is, if we want to engage with the project, it's our GitHub,

get a project at, translate slash poodle.

And we're on a a git channel,

gitter.im/translate/pootl,

and they can engage us there. That's the best place for kind of real time chat. You know, there's always pretty much, you know, 1 of us around. And if we don't answer straight away, we'll generally answer fairly quickly.

And,

you know, so there's pretty active community there. So And for anybody who wants to follow either or both of you individually, what would be the best way for them to do that? How about you, Duane? I don't know.

I really did.

Okay. Oh, GitHub, I would say for myself.

Github.com,

Flax,

phlax.

And,

if you wanna follow what we're up to, if you want to, actually speak with us, then the best channel is the Gita channel. Okay. So with that, I will move us to the picks. My pick today is going to be the

Google Chromecast.

I picked 1 of those up recently because it was on sale for about $10 off the regular price. And

my experience so far in using it has been good.

There's a little bit of disparity in terms of the support for different apps on my Android tablet, but

it's a useful product for being able

to easily get something from 1 screen to a bigger screen for more people to be able to share and viewing it. And it's I'll be interested to see what other sorts of support and uses

it gets put to as they continue to develop that project.

So with that, I will pass it to you, Duane. What do you have for picks today?

So I struggle because I I think I wanted to use this as an opportunity maybe to call out some of the people that we've relied on and loved in terms of the development that we've done. You're free to do that as well, in addition to any picks that you might have. Okay. Cool. This is just an excuse, so I can actually think about what I'm gonna pick.

Yep. No. Me too. I'm not sure.

Wankawa is Jetsea, which we use every day

for

real time voice and video,

which is an open source project developing an equivalent of some of the closed things that we're often engaged with. And so we

we rely on them and love them. And mostly we love the fact that every morning we get to see their 4

word random room. So if you ever go to meet used a nip mnemonic. S I, they create

a 4 word mnemonic for

interim, which creates great solidarity every morning. Yeah. I'd probably go with, with GISS if we're if we're if we're gonna pick on basis of of what what gives us a huge amount of utility.

Somebody had used IRC

forever. You know, there's a lot of resistance to moving off IRC and certainly in in open source projects because people are so used to it. But I think in reality,

HTTP based messaging

offers so many,

advantages over IRC that that that it's hard to

it's hard to look back, really. And that's become a really cool part of our our development process, really, being able to communicate. And I think I think the the the killer feature really is is the fact that Gutter allows

unlimited rooms, both pretty much both public and private for open source projects.

So it allows you to sort of build build your community really effectively. Alright. Well, I really appreciate the both of you taking time out of your day to join me and tell me about the work you're doing with Translighthouse

and the projects that you've produced for being able to bring localization and internationalization

capabilities

to larger communities definitely seems like

an important thing for everybody to be aware of. And I'll definitely be trying to push more for including those capabilities in the projects that I'm a part of. So I appreciate it, and I hope you enjoy the rest of your day. Thank you so much for having us. Yep. Cheers.

The Python Podcast.init

Summary

Brief Introduction

Interview with Dwayne Bailey and Ryan Northey

Keep In Touch

Picks

Links

The Python Podcast.__init__

Summary

Brief Introduction

Interview with Dwayne Bailey and Ryan Northey

Keep In Touch

Picks

Links

The Python Podcast.init