Build Your Own Domain Specific Language in Python With textX

Hello, and welcome to podcast dot in it, the podcast about Python and the people who make it great. When you're ready to launch your next app or want to try out a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode. With the launch of their managed Kubernetes platform, it's easy to get started with the next generation of deployment and scaling powered by the battle tested Linode platform, including simple pricing, node balancers, 40 gigabit networking, dedicated CPU and GPU instances, and worldwide data centers.

Go to python podcast.com/linode

today. That's l I n o d e, and get a $60 credit to try out our Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show.

You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis.

For more opportunities to stay up to date, gain new skills, and learn from your peers, there are a growing number of virtual events that you can attend from the comfort and safety of your own home.

Go to python podcast.com/conferences

to check out the upcoming events being offered by our partners and get registered today. Your host as usual is Tobias Macy. And today, I'm interviewing Igor Dejanovich

about TextX, a metalanguage for building domain specific languages in Python. So Igor, can you start by introducing yourself?

Hi, Tobias. Thanks for hanging. Sure.

I'm Igor Dejanovic.

I work as a professor at the University of Halmi Sad, teaching their several

courses in software engineering. And the most relevant for this podcast is probably,

the the course on DSL. And that's 1 of the reason why the text text exist actually.

Alright. And do you remember how you first got introduced to Python? Well, I remember it was relatively late,

since Python is around since nineties.

I use different languages back there, and I even missed the opportunity to use Python in the days where I do a lot of sysadmin stuff,

where Python is used a lot. I think I picked up Python tried to pick up Python first in 2008.

But I remember that I was put off by

semantic white spaces at that time because I didn't have actually an experience with any language that

did something like that. So I said, remember I called off a little bit,

and again tried Python. I think it was 2, 009.

And I decided that time to give it a

few days.

Let's try it for a couple of days and see how the things will go. And I remember that,

day after day I actually start to enjoy even the white space stuff.

So the nesting with white space started to get very, very logical to me. And later on,

I learned there is actually a name for that in a DSL,

literature. It's called the secondary syntax,

part of the language syntax that you can

that doesn't does not have, specific semantics, but you can freely use. That's the white space, for example, in the intellectual language or in graphical language that's, colors and shapes and positions. And actually, if you have a lot of secondary syntax,

you get all sorts of readability issues

because people tend to develop their own

styles. And Python did it quite good here because reducing secondary syntax actually,

improves readability.

So, yeah, I actually

love it a lot now.

And then in terms of the context of domain specific languages, can you give a bit of a description about what they are and some of the cases where you might need to build 1 versus just using

purpose programming language?

Dsl is language

tailored and constrained to a particular domain.

It is at the right level of abstraction which enables

its user, the domain experts,

a higher level of expressiveness by removing unnecessary information

that is a part of common understanding.

What do I mean by that? For example imagine 2 lawyers if they're talking some legal issues

because they are operating in the same domain

they can remove all unnecessary information that are, common understanding.

So their expressiveness

is higher,

they can use

shorter forms to convey information between each other. But if, they are about to

explain something to a person that is outside of the legal domain they will have to be much more verbose because they do not share the common understanding.

Besides using the concept of a domain,

DSLs also use a concrete notation that is used in a given domain and thus ideally

making the domain expert capable of specifying the solutions

on their own. Of course in practice

that is not always achieved but anyway

even if the domain expert is not using the DSL directly

it is much easier

to communicate with the developer when they're looking at the at the familiar notation for them. DSLs came in different forms and shapes.

We have, for example, internal and external DSLs.

Internal DSLs are

those built

inside the host language.

For example, if you use some clever feature of some language you can make

something that looks like different language, but is actually

interpreted or compiled by the same compiler.

Some languages are more capable in that direction. For example,

lisp is, well known for

being,

very capable when it comes to DSL and the lispers

are usually

create DSLs all the time.

Then we have for example some of the more contemporary languages like like Ruby is also very popular for building internal DSLs.

Even Python which is not very capable of building internal DSLs, we see internal DSLs all the time. For example if you take Django as a web application framework, in Django you have the definition of a data model,

you create a class that extends the model class and then you specify some class attributes

as a instance of a fields and then,

Django is capable from that description to dynamically generate all kind of stuff like for example,

object relational

mapper or SQL schema or for example admin interface for CRUD operations.

On the other hand, external DSLs

are full blown languages on their own. They have their own syntax.

We have to build a compiler for them or interpreter

so they are much harder to to build and maintain.

And then, of course,

we have

other categorization

like, for example, by concrete syntax we have textual or graphical or

some other notation like tabular or those languages are DSLs

but just with different interface to the user. And why we should use DSLs? Well,

first of all when

you are constrained

in a way what you can see and you use only the domain concept,

you can

express the solution

much in a much more

condensed form so you're more expressive

because,

the commonly understood stuff is hidden from the from you. It's built into the the the tooling platform into the compiler that enables

us to be more productive.

Some

case studies for example there is a case study by our meta case

done in Nokia, I think it was like maybe 10 years ago

done with the development of mobile applications.

They

measured

productivity boost by a factor of 10. But in general

what I can observe in practice is that you should achieve at least factor of 5 if you're implement dsl right in the right way. Another appealing reason for using dsl

beside productivity boost is that,

your

solution, your knowledge from some domain is stored and specified

in a DSL

which is independent from the underlying technology.

So it will evolve not with the technology itself but with the domain.

What it means is that

your knowledge

is preserved and can be easily transferred to another target platform and what is also important because it is on the right level of abstraction

that is familiar to the main expert. The specification of a solution serve also as the up to date documentation

of the system.

And so TextX is a tool chain for being able to define your own DSLs for being able to incorporate into

Python and other language projects. I'm wondering if you can give a bit more description about the TextX project and some of the motivation behind creating it and, some of its origin story. Well,

TexteX is actually

a DSL and a tool for building DSLs.

So it's a metalanguage.

And the motivation was,

early in my career I got introduced to model driven engineering

and the DSL just, let's say, a different flavor of it or they are overlap a lot. So I quickly got into DSL stuff, through the project called Xtext.

It is a Java based project,

and I think it was somewhere in 2, 005 or 6 when I played and used the Xtext.

And I always wanted something,

that is

similar to Xtext but in Python.

And I wanted something for the DSL course,

so it should be lightweight. It should be something easy to use.

And the Hextex was a little bit heavier because it is Java, it is Eclipse based, and

it is a lot a lot harder to learn. So I wanted something easier for a student to get started. So that's that's the motivation.

And I think I started developing it somewhere

maybe

2, 005, I think.

2015. Sorry. So that's that's the time I decided to to sit down and and actually implement it because I realized there is nothing similar to it in the Python world at that time. And as you mentioned,

there are a number of other libraries that do exist in Python for being able to build DSLs or write parsers. But what are some of the capabilities of TextX that make it stand out that might cause somebody to choose it over some of the available other options and maybe some of the characteristics of the overall space in of DSLs in Python that make something like TextX useful and necessary?

Yeah. There are great options in the Python world. We're actually lucky to have many parsing option. There is, for example, ply and sly and py parsing and parsimonious

and lark.

Even Antler, for example, which is a Java tool but can produce a parser for Python. But

all those tools are,

I would say,

more, like a normal normal parser or classical parsing tools

where you get,

much more involvement into building, when you're building DSL, you have to beside grammar, you have to describe,

the actions actually,

which is used to transform

the parse tree to something else or or they can do that on the fly without building parse tree. So you have to do a lot more to maintain your language.

But, the Textics is built on the idea that stem right from Xtext, where you actually describe through the grammar, you actually describe both

the parse or the syntax of the language and the what we call the metamodel of the language or the structure of the language. So in a way it is constrained. That's why I like to say it's DSL for building DSL. You're constrained but through that that constraintment,

you actually are much more productive and you can say, more with less

because it's all you need is, more or less the grammar itself

and you can start using your language.

Text will out of the box create all the necessary

elements of your language dynamically. So it will create classes that correlates with your grammar rules. It will create a parser that will parse your

textual representation of your model and instantiate the object of your dynamically created classes. And that all happened at runtime,

just by reading the grammar. Your language is much easier to maintain.

And for people who are building these DSLs,

how does the actual definition of the grammar end up hooking into the behavior?

As you've mentioned with the direct parsing tools, you have to be much more manual and explicit about it? Well,

when you're using

classical parsing,

you make grammar and then

you will either get the parse tree or you'll get some nested list, for example.

That's 1 way to transform the parsed content to some, to some data structure. But then you will need to transform that to some other form to be usable

in the further processing.

And you have all sort of things like you if that is some

real language you have for example a reference resolving

or you have,

to define, for example, parent child relationship between elements.

That's all, built in into TextText.

So,

by just giving the text as a grammar,

you will get the nice graph of you are not getting the parse tree, you are actually getting the

object,

graph or Python object plain Python objects

and they are connected

by the

reference resolving.

So it's not a tree, it's a graph, and you can use it straight away. It's a graph of a Python object.

And you can, plug

into the creation,

sort of way, you can define something called in TextText object processors.

So object processor is, is a callable in Python that,

is called to either check or transform,

the object that is being created.

So in that way, you can implement additional semantics checks or you can, on the fly change the object being created.

That is similar to, for example, semantic action you are doing in classical parsing.

But this is optional thing that you can add,

to introduce additional semantics or additional

transformation of the objects.

And so for those semantics, is that something like being able to say that this particular token is a keyword, whereas this other type of token is

the ask asking about tokenization of the input, the text text is based on, PEG parsing or PEG PEG grammars and, recursive descent, parsing called packet parsing. So it will distinguish between,

the name of, for example, of a function and some keyword,

because it has unlimited,

unlimited look ahead and can, resolve that ambiguity.

For people who are defining the DSLs,

what are some of the challenges that they face in just constructing the syntax

of their target language? And what are some of the types of inspiration that they might look to for determining how it's going to look and the user experience of the people who are going to be

using that language and the DSL parser and logic that are going to be generated and built with TextX?

Well, there are all sorts of challenges

during language design.

First of all, we should be sure that when building

domain specific languages, first, we must be ensure that the domain is covered correctly. What that means is that,

that you built into your language the correct concepts and the relationship and there are any, there are no concept that you're left out or that you don't have any additional concepts that are not relevant

for the domain. What usually happens in practice,

and is a danger for DSL developer

is that you start with the DSL for some domain, but then it's it's hard. It's very tempting to add stuff to that language.

And

many times, DSL end up being a GPL or general purpose language.

That's 1 consideration. The other is,

defining the proper syntax for for the language. And when you we are talking about DSL in general, we are not talking about only the textiles languages.

Concrete syntax can be anything. It can be graphical representation, it can be tabular.

So, the concrete syntax is actually

the interface to the user. What user will, see and feel and interact with during the usage of of the language. So it must be very nice. It must be easy to use. It must be very intuitive. To be intuitive, it must already correlate to the existing language in the domain. We're just formalizing what user are already probably using.

And

those are consideration regarding the syntax. And of course, we're we are building,

textual syntaxes. We have to consider all technical stuff, like parsing issues,

left recursion, ambiguities,

and at the end

comes the semantics. So,

we usually describe semantics by interpreting or compiling our language to something else. So choosing the right execution style, whether we should interpret our language or compile,

makes a difference.

And of course, it's probably less important, but we have to take care about,

runtime performance of our language. So all this decision will influence

how,

efficient our language will be in practice.

And on the point of

runtime performance,

what are some of the capabilities of Python that will lead someone to use it as the host language for a DSL? And what are some of the cases where somebody might want

to use a DSL that's using a different host language that's more optimized for particular

latencies or particular target environments?

Well, it depends

how critical is the runtime performance for your use case. I usually look at 2,

performances.

It's 1 is runtime, and 1 is development time or maintenance time. So if it's more important to you to, quickly develop your language and to easily maintain it,

then

Python is a a really good choice. Its its dynamic nature gives you a

quick turnaround

and you can experiment easily

so no wonder it's used for prototyping and for thing that are, you know, right 1 and throw it away. So if your,

if your system really needs, it's critical and needs something that must give a better run time performance. Python probably is not a good option. There are other languages that could serve as a better host. But still, you can use TextText or similar tools built on Python to produce or generate code for some other,

runtime platforms.

So,

you can see, different aspects. What are you using for developing language and what are you using for runtime? Runtime can be different. So you can easily generate, for example,

Rust code from, using text text from your models. That's not any there is no any constraint on in that regard.

And as far as the overall end user experience

of working with the DSLs that are being built using TextX, what are some of the

associated

needs as far as tooling or overall ecosystem

of building and working in that environment? And what are some of the additional either

associated projects of TextX or capabilities built into it that help in that overall process? Yes. Well, tooling is very important, when talking about DSLs. And,

given all the benefits you get from DSL,

the main reason why people didn't use so much DSLs in the in the past probably is the tooling support. Because it's not easy to build DSL from scratch and to maintain. So,

there are actually a class of software tools

that are specifically made for building DSLs

and they are called language workbenches.

TextText is not

a language workbench,

so it's a simpler tool.

Language word bench is a is a integrated environment for building languages and evolving language. So it's much more complex. But for the DSL part, there is a

a text text comment that you get when you install the library. So when you install your library in a Python environment, virtual environment, you get the text text comment

that can be used to compile your

or to check your model or to visualize your model, for example, or metamodel. So you can, for example, generate a nice diagram of your,

grammar.

It is a class

class like diagram that describes the structure of the language. Or you can use text text for example,

text text common to start the project to make

initial

out outline of the project. And there is,

another

useful tools. For example,

there is a support for for language server protocol and visual

studio code integration.

It is a project TextText

LS that is worked on, that is, Daniel Alero is working on it.

He was working on a master thesis regarding,

language language server protocol in a text text based languages and he took after his master thesis finished, he continued to work on that. And now we have

a second version of

that project

that is

very nicely

developing.

So anyone who want to try Textex

should check out that project.

It is under the same organization GitHub

like,

Textex itself. Yeah. Being able to have

that syntax highlighting

and the language server support in the development environments will certainly reduce the burden of people who want to be able to take advantage of the DSL without just looking at a blank wall of text and not really having any indicators of what the different tokens are and what their meaning might be in relation to each other? Definitely.

It's, first thing

that should be done when you're producing a tooling for your language, syntax lighting and code completion and code navigation.

So you should help your user to easily navigate around and to get help from the from the environment.

And the TextXLS

is a project exactly for that. Any language you develop using TextX can automatically generate the Visual Studio Code Visual Studio Code integration for your for that language. So out of the box, you get syntax highlighting for your language, which you can further configure if you're not, satisfied with the with the initial results. And

it is planned to support,

all styles of IDE

support like

navigation and completion and so it's it's not still finished fully, but it's very usable at the moment and can be tried out. Digging deeper into Textex

itself, can you talk through how it's implemented and some of the ways that

the structure of the project and its overall goals have evolved since you first began working on it? Well, it's built on top of a parser called El Arpeggio. It's a parser,

it's a peg parser. I started developing in 2, 009.

It's

what it's probably first project, a real project that I did in python.

So

when I decided to write Textex,

I decided okay I will use

back parsing

and

because I

knew Arpeggio very well and I realized that I will probably need to need to tweak it along the way to support all the features I I want to have in TextText,

I use it. And that

was a good decision, I think, because along the way I did have to tune few things in arpeggio itself to help develop develop some TextExp features easier. So basically, arpeggio is doing the parsing.

TextText is just a layer above arpeggio. How it works is that there is if you open the TextText source you will see that there is

a TextText a grammar language defined in arpeggio syntax. And when the grammar is parsed, there is a visitor that will build

the

metamodel and the another parser out of the grammar. That another parser is arpeggio parser for your new language.

And the metamodel

is the object holding all the information about your language. So all the concepts, all the relationship,

everything is contained in that object. And that metamodel object is actually used as

a entry point API entry point for further parsing.

You create the metamodel and you say metamodel.parse

model from file or model from string if you, want to parse the file or the parse to parse the string. And

the arpeggio parser built dynamically for your language,

will is accompanied with a

visitor that will transform the parse tree to the object to the object graph of corresponding to your grammar. And that design didn't change much from the beginning, so, the core design remained the same. But the language itself for the grammar evolved over time. It started as a x text language. The my idea at the time was to just make a NextText implementation in Python,

But actually, it grew over time and add some additional shortcuts in the grammar language itself and some way to easier specify some stuff. For example, there is something called a repetition modifier in TextText.

When you want to match

0 or more

stuff or 1 or more stuff, you can attach,

a syntactic

addition to the

plus sign or to asterisk style

and say, okay, match 0 or more elements or objects but they should be separated by by something and you just add the separator.

In a classical parsing, what will you do? You will have to do that manually. You will have to say match this and then comma and this

0 or more time. But in, Textex, it's much shorter to write. There are also for example rule modifiers

that doesn't exist in in x text but do exist in text text. There is a relatively recent addition of an order choice.

An order choice is, when you have a sequence of thing that you want to match and you say, okay, match those sequence in any order.

The text text or beneath our page, you will match all those elements, but in any order they are specified, which is very handy for some languages that define the key for example, some keywords that can be written in any in any order. So more or less that's that's about the design, the design itself.

So the core

core remained

pretty much the same through all this time. And for people who are using Textex for building their own languages, you mentioned a little bit about the need for having the grammar definition

and then being able to parse the written language of the end user and being able to

generate

the concrete model from that. But

what is the overall end to end workflow

of somebody who is

defining a new language with TextX and then being able

to distribute it to end users for them to be able to actually make use of that and develop within it?

Well, the workflow

can be,

different depending on the

on how complex your DSL is. So you can start very simple. Like, you can define your language embedded in a Python module. You can just write a string with a little grammar

and then you can you can call,

1 function that will transform that string to the metamodel, and then you can use a metamodel. So it's just few lines of code if your language is very simple. But if you were developing something more complex, then you can,

build the whole project,

language project.

And there is now a support for that in additional project called TextText Dev that can be installed together with TextText either by using p p install TextText with the with the dev as dependency

or as a optional dependency, or you can directly install TextText dev. When you install that that project, it will add additional command to TextText called start project. So it's similar to, for example, how Django would create a new project. So you type text text start project and you get initial project answer answer to several question. And then the initial project is generated.

In that project, you have a grammar file

where where you where you should go to define your grammar. And then

the project has a registration

already built in. So the project will be registered with the text text. It is done through the setup tools extension point mechanism.

So languages can be

extendable, can be actually, there there are like plugins for a TextText

in a way. So you can use text text to list languages.

And the generators for the languages are also registered in the text text. Setup.py.

So you can list generators also. So in that case workflow is start a project, text text start project and then

play with create the grammar. Usually, I tend to

first open just a blank file and try to write some model in it. So how I would like to express some solution.

And then I write that, solution that model.

And then I in parallel, I'll develop a grammar for that.

And then I usually have a small

unit test that I run constantly to see if everything works, or I just

have,

a I just use it for example text text command line to check if grammar is okay and then I iterate. I extend the model, add some new things in the model and then I extend the grammar and see if everything parses and when I end up with the syntax part so I'm satisfied with how the model look likes and how the grammar look likes, then I

designed the semantics. I built a compiler for that, either we're using some template engine or I make a little I made a little,

interpreter for the language. At the end of the process, you can pack that up, make a package of it, and release it on pipe PyPI, for example.

So the user can just install that language. And if the user would like to have,

ID support, you can use TextXLS

to build a Visual Studio Code plugin

with all the syntax highlighting and and stuff. And then you can distribute your language through that plugin. So that those are the options and it's very flexible. You can use it either very simple or you can use it like a full full blown project language project.

For the

languages that you're defining,

that brings up an interesting thought as far as how you would provide things like unit test capabilities

for the people who are writing the language to ensure that what they're building is going to parse properly or function as intended. And

I'm curious what your experience has been as far as how frequently people will actually go that extra mile to build additional ecosystem tooling for their languages and and just the overall need for it and some of the points at which it hits the tipping point of complexity where that's even necessary?

Well

it all depends who who the end users are.

If, if for example end user are people that are not that technically savvy probably,

it's

probably good investment to

make

good tooling support.

And for the testing,

I think it's generally always good to to write tests when you're developing your language. I usually cover all,

projects, open source projects that I work on with Pytest

test with a good coverage. I think it's very important, beside the documentation.

I generally feel more confident when doing some larger refactorings,

changing the language.

I want to to be sure that the

assumptions that I had before are not broken.

So I think it's it's worthwhile to put some additional additional

job in in making proper testing. And then as far as

the specifics

of the parsing implementation,

I know you mentioned that you're using a peg parser with some customization

a a defined grammar and concrete implementations of it, and you would be better served with a different parsing approach?

Well, PEG parsers are

really, really nice for its simplicity.

It's it's kinda

something you will probably end up with, if you're trying to build the parser manually.

You will probably

go to recursive descent. It's it's easy to understand.

So the back parsers are really easy to debug.

But, they're they have

this difference

comparing

to context free grammars that their order choice

is ordered.

So the alternative is ordered.

And by that, I mean when, you have several alternatives to match at some point,

you're telling the parser, try this. If this does not succeed, try the other 1.

And do this until you find something that that succeeds.

So in a way PAGS are more imperative,

in comparison to, to

context free grammar which are more declarative. You would just say,

this non terminal is this or this or this. I don't care in what order or what whatever. It's just

this I declare this. And what's problem with PEG is that

it is always,

it will always be, unambiguous.

But,

this might

sound good

and but in practice it's actually not always.

Because, it hides the ambiguity in the language. It will, just

go from left to right and pick first that match from the order choice and that is the the way it resolves ambiguity.

But it's not always what you want.

So and you will not get any warning.

It's the grammar is very hard to analyze for those things. For example,

the typical

problem you have with PAGS is for example imagine you will try to match a

and then if that that doesn't succeed

you will match a and then b. You can see that this second match will never succeed

or never be reached. If you find a in input it will be matched by the first choice.

So this a with b afterwards will never be reached. And then can

introduce

various problems in practice.

And the most difficult problems

are when you reorder the order choice. You you are actually changing language, but it's hard to see how. So in a big grammars can that can be problematic.

You don't have any analysis

from the from the tool.

And with,

the other parsing approaches,

which are based on on CFGs

and do some preprocessing and grammar analysis,

you do have some help.

Either choose some like for example shift reduce conflicts that tell you that at some point you have some either ambiguity or you need more look ahead to resolve something. So PAGS are easy to debug, easy to understand but

do have their own problems.

For people who are using TextX and building their own languages, what are some of the common challenges that they run into either in terms of being able to overcome those challenges

of the limitations of the peg grammar

or just the overall process of building the DSL

and making it available to their end users for being able to

do the work that the DSL is intended for?

So if if I understood the question,

usually

people,

from my experience, because

the problem with open source projects is

that you don't always get full feedback from the users. But I do have a lot of feedback from my students.

So,

it usually goes relatively smooth. When they have a good documentation

and good examples,

they will just read through that and generally they understand understand that very well so they don't have much problems with it.

Usually, initially

they

have a sort sometimes they have a sort of fear from the parsing in general,

probably because previously they were exposed to some old classic tools like Flex and Yac and similar,

so they consider parsing very hard and hard to understand and,

but I think that fear is very quickly overcome when they start to work with

easier to

understand and to use tools.

And as far as projects that you have seen built with Tech Stack so that you've built yourself, What are some of the most interesting or innovative or unexpected ways that you've seen it used?

Well,

again,

the most projects I see developed are from from my students.

There are several projects listed on the on the Textex

front page who is using,

but usually

users don't reach out that much.

So I I encourage users of Textex who are listening to this podcast

to drop me a line what they are using Textechs for. I always like to hear about that. But from the other projects,

probably most interesting was a project done by several students. It's a language for describing

guitar tablatures.

They call the project PyTabs. It's on the GitHub,

and the guitar tablatures is a way to,

to write

the pieces of music done for guitar but for folks that

probably didn't have for example formal education don't know to read notes. And it's very easy to understand format. It's it actually depicts the neck of the guitar in ASCII art, where you see

6

strings running horizontally.

And on each string there is a number that says,

it's a fret you have to press when you play that note. And

they managed to parse that with text text, and the grammar is actually very elegant. And if you think about it, it's like a 2 dimensional language. You know, you have not only horizontal you you're not only parsing horizontally, but vertically also because you have to correlate the different strings at the same place.

And the interpreter for that language will play the music.

So they designed language with the semantics of playing the music that is described by the user. So it's, for example, for me, was very

innovative and interesting way of using TextX.

Yeah. That's really cool.

And as far as your experience of building TextX and maintaining it and continuing to use it as a teaching tool, what have you found to be some of the most interesting or unexpected or challenging lessons that you've learned in the process?

Well,

maintenance,

of open source projects in general, I I learned is time consuming and not very easy to do, especially when the projects start to get some traction.

So,

when you're maintaining project,

you have

a lot of, a lot of work to do around just organizing stuff,

making sure every issue is commented on and every pull request is reviewed and etcetera etcetera.

And release process is done properly with the right versioning and stuff like that. So since many ops open source projects are done on a voluntary basis in a free time and, it's something hard to do. So,

in I think it was

a year and a half ago,

I got a I got a really great contribution for

Pierre Berl.

It was implementation

of

custom scoping support because Textex

had always this reference resolving thing

but reference is

let me quickly just describe what it is. For example when you are parsing something, At some place you have here I want to match the name

of some object I defined somewhere else

and the text text in that place will resolve that to a proper Python reference.

So you don't have to do that yourself.

That's why you end up with a graph of Python object not a tree.

And the scoping was done by using a global scope. So the text text in the older version or by default will search for that kind of object, that type of object but globally. And that's not all what you always want.

So

peer done a support for custom scopes. You can define

the custom scope provider that will, where you can in Python define actual

algorithm how that object is to be found. And another

piece of that pull request was support for multi model and multi methanol. So you can for example have several grammars different

and you can build model that

can reference

things from other model in other languages.

You can even reference things that are outside of text text. For example, you can reference specific node in a JSON file or a specific node in a XML file. And that support is really cool and,

he did that

a year and a half ago, send a pull request and we have really great collaboration on that pull request. And when we merged that to master, I asked Pierre to to join the project to help him maintaining, and I'm really happy he accepted. So he is now co maintaining project with me. And then it's much easier when you have someone that is a co maintainer

with you, because we can discuss this design decision and sometimes I don't have time to look at some pull request or some issue. Sometimes peer don't have time. So it's much easier when there are more

people. And 1 of the other interesting things that I found out while I was doing the research for this

conversation is that in addition to TextX and the arpeggio parser that it's using, you've also built another parser using a different type of grammar support called par glare. I'm wondering what your motivation was for creating that and just some of the ways that your experience

in Arpeggio and TextX

fed into the work you did there and some of the ways that the work you're doing on Parglare has informed decisions about how you approach things with a TextX and Arpeggio? Well, it's it's it's actually started as,

as

problems

in PEG parsing, I realized at that time. So for example, 1 of the problem I already

said about it, it's that unambiguous

parser well, parsing

which is not what you always want. There are hidden ambiguities. The other thing, for example, generally related to all top down parser is they generally don't accept left left recursion,

left recursive rules.

And sometimes

grammar is most naturally described by using, left recursive rules. For example, if you're building something that is heavily expression oriented, it's much easier to

encode it naturally.

For example, expression. If expression is if you're building expression for arithmetic

operations, you you can easily say expression is expression, plus expression or expression expression, minus expression, and so on. With, top down

parsing you must avoid the left recursion so you encode,

those rules differently which is not very natural. So I wanted to experiment with the other parsing approach. My idea was to offer additional backend for TextText instead of arpeggio to,

for example as an option you could plug in some other parser. So I built a paraglider

to experiment with lr parsers, bottom up parsers. And especially I was interested in general,

in general parsing so the paraglider also implements glr parsing. And later on I realized that

trying to put

2 different parsing styles in a TextX project would be very complicated.

So I decided not to do that but but the par glare project itself was developing

quite nicely and I really liked the the some of the results I got with, especially glr parsing.

So

and I like the way you can use actually context free grammars

to declaratively

express your language. The some learned

lesson

I get from the working on paraglider is that

sometimes

it's

really

nice to have something easy to start with like back parser,

especially for,

students that are root,

learning parsing.

But sometimes you need more power for a power, like,

bottom up parser with declarative specification of language

and with full general parsing like GLR which can which can accept any context free grammar, even ambiguous 1, and can, in case of ambiguity can produce arse forest. So all the possible solution for your input. And that's especially, for example, important for natural language

processing where you,

by default have ambiguity in in your language. And then for people who are making the decision of what to use, what are the cases where TextX is the wrong choice and they might be better served by either using a different DSL library or

using a simple parsing library and then doing the manual resolution of how that logic is supposed to play or just using a simple regex for maybe the smaller cases? Well,

if you're trying to parse something more complex, it's generally not

a very wise choice to just use a regexes. So

generally I

recommend

always use some parsing library because even if you think you can easily handcraft your parser there are all sorts of edge cases that are already built in into the parsing library and there is good error support and stuff like that. But

handcrafting parser can give you additional control. So if you want if you want a real total control over the parsing process

or for example if you want to learn parsing in-depth then you can go, with handcrafted parser. And for different libraries, well, text text is not a great choice if you really want to influence the outcome

what are you transforming your input to or for example if you want to

get the best possible runtime performance

or for example if you want to parse a stream

of

of tokens as as they arrive,

then Textex is not suitable suitable parser for that.

Or for example if your input is

ambiguous

naturally like for example it's a natural language or some language that is ambiguous, you cannot use back parser in general for that in that case. So generally

if you need

full control and you want to produce something that that does not correspond fully to your grammar let me give you an example. If you are building an again, expression language expression based language. For example, maybe you want to evaluate the expression on the fly. If you want that, that, then TextX is not a good choice because, you will always end up with a object graph and you will have to, transform that graph to the result of the expression. And as you continue

working with TextX and using it for your own purposes and for your teaching, what are some of the new capabilities or features or just overall improvements that you have planned for it or

associated projects that you have in mind to build? Well, first of all, 1 thing we discussed recently

was to to drop Python 2 support

from Textechs

and Arpeggio.

They're still compatible with Python 2 and, because of that we cannot move on with some Python 3 only stuff. For example, 1 of the thing I would really like to see, in TextExp is type hinting,

so we can

provide more more,

stricter check for types

in the library itself. And there is

also 1

bigger feature we were be we have been planning

maybe a year ago. It is a small DSL for custom

scoping providers.

That is the part that Pierre is working was working on but

you now describe the scope providers by Python functions

and the idea was to

create a small DSL where you can,

describe the scoping rule by very small and simple DSL that you can embed in the grammar itself. So so at the place of of using the reference,

you can write that expression

that,

tell TextExp how to resolve the reference. And there is even because we were discussing that on several issues,

we made in the wikis. There is

a TAP

1, is a text text enhancement proposal. So we collected all all the idea about that, DSL in that document. And

that's probably something

we should

work on in some in some future when we find some time. Well, for anybody who wants to get in touch with you or follow along with the work that you're doing or contribute to your work on TextX and your other libraries, I'll have you add your preferred contact information to the show notes. And with that, I'll move us into the picks. And this week, I'm going to choose the project we make Python style guide. It's a set of plugins for providing fairly strict linting of your projects, and I've started using that on some of the new,

projects I'm doing lately at my job. So been enjoying that. There are a few things that I've had to turn off, but all in all, it has been doing a good job of making sure to catch some of the silly errors that I add in as I'm working without me having to go through the whole, you know, reevaluate print loop, just being able to see in my editor that a mistake was made and fix it in that context. So I've been enjoying that and recommend it to anybody who's starting a new project or wants to add some new linting rules for their work. And with that, I'll pass it to you Igor. Well my pick would be something I rediscovered lately. It is called interactive fiction. It's a genre of

crossover

from literacy to the gaming.

I remember I played those games back in the 19th, but at the time they were called text adventures.

So recently

I tried to see what happened with that genre. I thought it was completely gone. It was very popular as in eighties by Infocom I think the company was called and it is nowadays it's not visible at least on the surface of the internet so I dig deeper and find out that the community around

interactive fiction is actually very alive and there is a site for example called interactive fiction database where you can find titles

that are published published

even in these years. There are outer actively working on new on new titles and there what is interesting there are,

altering tools that are actively developed. 1 is called TUDs and the other is called Inform 7 that is developed by Graha Nelson, a british mathematician.

And inform 7 is very interesting. It's a kind of DSL but based on a natural language so when you read the description of the game it's like you're reading some,

what you imagine would be

when somebody

would describe the language to you in plain english. And

it's here is a quote, little quote from

the inform 7, website.

It says it's a tool for writers intrigued by computing and computer programmers intrigued by writing.

Perhaps these are not so very different pursuits in their rewards and pleasures.

So it's very interesting. So I encourage anyone with interest in reading novels and solving puzzles to try out interactive interactive fiction. Yeah. There's another category of that as well with multi user interactive fiction, and I did an interview with the maintainer of a library called Evenia a while ago. So I'll add a link to that in the show notes as well. Oh, that sounds great.

So with that, I would like to thank you for taking the time today to join me and discuss the work that you've been doing with TextX

and Arpeggio and Parglare.

Definitely a very interesting problem domain and something that can provide a lot of utility to people who are struggling with trying to build their own DSLs or make Python work the way that they want it to syntactically.

So I appreciate all the time and effort you've put into that, and I hope you enjoy the rest of your day. Hey. It was my pleasure.

Thank you for listening. Don't forget to check out our other show, the Data Engineering Podcast at data engineering podcast.com

for the latest on modern data management.

And visit the site of pythonpodcast.com

to subscribe to the show, sign up for the mailing list, and read the show notes.

And if you've learned something or tried out a project from the show, then tell us about it. Email host at podcastinit.com

with your story.

To help other people find the show, please leave a review on Itunes and tell your friends and coworkers.

The Python Podcast.init

Summary

Announcements

Interview

Keep In Touch

Picks

Closing Announcements

Links

The Python Podcast.__init__