Summary
Python and Java are two of the most popular programming languages in the world, and have both been around for over 20 years. In that time there have been numerous attempts to provide interoperability between them, with varying methods and levels of success. One such project is JPype, which allows you to use Java classes in your Python code. In this episode the current lead developer, Karl Nelson, explains why he chose it as his preferred tool for combining these ecosystems, how he and his team are using it, and when and how you might want to use it for your own projects. He also discusses the work he has done to enable use of JPype on Android, and what is in store for the future of the project. If you have ever wanted to use a library or module from Java, but the rest of your project is already in Python, then this episode is definitely worth a listen.
Announcements
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For more opportunities to stay up to date, gain new skills, and learn from your peers there are a growing number of virtual events that you can attend from the comfort and safety of your home. Go to pythonpodcast.com/conferences to check out the upcoming events being offered by our partners and get registered today!
- Your host as usual is Tobias Macey and today I’m interviewing Karl Nelson about JPype, a language bridge that lets you use Java classes in your Python programs
Interview
- Introductions
- How did you get introduced to Python?
- Can you start by giving an overview of what JPype is?
- What was your motivation for becoming such a regular contributor to the project?
- Why might someone want to be able to call into the Java ecosystem from a Python program?
- There have been a number of other projects aiming to combine the capabilities of Java and Python, such as Jython and PyJNIus. What are the relative tradeoffs between the different options?
- Many of those other projects have stalled or stopped altogether. What about JPype has allowed it to survive for so long?
- Can you explain how JPype is implemented?
- How has the design and implementation of the project evolved since it was first implemented?
- How do the relative language versions influence the compatibility of programs on either side of the bridge?
- What is involved in creating a project that uses JPype?
- How are dependencies, packaging, distribution, etc. handled across the Java and Python portions of the code?
- What are some of the ways that JPype can be used for Android applications?
- What are some of the sharp edges or pitfalls that users of JPype should be aware of?
- What are some of the most interesting, innovative, or unexpected ways that you have seen JPype used?
- What have you found to be the most interesting or challenging aspects of building JPype?
- When is JPype the wrong choice?
- What is in store for the future of the project?
Keep In Touch
Picks
- Tobias
- Karl
Links
- JPype
- Java
- Overview of Python to Java bridges
- Lawrence Livermore National Lab
- GTK–
- Gnome
- Perl
- C++
- Matlab
- Java Native Interface (JNI)
- SciPy
- NumPy
- Matplotlib
- Jython
- PyJNIus
- Py4J
- Jep
- Ruby
- Reflection
- Ivy
- Maven
- JDBC
- Kivy
- Android
- Python Slots
- PyPy
- Java ASM
- Arrow Columnar Memory Format
- Protocol Buffers
- GraalVM
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to podcast dot in it. The podcast about Python and the people who make it great. When you're ready to launch your next app or want to try out a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode. With the launch of their managed Kubernetes next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40 gigabit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to python podcast.com/linode today. That's l I n o d e, and get a $60 credit to try out our Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show. You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis.
For more opportunities to stay up to date, gain new skills, and learn from your peers, there are a growing number of virtual events that you can attend from the comfort and safety of your own home. Go to python podcast.com/conferences
[00:01:12] Unknown:
to check out the upcoming events being offered by our partners and get registered today. Your host as usual is Tobias Macy. And today, I'm interviewing Carl Nelson about Jpipe, a language bridge that lets you use Java classes in your Python programs. So Carl, can you start by introducing yourself?
[00:01:27] Unknown:
My name is Carl Nelson. I'm a doctorate of electrical engineering from University of California. I'm currently working as a senior scientist at Lawrence Livermore National Lab. I was 1 of the contributors to the gnome project and the author of GTK Minus Minus. I am known by my handle, Framios, which happens to also be my League of Legends hand.
[00:01:52] Unknown:
Yeah. It's always funny how the nom de plume in Internet communities ends up sticking with you for a long time.
[00:01:58] Unknown:
Yes. It is.
[00:02:00] Unknown:
And so do you remember how you first got introduced to Python?
[00:02:03] Unknown:
Well, my introduction to Python is really part of a love hate relationship. I find that Python language doesn't really fit my programming style. So to be honest, I've been avoiding Python for many, many years. I was mainly a c plus plus programmer who used Pearl for scripting and MATLAB for prototyping. But eventually, I found that the overhead in terms of memory management for c plus programming was so much that as I drifted away from access to the hardware, I decided that I should start working more in the Java side. So I converted all of my algorithm code that I develop at work into Java, and I used MATLAB primarily as a scripting language that would do both the prototyping and calling the Java code as part of my test frameworks.
However, 1 day, my sponsor called up and objected to the licensing cost for MATLAB that they had to, of course, have many seats of MATLAB to service the software that I was delivering, and so I was asked to find an alternative. Most of the local users said that they wanted to use Python. So I basically pulled up and evaluated a whole bunch of bridge codes to try and make it so that we could use Python with our Java production libraries. 1 of them looked reasonably just started diving in and trying to alter anything that where there were rough edges or problems until I got rid of all of the major seg faults. So I kind of view myself sort of like the mechanic that doesn't know how to drive because my actual Python skills are very poor.
[00:03:56] Unknown:
And so now you're the primary contributor to the jpipe project, which has been around for quite some time, actually. And you mentioned that you had evaluated a number of the other bridge options for working between Python and Java. But I'm wondering if you can just start by giving a bit more an overview about what the Jpy project is and what motivated you to become such a regular contributor to it.
[00:04:19] Unknown:
Jpype is a module in Python, which is the intent of exposing all Java packages as Python modules. It does this by using reflection through the Java native interface to basically find out whatever the capabilities are of a Java class and reexport them into the dictionary based Python world. The whole philosophy is to be able to directly cut and paste Java code into the Python language and just make a few syntax changes in order to basically execute the same code that would have been done within Java. Its primary use is to for scientific engineering codes, where the code is already written in Java and you wish to exercise it within Python, but it's found many other applications in terms of use.
[00:05:11] Unknown:
As far as the use case, you mentioned it's used heavily in the scientific community. I'm wondering if you can dig a bit more into what the benefits are of being able to write Java code within a Python context and be able to call between those 2 different ecosystems.
[00:05:31] Unknown:
For me, this really comes as a development tool for Java. The way that I start all of my programming tests is that I want to develop something that is gonna be used as production Java code that is gonna be shipped out to 1 of our users. And so what I do is I take that idea, and I use scipy and NumPy basically to flesh out and develop a prototype. Once I have that prototype, I then go and create a whole test bench within Python in order to exercise all of the aspects of that Java code that I wish to develop. I then take that code and then pull it back and create all of the classes and the framework within the Java system, filling out the stubs using the ability then build proxies, which call the portions of the Python code that I'm eventually going to make use of.
Then once I have all of the pieces in, I can then use the exact same Python test bench that I originally started, which really gets me to my production Java code. I find this is a really great advantage in my development cycle because although Python is a good prototyping language, the half life of Python code, I find to be very, very short, especially if it's being worked in active development. And so getting the benefits of having both a prototyping language and a strongly typed language working side by side is a very valuable
[00:07:08] Unknown:
tool. In the use case for jpipe, the calling direction is from the Python code into the Java. But does it also allow for embedding Python within the Java runtime so that you can go the other direction?
[00:07:22] Unknown:
Yes. It does. So, basically, any interface within Java can be implemented by a Python code. It currently doesn't support the capability of actually extending a class if it's a already basically implemented in Java, but you can take an interface and implement it in Python. And so I often use this for auditing. So I wanna do a graph in the middle of my Java code, so I'll set up a Python framework that calls the Java code. I'll put an audit interface within Java and says, call this hook, and that hook then gets implemented within the Python code.
It then stops right there, plots a graph using matplotlib, and then I can basically audit and work with Python as a development tool for Java.
[00:08:18] Unknown:
There are a number of other projects that are aiming to be able to combine the capabilities of both Java and Python with the most notable being Jython, where it's actually an implementation of Python in the Java runtime. Then there are also projects such as Pygenius, there's Jpipe, and there are a number of others. And I'm curious what you have found to be the relative trade offs between the different options and what it was about Jpipe that made you choose that to be able to actually dedicate your time and energy into for being able to use it for your work.
[00:08:57] Unknown:
I downloaded basically all of them and sort of worked out a table of which ones I liked and which ones I didn't. I had a number of requirements, which is, first of all, my group has entirely made physicists who programming is not exactly the strongest suit. They are really great at the concepts, but not really good at the software architecture. So I needed to find something that gave that native Python look and feel, while the same time allowed them to exercise everything that they had within that Java world. So given those constraints, I also had to find something that was forward looking because we're gonna be using software and developing it not for just 1 or 2 years, but, you know, 10 or 15 years at a time.
And so I decided at the time that I really had to find something that would work with Python 3. So of the options that I evaluated, obviously, Jython is 1 of the top contenders. It takes the approach of rewriting all of Python into Java, which has the really clear advantage that if you want to embed Python in a Java program and ship it as an applet or something like that, then it would be very strong and very capable. Unfortunately, this also introduces a large number of pitfalls. First of all, the Jython approach of trying to pack Python, which is famous for chewing through objects with almost no regard at all, doesn't really work well in a virtual machine that only recycles objects with the global garbage collector.
This really causes a huge speed and performance hit that forced it so that the average CPython developer never would consider the Java Virtual Machine implementation as being a good alternative. Also, because they tried to implement purely in Java, they also sacrificed the CPython API, which is the greatest value for the scientific computing like I'm using. So sacrificing the thing that plays the enormous role in the success of Python is a huge negative, at least for my users. So with a huge task and not enough programmers to really pull it off, Jython has already been really stuck behind the other alternatives. So that left them with basically Python 2 support only.
I do understand that they're working to reboot the whole project from scratch, and I'm really looking forward to seeing that. As for PyGenius, it's primarily developed for the Android platform, and it appears to be a side project from a much larger effort of porting Python onto the Android system. I've worked with the guys in the past, and they are very dedicated. But just answering all of the questions on their support forums for how do you do stuff on Android means that they don't have a lot of free energy for getting down and delving into how do you deal with all of the interactions between the Python and Java virtual machine.
So when I got started working with jpipey and evaluated, I went and did a functional test of PyGenius, and I hit a number of segmentation faults. And those sort of segmentation faults would, of course, cause my physicist user group to quickly give up and have and declare that this has problems. So I decided that that really wasn't the best choice. As far as PyGenius, it doesn't expose anything beyond classes, fields, and methods, and it doesn't really have integration into arrays and buffers, which are critical to being able to get high performance scientific code working. The 1 alternative that you didn't mention is, of course, p y 4 j, which is another good alternative to Jpipy.
Just like Jpipy, they provide the whole Java environment in Python, but they do so using sockets rather than going through the Java native interface. This has the advantage, of course, that p y 4 j can attach itself to Java and then detach as needed, or it can be attached to multiple copies of the Java virtual machine if this is necessary. Of course, the big disadvantage of p y 4 j is that if you look at programming it, it's gonna look a lot like an RPC language where you have to set up this bridge code known as a gateway, pass instructions across it, and then get things back, which again isn't going to provide that high level of integration that I was looking for in my user base.
[00:13:55] Unknown:
Jpipe in particular has been around for quite a while. When I was looking into the project and preparing for the show, I noticed that the initial work dates all the way back from 2, 000 5 when it was on SourceForge, and it has since migrated onto GitHub. And with you as the lead contributor, it actually has a fair amount of activity on it. I'm curious what it is about Jpipe that has allowed it to outlast so many other projects that are trying to achieve a similar goal.
[00:14:22] Unknown:
As I view it, Java libraries being accessed from Python is sort of a niche area. It's certainly, very prevalent in the scientific and in engineering world. However, being a niche area, this kind of limits the number of open source developers that can do this sort of thing, and only those people that really had to do serious work in both environments would even consider ending up as developers. The other thing that is a huge barrier to getting into developing a bridge code is that it requires you to have a very high level of fluency in many different APIs and programming schemes. So although Python being 1 of my weakest languages, there's a lot of things that have to go into the c, the c plus plus Java, and understanding the JNI, all the way down into that virtual machine that really provides a limiting factor for getting into that sort of development.
So as near as I can tell, jpipe is probably 1 of the first out of the gate, but there have been numerous efforts made by scientific organizations to try and continue to address this need because there've been all of the ones previous have been pretty deficient. For example, there is 1 called JEP, which was developed basically to do the reverse where you're trying to be able to call Python from within the Java environment. But like all of these projects, they start out with some initial amount of funding and initial amount of effort, and, eventually, they just don't have enough programmers that can maintain interest.
Technologically, trying to merge 2 virtual machines together, without building a dedicated virtual machine like Grails VM is an enormous task. And, therefore, all of these other projects basically get to alpha quality, where they're capable of doing a reasonable amount of things, but there's so many rough edges where you're going to fall off into memory leaks or into segmentation faults. And as both Python or Java are very active communities, eventually, any project that is not being maintained at a pretty high level is just going to fall behind and become rather useless. The key advantage of Jpipy is that it is pretty darn small in scope.
The whole API is less than 20 primitive classes and about the same number of derived classes and support functions, but it takes close to 1, 500 different unit tests just to exercise all of the behaviors because you have so much different behaviors that exist behind the scene with the interactions with Java. So Jpipe being limited in scope, not really having the ability to connect and disconnect the JVM, and already having an extensive testing framework, I had a lot of good material to build on. I would really like to thank the the constant pressure from my local users group at Lawrence Kniffermer National Lab that have really pushed me to try and bring this code up to production quality so that it can attract not just people in the scientific and engineering, but also people who would like to use Jpipe before other applications.
[00:17:56] Unknown:
Digging more into how Jpipe itself is implemented, I'm wondering if you can talk through some of the ways that it has evolved and in particular, some of the updates that you've had to make to be able to bring it into full support of Python 3 and more recent versions of Java?
[00:18:14] Unknown:
The original author basically laid out all of his thoughts in a blog post. So he wanted to construct this really large framework, which would be able to implement both bridges from Java into Python, but also into Ruby. So he abstracted the entire API, unifying Python, Ruby, and JNI. But this resulted in the code that was very, very large and exceptionally hard to debug. So it implemented all of the c Python layer using capsules, and then it limited the total communications through those capsules to only a very small number of calls that are passing back and forth between these interfaces below.
And since he said that he wasn't really interested in working in a c programming language while implementing a Python module, he did everything he could on the Python side, which meant that all of these interactions going through jni, which is already not the fastest of interface, while implementing them from within the Python native code. And so the result was really, really slow and nearly impossible for anyone to get under the hood. So my main contribution was not really working on the front end, but working solely on the back end. So I took all of the code, read through it, figured out what all of these different layers were. And for the most part, I was actually a negative contributor, meaning I ripped out more code than I added for about the first 2 years that I was working on the project.
Once I had ripped out all of that code and got down to the fundamentals of what was underneath, I could figure out all of the pieces and then hold back all of that interface and identified, here's all of the speed critical pieces that I could then move and create these primitives in CPython, which then could be used to implement. The most critical thing in doing all of that work was, of course, dropping the Python 2 interface. So the way that this operates is every time that you encounter a class, you're going to need to generate a dynamic Python class that's going to represent all of the things that are in that Java class that you're trying to expose.
And so what that's going to do is force you to get really deep into the Python object model. The way that they were doing it with the old Capsule system is they implemented everything in a way that both Python and 2 and Python 3 could create these objects, but it required 3 layers, Python object modeling just to be able to glue it all together. And so when we dropped that Python 2 support, I was then able to go and create all of the primitives that represent each of the Java objects and improve the speed of most operations by about a factor of 10 to 400, depending on which operation was being performed.
That also allowed me to go and add all of things like the direct buffering support, which allows you to directly interact between Java and scientific libraries like NumPy.
[00:21:49] Unknown:
And in terms of being able to keep the project up to date, how do the relative versions of the Python runtime and the Java runtime influence the amount of effort involved in keeping it up to date and what the compatibility matrix looks like?
[00:22:05] Unknown:
From the perspective of the user, there is no real difference for any version of Python after 3.6 nor any Java version after Java 8. Internally, though, it's a pretty massive challenge. So as I mentioned previously, Python 2 was a huge stumbling block for jpipy, and thus, I celebrated January 1, 2020 in a way that few other people would be able to understand. I've got to get all the way down deep, deep into the guts of the CPython system. And often I discover down at that level, there is no real integration for these internal private structures.
So I've got to basically do something that is only done in the native CPython API in order to get my work done. Let's take a typical example. Python lets you create an int type, and that int type has an API for creating a new int. When you're working in native Python, you can, of course, take and drive another class from the type int and add something to it. But when you get into the CPython API, you'll find that there's nothing like that at all. So when you want to create an int, what it does is it creates an int in of the original base type, then it copies that base type int into the derived type memory space, which is then going to take a whole bunch of additional time as well as creating additional objects, which is gonna really hit that object limitation.
And since I've gotta go down into the guts to create basically new types for all of the Python primitives to represent each of the corresponding primitives in Java, I've gotta go down there to the level being able to say, add a native stack frame into Python that really doesn't have any API at all. The same thing can happen in terms of Java. So I started with Java 7 when I took over the project, but since that time, Java 7, fortunately, has gone away to where Java 8 became the mainstream, and it, of course, had really the best at the time in terms of capabilities because they were introducing a lot of really important and vital functions.
But then they took a massive shift when they went over to the module system in Java 9. And so you have to span both the current last long term support, which is Java 8, and run out to Java 15. So the way this works is because a good portion of jpipy is actually written in Java is that, internally, we have a Java 8 library, which then calls using reflection to ask, is this Java class that's available in the later version of Java available? And if so, grab it, load it, and then use its methods only using their names. And this is, of course, not going to create the bastion of clarity that I really like in the software that I'm developing.
[00:25:27] Unknown:
In terms of the impedance mismatch between the Java object model and the Python object model, what are some of the difficult edge cases that you've had to work around or particular limitations that you just weren't able to overcome and you just have to call out to users of Jpipe that this particular operation isn't possible?
[00:25:48] Unknown:
So there are 2 places where there's a lot of difficulty. So 1 of them, of course, is the problem of being able to extend objects as well as starting and stopping the JVM. When you start the JVM, it's basically going to completely marry the Python virtual machine to the Java virtual machine, and it's going to create a whole bunch of basically pointers that are gonna be accessed between the 2. And so if you ever tell it to say stop the Java Virtual Machine and you wanna continue using Python, all of those pointers are gonna become stale. This often leads to, you know, pitfalls as far as the Java module failing to or the Python hitting things where they are no longer existing and having to go into error and fault handling routines.
Of course, the biggest difficulty with operating something where you've got these 2 virtual machines is gonna be memory management. Both Python and Java independently manage their memory spaces and are garbage collected languages, and neither knows about the other. So whenever you're working with a foreign language within an interface, you have to have some way of holding it alive for the purposes of garbage collection. So both Python and Java use basically a reference counting system. And so when you ask for a object to be held alive, you put a reference for it. This kind of becomes a problem though when you have a Python object, which is pointing to a Java object, which is in fact holding another Python object.
Because if you ever create something where it's a circular reference, those 2 languages both have references going forward, and you now have this irresolvable reference loop. You can, of course, do things like adding weak references, which both language support, but this doesn't actually hold things in memory, which can create other problems. And so the ultimate limitations of the jpipy will always be having 2 virtual machines means that a user, through a series of fairly simple calls, can create basically very bad memory loops.
[00:28:10] Unknown:
For somebody who is actually using Jpipe to create a project, I'm wondering if you can just talk through the overall work flow and the software life cycle for that project, particularly things like managing dependencies and packaging and distribution across those 2 different language communities?
[00:28:30] Unknown:
Jpype itself is a Python installable module that's available through PyPI or can be downloaded using distributions such as Anaconda. The only requirement that it has is that there'd be a working copy of the JRE, which is typically pointed to using the environment variable Java home. When distributing Jpip as a module or a module that's using Jpipy, I typically recommend that people use something like Ivy or Maven to pull in the Java library and include it in the Python package, which is going to then be shipped and installed into the site packages as part of the startup.
There are some restrictions that are gonna, of course, happen, which is often people write what I refer to as a heavy wrapper of a Java class. That's where somebody goes and implements a Python module, which just uses Jpype as the back end, and they export the interface in their own native Python format. But this, of course, has the problem that you can only start 1 JVM, and you can only start JVM's 1 time. And there's some restrictions about how, when you start the JVM, that all of the jars that have been loaded at that time have special preferences.
So to try and get around this restriction, I've been working on a custom class loader that should be able to make it so that you can actually load JARs after a JVM has already been started.
[00:30:08] Unknown:
Aside from things like the memory loops and some of the particular limitations of how and when to start the JVM, are there any other potential pitfalls or sharp edges that users of J Pipe should be aware of?
[00:30:22] Unknown:
No. I think that we've covered most of them on that front. The most common being people trying to start and stop the JVM and expecting objects to continue working past the lifespan is usually the most often 1 that we see.
[00:30:38] Unknown:
And as far as the actual uses of jpipe, I'm curious what you have found to be some of the most interesting or innovative or unexpected ways that it's been employed.
[00:30:49] Unknown:
On that, I'm not really a great expert. I do see that there's a lot of projects that use basically JDBC, as well as, of course, the number of scientific codes, engineering codes that made uses of it. Oddly, I do see a lot of works of East Asian chatbot client. Unfortunately, not being able to read east Asian languages, I've never really looked into see what they're doing. The 1 project that I'm most pleased to see is that a user took the jpipe API, and they built a stub generator that allows you to use Python IDEs that they can pull a Java package, expose the entire module to the IDE.
It was really that sort of contribution that inspired me to basically write a parser Java doc and turn it into Python docstrings so that we could make the concept of the lightweight wrapper in which the Java package is actually considered to be a Python module.
[00:31:55] Unknown:
Another major area where Java is popular is in the Android ecosystem, and I was noticing that Jpipe does also have capabilities for being able to interact with the Android ecosystem. So I'm wondering if you can talk through some of the ways that that manifests and what's involved in using Jpipe with Android to be able to take advantage of some of the capabilities of Python on that operating system.
[00:32:21] Unknown:
Jpipe just started working on the Android system. So I was working with the folks over in KIV and PyGenius. I will confess that after my initial evaluation of them about 3 years ago, I completely forgot that PY Genius existed. And so I was really shocked when I was sitting there looking for similar bridge codes and to find out how that they were working 3 years on. I discovered that it was actually an active project and that it was actually a big competitor of mine because people were trying to use scientific codes using the PyGenius system.
So realizing that we're basically 2 sets of developers attacking the same sort of platform, I decided that it would be a good idea to meet up with them and see if we can come up with a way to get J PIPE to cover their needs. So I worked out all the patterns, basically, to use their building system to create and demo that jpipy can work within the KIV remote shell. And then I turned all of that work over to the KIV developers to try and get it integrated back into their distribution. It's currently waiting for actions on their side, which as I mentioned before, they have a very large project they're trying to maintain. And so they haven't really gotten into that distribution so much.
But when it does get in place, it will be a drop in replacement for the current PyGenius. There are, of course, some minor differences in how the object conversion works because in jpipy, whenever you hand something to a method, it will always return a Java type, which basically duck types to the nearest thing in Python. In PyGenius, they're going to convert everything on the incoming, meaning you have to pass in a Python array, and you get a Python array out the other side, which leads to a lot of conversion overhead. But with jpipey on the Android platform, I think we're gonna be able to do basically all the same things that we do in terms of the scientific and direct memory and buffer transfers that we did in the jpipy on the regular PC and provide that same support on the Android.
[00:34:53] Unknown:
Yeah. I know that Pygenius has always been 1 of the sharp edges in the Kivy ecosystem for being able to build projects for Android. So it's so great that you've been able to help in that regard and take all the work that you've put into J Pipe and help to translate it for them to be able to take advantage of it and reduce some of the overall maintenance burden.
[00:35:14] Unknown:
That is exactly why, I thought that I could do a big contribution to that project.
[00:35:20] Unknown:
In terms of your experience of working with J Pipe and doing maintenance on it and helping your end users to leverage it for being able to call between the Python and Java ecosystems, I'm wondering what you have found to be some of the most interesting or challenging or unexpected aspects of that work.
[00:35:39] Unknown:
So, obviously, the most challenging thing that I have in terms of working with Jpipe is trying to work around the fact that neither Java nor Python have adequate hooks. So simple things like providing a closure slot when you pass an object through a native interface and getting back that in extra information provide is a real challenge to doing the sort of development that I've been doing. So you end up being forced to do suboptimal solutions like static variables and maps. But each of those different solutions creates its own problems in terms of things like thread handling and map cleanup.
So I often have to work many times taking a shot over and over and over in order to develop and come up with an elegant workaround for, these sort of internal processes. So I would really prefer, of course, that language developers would future proof their APIs so that developers like me don't have to be creative as I put it. But perhaps the biggest challenge that I had was just trying to basically add a Java slot to Python objects. So as you probably would are aware, whenever you work with Python, it has these concepts of dedicated slots so that you can basically add something through and get order 1 access to that whenever it's needed, especially if it's being used in something like a tight loop.
So I needed to add a Java slot to basically every single 1 of the Python primitive types, but they all have different memory layouts, and they don't allow you to arbitrarily just add a user slot on top of something that has a different memory layout. So Python does provide for the concept of multiple inheritance within the native Python interface, but that's really all just a trick because they've already dedicated a slot to be able to add a dictionary, and that dictionary is then the only thing that you can add to a class without causing an additional conflict.
So the way that I had to deal with that is I had to create a custom memory allocator that adds extra space after the end of every Python object that needs to also be shared with Java. And this, of course, took a 2 month long nightmare just resolving all of the edge cases of adding a new allocator into the Python system. It's unfortunately this limitation that has made it so that I can't get jpipey to work in the pypy environment because I've never managed to get that debugging of all of those edge cases for this extra memory that I need to add on the end.
Obviously, that's a very challenging problem and something that I'm very glad to have solved.
[00:38:37] Unknown:
Because of the fact that you're working at such a deep level with the Python language and runtime, I'm curious if there have been any opportunities for you to contribute changes upstream to the CPython runtime to help improve the overall capabilities of the language, particularly for the use cases that you are working on?
[00:38:58] Unknown:
Unfortunately, I don't really have a lot of visibility into the CPython development cycle. I do see the people's names and so forth, and I've tried to reach out on IRC and other communication channels, but it seems really opaque to me as a developer. So I end up writing out to user groups and and so forth, and I I never get anything back as far as how to solve or fix these problems. I also have a big difficulty that even if they did solve the problem for me today, I got to be able to work with much older versions of Python than what is currently in the development cycle. And so that means that I'm going to be forced to deal with the problem even if I did manage to get the developer support to add new hooks in tomorrow.
[00:39:50] Unknown:
For anybody who is considering using J pipe in their own projects, what are the cases where it's the wrong choice and they might be better served with 1 of the other language bridges or just going a completely different architectural route?
[00:40:03] Unknown:
JPIP, of course, is going to have 2 virtual machines. So this is twice the overhead, which is, of course, a big impediment in terms of speed and in terms of resources that you're going to end up using. So I would say any project that already has access to a high quality Python library that provides the same capabilities as the Java 1 that you're considering using, you'd be much better off avoiding adding Java in simply because it can be done. The other thing, of course, is that since you're going to be marrying 2 machines at the process level, you can't stop 1 without harming the other.
And so anything in which you want to use Java as a slave system rather than directly integrating them, then jpypey is probably not the best choice, and you'd probably should go with the p y 4 j. This, of course, leads to the most frequent problem that I get of having a project with this sort of long standing history is that we have a huge pile of bad reviews. The most common way that this comes about is that someone tries to use jpiping in a way that is not appropriate or directly called out in our limitations and that have been ignored.
And when they do so, they end up with not getting a very satisfactory result, and they feel the immediate need to go out and say, oh, you know, go use the other alternative because jpipe is just crap.
[00:41:38] Unknown:
And as you continue to work on jpipe and continue to support your users, what are some of the plans that you have for the future of the project, either in terms of quality improvements or new features or just paying down technical debt?
[00:41:52] Unknown:
I really have 2 different directions that I am pushing for than future versions of j piping. The first is to try and press forward on this heavy versus light wrapping. So the majority of the projects that put out Python modules are really heavy in that they go and they hide the use of jpypy, and they have to wrap a completely different API, which is, of course, costly in terms of effort. And these heavy wrappers tend to be both incompatible with each other as well as not nearly as complete as the underlying Java API that is already being provided. So what I'm really looking to do here is to add more customization to the process where you can actually store a Python customization class in the Java jar so that the Java jar becomes the actual Python distribution that allows you to use all of the different things within that Java package.
So, basically, each Python customer can say, do things like rename methods, change the method resolution order, or add new Python specific idioms into an existing Java class. What my hope is that this will allow for people to actually just distribute the light wrapper in which you just distribute Java JARs that are usable by both Java and Python developers without the need for rewriting the whole API. 2nd, I'm working on a complete reverse bridge in which you can allow Java to use Python as easily as it currently allows Python to use Java.
So to achieve this, I'm going to basically implement a code extractor, which pulls all of the Python protocols from the objects that it encounters and creates customized stub classes that exist within Java. There will then be a Java library that contains all these stubs. And whenever it encounters a Python class, it's going to create a mix in in the form of different Java interfaces that represent the capabilities of any individual object. It will then use Java ASM in order to dynamically create this new Java wrapper, which will be treated just like a native Java object and thus allow things like scipy and numpy to be used completely within the Java environment.
Obviously, this has a lot of work because we have concepts like keywords and the dynamic nature of Python that is hard to represent within the Java framework, but I'm confident that this can be achieved. I see this as being a way to get great dividends in the scientific community by being able to reuse and share code between the 2.
[00:44:49] Unknown:
In particular, the use case of being able to bridge in both directions for Java and things like SciPy and NumPy. I'm wondering if you have also looked into being able to integrate the use of the arrow format for easy exchange and interoperability of in memory data structures?
[00:45:10] Unknown:
I haven't looked so much at the Arrow format, though I have done some work with the Google protocol buffers, which I use very often for my inter process communication. But the key thing with, JPipy is it's going to be entirely written using the JNI interface. And JNI interface, as far as I know, it doesn't have a lot of ability to interact with the arrow.
[00:45:36] Unknown:
Are there any other aspects of the work that you've done on J pipe or the ways that it's Well, I
[00:45:50] Unknown:
Well, I would certainly like some help as far as, getting some of these hooks within the CPython interface, improved. It would make a great deal of difference both if there were hooks within Python or within the JVM itself. I am have not looked yet at GravelVM, and I understand that that may have provide a lot more of capabilities of mixing the 2 languages together, and I'm looking forward to seeing the progress that it makes.
[00:46:21] Unknown:
Well, for anybody who wants to get in touch with you or follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And so with that, I'll move us into the picks. And this week, I'm going to choose going out for a hike. It's just a great way to spend some time and relax and get some of the benefits of being in the outdoors. And for that, there are actually a couple of apps that I've been using for finding good hiking trails nearby. So 1 of them is AllTrails, and another 1 I just found is called the hiking project, which is nice because it allows for downloading the maps to be usable offline so that if you don't have cell service, you can still find your way around.
And so, yeah, definitely recommend checking those out and finding some time to get outdoors. And so with that, I'll pass it to you, Carl. Do you have any picks this week?
[00:47:06] Unknown:
I don't really have much in the way of that front. My only thing is I am, of course, an avid gamer, and I really enjoy fighting it out on summoner's rift, as I mentioned in the past. Unfortunately, I am once again stuck in silver, and that means that I will, of course, need some help. Now being like all good library maintainers, I only play support, and this means that, well, I'm stuck with the usual.
[00:47:34] Unknown:
Well, I appreciate you taking the time today to join me and discuss the work that you've been doing with J Pipe and the ways that it's being used. It's definitely a very interesting project, and it's great to see that it has managed to continue velocity and stay up to date. So I appreciate all of the time and effort you've put into that, and I hope you enjoy the rest of your day. Thank you very much.
[00:47:58] Unknown:
Thank you for listening. Don't forget to check out our other show, the Data Engineering podcast@dataengineeringpodcast.com for the latest on modern data management. And visit the site of pythonpodcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email host at podcastinit.com with your story. To help other people find the show, please leave a review on Itunes and tell your friends and coworkers.
Introduction and Guest Introduction
Carl Nelson's Journey with Python
Overview of Jpype
Comparing Language Bridges
Challenges and Evolution of Jpype
Using Jpype in Projects
Challenges in Development and Maintenance
Future Plans for Jpype
Integration with Android and Scientific Libraries
Closing Remarks and Picks