Ever needed to use multiple programming languages in the same project?
When working on a project with limited scope, like web serves API layer, single page web application or even the firmware of an embedded controller, you can stick to your selected language. In the worst case you will have to expose some kind of interface or API so your code can be integrated with other components.
However, when the scope of the project increase, this might no longer be valid. Ever heared the dreaded words “Great job for implementing the driver for this camera. Now I want to control it over the web” or “Man your web dashboard software is great. I want to monitor my automated greenhouse with it”?
Remember the horrible feeling after realizing that this new requirements increase the scope of your project in a way that your current technology stack cannot handle. And the client doesn’t seems to care. After all he doesn’t care about frameworks or computer languages. He mostly care about the results.
Depending on the team size and your position, the decision what to do may lay on your shoulders. So … what are the options?
In this article I will give you an overview of the most common ways to combine the strength of multiple programming languages.
There are few main ways to integrate multiple languages to work together:
This is a really popular architecture, used and preferred by many developers. The project is divided into few smaller projects (services). Each of these services is deployed independently and runs in a isolated process in the operating system. It communicates with the other services through a contract. A strict definitions of supported messages structures and data types.
A Microservice can expose multiple different functions. Each of them having specific arguments, return data type specification, and a messaging format (JSON, SOAP, custom messaging protocol). As long as the service implements the contract, it doesn’t matter what is the language that it is written in, nor the other technologies used.
When using the network to communicate, the services don’t care if they are deployed on single computer or on multiple machines. It really shines best when used on cloud environments.
Any functionality can be exposed as a service: “payment processing module”, “speech recognition module”, “key/value storage” and so on.
- The services are decoupled. you can add any implementation and use any language as long as you stay true to the contract They are easily scalable.
- You can run all your services on 1 computer, or split them into multiple machine.
- Without adding too much complexity you can run multiple instances of one service as a cluster. Using this, you can handle huge workloads.
Imagine a scenario where we have multiple nested service calls. Service 1 calls Service 2 and Service 2 in terms calls Service 3. We will have to wait a bit to get the end result of the call to service 1. This might be fine for applications that can afford to wait, but it is a deal breaker when we need ultra fast responses.
Since the network has a limitation in speed, in order to increase the throughput, you can use shared memory instead of network to communicate. This greatly increase the amount of data that can be transferred between the services. The limitation of this is that it only works if the processes are on the same machine.
And just to add some numbers to this. As part of one hobby project, I had to create a fast data transfer between processes that work on the same machine. With just a slightly optimized memory based communication library in Java I was able to achieve a transfer speed up to 6 millions messages per second. Each one of them 20 bytes of data that was serialized, transmitted and deserialized at the other end. This makes for around 120 million transferred bytes per second. Just for a simple comparison, my LAN card transfer data at speed up to 100Mbit/s which is around 12 million bytes.
To use the best of both worlds, sometimes we can use communication libraries. Some of them can be configured to use shared memory when communicating with processes on the same machine and network, when communicating with remote computer. This however will add more dependencies to your project.
In case ultra low latency in not one of the virtues of the project, I would go for the microservices option. However, latency is a sensitive topic when it comes to robotics and systems that have to work in “real-time”.
If we want to use external functions written in other languages we can call them from our code using libraries. There are a few alternatives as how this might happen:
c / c++ – The compilation of the system languages results in native code. This makes their integration easy as long as you have the correct header files and link your code with the external libraries. Most compilers usually have a way to define the dependencies of your code to the needed external libraries.
Some of the interpreted languages nowadays are compiled to an intermediate language. Then executed on virtual machines for performance reasons. If two languages can be compiled to the same intermediate language, this helps a lot when integrating them. An example for such intermediate format is the Java bytecode.
Except for Java, there are a few more programming languages that can be compiled to JVM bytecode. Examples of such are Scala and Python (the Jython project). You can easily use them together as they are compiled to the same format (a .class file)
Shared Libraries are libraries that can be linked to any program at run-time. They provide means to use code that can be loaded anywhere in the memory. Once loaded, the shared library code can be used by any number of programs.
Connecting system languages like c/c++ to a native shared library can be done easily when you have the library functions and structure definitions in header files. The compiler and linker just need to know what functions it is calling and the used data types and structures.
Connecting a interpreted language and a native one in the form of a shared library is a bit more difficult, but it can be achieved with using
A binding from a programming language to a library or operating system service can be described as a glue code that acts as a bridge between them. It allows to call a function from the desired library, directly from our code.
There are a lot of examples that can be seen in practice: Java bindings for OpenCV (computer vision library), Java bindings for Serial Port communication (in libraries such as JSSC). Even some languages and platforms rely on system code. Node.js and Python rely heavily on C/C++ libraries.
Since all of the code is run in a single process, the function execution doesn’t depend on switching to another process or network communication. With this the response time is reduced to only the time needed to run the actual code.
- It can only be used on a single machine.
- Provide additional dependencies (correct versions of the bindings for the corresponding version of the library).
- Less portable (libraries are platform dependant).
- Not all languages can be used to write libraries. (usually low level languages such as C/C++ are used to create the libraries and high level languages such as Java/Python are consuming them).
- Too much bindings can result in plunging the project into a dependency hell. A project is dependant on all of the libraries, all of the binding to the libraries and all of their dependencies. You can guess what happens when these dependencies become too much to handle.
- The glue code and the libraries are executed in the same process. So in order to run our project onto multiple machines, we have to rely on a different method for the inter machine communications.
I would only use a solution like this in a case when ultra low latency is a must.
We can combine the previous technologies in different ways, in order to meet our requirements. Just keep in mind that we get the best of both worlds, but we also have to beware from the worst of both worlds.
A tip for the data types
One tricky problem is the mapping of data types between different languages and platforms. When a symbolic message format such as JSON or XML is used, mapping data will hardly be a problem. That is not the case for binary messages or a shared memory structures. A single byte that is not aligned properly will compromise the entire message. When working with binary data, it is essential to take into account the binary data formats used by the platforms.
This was a brief overview on the most common ways to use multiple languages together. To put it in real action, you will have to read the specifications of your languages of choice and select the best approach.
With all being said, I am pretty eager to test a specific hybrid architecture in the next version of my robot. What is your next multi-language project?