Programming Language transformation?

Instead of emphasizing the what, I want to emphasize the how part: how we feel while programming. That's Ruby's main difference from other language designs. I emphasize the feeling, in particular, how I feel using Ruby. I didn't work hard to make Ruby perfect for everyone, because you feel differently from me. No language can be perfect for everyone. I tried to make Ruby perfect for me, but maybe it's not perfect for you. The perfect language for Guido van Rossum is probably Python. -Matz.

Has anybody, then, made systems which might some day convert any language into any other language in a clean fashion, so that I can write in Alice ML and you can modify it in Java? Personally, I think it is obvious that even if such transmogrification were avaialble, it wouldn't always help a heck of a lot because the density of any given region of code can change like 100x. Not to mention that I guess any Turing-esque equivalency doesn't take into consideration the differences in runtime.

Another take on this: Why aren't there programming language generators / wizards which ask me a series of 20 questions ("do you prefer static or dynamic typing?" - i'd like to be able to answer 'both', of course) and then spit out a framework language (including debugger!) for me? (And under the covers everything gets converted to/from XML so we can individually put the curly braces - if any - wherever we prefer.)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

readability counts

We can write the programs that work as you say. I've seen a few, but they all universal write code that is not human readable. The computer can read them just fine, and in many cases the new program runs faster than the original (likely if the original is interpreted and translated to something compiled). The results are not readable enough that humans can easily modify the results.

If you are a good programmer, learning a new proramming language is a trivial task that will only take a few days. Dealing with the results of translated code (even to a language that you are a 10 year master in) will be a long exercise in frustration. Given how easy it is to learn a new language, our efforts are currently better spent learning the new language.

I think most people reading Lambda the Ultimate will agree that learning a language for the sake of learning a language is useful. (but of course Lambda the Ultimate is about new/different programming languages, so we are biased).

Re: learning a new language

I'd agree that learning a new language is great. The thoughts I had in mind were less about coddling people so they didn't have to learn new languages, and more about how we need to acknowledge subjectivity and personal preference, as Matz is saying. So I wouldn't want the translator to keep people limited, I'd personally want it so I can program in my favourite language and still be able to interact with others.

Also, I think that each programming language has its own gestalt, and while you can perhaps write a library function to do whatever you need it to do, that is possibly like putting pearls on swine - point being that it would be nice to be able to use any language based on what seems the most appropriate for the given task. I guess there are mix-up language environments where you can switch from one syntax and language to another (e.g.: JSP) to get the benefits of each at the appropriate time?

Re: JSP

Commenting on my own comment, for the record, I freaking hate JSP if only because there is no really good Emacs mode for it. (That's not the only reason, it suffers in many many other ways, but it is the first one that I hit on a daily basis.)

Intentional Programming

There was Microsoft's Intentional Programming which does something along the line of what you want though less ambitious. SCID is a similar thing that was thought up earlier.

Re: SCID

SCID looks neat, thanks for the link! (Although, at least at first blush it reads like a hyper IDE vs. something which can transmogrify between horribly different languages.)

Since this thread was bumped

Since this thread was bumped recently,

it might be worth noting that a tool like Phoenix could be used to transform between VB.NET and C#.

Translating between those two languages is actually very easy. They're so similar that in the usual case your code can be converted via .NET Reflector to the other seamlessly. Phoenix is just a generalization of this that in theory should allow you to customize more than just language mapping, such as customizing programmer syntax preferences such as bracing styles. I would have to dig up the Phoenix design docs, but I think such things were an original use case planned for Phoenix.

Not automated but...

It can't really be automated at this point, but there are efforts being done for more language interoperability. .Net is one of the examples, and a lot of people are now reusing the JVM in order to be "Java compatible".

However for both of theses you need to target either an high level language (by generating Java or C# sources) or directly the VM bytecode (JVM or CLR). That's why I wrote NekoVM which defines an easy-to-target intermediate language and provide the VM and the libraries to run it.

It should greatly ease the language compiler implementation by providing a common runtime. And if you write a Neko -> Your language converter, you will be able to reverse the process, and translate other languages targeting Neko back into yours.

Re: NekoVM

Neko has been on my list of interesting languages for a while. Many thanks for your work there and for the other code examples Motion-Twin has.

If you can translate, then it's probably the same language.

I think you can only do such translations when the languages are roughly isomorphic (i.e. fundamentally the same). There are surface-level features (like indentation syntax) that can be mixed and matched, but type systems seem like a much more complex beast.

For example "x.Run()" in Java is subtly different from "x.Run()" in Python. A strict semantics-preserving translation would produce weird code.

It might be possible to create a language X that can losslessly express all the semantics of a set of target languages Y. Then you can write code in X and translate it to any language in Y. But X is going to be a very tedious language to program in :)

Re: Semantics

It is an interesting point; as with any form of communication (natural language, or even logical proofs) an awful lot is left unspecified at the explicit textual level - people have to learn and know what the background semantics are.

It shows up with .Net in that while the languages can interact, and while the languages can be quite different (F#, C#, VB, managed C++) at some level they are forced into the exact same straight-jacket, hence the fact that VB .Net is very unlike VB Pre-.Net.

I have once asked the same question...

Why aren't there programming language generators

I once asked the same question in GoldParser forums. GoldParser is an excellent program for creating parse tables, but parsing is, in my opinion, 20% of a compiler. What I proposed to its author was to make GoldParser a 'programming language generator' which allowed the user to create, mix and match various compiler front ends and back ends. But that undertaking is difficult and time consuming and requires tremendous effort, something which is not possible with an one man team.

Far less than 20% in most cases

Even for a language with trivial semantics, such things as semantic analyis, code generation, optimization/code improvement, and/or interpretation/execution are far more difficult and interesting problems than the construction of a parser. While a parser may, in some degenerate case, compromise 20% of the code (here I mean the output of yacc or some such), parsing shouldn't require anywhere near 20% of the effort. Perhaps in a language which trivial semantics and complex syntax--but I'm not sure that we need languages of that sort. :)

(Ed: I keep typing Wiki markup for ''italics'' and '''bold''' into LtU... grrr. :)

Language representation and transformations

Something close to the above was the Phobos project I worked on at Caltech. In a multi-frontend compiler with various intermediate representations, Phobos allowed to create new "dynamic" frontends by letting the language-writer define a new representation into which your new custom syntax can be converted. It then allowed this representation to be transformed (via rewrite rules) into any other representation found in the compiler (various ASTs for the front-end languages, intermediate representations, the common functional representation, etc.), at which point the code generation could proceed using the infrastructure in the compiler.
One example was a UNITY compiler, which was mapped into C and the corresponding definition was about 1000 lines (including the syntax definition). At that point one could easily create another language based on UNITY in much less time that could be compiled to executable code (via the UNITY->C->ASM->CODE path).
There were a couple improvements to the term language used for generic representation since then, terms can be typed for instance, but the Mojave Compiler project eventually shifted to implementing compilers in the underlying formal environment (MetaPRL) instead of purely using that as a transformation mechanism.

Language interoperability is Hard

Has anybody, then, made systems which might some day convert any language into any other language in a clean fashion, so that I can write in Alice ML and you can modify it in Java?

No, and my personal belief is that what you suggest is not possible. I submit as evidence the fact that nobody has come up with a silver bullet for language interoperability, an easier problem that I also believe is fundamentally hard.

Some would point to .NET, and some truly good work has been done there. However, .NET's accomplishments in language interoperability are overstated. They did not take 10 languages and make them all work together nicely, rather, they defined 10 new languages that share a common data model.

The interaction between between some of the more interesting language features available today is fundamentally complicated. If a language supports lazy evaluation or dataflow concurrency, then calling this language from a strict langauge that supports mutation and has no concept of a partial data structure is quite difficult in the general case. Certainly, the calling code will be less-than-natural in the host language.

Programming language design is a hard, highly non-linear problem. A language is much more (and sometimes less) than the sum of it's parts. The first book I ever read on programming language implementation talked about call-by-value, call-by-reference, and call-by-name. The book's take on call-by-name is that it was tried in the 1970s, but it was dismissed as something that was hard to understand, nearly impossible to control, and not particularly useful.

Indeed, this take is correct... if you examine programs that depend on a particular evaluation order. However, in a purely functional program, the difference between call-by-value and call-by-reference disappear, and call-by-name becomes a well-behaved and rather useful construct.

So, for the sake of discussion, let's talk about a specific tool that allows you to edit a program in either Alice ML or Java. A simple edit in one language can turn into a complicated edit in the other, and Rice's theorem puts some draconian limitations on how well this can be ameliorated by a simplifier. Moreover, the Compiler Writer's Full Employment Theorem says that although there may be a limit to how good you can make the simplifier, you can always do better. My guess is that no matter how good you make the tool, by alternating edits in Java, then Alice, then Java..., you will be able to make your program arbitrarily obscure.

Speaking of Alice ML

I broached the subject of Oz / Alice interoperability not long ago. This was Andreas' response:

I'm afraid this won't ever work, at least not in the Oz->Alice direction. There is no way the Oz side could establish the typing guarantees assumed by the Alice side, unless Oz becomes typed itself (which we consider impractical - that's why we did Alice in the first place). Note in particular that take expects a package in Alice, which has to contain ML signatures and types.

This idea of having multi-language VM's with seamless interoperability is a frequent desire, and has been an almost as frequent promise in the past. However, our experience with (prototypically) implementing Alice on Mozart, the JVM, and the Dot-Net CLR, and Alice and Java on SEAM, as well as observing similar projects (e.g. MLj and SML.NET), has raised more than serious doubts that it will ever be achievable, at least as far as "seamlessness" is concerned. There just is too much impedance mismatch between languages. You have to fall back to the greatest common denominator for interoperation, which usually is quite small. Even interoperation between Alice and Oz on Mozart was far from seamless, as soon as you wanted to exchange non-trivial things.

I'm working on a related project

I'm working on a development methodology that uses grammars and allows to define transformation objects to perform grammar conversion.
There is a lot of things to do, but you may find it interesting.

ColonyDSL