How to respect language styles while translating?

There's a fair number of projects that take one highish-level language in and spit out another (often it seems to be JavaScript is the target, gosh i wonder why). But these translations often result in pretty non-human-readable output that doesn't respect the destination language's culture and commonly used style. That makes debugging harder, I think. So I wonder if there are any good approaches to bridging this gap? Ways of taking in e.g. Scheme and spitting out e.g. JavaScript that doesn't look horrible or confusing. (Presumably it depends on how far apart the two languages are in the first place.) Thanks for any thoughts.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.


That sounds to me kind of like "are there any good approaches to translating from idiomatic English to idiomatic German?"

Besides the obvious machine-translataion obstacles, one potential issue that occurs to me is translating to or from a "multiparadigm" language, even if that "multiparadigm-ness" is relatively mild.

In C#, for example, it's possible to write a nice OO program that drops into functional idioms in places. When translating this to a stricter OO language, the functional idioms can be expressed as OO constructs (which is what the C# compiler does when generating IL code); this can look "unnatural" in the target language, but what can you do?

Arbitrary translation vs. poetric restrictions

What you say rings true, apologies for asking a kind of silly question. :)

One can start with 2 arbitrary languages, or one could look at the destination language and think about what could map well to it and try to not stray too far outside that envelope. Take tail call optimization for example, and how various languages-on-the-JVM approach it. Of course that can be very sad: it means some of the reasons for using a different source language in the first place are lost; sort of like now isn't really VB any more?

Basically I'm always wondering what makes for a good high level language that can be ported to lots of other language systems w/out generating really nasty back-end output in those other languages? How can it be done w/out just being blub? Or how tainted (e.g. Scheme but w/out full TCO and w/out call-cc etc.) is one willing to get?

P.S. alternatively: forget about worrying about the output; treat it like machine code and create ways to have the debugging happen in the context of the source language: how to have an interactive source-line debugger?

interactive source-line debugger = decompiler + debugger?

.NET Reflector + Deblector is something like that. It's entertaining when you choose the "wrong" decompiler module for the assembly you're looking at, though.

Translating vs. Transforming

I am working on something of that sort, although Im restricting the "fullness" of the languages. Basically Im trying to create a backend (bytecode?) language that frontend languages gets translated to. The difference is that the backend should translate back into the frontends almost transparently (i.e. prettyfying them). So I wont be going with full blown versions of either frontend language, but something I call flavors of each - hopyfully easing entry for newcomers based on what langauge they come from.
The tool for translation was annouched here earlier on LtU.


Asides often do invite off-topic responses, so...

kind of like "are there any good approaches to translating from idiomatic English to idiomatic German?"

Well, yes. Hire two people, a translator and a copy-editor, both good at their job.

There's probably a snappy analogy to the question at hand, but it doesn't pop into my head.

- moved -

- moved -

dependent types would be nice

I've come across this problem from a few different angles and the solution that I keep coming back to (but is not well supported by languages) is dependent types. I am not sure if this is helpful in your context but what the back-end of my projects tend to look like is something like this:

Here I am making an arbitrary notation for dependent types that hopefully is intuitive. In practice I have always implemented these mechanisms manually.

String export(IfStatement node)
  return 'if(' + export(node.condition) + 
         '){' + export(node.ifBlock) +
         '}else{' + export(node.elseBlock) +
         '}' ;

String export(IfStatement node where node.elseBlock==null)
  return 'if(' + export(node.condition) +
         '){' + export(node.ifBlock) +
         '}' ;

Assuming multi-method like dispatch for these dependent-types the second function definition will fit for the special case if-statement, otherwise falling back to the more general form. This can be helpful to describe language styles, which generally take the form of special case exceptions to the more general syntactical form.

This may not be the best example, but I think it highlights how dependent types could be useful in these situations. However this approach can have major overhead depending on your implementation language, particularly if you are coming from a statically typed language; here I assumed that the argument type is known to be an IfStatement, however in reality the argument would often only be known to be of the more generic AST primitive type which would have many possible subtypes.

Schlep, JMacro

So I wonder if there are any good approaches to bridging this gap? Ways of taking in e.g. Scheme and spitting out e.g. JavaScript that doesn't look horrible or confusing.

One example of this kind of thing is Schlep, a Scheme-subset to C compiler by Aubrey Jaffer. Scheme for Software Engineering describes the rationale for it.

Presumably it depends on how far apart the two languages are in the first place.

Yes, which is why Schlep compiles a Scheme subset, instead of full Scheme.

Using Javascript as a target for a Schlep-like tool should provide some extra flexibility, since Javascript supports closures. But one problem with translating a language like Scheme is you'd have to compile away many of the macros, and that tends to leave something that looks much more low-level.

For another approach that also illustrates this point, there's a cool Haskell library JMacro, by Gershom Bazerman, which allows you to embed Javascript code directly in a Haskell program (here are some docs). It then generates actual Javascript from that. However, even though the original source code actually looks like Javascript, since JMacro has macro system features that support composing Javascript code (and inserting results of Haskell computations), the variables in the output code are renamed to avoid conflicts, which can complicate direct debugging of the Javascript code.

In short, translating between languages with different levels and dimensions of expressive power must involve tradeoffs somewhere.


those are very interesting, thanks for the pointers!

There are...

So I wonder if there are any good approaches to bridging this gap?

There are, but they usually aren't worth actually doing. Basically, when translating code mechanically, you figure out how to translate each phrase in the source language to a phrase in the target language, and put the results together compositionally. This is easy, but results in the kind of unnatural code you're talking about.

However, the kind of code people actually write takes advantage of source context to avoid redundant computations. So what you need to do is to write a translator that tracks not just the phrase you are translating, but also the program context into which it will be inserted, and analyze both the phrase and the context to determine what code to generate.

A simple and pretty example of this idea is in Damian and Danvy's paper Static Transition Compression, which uses this idea to translate the while-loop language into unstructured programs with gotos from , which avoids generating code with jumps to jumps, no redundant labels, and no unused labels.


(personally, i only experimented with it a while back, not recently.)

GWT apparently lets you debug the JavaScript by viewing the source Java that compiled down to it. Also, the compiler can spit out different versions of the code, some aimed at being easier to manually inspect.

It's hard to do well

"Translating" one high level language to another is very hard. One approach is to compile the source language into the target language, expanding all the idioms of the source language into lower level constructs when necessary. This results in programs that (usually) do the same computation as the original, but the code generated at each language mismatch is bulky.

Another alternative is to transliterate the constructs of one language into the roughly comparable constructs of another language. This preserves style at the expense of correctness.

Then there are annotation-based approaches, like GWT, where the human planning the translator has to provide hints of "what's really going on here".

Doing the job "right" requires enough static analysis to insure that quirks of the source language that aren't reflected in the target language are absent in the code being translated. This is usually not worth the massive amount of development required to build such a tool.