Transpiling a dynamically typed language to a statically typed language

Hi, I have been pondering what I want to do for my master
thesis (2+ years away) and I have an idea. I am writing to ask if
the question has a known answer or is even theoretically possible
to conduct.

My idea is to write a source-to-source compiler from a Lisp_2 to
Rust, but I do not know if it is possible to transpile from a
dynamic language to a statically typed language.

Below is a rough draft for the introductory chapter. Beware that
my writing style implies that I already know that producing such
a transpiler is possible, but I do not.

Kind regards,
Filip Allberg

A Tale of Two Languages

Imbuing Lisp with the thread-safety guarantees of Rust by using a source-to-source compiler.

Introduction

This short introductory chapter starts by presenting the question we are trying to answer, followed by a summary of our suggested solution that provides a definitive answer to the our question. We end this chapter ends with an overview of the subsequent chapters.

The question

By compiling a language X into a target language Y through the use of a source-to-source compiler (also known as a transpiler) the executable program (created by an additional compilation step) inherits certain qualities from the language Y, particularily its execution model.

Transpiling X to a target language Y may put restraints on the definition of X and possibly have a negative impact on the runtime environment of the final executable certain desireable traits may be inherited from Y, for an example cross-platform portability.

A commonly selected target language for a transpiler is the programming language "C". Since C is available on practically any machine, the C-code that is generated from a program written in X is likely to be portable as well.

Moreover, any optimizations that the C-compiler can achieve are automatically and implicitly available to the source language X. C-compilers excel at carrying out a great many optimizations with respect to allocating registers, laying out code, and choosing modes of address. As designers of the source language we are liberated of such issues.

In the case of C there are numerous pain-points such as pointers running amok and its lack of built-in garbage collection. If the source-language is a garbage-collected language, that is type-safe, such problems have to be addressed in the source-to-source compiler.

Furthermore there are certain limits imposed on our source language by the limitations of C has, such as a maximum of 32 arguments in function calls or less than 16 levels of lexical blocks.

Inherently, each source language presents its own set of advantages and negative consequences with respect to the source language.

The Rust programming language, which is a systems programming language that is a contender in the same domain as C and C++, has the desireable attribute of guaranteeing thread-safety and threads without dataraces.

The question then is, can a specific source language compiled into Rust be unequivocally determined to be thread-safe through the merits of the source-to-source compiler alone?

The solution and answer

In this text we present a source-to-source compiler from a Lisp2 dialect to Rust whereby we can, through the use of the Rust compiler rustc, compile the generated code. Using the generated programs, we can conclude, with the addition of formal theoretical deductions, that valid programs written in the aforementioned Lisp dialect are thread-safe and provides threads without data-races.

Overview

Since there is no additional body of work this subsection cannot be fully written, but a rough layout of the remainder of the text would probably look like:

Chapter 1. An overview of Lisp and a formal description of our dialect as well as an overview of our concurrency model (not sure if the latter is necessary)

Chapter 2. A precursory introduction to Rust with references to papers demonstrating the validity of the claims that Rust provides thread safety guarantees and data race guarantees as this is a corner stone to our work.

Chapter 3. Examples from concrete programs written in the Lisp dialect that will serve as benchmarks when evaluating the code generated by our compiler.

Chapter 4. An overview of the implementation of the compiler that also discuss alternative implementation choices and why they were not chosen

Chapter 5-(maybe several chapters): Explanations describing the parts of the different compilation phases. It is probable that we will encounter some automatas here.

Chapter 6: Code generation optimizations and translation strategies to portable Rust code.

Chapter 7: Evaluation of benchmarks

Chapter 8: Discussions of possible applications, such as tiering source-to-source compilers in a layered fashion to successively imbue desireable traits from each language in the compilation sequence to the top-most source language.

Appendices: Complete source code for the compiler

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Queinnec's Lisp In Small Pieces

I strongly recommend reading Queinnec's Lisp In Small Pieces book. See https://pages.lip6.fr/Christian.Queinnec/WWW/LiSP.html

It is quite similar to what you want to achieve, handling both interpreters and compilers of Lisp-like languages. It explains compilers to bytecode, and compilers to C.

I guess you would like to define the multi-threaded aspects of your Lisp. Look also at the garbage collection aspects (read the GC handbook) and study if and how you can handle tail calls.

Don't forget to study the semantics side. Read notably the formal semantics in R5RS

Perhaps my GCC MELT Lisp-y dialect might be inspirational. Read my DSL2011 paper on MELT, a Translated Domain Specific Language Embedded in the GCC Compiler to find some tricks to embed chunks of low-level languages (Rust in your case) in high-level code (Lisp).

Look also into Chicken Scheme, Bigloo, Scheme48, Stalin as examples of Scheme to C compilers.

Depends

it Depends on Rust capabilities. If Rust supports pointers and particular kind of type casting, then it's possible. Also, depending on Rust capabilites, There might be other options too.

My advice is firstly, without worrying about transpiler, to try to define a variable in Rust that in different execution times can hold different types of data. Opposite to pointers and casting, there is an idea to have a variable as a struct of different possible values, each with its own type. In the runtime you can use only a variable with a type you want. But you have to check this method with a Rust manual. Another option is to have N different arrays, each for its own variable type. You dynamically pick an array at runtime to change a type of a variable.

As a last option, if evrything else fails, you can find another transpiling target. I don't know if this helps, but my estimation is that C (statically typed language) could hold transpiled javascript (dynamically typed language) if you are not worried about performance.