LtU Forum

Extended Axiomatic Language

Axiomatic language is a formal system for specifying recursively enumerable sets of hierarchical symbolic expressions. But axiomatic language does not have negation. Extended axiomatic language is based on the idea that when one specifies a recursively enumerable set, one is simultaneously specifying the complement of that set (which may not be recursively enumerable). This complement set can be useful for specification. Extended axiomatic language makes use of this complement set and can be considered a form of logic programming negation. The web page defines the language and gives examples.

Seeking artricle on syntax sugar and comparing programming languages

I recall there was an article on LtU that defined term syntax sugar. IIRC the generic idea was that syntax sugar requires only the local transformation. And language features that require non-local program rewrites cannot be called syntax sugar and it can be core of argument that one language is higher-level then another.

Optimisation by repeated beta- and eta-reduction

The following post recently showed up on Planet Haskell: Morte: an intermediate language for super-optimising functional programs. From the post:

Now suppose there were a hypothetical language with a stronger guarantee: if two programs are equal then they generate identical executables. Such a language would be immune to abstraction: no matter how many layers of indirection you might add the binary size and runtime performance would be unaffected.

Here I will introduce such an intermediate language named Morte that obeys this stronger guarantee.

The typed lambda calculus possesses a useful property: every term in the lambda calculus has a unique normal form if you beta-reduce everything. If you're new to lambda calculus, normalizing an expression equates to indiscriminately inlining every function call.

I am worried about this because the author explicitly wishes to support both folds and unfolds, and, according to Catamorphisms and anamorphisms = general or primitive recursion, folds and unfolds together have the expressive power of general recursion—so that not every term has a normal form (right?). More generally, it seems to me that being able to offer the strong guarantee that the author offers implies in particular a solution to the halting problem, hence a non-Turing-complete language.

Later, the author says:

You can take any recursive data type and mechanically transform the type into a fold and transform functions on the type into functions on folds.
I have a similar concern here; it seems to me to be saying that folds can express general recursion, but I thought (though I don't have a reference) that they could express only primitive recursion.

Have I got something badly conceptually wrong?

Re-thinking Prolog

A recent paper by Oleg Kiselyov and Yukiyoshi Kameyama at the university of Tsukuba discusses weaknesses and areas for improvement to Prolog.

Quite many computations and models are mostly deterministic. Implementing them in Prolog with any acceptable performance requires the extensive use of problematic features such as cut. Purity is also compromised when interfacing with mainstream language libraries, which are deterministic and cannot run backwards. Divergence is the constant threat, forcing the Prolog programmers to forsake the declarative specification and program directly against the search strategy. All in all, Classical Prolog is the exquisite square peg in the world with mostly round holes

The strong points of Prolog can be brought into an ordinary functional programming language. Using OCaml as a representative, we implement lazy guessing as a library, with which we reproduce classical Prolog examples. Furthermore, we demonstrate parser combinators that use committed choice (maximal munch) and can still be run forwards and backwards. They cannot be written in Classical Prolog. Logic variables, unification, and its WAM compilation strategy naturally emerge as a "mere optimization" of the Herbrand universe enumeration.

The paper mentions the strength of the approach used by miniKanren (which embeds logic programming with fairer search strategy than normal Prolog into Scheme) and Hansei (which embeds probability based nondeterminism into Ocaml using delimited continuations to allow direct-style expression of monadic code).

After motivating some choices by studying the prototypical example of running append backwards they cover running parsers with "maximal munch" rule backwards - something that cannot be (declaratively) expressed in prolog.

A very interesting paper on logic programming! It also thanks Tom Schrijvers of CHR fame at the end.

Request For Advice and Guidance On Writing a Scheme To C Compiler?

Dear Lambda The Ultimate,

I've been working very hard recently on writing something I thought would be entirely straightforward: A scheme to C compiler that bootstraps - nothing I do seems to work. On my first attempt (which was able to compile and run its own parser) there was no GC or tail calls so it demanded >8GB of memory and failed, on my next the emitted code blows up so huge that it grinds to a halt. I hope someone could give me some advice on what I should do or what I need to learn to complete this? It would also be interesting to discuss and read about other members experiences.

Starting

I've been interested in programming languages since I started programming a long time ago, and have explored and written parsers, type checkers and interpreters for a fair selection of different types of programming languages like lisp (of course!), prolog (and minikanren), a cut down version of haskell with hindly milnor type checking as well as some unusual things like linear and reversible programming languages.

A long time ago I read Marc Feeley - 90 Minute Scheme to C compiler when it came up on this site and it was fascinating to understand (or so I thought) how a scheme compiler could be done, in particular using the CPS transform to deal with tail calls and call-with-current-continuation. This got me interested in continuations and I studied Appel - Compiling with Continuations but I didn't write a compiler back then.

Continuations

Since then I've read about Abstracting and Representing Control from Olivier Danvy and written metacircular interpreters that emit continuation semantics (pure lambda terms without any control operators). I also came to prefer delimited continuations rather than call-with-current-continuation and decided that I would like to implement those directly in my own compiler. I noticed that in Marc Feeley's compiler he was able to just use the continuation semantics of call-with-current-continuation as is because it's already in CPS form, on the other hand the continuation semantics for shift and reset are not in CPS form so this technique cannot be used: one would have to CPS transform twice to get the necessary metacontinuations - but this also causes a huge blowup in the number of lambda terms. So I studied other approaches to direct implementation of shift/reset:

In the end I felt like building a runtime for my compiler to target (in C) that handled delimited continuations was something I could add later on but for now I should forget about that (and if it was require in bootstrapping I had an easy way to interpret them), so I just used Matt Might's very clear and simple explanation of hybrid higher order CPS transform How to Compile with Continuations.

Getting Started

When the got the idea to write a self hosting scheme compiler I quickly wrote out a parser that's able to parse as much of scheme as I needed as well as the file itself. Then I initially came up with something very simple that just performed closure conversion: It was a surprise to me that this was basically enough to execute lambda using C procedures for code and vectors for their environment - it just lacked garbage collection and tail call elimination. I pushed on with this hoping lack of optimisations wouldn't be an issue ending up with a system that had one compiler pass per file as well, some utility functions and a script that chained them all together. This quite successfully compiles all the small simple scheme programs I gave to it (getting a program with the Y combinator to compile and run for the first time was very exciting!) but it just isn't good enough to bootstrap.

So I looked back a bit and read over Guy Steele's RABBIT paper and decided to start again from scratch. This time I started with a C runtime that had a simple two space copy and collect GC which uses the stack as the root - the stack itself is executed by a trampoline (a while loop that pops a continuation off the stack and calls it) and the continuations pop arguments off the stack, compute something with them then push a continuation onto the stack and return. I wrote some tests for the runtime: for example compiling recursive and iterative factorial functions by hand. The value of this approach is that you get tail call optimisation "for free" it only requires doing a CPS transform in the compiler.

So the overall architecture of the compiler is now:

  • Parse a scheme file to get the s-expression tree
  • "desugaring" - I have looked into syntactic closures and I think that I would like to build a good macro system out of this in future but to start with I decided to just hard code expansions for all the special forms.
  • mutability analysis - since I use what I found out later is called "flat-closures" I had to box variables that SET! is used on
  • continuation passing style transform - this makes every call a tail call
  • lambda lifting - I added this optimisation at the end but it didn't help at all [it looks for in place lambda applications like ((lambda (var ..) body ...) param ...) and adds its free variables as extra parameters - this means that it can stack allocate them instead of heap allocate then]
  • closure conversion - this removes all free-variables from the code by creating closures (code & environment vector pairs that holds the closed over variables)
  • hoisting - I give every lambda a name and move it to the top level since the code part of each closure will be implemented as a C function
  • c-gen - this translates continuation passing procedure calls into stack machine operations that my runtime can execute, it also handles taking things off the stack to allocate them on the heap

after that actual C source code is emitted and some simple programs like an infinite lazy stream of fibonacci numbers do execute (with the correct space usage i.e. TCO and GC are working) - but when I started to head towards bootstrapping just adding a recursive definition of the EQUAL? function blow up horrible creating 40k lines of C.

Looking for more to read

I've looked for more modern things to read that might help me build a self hosting scheme compiler like:

and tried to find short readable implementations (studying real world compilers is just too hard since they are so complex and enormous)

  • http://canonical.org/~kragen/sw/urscheme/
  • https://github.com/darius/ichbins/
  • https://bitbucket.org/bunny351/bones/src

but all in all I just don't know what I'm missing - I don't want to give up though because I think there's an important difference between understanding something and actually doing it. In particular I feel like I can't progress in my study of programming languages until I pass this block (I want to experiment with building compilers using Futamura projections like PyPy does after I complete this for one thing).

Higher abstraction through NLP and automatic code derivation?

While waiting for my plane and thinking about a what a VHDL/Verilog killer would look like, I had a very (un!)original idea: describe what is to be done in English, and let the killer do the code derivation.

Here is an example of a description of how I want my blob detection (a computer vision application) be done on incoming images:

blob detection, image processing
- connect neighboring high-intensity pixels
- high-intensity - values exceeding a threshold
- neighboring pixels of an image
- 8 pixels surrounding the pixel in question
- image is a 2d array of pixels, variable width and height
- pixels come one by one
- pixels are 8 bits, but can be larger
- threshold is variable at run time
- use png files as test images

This description is enough for a human to write the code, but not so for a computer, that has no understanding of any of these words.

Now my question is, is there anyone working on this?
If not---why not?

To start working on this, here is what I want to do (in my spare time):
- P. Norvig's PAIP has a chapter about solving simple math questions written in English. That's my starting point to derive some sort of meaning from English language description.
- Get from meaning to code. Probably, I will have to search through a code space to satisfy the meaning derived.

I like to hear your reasons on why it is or is not possible!

New Wiki about Structured Backus-Naur Form?

Hi all :)

I'm thinking about extending Wikipedia with a new page: Structured BNF. Is anyone interested in helping in changing content or lecturing it before I publish it? I would appreciate any comment, even if you think that this is not a thing to place in Wiki.

here is the content

Function arity with currying and call-by-push-value

In λ-calculus, all functions are unary, and n-ary functions are encoded through currying. But knowing the arity of a function can be useful for optimization — a curried function starts computing only after applying it to at least N arguments, and this N can be useful to know.
It seems that call-by-push-value(CBPV) allows dealing with arities in a convenient way, as discussed below, though that was not the main design goal. My questions are:

  1. Is this true?
  2. Is this already known?
  3. Would you recommend call-by-push-value, or are there downsides which make it unsuitable?

Below I explain the details.

Applications of arity:
(1) In GHC, that's useful to deal differently with partial applications — because e.g. a function body only computes after specifying enough arguments (in simple cases, that is barring some optimizations, as many arguments as specified in the left-hand side of the equations defining the function).
(2) In my case, I'm working on a program transformation: transforming a function f gives a function f* that returns f's final result and all its intermediate results. This can be useful for incremental computation (along ideas by Liu and Teitelbaum (1995)). Here, too, it seems silly to return intermediate results of a partial application.

But function arity is a tricky concept in λ-calculus. For instance, id should be unary, Haskell's ($) is binary, but ($) is defined as equal to id:

($) :: (a -> b) -> a -> b
($) = id

So, a notion of arity seems tricky, and one of my colleague keeps repeating it's a bad idea.

Today, while reading a paper using CBPV (Hammer et al. 2014), I got a hunch that the call-by-push-value (CBPV) lambda calculus seems to allow for a natural notion of arity. But Googling does not find this spelled out anywhere, maybe because CBPV seems typically used in theoretical contexts. I'm

So, why should CBPV help?

(1) It distinguishes values (A) and computations (C): Computations are either functions (A -> C) which take values and return computations, or base computations F A which wrap values in a "return" expression:

C ::= A -> C | F A

So, the return type of a function is wrapped by the F constructor. In particular, this distinguishes A1 -> A2 -> F A3 (a binary function) from A1 -> F (U (A2 -> F A3)), a unary function returning another unary function.
U is a type constructor for values representing thunks.

(2) Moreover, we also distinguish partial and full applications: since a full application produces a result of type F A, we need to use the elimination form for F.

Thus, after conversion to CBPV, we can distinguish syntactically between partial and full applications.

Questions:
- is this a good way to define a notion of arity?
- would it be better to just introduce multi-argument functions? I suspect not, because those prevent abstracting over arity, while CBPV you can still write forall A B. (A -> B).
- does the translation have downsides? I imagine that ($) = id transforms to something more complicated in CBPV, which would involve (the equivalent of) a full eta-expansion first.
- The GHC core language does have a notion of arity, but it does not seem as elegant as far as I can tell.
- is there an extension of CBPV to calculi richer than STLC, say to System F? I've heard people saying that focusing and polarity runs into some form of trouble with System F, but I haven't heard why. Interestingly, Levy's own paper dealing with System F (http://www.cs.bham.ac.uk/~pbl/papers/polynorm.pdf) does not use CBPV, but Jump-With-Argument.

Bibliography:

Liu, Y. A., and Teitelbaum, T. Caching intermediate results for program improvement. In Proceedings of the 1995 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation (New York, NY, USA, 1995), PEPM ’95, ACM, pp. 190–201.
Hammer, M. A., Phang, K. Y., Hicks, M., and Foster, J. S. Adapton: Composable, Demand-driven Incremental Computation. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (New York, NY, USA, 2014), PLDI ’14, ACM, pp. 156–166.

Programming by page faulting

I would normally post this to G+, but I'm in a 3 day vacation weekend behind the great firewall, so here goes. Pursuing a CLOS submission on hackernews this morning, I came across an awesome quote in the comments section:

I'm a fan of reading the ****ing manual and thus you're preaching to the choir on that. But I'm not an evangelical baptist about it. Truth is that most people program by page faulting - writing some code, hitting a bug, and googling for a solution, lather rinse repeat.

Many would find this very controversial, but it is increasingly true for younger programmers (and I've gotten into this habit myself). It has serious implications on programming language design: efficient feedback loops are much more important as design occurs more by doing than being done up front, functionality should be discoverable at least by searching. What else?

Database programming language review

I am currently searching for anything that calls itself a "Database Programming Language", and in particular any reviews or helpful comparisons.

The purpose is something of a long term hobby project, non-commercial but also non-academic. I'm most interested in languages that have at least some prospect of finding users with real problems to solve and real data to manage, but I'm also interested in more esoteric ideas for their future prospects.

My list currently includes (no particular order) : Datalog, DBPL/Tycoon, Dataphor/D4, Tutorial D, Business System 12, Rel Project, Kleisli, Xduce, Links, Cduce. I would be interested to add to the list.

It would be helpful to know of any confirmed deaths. In particular, it's impossible to Google for Links. Does it live?

I'm working my way (slowly) through the C2 Wiki -- bit of a mixed blessing -- and I know about the DBPL conference and the DBLP bibliography site (confusing). Other resources would be appreciated too.

XML feed