LtU Forum

V-Parser

V-Parser is a novel chart parsing algorithm I've been developing recently. The first open source implementation is at GitHub. You can test it at online grammar development environment. This is the pseudocode:
01  DECLARE chart: [][], text: STRING;
02 
03  FUNCTION Parse (grammar, input)
04      text ← input;
05      chart.CLEAR ();
06      MergeItem (0, [grammar.TOP_RULE], 0, null);
07      FOR each new column in chart
08          FOR each new item in column
09              FOR each alternation of item.Sequence[item.Index]
10                  MergeItem (column.Index, alternation.sequence, 0, item);
11  
12      RETURN chart;
13  
14  PROCEDURE MergeItem (offset, sequence, index, parent)
15      item ← chart[offset].FIND (sequence, index);
16      IF not found item THEN
17          item ← {Sequence: sequence, Index: index, Inheritable: [], Inheritors: [], BringOver: []};
18          chart[offset].ADD (item);
19  
20      inheritors ← [item] UNION item.Inheritors;
21      IF item.Index + 1 == item.Sequence.LENGTH THEN
22          inheritable ← iff (parent is null, [], [parent] UNION parent.Inheritable);
23      ELSE
24          inheritable ← [item];
25          IF parent is not null THEN item.BringOver.ADD_IF_NOT_EXIST (parent);
26  
27      FOR each x in inheritable
28          FOR each y in inheritors
29              x.Inheritors.ADD (y);
30              IF (x.Sequence, x.Index) not in y.Inheritable THEN
31                  y.Inheritable.ADD (x);
32                  IF x.Index + 1 < x.Sequence.LENGTH AND y is terminal succeeded in text at offset THEN
33                      FOR each z in x.BringOver
34                          MergeItem (offset + y.LENGTH, x.Sequence, x.Index + 1, z);
For a sake of simplicity, we present the algorithm that operates on classical context free grammar (CFG) where each rule is represented in the following form:

X -> A B C ...

where X is a rule name, and A B C ... is a sequence of rules named: A, B, C, and so on. A, B, or C may be terminal constants, as well. Alternations are noted by having the same rule name on the left side over multiple rule definitions in a grammar.

V-Parser is a chart parser that groups parsing items into columns that correspond to offset from the beginning of input string. Columns are incrementally processed, never looking back into previous columns in the chart. V-Parser stores its items in chart as pairs of a sequence and an index of the sequence element. This way it is always possible to know what is an ahead element of the current item (we just increment index property). The main function Parse serves as a loop over columns, items and their alternations. It repeatedly calls MergeItem procedure to populate the chart onwards.

MergeItem procedure creates a new item in the current column determined by offset only if the item doesn't already exist. Properties Inheritable and Inheritors are used as pointers to parents and items that inherit these pointers, respectively. Parents in Inheritable property are accumulated over children, meaning that each child has pointers to all of its direct or indirect parents.

Lines 20-25 make sure that Inheritable is properly set up in a case of pointing to non-last index of the symbol seuence. BringOver property is used to remember parent ahead symbols, and is used when we get to the point when we reach the last sequence symbols in parsing.

Lines 27-34 loop over each inheritable, and further reach to each inheritor. If inheritor is a successfully parsed terminal, and if inheritable is an item with non-last index, algorithm populates ahead item in corresponding chart column. The whole loop basically makes sure that newly realized ahead symbols are properly distributed over the chart, at positions determined by relevant past parse terminals, including the current one.

The algorithm stops when it runs out of new items in further columns.

Building compact parse forest (CPF) by V-Parser is trivial, because we just have to assign parsed extents to terminals. Reporting errors should also be relatively easy by analysing the last column in the chart.

Are "jets" a good idea?

I've noticed the beginning of a trend in hobbyist/amateur programming language theory. There have been a number of recent programming languages that have extremely minimalistic semantics, to the point of being anemic. These languages typically define function types, product types, the unit type, perhaps natural numbers, and not much else. These languages are typically capable of universal computation, but executing programs in them directly would be comically slow, and so what they do is recognize special strings of code and replace them with implementations written in a faster meta-language. This pair of the recognized string of code and the efficient replacement is known as a "jet", or an "accelerator".

Some languages that do this:

  • Urbit's Nock, which, other than the jet concept, I honestly think is either a strange art project or snake oil, so I won't discuss this further.
  • Simplicity, a smart contract language for blockchain programming by Blockstream.
  • David Barbour's Awelon, which is the most minimalistic of all three of these by only defining function types!

My question is this: is this idea of "jets" or "accelerators" a serious approach to programming language implementation? There's an obvious question that comes to mind: how do you know the object language program and the meta-language program you've replaced it with are equivalent? Simplicity claims to have done a number of Coq proofs of equivalence, but I can't imagine that being sustainable given the difficulty of theorem proving and the small number of developers able to write such a proof. Urbit takes a rather cavalier attitude towards jet equivalence by advocating testing and simply treating the jet as the ground truth in the case of a conflict. And I know David posts here so I hope he will respond with Awelon's take on this matter.

Here is an excerpt from the Simplicity document arguing in favor of jets:

• Jets provide a formal specification of their behavior. The implementation
of a jet must produce output identical to the output that would be pro-
duced by the Simplicity expression being replaced. There is no possibility
for an ambiguous interpretation of what a jet computes.
• Jets cannot accidentally introduce new behavior or new side effects be-
cause they can only replicate the behavior of Simplicity expressions. To
add new behavior to Simplicity we must explicitly extend Simplicity (see
Section 4).
• Jets are transparent when it comes to reasoning about Simplicity expres-
sions. Jets are logically equal to the code they replace. Therefore, when
proving properties of one’s Simplicity code, jets can safely be ignored.

Naturally, we expect jetted expressions to have properties already proven and available; this will aid reasoning about Simplicity programs that make use of jets.

Because jets are transparent, the static analyses of resource costs are not affected by their existence. To encourage the use of jets, we anticipate discounts to be applied to the resource costs of programs that use jets based on the estimated savings of using jets.

When a suitably rich set of jets is available, we expect the bulk of the
computation specified by a Simplicity program to be made up of these jets, with only a few combinators used to combine the various jets. This should bring the computational requirements needed for Simplicity programs in line with existing blockchain languages. In light of this, one could consider Simplicity to be a family of languages, where each language is defined by a set of jets that provide computational elements tailored for its particular application.

In the interest of charity, I'll also try to make an argument in favor of this approach, although I remain skeptical: an extremely minimal programming language semantics means a language designer can truly consider their work to be done at some point, with no further room for improvement. A minimalistic Python would have largely avoided the fiasco involved in the upgrade from Python 2 to Python 3, by pushing all of the breaking changes to the library level, and allowing for more gradual adoption. Languages features added to JavaScript in recent years (like classes, async/await, modules and so on) would also have been libraries and presentation layer additions instead of breaking changes at the implementation level.

Help with Herbelin

DISCLAIMER: I'm uneducated with PLT and don't know what I'm talking about, so please forgive any whacked terminology.

Ok, that outta the way...

I'm trying to make my way through The Duality of Computation (https://pdfs.semanticscholar.org/048f/c94be2ec752bb210c5f688cba0200c1a1f92.pdf), and this stuff, like most everything posted on this fascinating website, is way over my head. I can't follow most of the back half of the paper, but I was hoping someone might have the spare time to answer two newbie, school-level, questions...

1) (The silly one.) How on earth do you pronounce the "mu, mu with tilde" calculus. It'd literally be easier to read this if I knew how to read it. Is there some definitive guide for people who just figured out that the funny squiggle so many these papers is pronounced "eta"? :)

2) (The basic one.) The authors talk about the call-by-value side of the duality as dealing with "Environments with holes." This reminds me of PTS, with it's capital-PI lambda expressions that can close over terms or types. What's the difference between an "Environment with a hole" and a lambda over environments? And what does that mean for compiled languages, where environments aren't parameters?

Any help on either question would be appreciated.

Project Loom: adding fibers and continuations to Java

Just saw this on Hacker News -- Project Loom: Fibers and Continuations for the Java Virtual Machine with the following overview:

Project Loom's mission is to make it easier to write, debug, profile and maintain concurrent applications meeting today's requirements. Threads, provided by Java from its first day, are a natural and convenient concurrency construct (putting aside the separate question of communication among threads) which is being supplanted by less convenient abstractions because their current implementation as OS kernel threads is insufficient for meeting modern demands, and wasteful in computing resources that are particularly valuable in the cloud. Project Loom will introduce fibers as lightweight, efficient threads managed by the Java Virtual Machine, that let developers use the same simple abstraction but with better performance and lower footprint. We want to make concurrency simple(r) again! A fiber is made of two components — a continuation and a scheduler. As Java already has an excellent scheduler in the form of ForkJoinPool, fibers will be implemented by adding continuations to the JVM.

I'm a fan of fibers, and this has quite a bit of interesting material in it for like-minded folks.

Non-determinism: a sublanguage rather than a monad

Non-determinism: a sublanguage rather than a monad

A puzzlingly named, exceedingly technical device introduced to structure the denotational semantics has by now achieved cult status. It has been married to effects -- more than once. It is compulsively looked for in all manner of things, including burritos. At least two ICFP papers brought it up without a rhyme or reason (or understanding), as the authors later admitted. I am talking about monads.

In truth, effects are not married to monads and approachable directly. The profound insight behind monads is the structuring, the separation of `pure' (context-independent) and effectful computations. The structuring can be done without explicating mathematical monads, and especially without resorting to vernacular monads such as State, etc. This article gives an example: a simple, effectful, domain-specific sublanguage embedded into an expressive `macro' metalanguage. Abstraction facilities of the metalanguage such higher-order functions and modules help keep the DSL to the bare minimum, often to the first order, easier to reason about and implement.

The key insight predates monads and goes all the way back to the origins of ML, as a scripting language for the Edinburgh LCF theorem prover. What has not been clear is how simple an effectful DSL may be while remaining useful. How convenient it is, especially compared to the monadic encodings. How viable it is to forsake the generality of first-class functions and monads and what benefits it may bring. We report on an experiment set out to explore these questions.

We pick a rather complex effect -- non-determinism -- and use it in OCaml, which at first blush seems unsuitable since it is call-by-value and has no monadic sugar. And yet, we can write non-deterministic programs just as naturally and elegantly as in Haskell or Curry.

The running tutorial example is computing all permutations of a given list of integers. The reader may want to try doing that in their favorite language or plain OCaml. Albeit a simple exercise, the code is often rather messy and not obviously correct. In the functional-logic language Curry, it is strikingly elegant: mere foldr insert []. It is the re-statement of the specification: a permutation is moving the elements of the source list one-by-one into some position in the initially empty list. The code immediately tells that the number of possible permutations of n elements is n!. From its very conception in the 1959 Rabin and Scott's paper, non-determinism was called for to write clear specifications -- and then to make them executable. That is what will shall do.

A Framework for Gradual Memory Management

I do not know how much interest this community has in the intersection of programming languages, memory management and type systems, but for those intrigued by such topics, you might find this paper on Gradual Memory Management to be worth reading.

It proposes that a language's compiler offer more sophisticated type systems that enable a program to flexibly support multiple memory management mechanisms for improved performance, and do so with safety guaranteed at compile time. The described type systems build up from Rust's lifetime-driven owner/borrower model as well as Pony's reference capabilities (mutability/aliasing permissions). The paper also references Microsoft's experimental work on Midori.

I welcome any feedback or questions.

Programming language Theme-D

I have implemented programming language Theme-D, which is a Scheme-like programming language with static typing. Some properties of Theme-D include:
- Static type system
- A simple object system
- Multi-methods dispatched runtime (and also compile-time)
- Parametrized (type parameters) classes, types, and procedures
- Signature types resembling Java interfaces but multiply dispatched
- A module system
- Two kinds of variables: constants and mutable variables

Theme-D homepage is located at

http://www.tohoyn.fi/theme-d/index.html

Theme-D can also be found at

https://sourceforge.net/projects/theme-d/

I have also ported (a subset of) guile-gnome GUI library to Theme-D. Its homepage is located at

http://www.tohoyn.fi/theme-d/theme-d-gnome.html

and it can also be found at

https://sourceforge.net/projects/theme-d-gnome/

- Tommi Höynälänmaa

The Platonic Solids of Software Construction and Their Realization in C

The Platonic Solids of Software Construction and Their Realization in C

Synopsis -

Here I try to contrive 5 (actually ends up being 6) 'Platonic Solids' of software construction - IE, the fundamental elements of programming that all programmers in all business domains end up leveraging regardless of their general purpose programming language.

As a practical matter, I then demonstrate how different aspects of each element are either supportable - or not supportable in C. A language like C is chosen partially because when we use it to encode these elements, its weak language semantics actually enable us to understand each element in a more isolated way. For discussion at this level of analysis, this turns out to be useful.

However, I warn readers that this gist-article is more conjecture than science, an emerging idea that, if accurate in its notions, is a precursor to a rigorous investigation. That is why I offer it up for evaluation and critique here.

BCS FACS - Annual Peter Landin Semantics Seminar: Compiling Without Continuations, Prof Simon Peyton Jones, 12th Dec, 6pm, Lon

BCS FACS - Annual Peter Landin Semantics Seminar: Compiling Without Continuations

Date/Time: Tuesday 12 December 2017, 6.00pm - 9.00pm

Venue: BCS, 1st Floor, The Davidson Building, 5 Southampton Street, London, WC2E 7HA

Speaker: Professor Simon Peyton Jones, FRS (Microsoft Research)

Cost: Free

Booking: https://events.bcs.org/book/2701/

Synopsis:

Peter Landin (1930 - 2009) was a pioneer whose ideas underpin modern computing. In the 1950s and 1960s, Landin showed that programs could be defined in terms of mathematical functions, translated into functional expressions in the lambda calculus, and their meaning calculated with an abstract mathematical machine. Compiler writers and designers of modern-day programming languages alike owe much to Landin's pioneering work.

Each year, a leading figure in computer science will pay tribute to Landin's contribution to computing through a public seminar. This year's seminar is entitled “Compiling Without Continuations” and will be given by Professor Simon Peyton Jones, FRS (Microsoft Research).

Programme

5.15pm Coffee

6.00pm Welcome & Introduction

6.05pm Peter Landin Semantics Seminar

Compiling Without Continuations

Professor Simon Peyton Jones, FRS (Microsoft Research)

7.20pm Drinks Reception

Seminar details

GHC compiles Haskell via Core, a tiny intermediate language based closely on the lambda calculus. Almost all GHC’s optimisations happen in Core, but until recently there was an important kind of optimisation that Core really did not handle well. In this talk I’ll show you what the problem was, and how Core’s new “join points” solve it simply and beautifully, by effectively allowing Core to express control flow as well as data flow; there are strong links to so-called “continuation passing style” (CPS) here.

Understanding join points can help you are a programmer too, because you can write code confident that it will optimise well. I’ll show you a rather compelling example this: “skip-less streams” now fuse well, for the first time, which allows us to drop the previous (ingenious but awkward) workarounds.

SK in Prolog

A thought experiment I am too lazy to do so I'll ask you folk.

Define SK, define a reduction relation, ask whether two terms are equal/reduce to similar terms.

Can you do this in Prolog? (Asked because of interest in current unification based languages like miniKanren.)

XML feed