LtU Forum

inline vs scatter/gather separate annotation

I want to explore a "what if?" question that involves separating aspects of code that are normally adjacent in programming languages, in the sense of being nearby in syntax and grammar. What if some parts are specified elsewhere? Remote specs might augment or replace local specs in code, when compiled against a particular build target, or when transpiled to a new inline representation where everything is a local spec again.

(What do I mean by inline? Roughly speaking, some contiguous sequence of local tokens in a legal grammar order. In a programming language, keywords are usually inline, modifying the meaning of source code nearby, per the semantics of grammar used.)

One metaphor that seems applicable is the idea of styles used in document text markup. The same way you can separate text styles in cascading style sheets from the documents using it, we might separate abstractions of code application and specialization from source code using it. For example, if you wanted to support a meta-object protocol (mop), a spec for what the mop means locally might be defined somewhere else, and applied only at compile or transpile time. All kinds of instrumentation for logging, profiling, and debugging, etc, might be put somewhere else where it interferes less with the basic stuff.

One objective would be to isolate accidental noisy stuff from essential basics, while another would be enabling replacement and update in new versions without updating the old original. When you do this to types of variables, you get something like generics.

You would want different parts of a code base to see different views, applying different styles and annotations to the same original. One part of a code tree might have permission to alter certain data structures, and see data types involved as mutable, while another part of the code tree sees only immutable types that cause compile time errors on any attempt to call API that would modify state.

Note when important information becomes separated, this amounts to a form of obfuscation. (Witness C switch statements turned into C++ virtual method calls that distribute each case into whatever file implements each relevant method.) If you use scatter/gather on code annotations, it's easy for a compiler to gather, but hard on a human being. So you would want to support generation of views with everything gathered in more inline form, just for human consumption.

I've been thinking about this idea a lot, but it's seldom a topic of conversation, so that's mainly why I bring the idea up, just to see if folks have interesting comments.

[Edit: I'm looking for comments on splitting or scattering code declarations, rules, and constraints, where gather in this context is code rewriting. While the scatter/gather metaphor comes from doing this to data, I'm talking about doing it to code. Applications to dataflow aren't relevant unless input and output is code. There's isn't a good word that means the opposite of inline, unless you want to go with plain English apart.]

Recursive Descent Parser Generators

I'm writing a parser generator, to avoid being TLDR (which is most of my messages) I wanted to ask if anyone knows of other LL(*) parser generators, what I should aim for performance wise (what's acceptable) and if you've had experience writing something similar what's one of the toughest nuts to crack on this style of parser?

For me (more than one thing) :

  1. Generating Parsers with unbound look-ahead uses a lot of memory to do statically so none (or very little) of the processing is done during run-time.
  2. Follow predictions are a PITA: you have to analyze all possible parse paths coming into the given rule, walk the context stack to disambiguate the offending lock(s), and keep stepping ahead and only accept if the parent contexts can parse and that state can also accept (this was a nightmare, but I think I have it...?) I'm writing the prediction discriminator logic now (my most recent task.) Dangling else isn't an issue because iirc it is aware that the parent context is on an edge state once the current rule exits, and I apply to the maximal munch principle.
  3. Lexical Ambiguities: I'm a glutton for punishment, I used includes to allow complex grammars to span multiple files. To avoid issues with parse order and precedence on the user's side (you can still specify which trumps the other, but this trumping order is 'always on' if used) I allow for lexical ambiguities to create new parse states and following those ambiguity identities you can figure out which path was actually intended.
  4. To solve the memory issue, I decided that I would use: a symbol stack, an active rule context, recursive descent, and reductions.
    • Reductions some of solve the issue of memory by identifying where multiple paths of a prediction all enter the same rule on the same state, parsing that rule, and evaluate what comes after that rule. This abates some of the issues with the look-ahead requirements, but adds more complexity when the follow-state of that rule is ambiguous with the prediction! I haven't even gotten to this issue :(
  5. AST Generation: making this look easy is the hard part. It should just work (I will focus on refining it once everything else is done.)
  6. It's so easy to create grammars that are simple but yield an exponential state explosion. I need to do more research to detect this ahead of time so I don't waste time on processing something useless, so the user has a better experience than 'Out of Memory Exception'.

If this ends up working as intended, I'll be astonished, because when I started this eight years ago, I didn't have an effing clue. I still probably don't, but I at least know what I want to do now. The grammars define no semantic predicates, nor actions on lexical matches, so the generator has to do all the work. So far (without error context) it's 50-80 times faster than my own hand-written parser for its own language, until I found that I wasn't handling lexical ambiguities on follow states and now have to fully implement them to do a proper comparison.

Edit:
The goal with this LL(*) approach is there is very little back-tracking employed. The only time it should need to is when it's handling left-recursive rules and it has no choice to do so because it has to fall to a weaker prediction mechanism to yield a preliminary identity onto the stack that it can use to move forward, then rinse and repeat.

Edit 2: Redacting the following, turns out the issue is more complex :) The sample I gave it had zero issues with :(
'No Alt Indirect Left Recursion' is when you have something like:
A ::= B | C;
B ::= A '[' ']';
C ::= "Other";

Without code to properly identify these 'No Alt Indirect Left Recursion' states, it will recurse infinitely because there's no fall-back to choose first, it would always predict 'B' if it has: Other[].

New paper: Theory of Programs

A simple mathematical framework, based on elementary set theory, to cover programming concepts.

See http://bit.ly/1CkZBPj.

-- Bertrand Meyer

Lamport: Interprocess Communication

Something that may be pertinent to the recent (lengthy) discussion of the deep implementation details of interprocess communication is Leslie Lamport's interesting discussion of how to implement various kinds of interprocess communication. The bulk of the paper deals with various kinds of shared registers, which are taken as the primitives through which messages are passed. Starting from non-atomic operations, Lamport shows how to build atomic operations that allow message-passing to occur.

"At a low level, message passing is often considered to be a form of transient communication between asynchronous processes. However, a closer examination of asynchronous message passing reveals that it involves a persistent communication. Messages are placed in a buffer that is periodically tested by the receiver. Viewed at a low level, message passing is typically accomplished by putting a message in a buffer and setting an interrupt bit that is tested on every machine instruction.
...
Hardware implementations of asynchronous communication often make assumptions about the relative speeds of the communicating processes. Such assumptions can lead to simplifications. For example, the problem of constructing an atomic register, discussed below, is shown to be easily solved by assuming that two successive reads of a register cannot be concurrent with a single write. If one knows how long a write can take, a delay can be added between successive reads to ensure that this assumption holds. No such assumptions are made here about process speeds. The results therefore apply even to communication between processes of vastly differing speeds."

Some New Directions for ACP Research

This paper at ArXiv.org lists some new directions for research related to the Algebra of Communicating Processes (ACP). Most of these directions have been inspired by work on SubScript, an ACP based extension to the programming language Scala. SubScript applies several new ideas that build on ACP, but currently these lack formal treatment. Some of these new ideas are rather fundamental. E.g. it appears that the theory of ACP may well apply to structures of any kind of items, rather than to just processes. The aim of this list is to raise awareness of the research community about these new ideas; this could help both the research area and the programming language SubScript.

Practical Principled FRP: Forget the past, change the future, FRPNow!

A new ICFP 15 paper by Ploeg and Claessen; abstract:

We present a new interface for practical Functional Reactive Programming (FRP) that (1) is close in spirit to the original FRP ideas,
(2) does not have the original space-leak problems, without using arrows or advanced types, and (3) provides a simple and expressive way for performing I/O actions from FRP code. We also provide a denotational semantics for this new interface, and a technique (using Kripke logical relations) for reasoning about which FRP functions may “forget their past”, i.e. which functions do not have an inherent space-leak. Finally, we show how we have implemented this interface as a Haskell library called FRPNow.

Fixed link.

Who can make LtU2?

If we had voting on things (a la reddit, slashdot, hacker news, etc.) I think we'd all be better off. There are technological things which won't outright fix everything, but should greatly help I should think.

But we'd need somebody who can actually set that stuff up.

Bi-simulation in security auditing

Bi-simulation has been an important topic in Computer Science (e.g. intensely studied for process calculi) which may be applicable to an important emerging issue for computer engineering, namely security auditing.

See bottom of page 3 of Distributed Public Recording: Providing Security Without the Risks of Mandatory Backdoors

The Page-Fault Weird Machine: Lessons in Instruction-less Computation

In case one instruction is one too many for you. A compiler which executes instructionless code through triggering pagefaults.

This is a proof by construction that the Intel MMU's fault handling mechanism is Turing complete. We have constructed an assembler that translates 'Move, Branch if Zero, Decrement' instructions to C source that sets up various processor control tables. After this code has executed, the CPU computes by attempting to fault without ever executing a single instruction. Optionally, the assembler can also generate X86 instructions that will display variables in the VGA frame buffer and will cause control to be transferred between the native (display) instructions and 'weird machine' trap instructions.

Dedekind, Cantor, Conway, & Hewitt (w/ some Chomsky)

Here, I will attempt to cool down controversy on LtU over Hewitt's construction of Real by explaining it in more familiar terms. I think this will help to shed some light on Direct Logic and begin to hint why it is interesting in a PLT context.

I will borrow conceptually from John H. Conway who wrote, in "On numbers and games":

Let us see how those were good at constructing numbers have approached this problem in the past.

Dedekind (and before him the author -- thought to be Eudoxus -- of the fifth book of Euclid) constructed the real numbers from the rationals. His method was to divide the rationals into two sets L and R in such a way that no number of L was greater than any number of R, and use this "section" to define a new number {L|R} in the case that neither L nor R had an extremal point.

His method produces a logically sound collection of real numbers (if we ignore some objections on the grounds of ineffectivity, etc.), but has been criticised on several counts. Perhaps the most important is that the rationals are supposed to have been already constructed in some other way, and yet are "reconstructed" as certain real numbers. The distinction between the "old" and "new" rationals seems artificial but essential.

Cantor constructed the infinite ordinal numbers. Supposing the integers 1,2,3... given he observed that their order-type ω was a nwe (and infinite) number greater than all of them. Then the order-type of {1,2,3...,ω} is a still greater number ω + 1, and so on, and on, and on. The similar objections to Cantor's procedure have already been met by von Neumann, who observes that it unnecessary to suppose 1,2,3,... given and that it is natural to start at 0 rather than 1. He also takes each ordinal as the set (rather than the order-type) of all previous ones. Thus for von Neumann, 0 is the empty set, 1 the set {0}, 2, the set {0, 1}, ..., ω the set {0, 1, 2, ...} and so on.

In this chapter [Conway teases] we intend to show that these two methods are in reality part of a simpler and more general method by which we can construct a very large Class No of numbers, including at the same time the real numbers and the ordinal numbers, and others such as those mentioned above [e.g. infinitessimals] [....]

Hewitt can be read as constructing Real by specifying a set of 3-way partitions of the dyadic fractions between 0 and 1.

Every Real in [0..1] corresponds to three sets: {L | M | R}

The sets are ordered so that no member of L is greater than or equal to any member of M or R, no member of M is greater than or equal to any member of R.

  Zero:

  0 ≡ {{} | {} | Dyad} 
  where Dyad is the set of all dyadic fractions between 0 and 1

  One:
  1 ≡ {Dyad | {} | {}} 

Hopefully the dyadic fractions are themselves members of Real and we can memorialize that fact in a manner similar to Dedekind:

  ∀d ∈ Dyad, {dl | { d } | dr } ∈ ℝ

  where dl ≡ { x ∈ Dyad | x < d }
    and dr ≡ { x ∈ Dyad | x > d }

  Note: the Real {dl | { d } | dr } is called d.

The rationals in general correspond to regular languages as follows:

  Let Rat be the set of of sets of finite strings of 0 and 1 such that:

  ∀r in Rat, r is a regular language
    r is well ordered by a prefix relation
    ∀s in r, s is finite length and ends with 1

    rl ≡ { d ∈ Dyad | ∃x ∈ r, x > d }
    rm ≡ { d ∈ r | ∀x ∈ r, x <= d }
    rr ≡ { d ∈ Dyad | ∃x ∈ r, x < d }

   Note: the Real { rl, rm, rr } is called r }

As you can see, the subset of Real given by Rat is the set of all rational fractions between 0 and 1.

The rationals are given by certain regular languages over the alphabet {0,1}. Similarly, the constructible reals are given by certain recursivvely enumerable languages.

  Let Tructable be the set of all recursively enumerable sets of finite 
  strings of 0s and 1s, each string ending with a 1, such that each member
  of Tructable is well ordered by a prefix relation.

  By analogous construction to the rationals:

  ∀ t ∈ Tructable, { tl, tm, tr } is in Real

All rationals (dyadic and otherwise) are Tructable.

Finally, we consider a set of languages of finite strings for which the generator functions are non-deterministic.

  Let Fracs be the set of all sets of finite strings of 0s and 1s, 
  each string ending with a 1, such that each member of Fracs is
  well ordered by a prefix relation.

  By analogous construction, 

  ∀ f ∈ Fracs, { fl, fm, fr } is in Real

Note that if a member f of Fracs has a maximal element, then f is also a dyadic rational.

Note that if a member f is a regular language over {0,1}, then it is a rational.

Note that if a member of f is a recursively enumerable set, then it is a tructable real.

There are other possibilities. By a diagonalization argument, we know that Fracs contains members f which are not recursively enumerable. Those f are the unconstructible reals.

Let's suppose that the real world contains fair randomness. For example, there is a coin flip or some other kind of measurement we can repeat that will produce random outcomes. Neither heads or tails will be permanently starved. A sequence of coinflips on a given world-line will eventually produce both a 0 and a 1. Furthermore, on some world-lines, the sequence of coin flips is not described by any recursively enumerable function (at least assuming that there is no upper bound on the number of times we get to flip the coin).

The number of possible world-lines on which coin flips might be recorded is uncountable and, indeed, corresponds nicely to the members of Fracs.

In other words, a machine can enumerate the members of any f in Fracs, from shortest to longest. Such a machine, on a particular world-line, is pretty much what Hewitt means by an "Actor address".

XML feed