LtU Forum

symbols and loosely coupled concurrent apps part II

This discussion is a continuation of managing closed worlds of symbols via alpha-renaming in loosely coupled concurrent apps where a number of different topics are touched, most recently a Lisp sub-thread at the end in addition to a bunch of stuff I was posting about lightweight user-space process architecture involving threads and fibers. When that discussion gets less useful due to sheer size, I'll continue here when I have anything novel to say, which probably won't be that much. You're welcome to continue here anything started there, too.

(A number of things I said in the other discussion might be expected to bother folks who want to see fewer development systems around, with less competition, fewer languages, and narrower choices due to existing "winners" who will guide us to a managed promised-land where we can be tenants for a modest fee. Is that enough hyperbole? I can't be bothered to get worked up about it, but I see many things as subject to change, especially along lines of choice in portable code for app organization not dictated by trending vendors. I expect to mostly ignore politics and advocacy.)

The first thing I want to continue is discussion of performance among options for implementing fibers (single-threaded user-space control flows that multiplex one OS thread) because I like heap-based spaghetti stacks, while others recommend options I think are poor choices, but to each his own.

How can languages help us in terms of achieving correct program design?

We talk a lot about how languages can help us with correctness of program execution, but why don't we talk about how languages help us with correctness of program design?

With all the rework that goes into making a non-trivial system well-designed, I'm bummed this subject does not receive much treatment - especially in terms that non-PhD's like me can understand.

Any thoughts?

Function Readability & Understandability

Two versions of the same function in JavaScript because its a language that supports both the functional and imperative style. The function accepts any number of arrays as arguments, and outputs the permutations of the values so (['a', 'b'],['1', '2']) becomes [['a', '1'], ['a', '2'], ['b', '1'], ['b', '2']]:

function permutations1(x) {
    return x.reduce(function(a0, v0) {
        return a0.reduce(function(a1, v1) {
            return a1.concat(v0.map(function(v2) {
                return v1.concat([v2]);
            }));
        }, []);
    }, [[]]);
}

function permutations2(x) {
    var a = [[]];
    for (var i = 0; i < x.length; ++i) {
        var b = [];
        for (var j = 0; j < a.length; ++j) {
            for (var k = 0; k < x[i].length; ++k) {
                b.push(a[j].concat([x[i][k]]));
            }
        }
        a = b;
    }
    return a;
}

Which do you find easier to read and understand, and if you find one easier, what do you think the reasons are?

Snakes all the way down

Virtual machines (VMs) emulating hardware devices are gen- erally implemented in low-level languages for performance reasons. This results in unmaintainable systems that are difficult to understand. In this paper we report on our experience using the PyPy toolchain to improve the portability and reduce the complexity of whole-system VM implementations. As a case study we implement a VM prototype for a Nintendo Game Boy, called PyGirl , in which the high-level model is separated from low-level VM implementation issues. We shed light on the process of refactoring from a low-level VM implementation in Java to a high-level model in RPython. We show that our whole-system VM written with PyPy is significantly less complex than standard imple- mentations, without substantial loss in performance.
* We show how the execution and implementation details of WSVMs are separated in the same way as those of HLLVMs.
* We show how the use of preprocessing-time meta-programming minimizes the code and decreases the complexity.
* We provide a sample implementation of a WSVM prototype for PyPy which exhibits a simplified implementation without substantial loss of performance (about 40% compared to a similar WSVM in Java).
(groan, since when did Java become the gold standard for "fast"? I know, I know, "with a sufficiently advanced JIT...") (I (sincerely, seriously) apologize if this is not nearly theoretical enough for a forum like LtU.)

You got your Monads in my FOP/AOP.

I'm reading a book on Feature Oriented Programming, and it seems to me that FOP/AOP suffer because it is hard to get at the join points easily in standard crappy languages. Side-effects and nested calls seemed to me to be two of the big trouble makers. So I figured if we did things in a purely functional maybe monadic or command/continuation passing style, then we'd be able to intercept things w/out restriction. In fact, couldn't we transform sad bad imperative code into such a form?

What is sad to me is that, judging by a very brief look at the citations, there isn't a community of like-minded thinkers all getting together to advance this cause, which seems like it could be a very worthy one. I mean, it seems like one could commercialize this if one could take crappy C/++/Java/whatever code and perform maintenance magic on it.

(Some of the citations mention people who are (or have been) active on LtU, of course since LtU is the bee's (bees'?) knees.)

A very small, very random, sampling of the papers out there:

* A monadic interpretation of execution levels and exceptions for AOP
* Effective Aspects: A Typed Monadic Embedding of Pointcuts and Advice
* Monads as a theoretical foundation for AOP
* blog post AOP is... (monads)

et. al.

Let's kick continuations around for a while...

Recently Rys McCusker and I started talking about continuations, in another node where that was probably off topic. The link if you want to read it is http://lambda-the-ultimate.org/node/5082.

The ability to directly refer to continuations is supposed to be something like an "ultimate" control structure, or "goto with its context intact," which all other control structures are trivially implementable with.

But because it does a lot more than "goto" in order to preserve that context, there can be some serious problems if that extra stuff it does is irrelevant, wasteful, or harmful. And I have experienced that some kinds of continuations are just plain worse than others. But let me explain what I mean by kinds of continuations. Most languages that have them at all have only one kind, so the fact that there are different kinds is something that people who only program in one language tend never to notice.

There are a bunch of different things that can be called continuations. I think of them as being divided along four axes. First, there is delimited versus reified. Second, there is lexical-capturing versus dynamic-capturing (or both, or neither). Third there is winding/unwinding versus raw. And fourth there is mutable vs. immutable capture.

Delimited or "one-shot" continuations are delimited in dynamic extent; You can store them in values or pass them to subroutines or whatever, but having called a function, you can its continuation to return from that function exactly once, and that only if you haven't already returned from the function in some other way first. With delimited continuations you can reorder the sequence in which functions return, making it not equal to the usual standard of being the reverse of sequence in which they were called. You can even "delay" some of the function returns until after the program halts (ie, never) by just never using them, or by having your function just skip the returns of its callees by telling them to call its own continuation directly. These are mostly no problem in my experience.

Reified continuations can do the same kind of things that you can do with delimited continuations in terms of returning from the function or passing one to a called function or reordering function returns, but reified continuations have an additional property: having captured a reified continuation and stored it as a value in some variable that persists beyond the dynamic extent of the function (ie, after the function has already returned) you can call it and simulate that function returning, with the same environment (that may include the environments of other functions that have already returned) AGAIN, even though the function whose return you are simulating has not been called a second time. This sounds like a good idea, but experience has taught me that it is deeply problematic.

Lexical-capture vs. Dynamic-capture is whether your continuations capture the lexical environment (ie, the bindings within the lexical scope of the function whose continuation is taken) or the dynamic environment (ie, the bindings within the dynamic extent of the function whose continuation is taken). Usually, programming languages use scope, which is called lexical scope, or extent, which is called dynamic scope, but rarely both. In those two cases this is a trivial choice because your continuations will capture the same environment whose bindings are extended by your function calls (it could be done the other way but that would be insane). However, if your language has both lexically scoped and dynamically scoped variables, this distinction can become crucially important. Usually continuations that capture the lexical environment will allow the dynamic environment to be inherited from wherever the continuation is called, rather than from where the continuation is taken. You can make powerful arguments on both sides of the question of whether continuations that capture the lexical environment should or should not also capture the dynamic environment. Semantically it's nicer if they do, but in terms of implementation that doubles the branchiness of your environment tree and can become very expensive very fast if people use it a lot. In Common Lisp version 1, before they switched to lexical scope by default, there were (delimited) continuations that captured the dynamic environment but not the lexical environment. I guess there is actually a fourth subcategory here: "goto" is a sort of continuation that captures neither lexical nor dynamic environment.

Winding and Unwinding is another important distinction, and another controversy. If your continuations wind and unwind, it means that you can arrange for code to be called when a given scope (or extent) is entered or left, including entering or leaving that scope by means of a continuation. A classic example would be releasing a resource when you escape from a function that needs that resource, and then locking the resource again when re-entering that function. With raw continuations, it's a lot more like just a jump; you don't get to call set-up and shut-down code. In some ways winding and unwinding continuations are more orderly, but once again the devil is in the implementation; this renders continuations nearly useless for mechanisms like lightweight context switches between green threads.

And finally we get to the last category; is the captured environment mutable or immutable after capture? An implementation of continuations could copy the stack, and then treat the copied value as immutable. If the bindings visible in the stack were then mutated before the continuation were called, the stack from the continuation (including the former values) would be copied back into the working stack, undoing the mutations. This type of continuation is called a "restart" and is very rare because it is highly expensive in memory costs. In the usual case, the captured environment is mutable - ie, the implementation records only pointers to the stack frames where the variables are found, not the variables themselves. So if code after the continuation is captured mutates the values of those variables, the continuation when invoked will not overwrite the mutated values.

And I think that is an exhaustive taxonomy of all the things that have been called "continuations." Doubtless I've missed something.

Continuations are used to implement:
control structures - for example if you want to roll your own do-while loops, no environment capture is needed.
nonlocal flow of control - for example if you want to roll your own try/catch and exception handlers.
variable scoping - using the winding/unwinding variety, you can simulate dynamic variables in lexically scoped languages or vice versa.
green threads - if you want to roll your own cooperative multitasking.
session management - you can represent the session of visitor #1392 to your website (or whatever) as a continuation.
type system extensions - using winding/unwinding continuations can allow you to add typechecking code on entry and exit to functions, even for types the native language could not define. The drawback is that these are runtime rather than compile time typechecks.
and many other things.

Many of these things can't be done within the (other) tools a well-defined language can give you to work with. So, yay.

But many of these are BAD uses for continuations, in that continuations are a BAD way to implement these things. So, boo.

Many of these things cannot coexist in the same program because different uses of continuations would force stack captures that can never be recovered by garbage collection, or otherwise interfere with each other. This is especially true of reified continuations. So, boo some more.

Many of these different "continuations" have very fundamental semantic differences, in terms of composability and tractability for combinations.

And now that I've written half a book here, you guys can tell me what I missed or what it means. ;-)

Wrangling C via Static Checks

If you can't beat them, join them? Static Program Analyzer for Safety-Critical Real-Time Embedded Software which is the kind of thing I search around for; my 'real time' app is video games. :-)

We report on a successful preliminary experience in the design and implementation of a special-purpose Abstract Interpretation based static program analyzer for the verification of safety critical embedded real-time software. The analyzer is both precise (zero false alarm in the considered experiment) and efficient (less than one minute of analysis for 10,000 lines of code). Even if it is based on a simple interval analysis, many features have been added to obtain the desired precision: expansion of small arrays, widening with several thresholds, loop unrolling, trace partitioning, relations between loop counters and other variables. The efficiency of the tool mainly comes from a clever representation of abstract environments based on balanced binary search trees.

In this paper, we report on a first experience on the design of a special-purpose static program analyzer. The considered class of software is critical real-time synchronous embedded software written in a subset of C. The considered class of specifications is that of absence of run-time error. This experience report explains the crucial design decisions and dead-ends that lead from a too imprecise and too slow initial implementation of a general-purpose static program analyzer to a completely rewritten, very fast, and extremely precise special-purpose static program analyzer. By providing details on the design and implementation of this static analyzer as well as on its precision (absence of false alarm), execution time, and memory usage, we prove that the approach is technically and economically viable.

Ivory EDSL in Haskell for Embedded Control

I'm all starry-eyed enamoured with Ivory and Tower.

We report on our experiences in synthesizing a fully-featured autopilot from embedded domain-specific languages (EDSLs) hosted in Haskell. The autopilot is approximately 50k lines of C code generated from 10k lines of EDSL code and includes control laws, mode logic, encrypted communications system, and device drivers. The autopilot was built in less than two engineer years. This is the story of how EDSLs provided the productivity and safety gains to do large-scale low-level embedded programming and lessons we learned in doing so.

Ok for me what I'm excited about: Ivory is like Atom or Copilot only more real-world; Tower is a compositional language to boot; apparently there is an ability to do FFI from Ivory to C. (Now if only I could understand more about any kind of interactive source line debuggery.) I really want to write video games for my Game Boy using this stuff.

Language combining relational algebra and domain algebra

I'm working on a language (and implementation) that combines elements of relational algebra and domain algebra. For clarity, the relational algebra is about operations with relations as arguments, such as join, union and 'is subset of'. The domain algebra is about iterated tuple-level operations used to select, generate or modify relations, such as '>=', '+' and SUM().

The purpose is similar to the Functional Relational Programming described but not implemented in Out of the Tar Pit.

I have a bibliography of papers and products for some 25 or so relational languages spanning over 30 years, including the Third Manifesto and Tutorial-D, Rel, Alf, Muldis, SIRA-PRISE, BS12, Dataphor, etc. They all emphasise the relational side, and while most have a domain algebra component they do not treat it in any detail. I'm looking for more material on the domain side.

I know of one set of works directly on the subject: Aldat. I would appreciate any links or pointers for additional works in this area, preferably active now or in the recent past.

Intuitionistic Programming Language

The Intuitionistic Programming Language purports to do nifty things:

The semantics of IPL is based on intuitionistic type theory extended with a component model it compiles to LLVM bytecode using a novel algorithm based on term rewriting. The IPL compiler is very small: less than 10,000 lines of OCaml code.

IPL provides

zero abstraction penalty,
a very high level of abstraction,
excellent run time performance,
strong static safety guarantees,
predictable performance without GC pauses,
support for verification and testing, and
a composable effect system based on free monads.

The combination of these features is made possible by an algorithm that eliminates high level constructs at compile time. The limitation implied by this algorithm is that mutable complex state has to be region based (global or local).

IPL is currently in the early stages of development and there are many opportunities to make a large impact on its future direction and for further innovation in the space.

XML feed