LtU Forum

assistance with some data flow terms

I'm working out some rules for a data flow network and certain conditions apply in certain situations, but I am not sure what the accepted terminology is for these. So, if you will, what are the names for these:

numOut == numIn
A node (in a data flow graph) always provides exactly one output token for one input token. I've been calling this "systolic", but "systolic" really means "clocked" which this isn't necessarily.

numOut != numIn
A node may provide more or less than one output for one input. I've been calling this "asystolic", but that is even worse since I can't find much use of that term in the literature and in medicine it means "absence or cessation of heartbeat" or "any deviation from a healthy or normal condition" neither of which are desirable descriptions of the situation.

numOut <= numIn
A node provides an equal or fewer number of output tokens as input tokens, e.g. the "filter" operation from FP.

numOut >= numIn
A node provides at least as many output tokens as input tokens, possibly more.

Thank you!

Why Are ALL Programming Languages So Low Level?

Memory allocations and deallocations, threads, locking, execution control (such as loops, function calls, conditionals), etc... these are are machine specific features. Why do we not have a high level language that goes beyond these things? ALL current languages are low level even if the programming community tries to say otherwise. Handling hardware is low level.

Why do languages even touch execution which is a technique used to get around current hardware limitations? What will happen when processors can execute seemingly infinite instructions at the exact same time? Execution control will no longer exist. More to the point, mixing execution control and data manipulation all in the same statement is a hard coded custom solution. You cannot reuse components that have been welded together in this fashion. Why do languages still use this archaic form of programming that can never lead to reuse?

There is a solution, but I'm just curious why the lack of will to move forward. Haven't you ever thought it strange that you can only have unidirectional control statements within functions? Why the skew? Have you not ever noticed that sending messages between computers doesn't work well in an RPC fashion because the other machine is not required to respond? So don't you think that perhaps there might be something wrong with the entire concept of functions and procedures in programming languages? Is not the difficulty of having multiple returns values a sign that something is seriously wrong?

So my main question is: Why are ALL programming languages so low level and more generally obsolete for general purpose use when all the signs are impossible to avoid? Why the refusal from language designers to move forward? Can they really not see the solution?

SuperGlue

Hi, I'd like to announce our paper "SuperGlue: Component Programming with Object-Oriented Signals," which will be presented at ECOOP next month. Its related to functional-reactive programming with a more declarative and object-oriented way of manipulating signals.

Abstract:

Abstract. The assembly of components that can handle continuously changing data results in programs that are more interactive. Unfortunately, the code that glues together such components is often difficult to write because it is exposed to many complicated event-handling details. This paper introduces the SuperGlue language where components are assembled by connecting their signals, which declaratively represent state as time-varying values. To support the construction of interactive programs that require an unbounded number of signal connections, signals in SuperGlue are scaled with object-oriented abstractions. With Super-Glue’s combination of signals and objects, programmers can build large interactive programs with substantially less glue code when compared to conventional approaches. For example, the SuperGlue implementation of an email client is around half the size of an equivalent Java implementation.

Paper available here here.

Common Lisp Exception Handling

I recently watched this Google TechTalk video by Peter Seibel on Common Lisp.

Not being very familiar with Lisp, I found it quite interesting, especially the part about how Lisp handles error conditions. If I understand correctly, rather than unwinding the stack immediately, it sends a signal back through which is picked up at a higher level. Then that higher level gets to tell the lower level how to proceed. This seems neat, since as the speaker points out, you don't lose your state while asking the higher levels how to handle the condition.

Since I have not come across this before, it raised a few questions:

How well does this work in practice and are there any caveats?
Could this have been used in more mainstream languages (Java, c#, etc)?
Do other languages handle exceptions in this way?

emulation for Erlang style processes

In The Problem With Threads Peter Van Roy wrote: The real problem is not threads as such; it is threads plus shared mutable state. To solve this problem, it's not necessary to throw away threads. It is sufficient to disallow mutable state shared between threads

The last few weeks I've been thinking about avoiding shared mutable state in new language dialects, while retaining convenient thread styles. Recently I had a fun idea which -- as an unplanned side effect -- seems to solve shared mutable state as a problem well enough for my purposes. Suddenly I told myself, hey, I'm going to get Erlang style processes out of this. Then when I googled a bit for references, I hit the LtU thread I cited above, which I'd read recently.

Note I'm unsure how to present the whole idea yet, which I'm still groveling over in a zillion pieces, like a car that's been totally disassembled. (It's hard for me to point at this pile and say, "This would be a nice car, if assembled." To you it might seem merely a pile of junk.) Note I'm not trying to build this up; I'm apologizing in advance for how lateral this likely sounds.

The disturbing part is I seem to have veered off the language path into OS territory without meaning to do so. I'm not sure. Maybe that's always what happens when you plan to embed VM systems in host operating systems. The VM world starts to sound an awful lot like an operating system by the time you start getting processes (with no shared mutable state between threads). Try to ignore this transgression into OS territory as a separate issue.

The reason I'm describing this here is because I think a lot of folks would like to get Erlang's safe and efficient use of many thousands of threads into other languages. So this is a tactic you might consider when brainstorming on the topic.

I was thinking about instruction sets for virtual machines, because recently I was going to post a story to the history department about the Smalltalk blue book, with a minor note of interest in the Smalltalk-80 VM instruction set. Then I got derailed thinking about instruction sets, and now I'm posting about Erlang processes instead.

The missing part in the middle, which is still spinning off distracting derivative ideas for me, was a notion of using Motorala 68K family 16-bit instruction sets as a VM instruction set, instead of something like the 8-bit Smalltalk-80 byte codes. One of several reasons for this idea is a rule of thumb: "don't invent things." If you consider a virtual assembler for a VM, don't invent an assembler (or, not until you succumb to exotic extensions). Instead, get a free ride on lots of existing books and tools for some existing, well-known, easy-to-understand instruction set. I happen to like m68k.

Of course, if you don't run on an actual m68k chip, this means emulation of some kind. It's the emulation that leads to a wonderful game making the Erlang process style easy. The emulation of a chip's machine code adds an indirection causing some extra layer cost, but suddenly making many other complex things easier and perhaps cheaper to do. My original motive was something like this: debug all the compilation issues under a simulation, and later worry about real machine code.

Then I noticed that sandboxing was going to be trivial. An emulator can manually memory map all space seen by the code run on the VM, so it's impossible for VM code to either read or write space unless the emulator wants to let it happen. So VM code simply can't touch other pages in the emulator's host process except those the emulator chooses to let appear in the virtual address space.

Then, it's easy to think about sharing readonly pages among separate VM processes, simply by letting the VM's emulator map pages into multiple address spaces, with suitable refcounting. And while you're at it, you can make inter process messaging cheap too, because it's not necessary to actually copy any bytes around in memory to deliver a message in the same host process; you can simply map parts of the host address space into a virtual address space of a VM process.

Also, when sychronization primitives for concurrency are done in the VM engine, then multitasking starts to look rather cooperative (read: cheap) in each VM process. And high level operations performed by a VM engine can be atomic from the perspective of each VM process.

As an example, suppose you mimic m68k machine code and utilize A-line traps so instructions beginning with hex $A are not defined by the emulated chip, and instead cause (emulated) trap exceptions to system defined high level routines. This is how the original Macintosh architecture used 68000 traps to implement the Mac toolbox in ROM very cheaply, since each A-line instruction was only 16-bits in size. Effectively you get to dispatch to an arbitrary late-binding method using only one 16-bit instruction.

On a real 68K chip, A-line traps are not cheap because because they cause exceptions with a lot of register maintenance costs as part of the dispatch to other 68K code somewhere else.

But when a 68K engine is emulated, traps are even faster than a normal dispatch in VM code, because it can be done directly in native code in the VM engine, without emulation. Complex system operations can run native in a VM engine, and atomically from the view of a VM process. If you wanted to implement a Scheme system with, say, around 250 primitive methods, each primitive could be a separate 16-bt A-line trap instruction running native in the VM engine. Since the leading $A consumes only 4 bits of an instruction, the other 12 bits are good for 4096 different traps, without needing to start a new VM engine. So you could implement a very large high level library as atomic VM engine operations this way, without needing any emulation for code in the library.

The last paragraph has little to do with Erlang processes per se. But it helps to illustrate the idea that a VM engine for a hosted language is basically the operating system from the point of the hosted language. The VM engine can use and manage space invisible to a VM process, and perform magic like high level atomic operations without a VM process knowing how or why.

None of this requires any particular language to work. It works with any high level language you want to implement on top of emulated machine code. So it seems to be a runtime idea only, and not a programming language specific idea, unless you count the emulated assembler as the language. Weirdly, you can even use different high level languages in different VM processes, without them being able to interfere with one another.

Here I'll stop suddenly because I mentioned all my main points, and I haven't finished thinking about all the details, and so lack any conclusion as yet.

Trying to get embedded python co-routines to work

Sorry if this has been talked about before, but I can't seem to find any resources on this site or with Google anywhere...

I'm trying to get embedded python co-routines to work, and thinking about what the best strategy would be, so I figured I'd ask if anyone had dealt with this before.

Basically what I'm looking to accomplish is a bunch of separate interpreter states (or threads I guess, if I can get that to work elegantly) that I schedule myself (i.e. when a certain process gets a tick, I tell the python thread to do a tick of its execution), with many sequential blocking functions for each process. So basically I want to have concurrently:

Process A trying to do actions P, Q, R
Process B trying to do actions X, Y, Z

where I schedule when each one gets a tick, and the responsibility for "am I done" checking is inside P, Q, etc, not in A or B

This would be a C application, with A and B being Python scripts and P, Q, etc being Python extensions written in C, possibly with Python wrappers...

Some ways I'm considering doing it:
- building off generators (no support of going to/from C or yielding the whole stack that I can find)
- throwing exceptions to escape to the top level and hacking continuations in
- using the debugging trace functionality accessing some shared data to allow me to step until I hit a blocking instruction (still requires nasty frame saving/restoring)
- somehow scheduling the python threads by myself (it seems like this should be trivial, but somehow I can't seem to find any interface for doing this... still searching through docs, I'm assuming it should be doable somehow...)

Does anyone have any experience with this or anything similar? I hope my descriptions are clear enough...

Thanks in advance,

Raoul

Relevance of Curry-Howard

Hi there,

I have read a lot about the Curry-Howard isomorphism, but its relevance is not yet clear to me. I know there are experts on this topic here in this board, so please enlighten me :-)

Technically, I understand the relation between simply-typed lambda calculus and constructive logic and how the isomorphism between proofs and programs works.

However, what does Curry-Howard mean in general? Why is it significant? Does it say anything about a setting different from simply-typed lambda calculus/constructive logic?

I understand that one interpretation of CH is this: If I have a function of type t -> t', then the function body is a constructive proof that it is possible to construct a t' value from a t value.

However, this "type inhabitation" question seems to have a trivial answer in almost any real programming language: You can construct a value/function for whatever type you want (e.g., in Haskell you can just use "undefined")

I have never seen any type in a program that expresses an interesting, non-trivial proposition, but maybe I have not looked closely enough?

Would it be possible to take an arbitrary mathematical theorem (let's say: "string concatenation is associative", or Fermat's little theorem) and express it as a type?

I am also still a bit confused about conceptually viewing a type as a proposition. I think of types (of functions) as partial specifications, whereby the type checker checks conformance of the implementation to this partial specification. How does this fit together?

Type checking and logical errors

A recent correspondent asks jokingly "I suppose static typing allows for solving the halting problem as well?". This reminded me of an article where Mark-Jason Dominus (the Perl guy) describes how the type checker in ML found an infinite loop bug in a simple sort routine - - well, "found" in the sense of "gave enough information for it to be apparent to anyone paying attention that something was wrong".

I wonder then whether this is a one-off, or whether people who use ML and similar languages frequently find that the type checker points them towards bugs that even a dynamic-typing-language programmer would regard as genuine - - that is to say, errors, like infinite loops, that are not on the face of it just type violations? Does anyone have other examples?

Language-spotting

I've been reading LtU for about a year, and have always found things here that I'd never find on my own -- but one thing I haven't yet found anywhere on the web is a comprehensive database of PLs and PL implementations. Not just a simple list, but with detailed entries that can be searched on multiple attributes.

There is The Language List which I've known about for awhile, but just has textual descriptions which are just one-liners for many languages. I've recently discovered HOPL, which explicitly represents relations between nodes (e.g. influenced-by, implementation-of) and has a search interface. There's even a taxonomy which is interesting, if a bit inscrutable. Even the venerable Wikipedia has good info if you're willing to follow the links.

These lists focus on history -- I'd like deeper info on the size and shape of the languages and the implementations. For instance, I'd like to ask how a question like: how many BASICs ran in 16K or less of memory? Or: which languages that can evaluate their own code at runtime existed before 1970?

I wonder if any attempts have been made to develop such a database, and if it's reasonable to do so. You could end up with quite a lot of attributes for each PL, and some of the choices would be pretty arbitrary (and perhaps controversial). I'd like to ask those who have seen attempts at this before -- would such a database be a useful resource, or just an obsessive-compulsive enterprise?

XML feed