New Dataflow Programming Language

I am now convinced that a new programming language is needed in order to get the most benefit from the data-flow programming model.

Until a reasonably good uncluttered dataflow language implementation exists, then no-one is going to be convinced to jump on board. Implementations based-on control-flow/imperitive languages and written as libraries require that programmers write dataflow programs in an entirely non-intuitive way, which negates many of the benefits of the new way of thinking & working.

I believe that a new language could have enormous benefits that are currently difficultto imagine in control-flow ways of thinking - e.g. parallel/distributed processing for free, fault tolerance for free, IO almost free, exception handling almost free, authentication/security for free, and a the massive reduction in application code that would result from freeing the programmer to have to code this functionality.

I am thinking about starting work on the design of new language - is there anyone here interested in this idea?

Some of my ideas about the language features include:

1. The language would be modular, e.g. a dataflow "network" could be "encapsulated" into a "module", which could then be used as a module/process in other dataflow networks. Such modules could also alternatively contain traditional control-flow code,
for those cases where that is the best way to implement the function/algorithm. Modularisation would mean that the language could implement both coarse-grained & fine-grained dataflow networks.

2. The language would be object-oriented, but not in the usual way as a syntactic source-code device. Objects would be the primary (only?) data flowing along the pipes between modules/processes, and the dedicated hidden executable code associated with
those objects would delivered transparently along the pipe with the data. this would allow (e.g.) a truly generic sort module to be written that could work with any data/object type without knowing anything about it.

3. A type system that defines higher-level types (e.g. time, hours, seconds, kilometers, mile, etc), which would also be extensible. It would also implement higher-level basic data types that are no longer dependent upon the underlying architecture of the CPU (e.g. the most important example of which would be a "number" type that is unlimited in size, and can contain unlimited amounts of decimal precision). This would enable us to get away from the endless problems caused by tying number types too closely to underlying hardware implementations of 32-bit vs 64-bit versions of integer vs floating-point types. I'm still thinking this through, but have decided to go for strong typing.

4. An extensible pipe type system which defines characteristics of a pipe, e.g. priority, push vs pull, serial vs parallel, ordered vs unordered. Pipe type definitions would alllow the dataflow runtime engine to make sensible decisions automatically about the behaviour of the pipes & processes/modules. One example of such a decision would be whether all the objects in a data stream can be processed in parallel or whether they must be processed one-at-a-time in sequence. Another example would be to define pipes to be "time-critical" which would allow the dataflow language to be used in real-time applications, audio/video applications, device drivers, etc.

5. A text-based syntax, that translates unambiguously into a diagrammatic form, so that programming can be done either "visually" in diagrammtical form or traditionally using text editors. The new syntax would get rid of the "clutter" that comes with dataflow implemenations based on control-flow/imperitive languages. The language would need to have syntaxes
for: a) dataflow network definition, b) data object definition, c) pipe definition, and d) a minimal control flow language that would probably look similar to Pascal/Modula2.

A new dataflow programming language, if it was as successful as I hope, might also have the benefit of killing off C and its horde of mutant offspring, which I sincerely believe have held back the progress of software engineering for the last 20 years.

If you find any of this interesting, please feel to contact me to discuss it in more detail.

Although I have been reading as much as I can find, I am still relatively new to idea of programming language design, and would welcome references to existing work. I have no interest in re-inventing any wheels just for the fun of it.

Regards,
Mark Taylor.

[Minor edits for clarification and to correct some mistakes]

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

[meta] Formatting

Could you clean up the formatting of your post? (If the issues aren't clear, resize the text a bit.)

RE: formatting

Sorry about the bad formatting...

I originally wrote this in a text editor in word wrap mode, and it was only after posting that I began to suspect that the editor was inserting hard end-of-line characters while doing the wrapping.

Now fixed... I hope.

Which Brand of Dataflow?

I know many distinct models of programming called 'dataflow', such that the term loses its value. The term is used for multiple stream processing systems, spreadsheets and reactive models ('reactive' is another overloaded word), and even some forms of message-passing systems (flow based programming, Kahn process networks).

Could you please clarify what you mean by 'dataflow'? Doing so will give me a better context in which to grasp your OP post.

RE: Which brand of dataflow

I'm not sure... I still have a lot to figure out. Although primarily Dataflow, it will probably combine many different ideas, some of which will not be Dataflow. I do have a list of things I consider important, most of which I touched on briefly above.

I'm fairly new to Dataflow. The idea came to me over the last year or two, from unix pipes, and also Pure Data by Miller Puckette (a visual dataflow application/programming environment that allows the user to build audio synthesisers programmatically). I suspected that the idea must have been around for many years, and I spent many weeks googling before I was able to establish that it was called "Dataflow", and I'm still reading/learning now.

Number unlimited in size?

I have always thought that this is a bad idea: your memory is finite, so if you don't limit your numbers then you could get a memory error (which depends of the computer you use) instead of a reproducible (portable) overflow..

That said this is probably not the most important part about your language..

Arbitrary Precision

I take it you also dislike arbitrary-length strings, vectors, binaries, and query result sets...

In the embedded and real-time domains, it is important to have fixed sized numbers and strings and such. But I don't believe they should be the default for most domains.

RE: Number unlimited in size

well, maybe not *actually* unlimited, but with capacity to hold extremely large numbers. all I am suggesting really, is a number type that whose size/limits are dynamic like most string type implementations.

this is not a new idea - there are already libraries that implement this (e.g. bigint).

sorry - I forgot I already replied...

sorry... I forgot I already replied...

The ptolemy II project and

The ptolemy II project and LINQ, up to the heavy type system, seems fairly close to this. In particular, both support OO and various modes of data flow (though I'm not sure how LINQ integrates, e.g., calls to Rx).

I don't understand why types for numerics are fundamental to this discussion nor really want to discuss syntax.

The ptolemy II project

RE: The ptolemy II project... thanks for the reference, I shall follow that up.

Dataflow Language Design

My understanding is that LtU admins would prefer us take language design discussions to other sites, though we can come back to LtU to discuss articles about the resulting design. The PiLuD google group would be a decent place to take a detailed design discussion. That said, LtU is a decent place to grab interest.

Anyhow, I do have some interest in a new dataflow language, though (as you'll find no matter where you go) I have my own ideas about how to best approach the problem.

I'm actively working on a new reactive-dataflow computation model: Reactive Demand Programming (RDP). The short summary: an RDP program is a set of agents that maintain continuous, parameterized demand-response relationships with one another. They will react to changes in demands or response. Sensors, actuators, UI and such are represented as agents that influence or are influenced by the real world. Systems are 'demand-driven'. All communications are idempotent and REST-ful (in both directions). RDP is, essentially, a functional reactive programming with identity and bi-directional communication. The RDP model promises many very nice properties for open distributed computing and command-and-control systems, both of which are in my target domain. Providing a list of these properties makes me feel like I'm tooting a horn and need to justify myself, so I'll avoid doing so here.

I'm currently spending my evenings developing and implementing a model of RDP in Haskell. (I chose Haskell due to its very effective concurrency support, and its STM that helps me control glitches when working with agents that interact with the real world.) I also have plans for a new programming language to effectively leverage this model, which I'll also implement in Haskell (with advice from Staged Tagless Interpreters.)

Getting back onto the subject of your ideas...

how can the average programmer jump on board until this happens?

I believe that good libraries for popular languages would be a better place to start. Average programmers simply won't bother with a new language unless they can easily integrate it or leverage it for whatever project they're currently developing.

Just as critical are good adaptors to popular libraries and foreign function interfaces (such as multi-media, UI, DDS, HTTP, SMS, filesystem, database connectivity). It is unfortunate, but most FFIs are designed for imperative programming and need considerable adaptation (with careful handling of threads and synchronous IO) to work effectively with a reactive or dataflow programming model. Adaptors should automatically leverage SQL query notifications where possible. A filesystem adaptor should use the file update notifications where available (such as on Windows). The UI should redraw with good latency but do so efficiently - not redraw when unnecessary.

The language implementation really needs such libraries to adapt the OS and the real world in any case. The language implementation is mostly to 'glue together' the adaptor libraries. Even a scripting language would work reasonably well for this purpose.

fault tolerance for free, IO almost free, exception handling almost free, authentication/security for free

Developers don't get fault-tolerance for free, though I'll grant that reactive programming models make it easy to express 'fallbacks' and support resilient recovery when the faulty resource recovers.

For IO, there are many different models of IO... some will be easy to adapt to reactive/dataflow, some more difficult. There are many gotchas to be careful of: time-space leaks, glitches, closed feedback loops, ensuring progress. Indeterminism can be okay, but leads to testing and verification issues.

Authentication and security certainly aren't free, though you could create a capability-secure dataflow language. An interesting challenge with dataflow languages is expressing the Waterken IOU protocol (negotiating various e-trades).

a dataflow "network" could be "encapsulated" into a "module", which could then be used as a module/process in other dataflow

Sounds good. I've something similar planned for my RDP language, though I call these modules 'agent configurations' and describe and parameterize them using pure functions.

Sealer/unsealer pairs are an interesting and powerful basis for first-class ADTs (and unique hidden types) in a distributed system. Thus, I can use agent configurations as typed first-class modules if necessary.

Objects would be the primary (only?) data flowing along the pipes between modules/processes, and the [method] code would delivered transparently along the pipe with the data.

You'll need to be careful here: it is very difficult to reason about progress, safety, timing properties, resource consumption, and security in the presence of arbitrary code. Further, synchronization issues arise when the same object must be passed to two different elements.

The ability to pass object references and perform code-distribution is very useful and powerful, but I suggest caution about making it the primary basis of distribution. I think Erlang and Clojure have a better model here: the data transmitted between elements is immutable structure by default, but it is possible to pass object references or first-class functions if necessary.

extensible pipe type system which defines characteristics of a pipe, e.g. priority, push vs pull, serial vs parallel, ordered vs unordered

Don't forget reliable vs unreliable transmission, bounded buffer, encryption, one-to-many (multi-cast), many-to-one, many-to-many (bus), transfer of ownership, fifo vs. filo, priority queue pipes, quality-of-service, support for expiring messages, and so on.

It is easy to fall into a trap here. I did similarly a few years back.

I suggest you model pipes within your dataflow language, as a library. The ability to handle these various concerns without pushing them into under-the-hood runtime magic would be a decent test of your language. If your language is insufficient, your fix will be more widely applicable than to pipes.

A new text-based syntax, that translates unambiguously into a diagrammatic form

Neat idea.

Though, a diagrammatic form for describing configurations does make it difficult to parameterize the module in a manner that affects the absolute number of nodes constructed.

I've also considered a diagrammatic approach. My decision was to use an EDSL to describe the diagram, which could be passed to a function that would translate it into a proper agent configuration.

if it was as successful as I hope, might also have the benefit of killing off C and its horde of mutant offspring

I think many a new language designer hopes for this level of success. I don't think you'll need to kill off C; C's offspring will eventually eat it alive.

Dataflow

The success of any given language has little to do with it qualities. Make the language something YOU like to use, and use it - then be happy if others take it up and find it useful.

As everyone else here, I am also designing a language - however I feel that language designers these days are infected by the multiple paradigm bug, so language design becomes about features:

Should language X have closures, generics, tail recursion, this or that type system?

I hope to make a language with very few features, but powerful enough to simulate most of them (read: in library form). One of my basic ideas is that the library is also a tutorial, so the user can check the library source to see how advanced things are done - as an answer to your ideas concerning

the average programmer

I like to think about dataflow as electronic circuitry. Everything is components that can be opened and see how it is constructed: A PC can be opened to reveal the CPU, RAM, GPU etc. The CPU can be opened to reveal the Clock generator, Cache, Program Counter, ALU etc. The ALU can be openend to reveal the Full adder etc. All the way down to the nand gates.

I would love a language building on ideas like this, since its difficult in OO to use objects as components to build new objects.

Hope you have something like that in mind.

I like to think about

I like to think about dataflow as electronic circuitry. Everything is components that can be opened and see how it is constructed: A PC can be opened to reveal the CPU, RAM, GPU etc. The CPU can be opened to reveal the Clock generator, Cache, Program Counter, ALU etc. The ALU can be openend to reveal the Full adder etc. All the way down to the nand gates.

LittleBigPlanet 2 will support circuit boards and microchips. Yes, it is a game, but they are turning their tools more into full blown programming languages.

RE:dataflow as electronic circuitry...

RE "dataflow as electronic circuitry. Everything is components that can be opened and see how it is constructed"

This sounds exactly like what I had mind. One of the my ideas that I didn't touch on above, is that one would be able to view the entire network as a graphical dataflow diagram which would be dynamically generated from the network itself. Such a diagram could function as a debugging tool, or a monitoring tool, and would also make the network self-documenting.

Another underlying idea of mine (but I'm sure will not be original to me) is to make software construction like using a Lego toy set, with components that provide both low-level & higher-level functions, and which would have a uniform standard API so they could be plugged anywhere into the network.

I am also hoping/confident that suitably well-designed Dataflow language could also be used for low-level programming like device drivers, realtime systems, graphical UIs, and even operating systems; and that it would also be self-hosting/self-compiling.

RE: dataflow as electronics circuitry

FYI, you might be interested in some of my ideas/work at http://www.visualframeworksinc.com (esp. the papers section). I've been successfully using the reactive, electronics-like event paradigm for a while.

Dataflow as electronic

Dataflow as electronic circuitry isn't as easy to scale into complex programs as you might be imagining. Issues such as concurrency and timing remain full of gotchas. Races and glitches are a problem even down at the hardware level - which is a big reason that 'clocked' CPUs exist: to provide stability and arbitration. (We could develop asynchronous CPUs. Karl Fant certainly advocates doing so.)

The description of dataflow on wikipedia describes Spreadsheets as a widespread example. Spreadsheets cells describe either values or functions of other cells. This doesn't admit to an internal model of 'time' - though you could create a cell and put the time in it. (With a decent functional language, you could feasibly generate cells whose values are first-class functions of continuous time.)

Dataflow definition

As often definition of a term is a issue. I can see Wikipedia defines dataflow rather different than I do. My view comes mainly from dataflow diagrams, but there are nothing necessarily reactive about that. JPM's book on dataflow is a lot more like that, although his dataflow model is built by small polling and pushing modules written in a "normal" programming language.
As a note on scaling and error handling, I have identified only one general mechanism for handling errors, which is timeout.

FBP

JPM's work is commonly called 'flow-based programming', since 1994. It was called 'data flow' in older literature.

My comment re. electric circuitry still applies, though. FBP is successful at putting together a small number very large big black boxes. Understanding concurrency, delay, fault tolerance all become huge issues as you scale to large numbers of components. FBP is not something you could feasibly use down to some nand-gate levels.

FBP

You are correct in the terminology, but as I also state Im working on something similar. Specifically it tries to deal with the feasibility issue. It shares a lot of concepts with functional programming to deal with the concurrency issues. I suspect that I'll end up with something like Arrows, although I can't really get a grip on them.
I really like functional programming, but its "single value output/return value" is the primary "obstacle" for building components (flamebait?) :-)

Regarding hardware dataflow model

I was a part of team who tried to develop working dataflow computer (I did mostly engineering tasks - Fortran parsing, modeling, etc, but I figured out an interesting approach to reduce content-addressable memory volume needed for correct operation of data flow machine).

Our group leaders come up with notion of computation space. This is the address space of whole program computation points, so that one program point can send its' result into input of some other program point.

For example:
1: a = 1+3
2: c = f(x,y,z);
4: g();
3: b = a + c;

Result (a number 4) from program point 1 will be sent into left operand of + of point 4. Right operand come from call to f(x,y,z) and when those two arrive into global CAM we can produce result for b and send it according to dependencies on b. This can occur even before g() returns.

Call graphs provide paths to current code points:
f(x)
if (x 1: return 1;
if (odd x)
2: return 1+f(x/2);
else
3: return 2+f(x/2);

f(4) will reach path [2,3,2,1] (for x=4, x=2, x=1 and x=0).

Loops inside functions always have their iteration counts so that we can refer to computation elements in future.

This integrates quite well with memory: memory is a computation elements with static addresses. You can send requests to "write" and "read" ports of each memory address. Write input for specific memory address takes a value and computation space address to activate following sequence of computations, read input takes computation address where we will send copy of our value.

(actually, this is how self-synchronous register memory works)

If you are good enough to compute complete dynamic data flow network between computation space elements, you can erase memory accesses - they are here only for compatibility and sequencing. When you erased all memory accesses and change them into sending statements, you can get theoretical limit of parallel operations for given problem.

It was so easy (on our models) to create whole 10^9 operations for 1000x1000 matrix multiplication, so we had to create means to throttle parallelism. And they, actually, was our main goal of research.

So I advocate to change focus of attention from tubes to computation space. From static (or semistatic) dataflow to dynamic one. It is only slightly harder to program, but gains in computation speed could be enormous.

i wish

i groked that.

dataflow for VLIW?

I agree that parent post is rather unclear, but it is common knowledge that pure dataflow is uniquely able to extract very fine-grained parallelism from the code.

I wonder if anyone has sought to combine pure dataflow with a VLIW target, since the whole concept of VLIW is predicated on the compiler extracting as much ILP (instruction-level parallelism) as possible from the code and scheduling execution accordingly.

There are other problems with VLIW, such as the memory hierarchy introducing unpredictable delays, but these could be addessed by deterministic caching, i.e. prefetching.

Long live the king..

>> if it was as successful as I hope, might also have the benefit of killing off C and its horde of mutant offspring
> I think many a new language designer hopes for this level of success. I don't think you'll need to kill off C; C's offspring will eventually eat it alive.

:-)
Note if any C's offspring kill C, it's likely that the offspring will be even harder to kill..

LtU admins would prefer language design elsewhere

"My understanding is that LtU admins would prefer us take language design discussions to other sites, though we can come back to LtU to discuss articles about the resulting design."

It was not my intention to conduct the design of anew language in discussions here on LtU, I was just trying find if:

a) anyone shared any of my ideas & was interested in some kind of collaboration.

b) anyone had any pointers to existing stuff that I ought to read.

"The PiLuD google group would be a decent place to take a detailed design discussion."

Thanks for that reference, I will follow it up.

RE: Reactive Demand Programming

thanks for the reference to your work, I will follow that up.

BTW - I don't mind you *tooting your horn*, it means that:

a) you are possibly thinking along similar lines & don't mind discussing ideas, and

b) your work might have some good ideas that I hadn't thought of yet, and can steal.

:)

Sounds like Flow-Based Programming

Circa 1994 J. Paul Morrison published a book titled Flow-Based Programming. A second edition was released this year. See Mr. Morrison's Flow-Based Programming where much of the first edition of the book is available on-line.

RE: Flow-Based Programming by J Paul Morrison

I have already read the online version of pauls book; but thanks for the reference.

How Does Your Proposed Language Differ From His?

How Does Your Proposed Language Differ From His?

RE How Does Your Proposed Language Differ From His?

The main thrust of my point is that a new programming is needed with a syntax designed specially for dataflow. Paul Morrisons FBP is a library implemented in a variety of traditional imperative/control-flow languages, and any program using it will contain imperative/control-flow "clutter" that I think could disappear in a specially designed language.

I am discussing with Paul at this very moment on the google group flow-based-programmming, exactly what the differences are between our ideas.

Updated 2009

JP Morrison published a 2nd edition in 2009 with updated examples and more implementation advice based on real world usage of FBP, as well as attempts to make connections to other paradigms. It is a lot clearer, even though JP's explanations tend to be very good in the first edition.

Modularisation requires interfaces

1. The language would be modular, e.g. a dataflow "network" could be "encapsulated" into a "module", which could then be used as a module/process in other dataflow networks.

This is a good idea in general, but encapsulation works is useful only as far as the result is something with a well-defined and simple interface. Unix pipes work so well because all the shell needs to know about the processes it combines is that they input something from their stdin handle and output something on their stdout.

I believe that a critical piece of design for any data-flow language/framework/library is a predefined, well-supported set of types of black-box components. That's not to say that other components can't be supported: a Unix process can write to as many handles as it likes, it's just that the shell won't help you combine their data flows. But the language must give you tools not only to encapsulate a network into a modular component, but to combine these components.

I've been playing with a project on these lines myself; its home page provides some thoughts and links you may find useful. I don't think it's going to suit you exactly, as I have no intention of extending it to a general-purpose language, but you may be able to use its ideas on the set of component types and their combinators.

Typed Dataflows

I agree, Mario. Supporting structured types between black-box components is useful. Static or link-time validation of the types is valuable. Structural typing is a useful option. Various solutions for representing these types do exist (XML, JSON, YAML, XPL/TRP, OMG IDL).

That said, I think the 'interfaces' relevant to the OP's modularization are more along the lines of (foo:in A, bar:in B, baz:out A) and hooking these 'handles' together.

The interesting challenge (IMO) is supporting transport of handles... such that dataflows become first-class and you are not bound at runtime to a fixed dataflow structure. This is important for flexibility, modularity, and scalability of the dataflow system - for example, allowing new elements to register themselves. And if you plan to scale, then security - such that you can operate on an open network (rather than a VPN subject to insider attacks and buggy software) - is also a useful feature.

clueless question

might this look like pub/sub buses or tuple spaces? sounds sorta like they'd give one the dynamism you mentioned.

Pub/sub buses, at a very

Pub/sub buses, at a very fine granularity, converge to reactive programming. As larger 'architectures' they have nice modularity and dynamism properties, but do not enable secure composition.

Without secure composition, you lose a great deal of dynamism to administrative barriers - firewalls, VPNs. It is onerous and error-prone to pass 'handles' (callbacks, references, etc.) through these barriers - you end up writing a lot of adaptor code and little languages, busting holes in firewalls, mucking with obscure configuration parameters, facing GC issues, etc.

Even if we had support for secure references and communication, pub/sub buses and tuple spaces have nothing to reference. They would need to exist at a much finer granularity to enable secure composition. Databases have the same issue.

Re: new dataflow PL for the "average prorammer"

I can see many problems with your post, though this is to be expected since you state that you have only discovered the idea "over the last year or two" and are still learning about it. You may be interested in some recent theoretical work which attempts to elucidate and extend the formal foundations of "visual flow diagrams" (which obviously encompasses graphical/diagrammatic dataflow), based on string diagrams in monoidal categories. This MathOverflow.net link is a fairly comprehensive survey. (Start with John Baez's "Rosetta Stone", and follow up with Selinger's survey if you have some category theoretical background.)

One problem with using graphical notation to describe general-purpose programs is that you're limited to choosing either data flow or control flow (i.e. flowchart) as the "real" graphical notation, and the other must generally be supported by using kludges, if at all. (For instance, the LabVIEW system chooses to express choice and branching by providing 'alternate diagrams' for a single "case" structure.) In a sense, data flow and control flow are categorically dual to each other (see E. S. Bainbridge 1976, "Feedback and generalized logic"). Proof nets (a.k.a. interaction nets) might be able to express both on the same diagram, but these are much more complicated than simple dataflow.

Further, it is not clear to what extent graphical dataflow might be able to express high-level or impure features (such as exception handling or IO) "for free". However string diagrams have been extended to express functors and monads, and Dan Piponi has published a similar notation for the case of commutative monads.

Added: Just wanted to point out the following paper: Wesley M. Johnston et al., Advances in Dataflow Programming Languages which has been mentioned by ad1mt on the flow-based programming group. The paper discusses current open problems in dataflow, including representation of control flow, complex data structures and nondeterminism.

EFFBDs and Behavior Diagrams

One problem with using graphical notation to describe general-purpose programs is that you're limited to choosing either data flow or control flow (i.e. flowchart) as the "real" graphical notation, and the other must generally be supported by using kludges, if at all.

EFFBDs and Mack Alford's "Behavior Diagrams" both intermix control flow and dataflow in a single diagram (see here for an explanation of both). Of course, neither is actually a programming language (more a modeling language), and both result in some fairly complex diagrams.

Kludgy notation: a case in point?

EFFBDs and Mack Alford's "Behavior Diagrams" both intermix control flow and dataflow in a single diagram

Thanks for the interesting link. The problem with these kinds of notations is that they don't really have a formal description. The first notation presented in the paper (FFBD) is fairly complex, due to having separate AND and OR nodes which must be matched in a well-formed diagram--the latter for branching and merging, the former (apparently) for shared-state concurrency. This involves a slight ambiguity already, since one cannot easily tell whether two juxtaposed edges or functions might be active concurrently. Also, in theory, the "multiple-exit function" symbol could represent a function which creates concurrent threads, rather than one with multiple exit branches.

But there are further problems with the EFFBD language, which attempts to overlay dataflow onto the original control flow diagram. Specifically, the multiple edges flowing out of the Data 2 node and into the Data 3 node are only needed to pass over a branch and merge point (respectively) in the control flow graph. Fundamentally, this seems to conflict with the dataflow convention--multiple edges "should" represent multiple data flows. Also: (these may be problems with the toy example, though) what happens to the Data 5 input when cc#1 is taken? What is the semantics of dataflow inputs and outputs for the "function in iterate" which will be called 3 times?

mathoverflow.net

RE: mathoverflow.net

thanks for the reference, I will follow this up.

Killing dataflows

I am now convinced that a new programming language is needed in order to get the most benefit from the data-flow programming model.

Don't believe this. Oracle just killed JavaFx script which has gone very deep into dataflow oriented programming by means of its "bind" operator. As a reaction thousands of Java programmers are relieved and dance in the streets because they can continue using their beloved ActionListener. Now Oracle has fans.

Whereas Microsoft also seems

Whereas Microsoft also seems to be de-emphasizing general databind in WPF with the drastically neutered version provided in Silverlight. They never had nice syntax for bind like JavaFX did, but it was easy enough to bring out (e.g., in Bling).

Whereas Microsoft also seems

Whereas Microsoft also seems to be de-emphasizing general databind in WPF with the drastically neutered version provided in Silverlight.

Please explain how Silverlight's data binding is weaker.

Quoting Shawn Wildermuth, Data Binding Changes in Silverlight 4:

Prior to Silverlight 4, in order to support data binding, the object had to derive from the FrameworkElement class which left out some key objects including Transformations. Now data binding works on any object that derives from DependencyObject (which is most of the Silverlight 4 framework).

Silverlight 4 is arguably more expressive than the original WPF 3 binding framework, since it allows the programmer to map null references to behaviors via TargetNullValue and FallbackValue.

The real issue is that both Silverlight 4 and WPF version X.y.z are ultimately ugly (especially examples written in XAML) when compared to how one might express such concepts in F#. I also pointed out to members of the Silverlight/WPF core team the big design flaw in having DependencyObject.UnsetValue map to new Object(); -- this should map so that DependencyObject.UnsetValue returns something like a DependencyObjectUnsetValue.GetSingleton(). The backfix to the propagation model, of internalizing the model logic into the metamodel and exposing attributes to manipulate those primitives, seems backwards to me. But it is the only backward/forward-compatible way to make the change.

Silverlight 4 does not

Silverlight 4 does not Multibinding. Life sucks without Multibinding, you can't write something simple like a.bind = b + c.

The WPF teams design was OK, there are definitely some flaws but give them credit for avoiding a lot more, the framework is definitely very usable and better than anything else out there. I agree that the databinding engine isn't as powerful or robust as it could be, but it works for me 95% of the time. Silverlight is a poorly thought out hack of WPF, I'm surprised at what they decided to remove and how much the removed components hurt (multibinding).

You don't need MultiBinding for that

You just have to curry your singular Binding forms. Are there special performance advantages to MultiBinding that I am unaware of? (I've never tried to measure the differences in these approaches.) I would think that since you are lifting everything to an expression tree that the differences don't really matter. My intuition is based on the fact that the Blendable Behaviors extensions that can be provided to any object as an attached property generalized the MultiBinding mechanism and other mechanisms like Storyboards. The only downside to Blendable Behaviors is that they are not closed under composition, but what frameworks support that feature statically, anyway?

I've largely solved these problems by avoiding them by coming up with simple idioms that map back and forth from/to WPF to/from Silverlight -- but I also don't do physics-like rendering tricks like Bling.

You cannot curry singular

You cannot curry singular binding forms, you can only convert expressions that depend on exactly one dependency property. The only way to do multi-binding is a huge hack, like this, which has perf issues.

Behaviors are very interesting, sort of like declarative boosters for existing object trees. I guess that they aren't using databinding, but rather manual event handling to keep things consistent.

Bling Physics would work perfectly fine in Silverlight, as the physics engine comes with its own binding engine and doesn't track changes in dependencies (everything is refreshed on every timestep, not efficient but kind of makes sense for physics).

The only way to do

The only way to do multi-binding is a huge hack, like this, which has perf issues.

Do you have some sample code that demonstrates the performance bottleneck? I'd like to see if I could come up with something that eliminates the bottleneck as compared with MultiBinding. I'd have to model the call graph and look at the instruction trees each trace generates. Profiling is the best way to discover why MultiBinding is more efficient; MultiBinding is also a "huge hack" so we can't eyeball this stuff...

Behaviors are very interesting, sort of like declarative boosters for existing object trees. I guess that they aren't using databinding, but rather manual event handling to keep things consistent.

I understand the design rationale behind Behaviors: the Expression Blend team needed a way for artistic people to mash-up Silverlight controls with various UI effects. I pitched one approach based purely on LINQ Expression Tree API to one Microsoft employee, but it never took hold. In the end, the Behaviors API makes sense because Behaviors have a standard model that artistic users can view in a Property Editor Dialog - attributes are directly mappable to propety editor dialogs and this approach has been used in every Microsoft IDE since like Visual Studio 5? I think the internal model appears uses attribute grammars with forward chaining, but I might be wrong.

My argument would be a LINQ Expression Tree API with unparser definitions specifying how to visualize the expressions in the IDE would be the most flexible approach, but flexible isn't necessarily what Microsoft wants. They just desperately want to figure out how to get artistic people to use their stuff. -- Chris Oliver hit the nail on the head when he mentioned that this is a problem the game industry has been dealing with for a long time, and it is no mistake that they've developed non-developer centric tools. (On the other hand, Chris does not touch on the problems associated with artistic designers adding a hojillion lighting effects to a scene and wondering why it renders at 0.000001 fps).

Chris's observations are probably why the original architect for Blend had a CAD IDE background.

Also, I mentioned in a previous LtU thread how the ActionScript-based movie editing app Flektor was created by Andy Gavin, of GOAL Lisp and Crash Bandicoot fame, and Andy cited his experience building similar designer tools for his designers at Naughty Dog Software. Andy mentioned in an interview that they actually had to dumb Flektor down because it was too sophisticated for what people like MySpace users wanted.

Do you have some sample code

Do you have some sample code that demonstrates the performance bottleneck? I'd like to see if I could come up with something that eliminates the bottleneck as compared with MultiBinding...

The Silverlight hack is OK for light uses of multi-binding, as they use in their examples. However, it wouldn't work very well if you were using databinding pervasively for things like say layout (rather than use panels). Perf just goes down by about an order of magnitude... I will benchmark it sometime if I get into Silverlight programming again (for WP7). Note: you aren't supposed to use databinding as pervasively as I do, there are alternatives that you are supposed to use (panel nesting for layout), but databinding so easy and powerful.

Multibinding is hardly a hack.

Chris's observations are probably why the original architect for Blend had a CAD IDE background.

I believe the current Blend architect has a background in compositing tools, perhaps of the dataflow variety :).

I really need test code I

I really need test code I can profile to say anything meaningful.

I've learned that most of the specialized Binding forms in WPF are only necessary if you are Binding to a UI value directly - if you compose things differently at code-write-time then you can solve the problem by avoiding such stuff as TemplateBinding via composition - one of the chief reasons for specialized Binding forms is that they have a different object tree walker that is more efficient than Binding, so if you want to replace all of the specialized Binding forms you need to come up with something equivalently performant by side-stepping walking the object tree, but doing the equivalent of walking it. In this way, trashing the internal BindingWorker and ClrBindingWorker as much as possible produces much faster binding times. This is what I meant when I said I side-stepped the use of MultiBinding, and for that matter all other specialized Binding forms. A generalization of what I did would be to provide an overlay network on the UI model to map where streams of values go to.

It would perhaps be nice if the CLR had intrinsic support for value propagation, but it turns out something like Alexey Radul's Propagator Pattern is hard to implement efficiently. Still, an imperative dataflow intrinsic for the CLR plus language support would have greatly simplified the design of .NET 3.0. It is not just WPF and Silverlight that were affected, IMHO. Windows Communication Foundation (WCF) and Workflow Foundation (WF) were both complicated as well, and because none of these "foundation-al" frameworks shared a common metamodel, integrating between them was very hacky -- WF was burnt to the ground for .NET 3.5.

I guess the reason I keep bringing this up is the likely billions of dollars I think this cost Microsoft... I am really convinced these are issues at that large of a monetary scale. (I also think the CLS-compliant languages (C# and VB.NET) cannot effectively model what WPF, WCF and WF should have looked like, and that features in Scala would've produced more robust designs, but that is something for another time.)

Note: you aren't supposed to use databinding as pervasively as I do, there are alternatives that you are supposed to use (panel nesting for layout), but databinding so easy and powerful.

Scenarios customers want supported dictate what features should be used pervasively, not some Mandate from Microsoft. The hooks into the layout system are arguably a kludge in and of itself. The layout system is the chief reason why WCF and WPF did not share a common meta object protocol; DependencyProperty.GetMetadata returning PropertyMetadata in the way that it does just shows the warts.

correction

it reads to me like you are saying they got rid of dataflow entirely, whereas what i thought i heard was that javafx the language syntax was to be stopped, but the features would be available through (more cluky) regular java apis. so i think it isn't that dataflow is going away, it is just getting cheaper threads (as in clothing)?

Like WPF in C#/XAML? I

Like WPF in C#/XAML? I thought that F3/JavaFX Script was a great language, its too bad it didn't get any traction. I wonder if it was because Sun was a mess when they went with this, or if there just isn't enough intellectual bandwidth for new programming languages on the JVM.

if it was because Sun was a mess...

Interactive software is a very interesting nexus of math, computer science, and art, and where programming languages best fit into that, is still a good question imo.

On the left we have the hardware, and on the right digital content creation tools. In the middle is your software system. Content creators want to work interactively (wysiwyp), not "tippee-typing" code in a text editor.

Media industry tools reflect this, from Unreal Kismet, to Maya, to Massive AI, in providing non-textual authoring mechanisms (in addition to writing code) which encode both programs (to varying levels of complexity) and data. In so far as that is the case, programming languages designed for easy authoring by programmers in a text editor probably just get in the way of efficient compilation.

I know there are a number of people on this forum from the games industry who could tell you in detail the realities of creating state-of-the-art interactive programs better than I can.

In so far as use cases that require ease-of-authoring (on the right) are handled by interactive tools, it makes sense for the majority remaining programming use-cases to be optimized for the hardware (on the left) in an uncompromising way, i.e memory-cache-and-processor friendly rather than programmer-friendly data representation and code generation.

Just a quick $0.02.

This makes a lot of sense,

This makes a lot of sense, thanks for the the comments! It is definitely matching my experience: I'm trying to get people excited about PL research in my current job, and it mostly bounces right off them because they aren't interested in any new textual languages. However, when they see something like Kodu, with its visual tile-based behavior language, they get very excited. As a result, the project I'm working on right now is a semi-visual (block or tile-based) language. Maybe not completely visual, there is still some text involved, but the editor is completely structured (like say Kodu or Scratch/App Inventor). It might be the future, but we have to break the expressiveness/abstraction deficiency that non-textual languages seem to be cursed with (or maybe, they are usable for these reasons...touch call).

Structured vs. visual dataflow?

However, when they see something like Kodu, with its visual tile-based behavior language, they get very excited. As a result, the project I'm working on right now is a semi-visual (block or tile-based) language. Maybe not completely visual, there is still some text involved, but the editor is completely structured (like say Kodu or Scratch/App Inventor).

I'd argue that a "structured" language like say, Scratch or Kodu is little more than an alternate representation of an existing abstract syntax tree (AST). If the AST representation is good enough, it is comparatively easy to expose it to the programmer through a snap-the-blocks or other graphical interface. To the extent that such a representation removes the well-known difficulties with parsing/generating a textual form, it can certainly be "more intuitive", even for non-programmers. In practice, however, structured editing has not proven successful in industrial use, especially with larger programs. The best approach here seems to be "intentional" or "language oriented" programming (which aims to expose different surface representations--text, visual, etc.--on the same deep syntax) combined with "smart" IDE-like editing. See e.g. the Xtext system for Eclipse.

Visual dataflow languages are somewhat different, because they have graph-based (rather than tree-based) visual representations, and AFAICT nobody has managed to describe these with any kind of formal syntax. The basic notation is fairly trivial, though, and the underlying dataflow semantics is rather straightforward as well.

HOW?

HOW do you see Xtext fitting into the real-world, industrial problems mentioned and beautifully articulated by Chris Oliver (the language designer of F3 aka JavaFX)?

Xtext is just another programmer tool. When compared to its competitors, it is ad-hoc and uses clever heuristics to implement common features -- but those common features are common to textual languages.

Just my perspective, anyway: Xtext is not the answer!

2nd'd

yeah, xtext is not intentional programming by a large stretch, from what i read on the xtext pages. it is "just" a library to help get your particular programming language working well in eclipse editors.

subset of designers and programmers

I suppose that the subset of programmers with strong visual design requirements is a relatively small one. But it's hard to be satisfied with environments like processing after being exposed to languages designed for succint but robust expression of graphics, or to be satisfied with current program specification conventions after experiencing the ease spreadsheets offer for reading and manipulating data (to a point).

Anyway. I take it from your comment that you're not planning to resurrect F3 post-JavaFX!

Have you discovered anything

Have you discovered anything in the JavaFx API which remotely looks like a data binding model/engine which was set aside? I suspect this has been totally embedded within the JavaFx Script interpreter which works much like the JRuby or Jython interpreter on top of the JVM. So no, I expect some functionality which will be accessed through Java Swing and Oracle isn't interested in replacing the graceful JavaFx Script with a clunky XAML + dependency properties solution. However one can never know that that the worst case of such an involution won't happen.

I think this really means

I think this really means that we should be thankful that the LabVIEW, Max/MSP, Flex, etc. communities aren't also under the thumb of Larry Ellison. Now I better understand why Gosling left.

Steering back on the topic of LtU, perhaps this is just another drop in the bucket of language success having a not-so-obvious relationship with language quality.

future of JavaFx

I was all set to use JavaFx as the basis for my next project until I got wind of the uncertainty surrounding it. But it does seem that while JavaFX/F3 (the language) are going away, fairly complete support for it in other JVM languages is on the cards. See slide 69 onwards in this JavaOne presentation showing a preview of a Scala javafx-type DSL. Noticeably absent so far is a treatment of bind expressions which are what attracted me to JavaFX in the first place.

You list Flex in your languages immune to Ellison - does Flex have comparable bind expressions to JavaFX? If so I'm decamping to there immediately! But my initial scan of Flex doesn't seem to support that (I'd love to be corrected on this).

I don't really remember the

I don't really remember the expressiveness of binding in JavaFX. Flex, unfortunately IMO, limits binding to being only syntactically supported in the MXML (UI) layer. You can use APIs in ActionScript to do it (MXML compiles into AS), but it's clunky. I tried adding syntactic support to it at the ActionScript level as well but I don't think you'll see something like that for awhile (but there seems to be some internal pressure for it). Finally, data binding in AS, while hitting many common cases, isn't as expressive as something like FRP (which would, in most forms, allow flows that carry flows).

I don't really remember the

I don't really remember the expressiveness of binding in JavaFX.

Writing

var x = bind expr

updates x whenever expr is updated by means of at least one of its constituents. If one establishes a binding context ( the following is not JavaFx Script syntax! )

var x = bind \(y1,...yk) -> expr

where the lambda represents a closed function and expr depends only on the parameters yi, x is updated whenever at least one of the yi is updated. For all other parameters the current value is used.

I'd hope so -- syntax aside,

I'd hope so -- syntax aside, that's essentially the entry point of any binding system worth its salt. You might want to read up on some literature to see why this is a very incomplete description of the expressive abilities and guarantees.

Nice, looks like Bling

Nice, looks like Bling (Bling is still more powerful of course :))! There was a project called ScalaFX on the cards once, it would have brought even more expressive data binding to Swing programs. But for some reason, it got side tracked, or they focused on old-fashioned discrete event programming, I'm not sure. Scala can do more of this, the binding in JavaFX is actually kind of limited since it was surfaced by JavaFX script. If Oracle would adopt Scala as a next generation RIA platform, things might look up.

re: scala fx

maybe their new reactive work is paving the way? or something?

Not sure. There are many

Not sure. There are many ways this can go, I'm glad they are experimenting with different approaches. HOWEVER, a simple DSL for data binding in Scala would go along way by itself.

Adaptation vs. Creation

Is it worthwhile to create a new programming language to suit an upcoming need, or rather to adapt an old programming language to suit the new need?

Always adapt first. A good

Always adapt first. A good library is worth more, and more immediately useful, than a new language. Importantly, you also will maintain use of any existing IDEs, type analysis, FFI integration, et cetera. A library will more likely see wide adoption, able to fit into existing toolchains.

If, even after developing a good library, you're suffering a lot of boiler-plate, roundabout frameworks, shotgun editing, and careful self-discipline, then it may be worth developing a language. But, importantly, you should already be suffering, so that way you have a clear idea of what problem needs a solution.

The difficult step is recognizing your own suffering. It is common to blame oneself in an abuse situation, and abuse of developers inflicted by poor programming tools (such as poor concurrency and synchronization models) is no different.

Additionally, it is important to identify as many problems as possible, so that you can find just a few solutions to solve a lot of problems at once. The whole 'solve one problem at a time' or 'walk before you run' ideas are rubbish philosophies that will increase complexity and entrench you in the present.

Researchers create new

Researchers create new theoretical languages (i.e., calculi) to understand a problem in isolation; they often stop there or otherwise implement it in the same way (which I consider to be a mistake and deceptively incomplete research). I agree with David about using library abstractions first: the front-end will look atrocious, but at least you can start the feedback process to better define and address the relevant (i.e., human/society/culture-motivated) problems.

On a slight tangent, I have mixed feelings about Haskell-based and most pure-FP language-based research due to this: you have an implementation you essentially already know would have worked, so how informative is it? In a sense, it's almost like medical researchers testing on (crazy smart) mice and stopping there. We learn more about the power of FP (go zealots!) but at a cost to the research being presented. OTOH simplifying assumptions often accelerate language development, where dirtier domains can be left for exploration by heathens -- though there's a nice inbetween where we can have a library, make strong assumptions about its use, and gradually weaken them as needed.

Ah, the life of an empiricist...

Imperative Haskell

Haskell is a very nice imperative language, especially in a concurrency setting - what with STM and atomicallyModifyIORef, excellent integration with FFI and OpenGL, and so on.

A lot of people conceive of Haskell as a 'pure' language. In practice, that purity feels quite optional. All the flexibility of a richer-than-usual imperative language is available, and you rarely can reason about behavior of large programs in a pure manner (you must concern yourself with order of effects, for example).

Haskell's monads + typeclasses allow a sort of 'overloading the semicolon' that is very good for new language experiments, and support multiple alternative implementations to a single language to live side by side. Type families are a wonderful extension that allow our 'languages' to have dedicated, hidden types for primitives, without the funkiness of GADTs.

Monads are a great barrier

Monads are a great barrier for preventing people from writing impure code while still being able to claim support it and do our fun tricks requiring purity.

The trends in using theorem provers and other forms of heavy computation for program analysis may soon hopefully reflect on language design, allowing us to move away from this great fraud.

(This is knowingly slightly inflammatory and a reaction to the disservice of the trend of essentially mass advertising of Haskell and other FP PLs as the great imperatives, which, again, I believe has some detrimental effects.)

Do you mean that in the same

Do you mean that in the same sense as: "Ring theory is a great barrier for preventing people from writing code that uses integer arithmetic while still being able to claim support for it"?

I would agree that people teaching Haskell should probably avoid all mention of 'monads' and other scary words until after people have a lot of experience with using them. Similarly, we don't teach ring theory to children before they start using '+' and '*' operators. I think the failed attempts at education in the subject have done a great disservice to the FP programming camp.

Monads have their flaws, certainly - I do not like their in-the-large modularity and composition properties, plus they inherit all the flaws of imperative models. But I don't believe they represent any sort of deceit, intellectual sophistry, or 'great fraud'. Certainly, a significant fraction of most Haskell code is usually monadic. People really do use monads to write impure code, while still achieving controlled 'chunks' of purity.

I doubt program analysis would have any real impact on the use of monads. Monads, fundamentally, are an abstraction - a way to package an imperative language and imperative glue. To get rid of monads, I'd suggest we find alternatives to 'imperative' programming that will be effective for IO, FFI, hardware integration, and pluggable modularity. Most of my language efforts lie in that direction.

Do you mean that in the same

Do you mean that in the same sense as: "Ring theory is a great barrier for preventing people from writing code that uses integer arithmetic while still being able to claim support for it"?

Perhaps you should spend time muzzling over the following before again adding to the chorus of that popular defense:

The definition of crazy is doing the same thing over and over again and expecting different results

The definition of crazy is

The definition of crazy is doing the same thing over and over again and expecting different results

So you're advocating that we should change labels from "impure languages" to "insane languages"?

No. I'm not arguing against

No.

I'm not arguing against focusing on monadic etc. definitions of a program if it helps the machine etc. side of things. As with David's retort about ring theory, it may be theoretically unavoidable even when not exploited or brought to the forefront. Such an argument would be crazy -- with the same reasoning, I'd be suggesting we throw away any program analysis and optimization ideas that involve AI or SSA!

As a simple example of what I'm saying, imagine taking a codebase with two views. First, the Scheme representation of a program -- perhaps what a developer can choose to code in. Next, an O'Caml representation (which may be fairly transformed relative to the Scheme one, e.g., state passing). Now, when the developer writes a new function -- say a scheme one that mutates something in lexical scope -- the underlying system can take that variable and rewrite it as a mutable location (or monad, whatever). If this is a global change, impacting function signatures, the developer might want to be notified.

That example is, essentially, a rehash of (a tiny part of) the leap made with type inference: the types can be there, but our job as the language designers more directly facing programmers is to hide them until we need them. I'm observing that program analysis techniques and the amount of computation available have both significantly grown since that early era: I bet we can exploit that for a saner front end. The world may be crazy, but that doesn't mean we have to be crazy in response -- dealing with it seems like a good idea and monads (or something about them, like not-monad-by-default) is apparently insufficient.

Why not?

I think you're envisioning something close to Coq and Ynot. Or perhaps you're envisioning something a little less extreme, such as pluggable types.

I believe those approaches are promising as well. But I still don't see structural control of effects (as via monads) to be 'great fraud'. In practice today, I don't get to choose between monads and your vision. In practice today, when I'm lucky, I get to choose between opaque first-class procedures with arbitrary immediate effects vs. opaque first-class monads with controllable, delayed effects.

The ability to delay and control effects expressed in a monad is very useful. E.g. (ignoring unsafePerformIO) I can statically prevent the writing of OpenGL code that waits for user input.

I'm more thinking in line

I'm more thinking in line with trends in synthesis and dynamic analysis. Usable surface, powerful backend.

Writing programs with proof assistants sounds right at first, but somehow the baseline seems to be in logic land and hard to use. In the above, we start with the irregular and have help in moving away. Perhaps contract systems are closer.

Insane Languages

I like that idea, Matt. An 'insane' language is one where you can say the same thing over and over again and, each time, get different results.

So, pure languages would certainly be sane. But so would be impure but idempotent languages.

OT

The definition of crazy is doing the same thing over and over again and expecting different results

We don't discuss sex on LtU.

LOL

We don't discuss sex on LtU.

:D

But then, would the definition of crazy imply

the definition of wisdom is doing various different things over and over again and looking for the same result

?

;-)