Small PL Thought Experiment

Understand that, in addition to how the though experiment is formulated, this forum topic is also very much a question about the most profitable direction of my own future educational development regarding the theory, applicability, development pragmatics and implementation techniques of programming language "paradigms."

1) Imagine a simple language with features roughly the intersection of existing languages similar to ML or Haskell (and a host of similar, more obscure languages):

   * "mostly functional"
   * manifestly typed, 
   * a type inference system (maybe only "local" type inference)
   * algebraic data types
   * parametric polymorphism
   * a simple name space management oriented module system with the 
     possible addition of optional module level type parameters, 
     largely as a convenience feature - example) imagine something 
     simple such as: import ListLib[Int] as intls;
   * a "rich" pattern matching facility, perhaps with a pattern 
     variable unification feature included (.e.g., equality testing)
   * a "principle of least surprise" syntax friendly to constructing 
     manipulating lists, vectors, hash dictionaries, trees and other 
     sequences and data structures. 

2) Presume the accouterments of nicely supported modern language implementations:

 * a suitable FFI
 * common libraries such as regex, tcp/ip networking, a low level 
   interface to the operating system's preemptive threading system, 
 * library interface to all other common operating system features, such as
   memory mapped files, shared memory, process control, and so on and so forth.'
 * boring libraries for parsing the oodles of data formats floating around
   the world: XML and friends, JSON, text and numeric data files both text and
   and binary formatted, all the common RFC described file formats
 * some suitable full bore language parsing system
 * an Emacs major mode (or perhaps Visual Studio or Eclipse integration 
   if that floats your boat)
 * a source level debugger
 * at least some basic "from-source" documentation generation system
   such as Java Doc or its nicer brethren, whatever.

The goal here is to describe and envision a nice, but still mostly generic, simple and rather "innovation free" BASE LANGUAGE judged "practical and suitable to production development" by the criteria commonly applied to other (functional?) languages circa 2010.

THE THOUGHT EXPERIMENT: If one could add a significant new language feature or new additional "programming paradigm" (and thus, of course, attendant syntax extensions, compiler technology, runtime system, etc.) to this intentionally commonplace BASE LANGUAGE:

  * What would that new feature or new added language "paradigm" be?
  * What new problems would this language addition solve? 
  * Or perhaps an alternate motivation, what additional 
    "expressiveness" might it add to the language?
  * Why is this particular language technology "more important" than
    other options?
  * In what problem domains would this new feature or added paradigm
    be most useful (or is it mostly domain neutral/universal)
  * Does a "feature combination" or "multi paradigm" language such
    as, or similar to, this already exist?   

I truly appreciate any thoughtful answers, which may greatly influence where I "go next" in my ongoing studies of programming languages and their implementation.

Thanks a million in advance!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

concurrency?

didn't notice it mentioned in your lists

Actually I'd delibrately omit thread level concurrency...

Processes are simply better.

really so black+white?

i would hazard that as with many design issues, the choice is all about trade-offs. i fully believe that sometimes the language should take a stand on which particular choice to make + yet also that sometimes the language should not. is there really consensus on the Right Abstraction for concurrency? (don't get me wrong, at the very least i think shared mutable state with locking a la mutexen is the worst of all the current choices, so we can at least sorta say which ones should /not/ be considered as the Right Abstraction :-)

Possibilities

Extensible records?
Dimension types?

More possibilties...

Higher-level concurrency constructs
Software transactional memory
Effect typing (including aliasing, ownership, and transfer)
Versioning

versioning

Do you know of a language that implements versioning well?

Versioning

By versioning, do you mean module/library versioning or do you mean something else?

S.

delimited continuations

Delimited continuations? Seem useful, given your web focus.

Relatedly, exceptions.

Exceptions et. al.

If we can bundle up exceptions with some other "dynamic" features such as "unwind-protect" (syntactically combined with exception handling in some languages), some kind of more specific and elegant resource allocation/guaranteed cleanup language facility, lexically scoped dynamic control flow and maybe fluid variables (in a limited, type safe manner), I may well agree we'd get a big bang-for-the-buck for designing and implementing a set of features in kind of the same "dynamic" semantic family. Overlapping implementation techniques doesn't hurt either.

In case it's not clear, as for "lexically scoped dynamic control flow" I'm thinking of something akin to CL's block special form (but manifestly typed, of course). As in:

foo = block back-here
        let dflt-thunk = fn k:key-type = from back-here return 0 in
          dict-get-dflt(my-int-dict, key, dflt-thunk);

Contrived, but the dictionary implementation applies a function to the key if that key is not found in the dictionary. In our case, our thunk ignores the key argument and dynamically returns 0 "directly" to the "block back-here..." clause. Some kind of typed call/ec (dynamic extent continuation) features would be equivalent, as far as I can tell.

As for resource cleanup, I'm thinking of a language construct that cooperates with user defined data types that fulfill some set of structural and/or procedural interface requirements (like "for x in ..." and user definable "Iterator" types in some languages). Here's just a tiny whack at the idea with a simple fluid bind feature illustrated as well:

  fluid *sin* = os/get-stdin _; -- defined as fluid
  ...
  -- guarantees to close ifil on stack unwind via some standard
  --  procedural (maybe method if OO) interface on cleanup-able
  --  resources.
  cleanup ifil = open-file(path, OIn, ODfltFlags) after
    let *sin* = ifil in -- fluidly/dynamically bind *sin* (stdin)
      do-stuff _;

delimited continuations can do all this

If I understand you rigjt, delimited continuations plus dynamic binding can implement the behaviors you outline directly, correctly, and efficiently.

Of course with reentrant continuations, unwind-protect is a bit thornier. Flatt et al's ICFP '07 paper goes into all the details.

I did a quick citeseer search

And yes, delimited continuations are now on my "to learn" list. Thanks much!

Not a LtU search?!

Not a LtU search?!

Might actually be worthy writing a greasemonkey script

...to mashup

cool

FWIW I wrote a bit about my learning process here: http://wingolog.org/archives/2010/02/14/sidelong-glimpses.

I listed some papers that helped me there.

How practical?

I wouldn't consider a language that didn't have some level of support for runtime module loading, polymorphism or at least some non-clumsy late-bound dynamic dispatch mechanism, up to and possibly including runtime metaprogramming / code generation, suitable for the kinds of web-capable development you seem to be hinting at, what with TCP/IP, JSON, etc.

Dispatching by pattern matching over statically defined sum types is all very well when the problem is relatively well defined, but when requirements are constantly changing and a website must iterate multiple times a day, ideally dynamically with no down time or process restarting, dynamic aspects become very useful.

Not really concerned with Web apps per se

I wouldn't have mentioned JSON if I thought people would assume that Web applications were of any particular interest. I was just enumerating boring but commonplace text and data formats. I could just as well have said that there was already an x-platform GUI library and basic image file read/writing for most common formats.

Sorry for any misunderstanding.

I'm presuming a fully compile-to-binary implementation, and the ability to both generate and dynamically link .so or Windows .dll libraries is reasonably "par for the course" for a compiled language (and from within the FFI as well, supporting C and Pascal calling conventions, along with the all-too-many variations thereof).

Didn't you get the memo?

Hehe, I've been thinking along very similar lines. I was messing around with LaTeX, and produced a very short introduction to a small language I am building.

I guess this is what I ended up with given a lot of your questions, and the one major feature I added was type constructor classes. I am wondering whether I miss existential types or subtyping though.

I Read It

And I liked it! Some interesting ideas. I might have missed a few features/details, such as how different exceptions might be discriminated and the right handler function abstraction applied, but a very elegant and minimalist design! - Scott

Great, thanks!

Ah great, motivation to get the compiler finished now, finally!

As for your question, you match against the type constructor of what is thrown, so a typical handler would look like

try some_computation_throwing_an_int_or_a_text
catch
  [ n:int   -> print "integer " + (txt n) +" was thrown!"
  | t:text  -> print "text \"" + t +"\" was thrown!"
  | v       -> throw v ]

You may want to consider an

You may want to consider an alternate exception-handling form which is more well-behaved. Your document seems like a pretty straightforward intro, which is good. No mention of whether your compiler supports type inference (I suspect not), or even if the language is intended to GC'd (I suspect it is).

Your type constructors also seem a little odd, as it's not clear how you specify a type instance, ie. sum:int list -> int. Given some of the examples, it looks like it would be "list int", but I personally think the symmetry between the type and its declaration is nice, ie. "type 'a list" -> "int list" or "type list 'a" -> "list int" , but not as I think you have it, "type list = \a => [ nil | ... ]" -> "list int". I suppose you're trying to expose some symmetry between lambda abstraction "\x -> x" and type abstraction "\a -> a", but not sure if that's helpful here.

Finally, the # symbol in the "interface" construct is not explained. The section itself is clear enough, but # just doesn't jive with any other syntax explained in the doc, so I was curious.

Great Feedback

I do the normal HM type interference with type constructor classes. I made no distinction in the checker between type schemes and simple types except for a syntactic one, during parsing, at the moment.

You are right that my types may seem a bit less readable to people who are accustomed that type arguments occur on the lhs of an assignment symbol. But, yes, I aimed at orthogonality, so that, if a name can be an abstraction of a value/value-expression, then a name may also be an abstraction of a type/type-expression. Hence all names are kept simple, and the whole type body occurs at the rhs of the binding assignment.

Was there a typo? A list definition should be recursive and be written as

type list = \t => [ nil | const t (list t)]

A list of integers is written as (list int), and given its definition, it's therefor also structurally equivalent to (\t => [nil | cons t (list t)]) int = [nil, cons int | list int], I use a nominal typing scheme, but of course, that fact is needed during type inference.

The # is the actual interface specification, it is an explicit keyword, types and instances are kind checked, and it makes explicit that the interface Num has kind *->#, i.e., it is something which takes a type, and returns an interface. It also states the formal type to unify against for instances. There is some provisional support for more complex type instances, i.e., you can unify against (f x y), but I don't use that at the moment since I wanted the compiler bootstrapped first.

Some of the decisions may seem a bit odd, but also have to do with keeping the language simple, but also the type checker trivial, i.e, some decisions are also partly implementation driven.

I'll add it to my list of changes [to the document], though. Thanks.

[I looked at the paper you referenced. Thanks again, but sorry, no dice. Anything simple regarding compilers, like handling exceptions, should fit into one a5, or be discarded, IMHO. I use shift/reset, that's good enough. If I can't grok it under 12 seconds, I don't read it anymore.]

[I looked at the paper you

[I looked at the paper you referenced. Thanks again, but sorry, no dice. Anything simple regarding compilers, like handling exceptions, should fit into one a5, or be discarded, IMHO. I use shift/reset, that's good enough. If I can't grok it under 12 seconds, I don't read it anymore.]

Here it is in two lines from page 3:

try x ← f(x) in g(x) unless { e ← ye } = g(x)
try x ← raisee in g(x) unless  { e ← ye } = ye
where e ∈ Exceptions

So try is a binding form like let scoped to the expression g(x). This is a generalization of the exception form discussed in Benton and Kennedy's "Exceptional syntax" paper. The rest of the paper is about an algebraic model for effects and effect handling, so it's probably not relevant given you're mandating call-by-value.

Re:shift/reset, that's interesting! I didn't see that mentioned in the paper. Or perhaps it's not accessible to user code?

Cool, I got that under 12 seconds!

:)

No, it isn't accessible. I noticed that the catch/try and throw combination are equivalent to a syntactic construct, and deduced some combinators to compile to which make use of runtime features (i.e. they have access to the control pointer). Rc 'saves/memoizes' the control pointer, Sc restores it.

It turned out to be equivalent to shift/reset, which in a LC with those features are of course also syntactic notions.

So, under water you find shift/reset in the runtime, but there's no support in the language itself.

[Btw. Thanks, I noticed I remembered a rule wrong and have been confusing its with it's.]

Discriminating on Types

Minus a generally applicable "introspection" feature, this kind pattern matching over types frankly bothers me.

I love SML, but it's "special case subtyping" feature for exceptions, to me, is simply an outright flaw in the language. Why only exceptions?

OTOH, I'm not so comfortable with the idea of general pattern matching over types either; colloquially, I view this as a sort of "Stupid Scala Trick" to use a Letterman phrase :-)

Has anyone truly shone a very bright light on this issue - exception typing and subtyping, exception handling and general pattern matching over types vs. exception type inspection in dynamic handler discrimination?

S.

Mystified

I'm somewhat mystified over what you are talking about. Pattern matching is not introspection or discrimination on types, it's case analysis over variant labels/tags - which may look similar but is something quite different. Likewise, ML's exception type does not have anything to do with subtyping, it is merely an extensible sum type (although there have been proposals to generalise it with subtyping). It is admittedly a flaw in MLs that there is only one such type - which happens to be called exn for historic reasons. But that does not really affect either theory or practice.

Uh

SML, OML, OCaml. Now I am confused.

Pattern matching over types

It is admittedly a flaw in MLs that there is only one such type - which happens to be called exn for historic reasons.

This is my problem with SML's mechanism - it's a special case. I don't think an aversion to special cases is so completely out of line :-)

Also, in a response to a comment below, discriminating over constructors for one type is not the same as discriminating over types. In other words, these are NOT the same thing at all:

-- Normal discrimination over constructors
len :: List a -> Int
len Nil = 0
len _::xs = 1 + len xs 

-- Totally different animal - unless we have untagged "true union"
-- types we need some kind of "Any" type.
odd-fun :: Any -> Unit
odd-fun v = 
  match v
   | i:Int -> puti i
   | c:Char -> putc c
   | s:Str -> puts v
   | b:Dbl -> putd d
   | a -> puts (tostr a);

Any kind of type "introspection" or "discrimination" feature 1) may well encourage bad software design to start with in the wrong hands but also 2) burdens the runtime system and compiler design with constraints on the representation of values (more so on scalar representation, I'll hazard guess).

I'll freely admit that of the dynamic control flow mechanisms, I'm not sure at all how to design an exception handling language feature that is both usable in practice and completely "clean" with regard to the language design -- Scott

Cleanness

This is my problem with SML's mechanism - it's a special case. I don't think an aversion to special cases is so completely out of line :-)

I agree, which is why we generalized them in Alice ML. ;-) But I don't think that makes too much of a difference in terms of "cleanness".

Any kind of type "introspection" or "discrimination" feature 1) may well encourage bad software design to start with in the wrong hands but also 2) burdens the runtime system and compiler design with constraints on the representation of values (more so on scalar representation, I'll hazard guess).

I agree with (1), but you may be somewhat mistaken about (2). What you need to do is to pass runtime type information, but that does not need to be (and should not be) tied to values. Instead, it's tied to type variables. That still incurs a cost, but it is much smaller and easier to optimize.

Stupid Scala Trick?

Conditionals based on an unbounded set of automatically generated type-like labels is an idea approximately as old as dirt. Why blame it on Scala?

Well....

I have a language with values, types and interfaces. I am going towards a language (the only feature I didn't implement, yet) where you can build lists of values with different types, but the same interface - like in Java. Being able to coerce a value back to its type seems natural to me, maybe I programmed to much in Java??

For the moment, matching against a the type of a value is just an extension of pattern matching in general - it's handy in the case of exceptions, but you can do it anywhere you can write a lambda abstraction. Thus, you can build functions on values of type 'list t,' i.e., the concrete type is not known, but you can, in the body of the function, actually see whether you're processing a list of integers or a list of strings.

The list-of-anything, instead of list-of-anything-of-the-same-type, is the last extension I need, but I got around not having it so far for the moment. I just didn't decide on how to handle that yet. I might go existential types, but that doesn't handle the case that I want to reuse all the normal lists definitions.

If anyone has great ideas, please let me know. (Uh, better than union types, existential types, or record subtyping, and-so, because I know that.)

[Hmm, I might have misunderstood your remark. If I read it correctly, you _only_ can do that in SML with exceptions, so, yes, I agree, that might be generalized. In general, as you also remarked, it has its uses, as I showed by an example of processing lists of different types differently.]

[I have some thoughts on it. I.e. I need a quantifier ??, such that a type '??t => list t' is somewhat akin to existential quantification, but states that the list consists of values of different types. I might avoid union typing altogether with a function which just 'erases' the type before it's inserted, not sure.]

[Forget it, problem solved.]

"principle of least surprise" syntax friendly - omit that one.

I'd omit that and replace it with a very easy to parse and understand syntax that permits rewrite systems / algras / refactorings.

I'd also pay very strict attention to the containment of cause and effect. ie. Modules would be utterly watertight, permitting no effect to leak except via the document interface. (ie. Permit large scale refactorings)

Syntax

This sounds like a good idea. Syntax might not be too fashionable an issue, but I've always thought it has import (I'm an aging Lisper).

Do you have any pointers to languages whose syntax has the desirable features that you describe? I've always admired Smalltalk's syntax and Forth's simplicity and syntax extension capabilities.

But I'd love to learn about other syntaxes and the nuances that give them the particular benefits you describe. Thanks!

How to

Study REBOL and OMeta for parseable expression grammars. Unfortunately, no books, only manuals, blog posts, mailing lists, and research papers. REBOL does containt tutorials. OMeta does not. REBOL is also more general than OMeta in that its PEG-equivalent functionality (which is referred to as the "parse dialect") is only one if its "dialects" for domain-specific languages.

Study Maude for term rewriting. The book All About Maude is a great place to start, and is a very friendly introduction to the language.

For environments, see MetaCase's MetaEdit+, Xtext, Stratego/XT, JetBrains' MetaProgrammingSystem and Oslo (now called SQL Server Modeling).

Another way to get a feel of

Another way to get a feel of a PEG binding is LPeg:

http://www.inf.puc-rio.br/~roberto/lpeg/lpeg.html

Although Lua sounds a bit ... conventional in this context, it certainly is a case study on how to fit a modern dynamic language in less than 200K

Get rid of the typing. Then

Get rid of the typing. Then look at Babel-17 ;-)

This has been an interesting

This has been an interesting thread. The major additions I'd recommend are named state, reified continuations, concurrency dataflow-oriented organization, high-level macros, output futures, and something like the lispy eval/apply/lambda facilities.

This isn't just a random laundry list; these are features that would IMO be especially powerful together - although some combinations require lots of care.

To get both concurrency and reified continuations working together and not fighting, I'd say that the continuations should capture the control state of a single thread, not the whole program. Also I don't know if you can really have fully "static" typing with lambda/eval/apply, but I think they're worth putting some level of runtime type determination in for.

Ok, My Own Votes for Potential Additional Features

My own ideas are pretty prosaic, but I think would help the language programmer greatly in their day to day work.

1) My first candidate are simple co-routines (not generators, but the "real thing," coroutines that can accept "send" sent values). Coroutines provide specifically for a structuring of part of programs as highly composable Unix pipe-like chains of sources, filters, and sinks. The potential topologies of coroutines processing data and sending results to other coroutines are even more flexible, but I won't go into that here.

In the most general case, coroutines can conveniently and safely allow for an extremely convenient inversion of control and dataflow that will simplify many everyday software design problems for the programmer.

2) My second candidate is a *very* richly featured, programmer extensible (via iterator interfaces on data types, programmable new filter operator semantics, etc.), "comprehension feature," akin to exemplars such as CL's LOOP facility or Scheme's "Eager Comprehensions" SRFI.

I explicitly do not say "List comprehension," because I envision a much more general "sequence comprehension" and even an extensible "foldr/foldl" like "data destination" feature for computing sums, averages and so on and so forth.

Properly designed an elegantly integrated with a modern language's typical data streams - streams of numbers, character and line streams, streams of rows from an SQL query, etc. - a highly capable "for" iteration, filtering and comprehension feature could *greatly* simplify a large percentage of the boring "boilerplate" code that burdens a developer in their day to day work.

3) Other candidate features might be lazy evaluation and an open methods, multiple dispatch oriented OO system.

-- Scott

coroutines and lightweight processes

Yes, please do co-routines. But what I really want is lightweight processes built into each language. (I've written about this some, but not lately.) Constructing Unix-like pipes in a single native OS process is one of the big draws here.

If a cooperatively scheduled lightweight process is a fiber -- ie green thread -- you can represent one as a continuation plus identity. (If you have a slot in which a fiber's continuation can be stored, that slot can be the identity of the fiber.) If you can swap continuations in a fiber, you get co-routines directly in a lightweight process.

Your continuation representation needs to include a value or values returned if you're calling upward when the fiber is pre-empted (say upon return to trampoline).

Unfortunately a FFI for this is hairy in C when you want a stackless model

re: ffi

i'm not totally understanding, but i wonder if others have thoughts on that already.

expanding ffi remark

In pursuit precision and clarity I get verbose, which very few folks like. So I'll only try to expand the ffi (foreign function interface) part. I can add more if you like.

If your C runtime isn't already stackless and/or lacks good coroutine support, but your language depends on constrained use of stack and low CPU use before yielding, then style of C code you can write in a foreign function reached by ffi is so constrained it might be hard to do something useful. (It would be hard to write more than a leaf node that didn't call anything else.)

If you write your language in C, you might follow a very tight and unnatural coding style to achieve your stackless regime and timing requirements. You might need to impose that same style in code reached by ffi, to maintain your invariants. Following the rules would be hairy to a developer adding code via ffi. You'd have trouble using any library already written in C -- it would break the rules.

The Erlang page you cited looked interesting. I expect they have similar issues.

Bottom Line

What you want is a virtual processor and a virtual processor context.

cost of virtual

Z-Bo: What you want is a virtual processor and a virtual processor context.

I was encouraging scottmcl to do coroutines and lightweight processes. I've implemented them already along the lines I mentioned in C. I like them, but other folks don't like what it does to code. If I do something fancy, my code strikes people as complex.

The virtual route would work, but is unlikely to perform as well. (I usually work in contexts where folks say, "Given this hardware, give me maximum results on every dimension so CPU, memory, and network bandwidth all get used so micro and macro benchmarks are best.")

A virtual processor context also fails to satisfy requirements if you need sharing of memory in one native OS process without cost (and I do). Personally, I want some behavior like a process, but without any process isolation. Presumably the purpose of virtual is to isolate.

Let's not talk about what I want. It won't match a market, so it would pay no one to do it.

re: isolation

out of curiosity, are there (i seem to think there are but can't find references) ways to implement static assurances of isolation so that you don't have to pay the runtime overhead?

isolation (not sure why I keep responding)

(Sorry I didn't nest my last post correctly.) You could follow statically checkable rules to assure isolution, but those rules might rule out a lot of expressiveness. Without rules restricting what you can express, I'd guess the problem is isomorphic to the halting problem.

Even if they worked, static checks would not help me since I want to write code clearly violating isolation (just like threads using each other's state in one native process) because parts of the same continuation call tree are shared and updated. I want lightweight processes to act like cheap threads with lockless synchronization to get parallel and async execution.

effect typing?

seems like (i don't actually know what i'm talking about) some form of effect typing could be used to dial in things the way one wants at compile time? could one see what is/not isolated and "fix" only what needs to be fixed? of course that gets into all the coverity style usability issues if it is even remotely possible.

I was actually referring to

I was actually referring to Ian Piumarta's paper describing his Virtual Processing Unit (VPU):

Ian Piumarta. The virtual processor: fast, architecture-neutral dynamic code generation. In Proceedings of the 3rd USENIX Virtual Machine Research and Technology Symposium (VM'04), pp. 97-110, San Jose, CA, USA, 2004.

Ian was one of the pioneering researchers in recursive virtual machines and the Virtual Virtual Machine project at INRIA. Currently, he worths for Alan Kay as a co-PI.

Also, I remember first hearing of the argument for this on LtU user Sandro Magi's blog: What's Wrong with .NET. Of course, Ian gives a way better technical argument than Sandro, but Sandro is covering specifically .NET.