What Are The Resolved Debates in General Purpose Language Design?

In the history of PL design there has been quite a bit of vitriol spilled over design choices. These days bringing up "static vs dynamic typing" is a good way to generate some heat on a cold winter web forum.

But a few debates seem to have been pretty firmly resolved, at least for general purpose languages. GP languages have considered goto harmful for decades now, dynamic scoping is seen as a real oddity, and manual memory management is a choice only made by systems language designers.

Questions for LtU: What other debates have been "resolved" in the sense that general purpose languages are hardly ever designed with an alternative to the consensus? What debates will be resolved in the near future? And what apparently resolved debates deserve to be reopened?

One rule: "static vs dynamic typing" will be considered off topic because clearly and emphatically that debate hasn't been resolved and has no prospect of being resolved any time soon. Also, it's boring.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

First-class

First-class functions.
Memory safety.

Memory safety

Good one. I should have included that. First-class functions seems more like one that might be in the "soon to be resolved" column.

All new languages I've come

All new languages I've come across have first-class functions, and C++ is getting lambdas retrofitted. Seems like a definitive win to me.

Debate

Right, but if you visit the Java forums you'll see that the debate rages on.

Seems a bit silly, given

It is silly - mostly

As you point out, object oriented programming is already programming with higher order functions, so any objections based on "newbies won't understand it" is automatically damning the entire paradigm.

But there are some arguments that since the language really already has them, kinda, sorta, there's no need to add any more support for them.

Yes, they are still a bit

Yes, while syntactically heavy in Java, they are present. So my original point stands: first-class functions have definitively won. :-)

symptomatic fexprs

Well, trying to look at my own work from the outside, perhaps the re-raising of the first-class operatives (and first-class environments) debate is a symptom that the larger debate on first-class functions is relatively settled (on the theory that we wouldn't be moving on to look at these other things if first-class functions were really still up in the air).

Really?

Is there consensus on first-class functions? Java still doesn't have them, for example—and not just because of inertia; they do introduce performance problems.

Anonymous Functor Objects...

With its functor objects and anonymous classes, Java has first-class functions in all but name, limited mostly in that they don't inherit lexical scope. No reason you can't pass other functors into them.

Lexical

Actually they do. Mostly. :-) They capture immutable variable bindings in the enclosing scope and they capture field and methods bindings from the lexically containing object(s). What they don't capture is non-field mutable reference bindings from enclosing scopes.

Rocket Science

Those two sentences (after the emoticon) are enough to confuse 95% of the non-programming world. ;)

(edit: oops, forgot to negate the programming part.)

* I think you had it right the first time, too.


Functions in Java

There are proposals for adding such a system in Java 7; "BGGA" is arguably the most popular.

Cancelled

Apparently lambdas have been taken off the table for Java 7 due to a lack of consensus.

DataFlow doesn't need functions

Like the subject says, functions aren't needed for computations. In fact, they're quite restrictive compared to what you can do without them. If multi-core systems are going to be the norm, expect dataflow to became more prevalent.

Note that by function, I mean an entity that uses the substitution model. If the "function" lives on after it's used, then that's something else, but people often call this a function anyhow when it is not.

The distinction between a

The distinction between a function and a closure is well known, as is the benefit of both in data flow. In the limit, data flow systems tend to try to include both. This is less present in the hardware roots of data flow, but obvious in software/productivity uses.

Not sure it is a closure either.

In dataflow, (say unix shell or lucid), the units are more closer to processes than to either functions or closures. They generate or transform a stream of values unlike a function or a closure.

Good catch, I was blurring

Good catch, I was blurring my implementation experience with end-user abstractions :) At a formal level, co-data is probably close to what you want. There have been a couple of nice papers about this. Functions are convenient for writing data transformers, first class ones for dynamic data flow, and closures to avoid writing convoluted or otherwise clunky state-passing transformations. The data flow system can often be distilled to basic chaining primitives: these can be thought of as combinators that are parametrized by functions, and change everything from data to co-data. This perspective is the FRP one, but natural (e.g., if you view streams as deriving from laziness, there's a history of strictness analysis and otherwise isolating laziness).

What I wanted to say is that (globally) pure systems tend to promote the use of first-class functions and closures, or otherwise punt and people use other systems for heavier tasks. Hence my usage of 'tend' and 'in the limit'. In first-attempt (or lightweight, however you want to spin it) systems, you chain together either stateless transformers or occasionally the stream dual of a fold or reduce accumulation function. However, unless the language expands (e.g., into synchrone/lustre/frp), this approach is too weak (e.g., Python replacing pipes for shell scripting).

Two caveats.

1) I'm discounting Peter van Roy's notion of data flow variables when I discuss data flow systems: the interesting part here is the manipulation of streams. I'm not aware of enough popular use of this style of data flow, and don't think it's what people typically mean.

2) I'm also discounting imperative data flow languages (e.g., Esterel). This is largely for not being as familiar with them and their common use as I'd like to be.

I design languages without

I design languages without closures. Instead, there are often higher-level ways to express what you might have used a closure for; e.g., rules in SuperGlue. Closures and functions are useful for low level plumbing, but they shouldn't be in a very high level languages.

I.e., this debate is not resolved.

Just saying data flow

Just saying data flow without closures isn't revealing. SuperGlue, if I remember right, is similar to data binding as seen in GUI languages (Flex, maybe some VB stuff), though with a clean embedding approach. However, these languages typically require abstractions elsewhere (even Excel code eventually leaves Excel land*). Code that uses data binding still often punts to traditional handlers, either with explicit callbacks, or with impure bound expressions. For general programming, variants of the proposed solution haven't been enough in practice, so I stand by my position that something stronger like first-class functions + closures must exist somewhere for GPLs.

It doesn't mean you're 'wrong': my experience with more expressive data flow extensions to JavaScript and ActionScript suggest your usability argument is a good one :) However, these still had fall-back mechanisms. Rephrasing in terms of resolution... reactivity support, esp. a flavor of data flow is the way to go for complex (visual?) end-user systems, and data binding seems to be the leading horse. How to get this to play well with the rest of the system is an open question: should it be first-class and mix with other features, be a separate layer, or somewhere in-between with perhaps more of a focus on efficiency or correctness?

*If you've ever seen Fight Club, you might understand one of the scariest places I've ever been to: a soap factory that was largely driven by Excel spreadsheets.

Rule driven programming is

Rule driven programming is orthogonal to data flow. Basically, with rules you can build a large data-flow graph with a few statements. Probably you could do that with control flow or recursion (basically the same thing), but rules can be more elegant. You can build a data-flow graph using imperative statements also (e.g., in WPF), or via objects, or via a combination of rules, objects, functions, and imperative updates.

Addressing other domains, you can think of using rules rather than functions on something else...like a discrete event/action paradigm (which I consider separately from continuous data flow). An extreme example is Prolog, but some of the more advanced constructs in Prolog either touch control flow (cut) or begin to resemble functions.

Of course, under the covers, everything may translate to functions or sub-routines or whatever, but we are only concerned with the language the user sees. This topic is about general purpose languages, but you can have a library in a GPL that avoids closures (e.g., Bling for C#).

Yes -- perhaps I wasn't

Yes -- perhaps I wasn't clear. First I stated that data flow systems tended to include first-class functions with closures (or the roughly equiv. OO approach), and then I clarified that of course there are other ways to fill the void. How

On an orthogonal point, I'm not a big fan of P2-style support of streams in Datalog; perhaps you are thinking of something else. The idea is natural, but the actual approach never somehow clicked for me.

Memory safety too general

Memory safety seems to general for me. If you include OutOfMemory errors, I do not know languages, which are actually safe.

I'd rather say "pointer arithmetic" is dead. (Except for low-level operating system stuff)

Off the top of my head

1. Structured programming (Goto considered harmful)
2. Lexical Scoping

Dynamic scoping

While I certainly agree lexical scoping is the sane default, I'd hope language folks would keep in mind the benefits of allowing an option for "dynamic scoping" (more precisely indefinite scope and dynamic extent) a la Common Lisp and Perl.

Double Bah!

I posted this to point out a mistake in my own previous comment and then realized I could just edit it. But now I can't delete this. Sorry.

It's a good point

Besides CL and Perl, Clojure includes a (thread local) dynamically scoped binding construct as part of the language and Scala has one in its library. So the resolved issue is, as you say, what's the default rather than whether it should exist or not.

languages that admit they are on an OS

Edit: not really a syntax/semantics issue, but a platform/environment issue.

I find languages/platforms that have high OS-interaction friction painful. If too many layers of abstraction exist between the programmer and common operating system tasks (file, process and network I/O) then those layers cease to be useful and become boilerplate. When the boilerplate becomes too heavy, interaction with the OS (and other systems via the OS) gets reinvented (like security models, user accounts, job scheduling, email, ...). Its also nice to write a program and not have a lot of heavy lifting to run it from the OS.

Related to this is string manipulation & data encoding/decoding. Platforms that let you deal with strings elegantly and efficiently make OS-interaction better. Good regexp support goes a long way (regex literals are nice). Some concept of pack/unpack makes things decoding and encoding data of different widths types endianness less error prone. Treating strings/data as lists are nice (given good list support).

If it needs that much explanation...

... I don't think you can call it resolved.

I think we'd all agree that 'efficient access to external services' be they on a virtual machine provided by an OS or bare metal is a good thing.

But the degree to which a language admits to being on a particular OS is also the degree to which code and libraries written in that language become unportable to other operating systems.

Images

As David says, the appropriate level of abstraction is still very much an ongoing debate. That and your points about strings sound like a personal wish list rather than a resolved debate.

But your point about platform/environment did bring to mind something that apparently is resolved and is more an environment/platform issue: image based development. It's been a long time since I've seen a non-Smalltalk environment that was based on persistent images.

Somewhat close is the notion

Somewhat close is the notion of persistent interpreter environments (not sure what the appropriate term is). Saving an interpreter session -- not just the store, but the previous commands -- is wonderful (e.g., R, Matlab). I wouldn't say the door is shut on this one. Perhaps the reverse: this is a common (though somewhat trivial) distinction between languages meant to be used interactively (R) and not (Python).

Perhaps not Resolved

I think that you don't necessarily "resolve" these types of things, as much as coming to a full understanding of the implications of different features, and what other features they are or aren't compatible with.

The cynic in me

I'm more cynical. It's often been observed that new scientific theories are adopted not when evidence becomes convincing but when the previously tenured generation dies and leaves nobody with a vested interest to fight for the old pet theory in the face of mounting counter-evidence.

Does a similar process hold for PL theory and design?

A new scientific truth does

Max Planck said similar: A new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die, and a new generation grows up that is familiar with it.

Unfortunately, this does not

Unfortunately, this does not inspire so much confidence in the "truth" of this new truth.

Here's a few:

Might not think about these because we don't argue about them.

  1. Numbers (at least small integrals and finite-precision reals)
  2. Arithmetic (sum, product, division, modulo)
  3. Structured Aggregate Data (at least one mechanism to glob small bits of data into larger structures)... but note that immutability of these structures is NOT resolved.
  4. Structured Programs (at least one mechanism, such as procedures, to break down big programs into smaller chunks)
  5. Conditional Execution (ability to specify that a subprogram execute only in certain conditions)
  6. Recursive Definitions (ability for a procedure, function, or method to ultimately call back into itself, including co-recursion)
  7. Dynamic Memory Allocation & Management (excepting in certain embedded domains... ability to fetch more memory on demand and later release it; doing so may be explicit or implicit)
  8. Pattern Matching - from simple to advanced forms of switch/case statements.
  9. Modularity - breaking down large projects into bite-sized chunks accessed through an interface (usually a set of shared names)
  10. Named Variables - ability to assign something to a name at runtime then use that name to access that something; issue of single-assignment vs. mutable variables remains.

Here's a few stubborn old entries that I believe aren't yet resolved to everyone's satisfaction:

  1. Exception Handling (and especially Java-style 'checked' exceptions, which some claim even more 'evil' than the regular exceptions)
  2. Reflection
  3. Concurrency (many languages still don't acknowledge it, nobody agrees on how to do it)
  4. Representing IO
  5. Macros/Extensible Language/Operator Overloading (much holy war fodder between language designers here, concerns regarding maintainability of code by newcomers to projects)
  6. Syntax, especially Syntax of comments :-)

There are tons more issues that simply don't come up that often (security in languages is a major focus of E and my own language, language integrated queries are on the new side but are regarded as pretty cool, resumable exceptions are wicked nice but nobody but to my knowledge only Lisp really has them, multi-methods, open functions, open data types, aspect-oriented programming, dataflow constructs, versioning/transactions/undo-capabilities, etc. simply don't come up in enough languages to be subject to widespread debate.

Numbers

This one actually seems a bit debatable, at least along the dimension of whether numeric types with finite or infinite range should be the default.

Good catch on recursion. It is easy to forget that languages were once designed without any support for it.

Fixnum vs. bignum

This one actually seems a bit debatable, at least along the dimension of whether numeric types with finite or infinite range should be the default.

And it'll probably continue to be unresolved for a long time, since most programmers don't even notice which is the default in the language they use.

But there are other bits about numerics that are resolved. When was the last time you saw a machine using BCD, or ones-complement arithmetic?

Decimal

Though the representation may not be BCD, General Decimal Arithmetic is making a comeback: Java, Python, etc.

BCD

The last time I saw a machine using BCD was about four hours ago, when I received an SMS on my mobile phone.

Speaking of recursion

One more stubborn old entry is tail call optimization (people like their stack traces).

Bring back goto!

Indeed. I wish this one would be settled in favor of proper tail call handling. If I ever wanted a stack trace for debugging purposes, I can always wrap the tail call with some identity function that would not get optimized away. No more tail call, and suddenly your mostly worthless stack frames don't automatically disappear. Yay. Why again is TCO an evil "optimization", and forgoing it a reasonable choice?

And I think "considered harmful" should be considered harmful. Really, what's the problem with goto? I obviously think it should be in almost every programming language. ;-)

Goto will once again become acceptable when...

...they finally get wrapped in a Goto-Monad.

Goto-Monad

What, like this one? ;)

Zero based indexing

Anyone still indexing from one?

Lua

IIRC Lua uses default one based indexing.

Matlab.

Matlab.

R

R uses 1 based indexing. I suspect this trait is inherited in languages that claim Fortran as a forebare. R also uses column major ordering on data.frames -- I believe similarly inherited from Fortran (though I know little about Fortran, so take this with a grain of salt).

Oz

Oz indexes from one too.

Smalltalk

Smalltalk uses 1 based indexes.

XPath/XQuery

XPath uses 1-based numbering/indexing for nodesets in 1.0, and for generalized sequences in 2.0 (the latter also ends up in XQuery).

unicode strings

Treating strings as single-byte-only characters has been abandoned in favor of unicode capable strings hasn't it?

Arc is a new language that

Arc is a new language that doesn't support Unicode (yet), so it doesn't quite seem "settled" yet (though Arc caught much flak for it).

Arc is irrelevant

Its only visible relevance, at this point, appears to be a self-inflicted black eye on the face of its creator.

Well, a new language which

Well, a new language which many people defend seems like a counterpoint to Unicode being a fully resolved debate. I agree it's not very relevant, but people do use Arc.

Unicode is probably a mostly-resolved debate

though I don't consider Arc to be a significant participant in the debate, given Paul Graham's ambivalence to the issue.

Unicode is a complicated, messy standard; which lots of people don't like for that reason. If your application doesn't require internationalization, it's easy to throw up your hands and not bother. However, internationalization, and the encoding and processing of the gazillions of languages and notations used around the world, is a complicated, messy problem--one I think doesn't admit a "clean" solution (or at least a solution cleaner than Unicode).

Certainly, code pages, regional encodings like Big5, or other alternatives don't suffice.

I think Unicode is pretty close to a done deal. The Web is Unicode-based; and nobody has proposed a serious alternative.

Creationism

This is sort of like the "debate" between teaching evolution and intelligent design. A handful of people continue to insist that Unicode is irrelevant, and design languages around this assumption. However all serious languages have support for Unicode.

Not true

Arc has supported Unicode for nearly a year:

http://news.ycombinator.com/item?id=111100.

It's actually just piggybacking off the MzScheme support. Apparently wasn't much of a change to expose that support.

but which Unicode? Windows

But which Unicode? ;)

Windows NT and Java both started with UCS-2, but later added support UTF-16. Python uses UCS-2 and recently supports UCS-4. And what about the popular UTF-8?

UCS-2 and UCS-4 waste lots of memory (for European languages). UTF-8 and UTF-16 are compact, but do not allow O(1) string indexing.

There is that...

though I'm puzzled by your description of UCS2 as "memory wasting" whereas you describe UTF-16 as "compact", as both are (nominally) 16 bits per character. The difference being UCS2 doesn't support any plane other than plane 0 (and is exactly 16 bits/char) whereas UTF-16 does; such characters require two 16-bit words to encode. And as you point out, the variable length encoding schemes make indexing more problematic--moreso than a simple multiplication.

But Unicode is still saner in this regard than the various "code page" schemes that have long plagued Windows--and any other scheme for encoding the languages of the world will have the same issue.

Within a language runtime, this is mostly an implementation detail (an important one to be sure); the difficulty is dealing with external data, which may be formatted in one of several different ways...

This is a solved problem

It is possible to support full Unicode strings with O(1) indexing, with only 8-bits per code point in (rather common) case where the string is entirely ASCII. Take a look at https://trac.ccs.neu.edu/trac/larceny/wiki/StringRepresentations -- the one I'm thinking of is the "record1" representation.

If it is solved...

Then what is the proper way of dealing with Unicode? (I am asking since I am trying to figure this out myself.)

I personally would settle for internal UTF-32 at the moment, although that factor 4 is nasty. I've looked at the Python internals, which seems to use Utf-8, but it seems to noisy for taste, and I gathered it took a looooooong time to implement. (Which I don't have.)

Anyone tried Fribidi? Or libuni? Or something? I would think that there would be a standard library for encoding/decoding, or is iconv enough?

record1

Read the link I posted above. You can have strings with 8 bits or 24 bits per character, depending on whether the string has any non-ASCII characters in it or not.

R5RS and R6RS Scheme

And I need C?

Hmpf

I have the feeling it's just enough if I stick with wide characters in C and just not care about the encoding.

Though UTF-16 (as adopted by

Though UTF-16 (as adopted by Java et. al.) is inline with contemporary thinking***, in the context of some future "Hundred Year Language", I would like to see UTF-32 as the default character encoding for (in-memory) strings. That would get rid of all the monkey business with accessing the nth character (rather than code point) in a string and ==> O(1) string indexing as already pointed out.

***See official Unicode FAQ page for their take on this: UTF-32 FAQ

With 64 bit processors (and memory and MIPS to burn) on every desktop just round the corner, can't see myself why we don't bite the bullet and go UTF-32 for string types in new programming languages. Perhaps it wouldn't be such a bad idea to deprecate UTF-16 in favor or UTF-32 in languages such as Java that have hitherto been conservative on the issue.

UTF-16 in favor of UTF-32

deprecate UTF-16 in favor of UTF-32

Not to confuse with facts, but...

As we built one of the first 64-bit processors at HaL, we looked at the cost of this. Approximately 20% of the data in modern program binaries is character data, so the cost of going to UTF-32 is to double all binary file sizes. For runtime data the cost is considerably higher, and the impact on de facto cache utilization is a complete disaster. For character-intensive applications running in Europe and the U.S., UTF-32 means that 75% of your cache is unutilized. In the Eastern block outside of asia only 50% unused, and in Asia 25% unused. Basically a bad approach all around. The only credible argument for UTF-32 is that indexing is linear, but see ICU for a proof by example of how to do that properly.

Our conclusion was that you basically have two sane positions. UTF-32, where character offsets are uniform but space is used badly, or anything else, where character offsets are a nuisance and the rest is just a debate about how badly you want to use space. In the latter case, UTF-8 is clearly better for most of the world, and the right thing to do is to build a library similar to the string libraries used by ICU that internally encode strings as substrings characterized by uniform code point size per substring.

UTF-16 is the worst of all worlds. Non-uniform indexing and space-wasteful in most current use-cases. It only made sense in Java because at the time of the decision (UNICODE 2.0), it had not yet been realized that 16 bits for universal code points was insufficient.

Should depend on the host environment

I claim that the decision of what the default string encoding should be is not up to the language designer, but the OS designer.

Consider that the language must deal with the fact that data in the 21st century is stored and transmitted in a very wide variety of formats including the several representations of Unicode. Therefore, you must provide support for format conversion at the I/O boundary no matter what.

However, at the boundary with the host OS, there is generally only one encoding to choose from - the one the host OS is using. Therefore, a hosted language that wants to play nicely with its host OS should choose a default in-memory encoding that matches the host environment regardless of the technical (de)merits of that encoding on its own. That means UTF-8 on Linux and OS-X, UTF-16 on MS Windows, EROS's choice on EROS, etc, etc.

String Representation not resolved...

I would like to see broader support for 'ropes' as the default 'string' type. One can still achieve O(1) amortized iteration and parsing, gets O(lgN) manipulations (append, split, substr, replace, insert, remove) while simultaneously supporting persistence (useful for versioning, undo, and transactions). It is cheap & easy to have the rope count contained graphemes and glyphs, line breaks, and even words, allowing very fast indexing O(lg N) to access and manipulate strings in terms of given grapheme, glyph, word, line numbers, etc. (at least if the maximum 'primitive array' size is fixed). Finally, UTF-16 and UTF-8 could even be used together within a single rope, no problem at all.

It isn't as though we often have valid reason to 'index' into strings other than to iterate through them. Strings are almost as opaque as data-types can get (only binaries being more opaque). If indexing into them as opposed to running a regex string match or transform, it is reasonable to suggest a change in data type.

UTF-32 as the default character encoding for (in-memory) strings [...] would get rid of all the monkey business with accessing the nth character (rather than code point) in a string and ==> O(1) string indexing as already pointed out.

Even UTF-32 does not "get rid of all the monkey business". While Unicode does aim to give every character a unique codepoint, the inverse is not true: not every codepoint is a unique character. There are many cases where multiple codepoints may be combined to form what is visible as a character (a glyph) or used in a language as a whole character (a grapheme), and there are many 'control' codepoints. The issues involved have a rather profound impact on how one goes about integrating Unicode support into a language. Related from the FAQ you mentioned:

Q: Doesn’t it cause a problem to have only UTF-16 string APIs, instead of UTF-32 char APIs?

Almost all international functions (upper-, lower-, titlecasing, case folding, drawing, measuring, collation, transliteration, grapheme-, word-, linebreaks, etc.) should take string parameters in the API, not single code-points (UTF-32). [...]

Given that any industrial-strength text and internationalization support API has to be able to handle sequences of characters, it makes little difference whether the string is internally represented by a sequence of UTF-16 code units, or by a sequence of code-points ( = UTF-32 code units). Both UTF-16 and UTF-8 are designed to make working with substrings easy, by the fact that the sequence of code units for a given code point is unique. [AF] & [MD]

String Representation not resolved...

Your points taken; certainly I'm not religious about this nor am I a Unicode expert ... my main point was that there is no need to be conservative with memory/mips in choosing the most convenient string encoding and my gut feeling was UTF32 easier to deal with than UTF16 (and in turn UTF8).
Your suggestion re ropes has merit; found this IBM article on the subject for those interested ...
Ropes: Theory and practice
I concur that string representation is not a resolved language issue. If people have strong views on this, perhaps this subtopic deserves a separate thread.
Cheers.

There is only one Unicode

It has different encoding forms, including some clever ones for use in memory that are not standardized as transports. For example: if your (immutable) string has no characters above U+00FF, store it with 8-bit code units (which also happens to be Latin-1); if it has no characters above U+FFFF, store it with 16-bit code units; otherwise, store it with 32-bit code units.

OCaml doesn't support

OCaml doesn't support Unicode, and I'd like to think that it is, as a language, far from "irrelevance".

In my view, whether Unicode should be part of the language or relegated to support libraries is a debate that not only exists, it's far from closed. Witness Python 3000.

Legacy languages

Relegating Unicode to libraries is a recipe for disaster. Every piece of code that does text I/O (and text manipulation in general) needs to be aware of Unicode. Making it optional will only ensure that most code in the wild will choke on non-ASCII text.

Libraries don't have to be optional

Unicode support doesn't belong in the core of a general purpose language (but neither does ASCII).

Literals?

Except for literals, no?

No

Put a layer of abstraction around the way your program is encoded.

Unicode not just a Representation

It's rather useful for the language to:

  1. provide a range of at least 2^20+2^16 unique entries in some sort of 'codepoint' type even if the language doesn't support unicode characters (characters can be multiple codepoints, further complicating Unicode)
  2. Support the embedding of unicode expansions in strings ("abc\u2e64;def\u026811;ghi")
  3. Support the source-embedding of unicode in strings, possibly in other program text (e.g. if dealing with extensible programming language capabilities, camlp4, etc.)
  4. Acknowledge that characters and codepoints are not necessarily the same and support this in libraries responsible for drawing glyphs, reducing strings to canonical forms, and doing a number of similar behaviors

These are type and syntax issues and cross-cutting issues that penetrate design of other libraries, not just encoding issues. If a language is to 'support unicode' in any real sense, it needs to acknowledge unicode at more than just the level of string encodings and 'unicode transformation formats'.

And bootstrapping problems tend to hinder the idea of "put a layer of abstraction around how the program is encoded" if by program encoding you mean its concrete syntax. One must either bootstrap this capability within the language, or one must fall back on some sort of preprocessor (in which case one runs into the bootstrapping problem in the preprocessor). Bootstrapping within the language can be done (it's what I favor) but to do so one must choose how the initial-syntax-to-later-extend-the-syntax is encoded. (E.g. one might allow both UTF-8 and UTF-16 and initially support a syntax-rules, get-syntax, and non-monotonic use-syntax notations atop that.)

Libraries can depend on other libraries

These are type and syntax issues and cross-cutting issues that penetrate design of other libraries, not just encoding issues. If a language is to 'support unicode' in any real sense, it needs to acknowledge unicode at more than just the level of string encodings and 'unicode transformation formats'.

See title. Unicode support should no doubt be in a standard library and used by other things that do string processing. And IMO your bullets 1-4 scream "this doesn't belong in the core language."

One must either bootstrap this capability within the language, or one must fall back on some sort of preprocessor (in which case one runs into the bootstrapping problem in the preprocessor). [...] (E.g. one might allow both UTF-8 and UTF-16 and initially support a syntax-rules, get-syntax, and non-monotonic use-syntax notations atop that.)

I'm also in favor of some sensible syntax customization support, but as you note, it doesn't really solve the problem here. The problem here is supporting literals in a way that depends on how the source is encoded: ASCII encoded source can support ASCII literals directly; Unicode encoded source can support Unicode or ASCII directly. All we need to support this is a very lightweight "preprocessor" that depends on the source encoding type. More heavyweight syntax translation can be handled with a more complicated mechanism that applies to all source encodings.

Unicode support should no

Unicode support should no doubt be in a standard library and used by other things that do string processing. And IMO your bullets 1-4 scream "this doesn't belong in the core language."

Sorry, but I consider every "standard" library to be part of "the language definition". Perhaps that results in some confusion. And I suppose if you have 'Integer' as a primitive, you don't need to treat codepoints as special... my language goes the other way around: it has 'codepoint' as a primitive and sticks Integers in a standard library.

As far as your assertion that the bullet-points don't belong in the core language, perhaps we should also exclude ASCII and every other encoding of characters. I mean, why should we allow "abc" to have, you know, quotes and characters in it?

The problem here is supporting literals [...]

I don't really agree with either of your solutions here. I think the syntax for literals should also be manipulated by the same heavyweight syntax extension mechanisms as everything else, in which case it shouldn't be subjected to preprocessing.

Besides, just read ASCII as UTF-8 and you're set. :-)

Hierarchies of Languages

Sorry, but I consider every "standard" library to be part of "the language definition".

In my view there's a hierarchy of languages. I've tried to be consistent in using the term "core language" when saying Unicode doesn't belong.

[...] perhaps we should also exclude ASCII and every other encoding of characters

I think you're being sarcastic here, but if you'll look a couple of posts ago, this is in fact my position with regard to the core language.

I don't really agree with either of your solutions here.

I don't really have a problem with your proposed solution of simply declaring your language to be UTF encoded - that's a pragmatic choice. To me this is mainly an aesthetic issue. There is already a need to support two types of encodings (ASCII and Unicode). Generalizing to support an arbitrary encoding is pretty trivial and allows you to completely ignore encoding issues when specifying the core language.

And as for the same heavy-weight syntax mechanism applying to string literals - it does. The issue is the individual characters.

I think you're being

I think you're being sarcastic here, but if you'll look a couple of posts ago, this is in fact my position with regard to the core language.

Well, I think I'll take a different position: a core language should define its own logical representation to a degree allowing external software to produce source for it that requires no special processing by the language interpreters.

Generalizing to support an arbitrary encoding is pretty trivial and allows you to completely ignore encoding issues when specifying the core language.

I'm not talking about 'unicode' "encoding issues". I'm saying that the decision to use 'unicode' rather than some other character set pretty much needs to be a core language decision. If you think it's trivial to get around this, try it. I did. Now look what I'm preaching.

[...] a core language should

[...] a core language should define its own logical representation to a degree allowing external software to produce source for it that requires no special processing by the language interpreters.

I'm not sure why you think this is a contrary position. The core language should be aware of its own abstract parse tree, but through abstractions like "Symbol" -- not "UnicodeString".

I'm saying that the decision to use 'unicode' rather than some other character set pretty much needs to be a core language decision. If you think it's trivial to get around this, try it. I did. Now look what I'm preaching.

As you might imagine from my being a frequent poster to LtU with strong opinions about language design, I do have a language project on back burners. Specific issues that you think will be problems would be much more helpful than "I tried it and didn't get it to work."

Abstract Syntax Only?

If you have a concrete syntax of any sort, and you wish to discuss your code in terms of 'text examples', then you'll need a character set to describe the set of characters that are used in the text to produce the examples. It's pretty fundamental.

You're free to move parsing out of the language and require the interpreter/compiler be passed an 'abstract parse tree' straight from the start. By doing so, you could avoid issues of concrete syntax.

OTOH, lacking a standard (initial) concrete syntax is pretty damaging to a language in other ways, such as development of tools that can highlight and index and process source code. Internal extension of syntax, where language statements affect the parsing of other language statements, is much more promising, and allows IDEs to follow the changes in syntax just as readily as the parser.

Not abstract syntax only

Abstract syntax only in the core of the language. Tools for parsing / pretty printing ASCII or unicode should be available in the libraries, and should be used by IDEs or other tools. Maybe we're just arguing over the concept of "core language"? My approach on this seems pretty reasonable to me.

What means 'unicode support'?

Abstract syntax only in the core of the language.

I think we might be arguing what means "unicode support". To me, it means that source code must, minimally, be allowed to contain arbitrary unicode strings and characters, which means that support for expressing unicode strings must be part of the language's concrete syntax. In its greater extent, I believe it means that, should the language support syntax extension, the full range of unicode codepoints is available for specifying language transforms into the abstract source tree.

The fact that support for parsing this syntax may be implemented in a standard library somewhere doesn't change much. It's a nice feature if a language includes support for parsing itself, but many languages never parse themselves. And the syntax of those languages that never parse themselves is no less central to the definition of those languages due to that fault. Thus, if you have a concrete syntax defined for your language, and your concrete syntax supports unicode (even minimally), then I would assert that you've integrated the decision to support unicode into a significant and central part of the definition for your language.

That said, I'm starting to suspect that, for you, "unicode support" simply means "none of the standard library functions will barf if fed external text from a unicode representation." I suppose one could call that 'support' in some sense. But, whether or not you are supporting unicode in the sense I described above, your standard concrete syntax (if you have one) shall necessarily support one character set or other.

If, on the other hand, you leave concrete syntax out of your language, make concrete syntax the province of the user community, perhaps define several syntaxes just to prove a point, then I would readily agree that the syntax is peripheral to your language. But that approach has major costs regarding automatic integration, configuration management, source distribution, etc.

Tools for parsing / pretty printing ASCII or unicode should be available in the libraries, and should be used by IDEs or other tools.

And what about tools for parsing the ASCII or unicode that just happens to be represented within the source code for your language?

Question: does your approach require users to provide plugins that will translate concrete source code into your abstract syntax?

What means 'core language'

I'm talking about full unicode support, but I think I've been confusing things by talking about doing unicode in libraries, when I really mean in tools. (For me, the tools are in libraries, but as you point out, that doesn't really matter). I'm saying there is a front end that provides for presenting unicode to the more core layers of the language.

Users are welcome to write their own front ends that translate from some other encoding, wouldn't need to write very much code to do so, and would enjoy similar access to the core language. But they would face interoperability issues with people using other encodings, and so would also need to make converter tools to do so (which might be pretty challenging for a different character set). So there'd probably only ever be one or two encodings in practice. The reason for separating the encoding is just to make the core language simpler.

OCaml does UTF-8 just fine though

We wrote an entire wiki engine in OCaml and it handles Unicode (via UTF-8 in strings & in the database) just fine.

UTF-8 reading, writing, and

UTF-8 reading, writing, and comparisons on the basis of octet equality hardly counts as language support for unicode... not unless you get the right count of 'characters' back when you ask for the size of a string, or get the right 'character' when you ask for the forty-third entry.

Admittedly, OCaml should do better than C when dealing with U+0000 (when using UTF-8 in C I usually end up encoding that as 0xC0 0x80 to avoid premature string termination).

But it does do that

It does that fine - just use Camomile which is a huge, full-featured i18n package that handles far more than merely unicode.

In the case of the wiki, we didn't actually use Camomile, but have a couple of custom functions for UTF-8 validation and string length, and do everything else out of the PostgreSQL database. This is the logical and proper place for it because we want to push tasks such as searching, indexing, etc to the database.

As for the question about having UTF-8 strings in your program source: IMHO this is nearly always a mistake. You should use a tool such as ocaml-gettext to localize your messages.

If you point at the

If you point at the libraries then you should also insist that C++ "handles software transactional memory and variant types and regular expressions just fine".

The real question is whether the support is integrated throughout the standard libraries and other people's libraries.

Anyhow, I acknowledge the point on localization of messages and reply that not all strings in source code are messages for users.

camlp4

Well OCaml lets you modify the syntax of the language so, for example, you can include regular expressions directly in the language like Perl (actually, much better than Perl, because the well-formedness and type-checking is all validated at compile time).

To be honest I'm not sure what a syntax for "doing Unicode" would even look like. OCaml-gettext uses camlp4 to parse the abstract syntax tree looking for translatable strings, so you could do something like that to mark strings in various encodings perhaps? You could add custom operators to replace library calls ... I'm a bit out of ideas here - I really think that libraries are the best place to handle string encoding, but OCaml certainly doesn't limit you to just using libraries.

In the wiki code we use PG'OCaml which lets us embed SQL statements directly into OCaml - type-checked against the database at compile time (of course!). That involves camlp4.

BTW, I should say that string constants in OCaml can contain UTF-8 (or anything you like). The limitation is that certain functions in the base stdlib assume ISO-8859-1, eg. String.lowercase. However there are replacements for these functions in Camomile, and you can even disable those specific stdlib calls if you want to prevent people from using them accidentally.

Yes and no.

There seems to be consensus that 8-bit characters are dead, and the UNICODE code point space seems to have one, but UNICODE encoding has not. One of the intended (hoped-for?) beauties of UNICODE was that there should exist some encoding in which code points and string offsets had a 1:1 relationship. Unfortunately, early UNICODE made the mistake of betting on 16 bit code points, with the result that early-adopting languages like Java and C# have string models that are horrible no matter which side of the UNICODE fence you straddle.

Last I checked, there were still conflicting definitions of content model in various XML standards because of confusion between character offsets and code point offsets.

Not necessarily

Well, "Unicode capable" may be a correct term, actually, but it's not just Unicode. Ruby 2.0 strings aren't necessarily Unicode, for example (and they may even use encodings which are not Unicode-compatible at all).

XPath/XQuery

XPath/XQuery

XPath?

How is this a resolved debate in 'general purpose language design'?

Even XML itself isn't an entirely resolved debate

with other textual notations, such as JSON, gaining traction in the space that XML represents.

While I have little desire to flame XML in its intended application (markup--that thing the M stands for--an application where the XML is interspersed with Real Data), its kinda overkill for some applications. There is nothing uglier than an XML document where all the CDATA sections are nothing but whitespace. :)

Wow, you must be blessed...

... to have never seen anything uglier than an XML document with an abundance of whitespace.

Of course, I don't program much in Perl

so by virtue of that, I'm insulated from a whole lot of hideousity. :)

Perl is the logical successor to APL

It's a write-only language...

xpath/xquery agreement

I agree with you but, as you can see from the other comments, you're making an early call.

XML data model, XQuery Update side effects model. For most everything.

-t

p.s.: heh... (in the context of "extension languages")

Algol-style type declarations

I don't know if this is 100% "settled", but static type systems where the full type of every variable must be declared seem to be on the way out. Even C++ is introducing type inference in places where it makes sense.

Type inference not settled

That's another debate where parts of the Java community (at least) are still fighting. Note the confusion in the article between type inference and dynamic typing - but read the comments to see people who do understand the difference still arguing against it.

There are many reasons to not want type inference

* Lots of interesting type systems out there for which typechecking is decideable but inference is not.

* Even in a typeless context, type annotations (which, if not checked by the compiler, essentially become assertions) are useful as statements of programmer intent, and as runtime consistency checks.

* Inference is often undesirable at the interface to a module/class/subsystem/function/what-have-you; as the types being operated on or returned are part of the problem domain.

This isn't of course an argument against type inference; merely an argument that it isn't universally desired or a settled question.

Just to be clear...

...I'm not saying that type inference is always desirable. I'm saying that having to declare the full type of every internal variable is increasingly seen as being undesirable.

Mostly agree, but

Inference does not preclude annotation.

Your point about module interfaces is critical. Just because an implementation can handle a wide range of inputs does not mean that I wish to export all of that capability.

Mutable global variables

are one thing that seems to be rapidly disappearing.

Constants

So we can agree i presume that "constants" are another "Resolved Debate"/"must have"? Again hauling JavaScript as an example which only had "const" retrofitted in recent versions.

Not resolved

Many languages have no notion of immutable variables. In some languages, dynamic metaprogramming means that even the structure of a program is mutable. Definitely not resolved.

What is the threshold for "resolved" anyway?

Early FORTRAN dialects would let you mutate the constant table...

Given any language feature or constraint, I'm sure someone knowledgeable can be found that despises it. OTOH, most production programming languages don't permit this sort of hyper-reflection, and those that do seem to do so as a developer aid, rather than as a useful production code technique.

"Dynamic metaprogramming" has long struck me as a workaround for "bad development tools".

I'm not sure

Not sure what the threshold is for consideration, but it seems reasonable that Ruby and Python meet it. Ruby is especially famous for libraries and frameworks that derive their power from dynamic meta-programming. And Python has some immutable built in types but instances of all user defined classes are mutable. So whatever our personal feelings about these kinds of things might be, it doesn't seem reasonable to call them resolved.

Fortran implementation bugs

It's true that passing a constant by reference (the only type of parameter in Fortran) in early systems mutated the constant, but that was a bug, not a feature of the language by any means.

Global variables

I don't see any evidence that they're disappearing from languages. At most they're getting namespaced. The difference between myMagicGlobal = 42 and Globals.myMagicGlobal = 42 just isn't much.

Nearly globals

Except that in the latter it's much easier to rename Globals when it turns out the world's a bit bigger.

Block scope

Contrast it with JavaScript's function scope, which is really awkward at first. Why? Mostly because coming from other (even older) languages with block scope makes one take it for granted.

I know this is a very basic, but i think this is one debate that HAS been resolved, in contrast with some other concepts I see entering the stage here.

Nah

Python doesn't have block scope and isn't going to get it.

Resolved to be deserved to be resolved.

1. High level module system. Functor, Functor-Functors.
2. Module / Component dependency and composition wiring.
3. Symmetrical, turtles all the way down, full control of Opaque or Transparent Types.
4. Macros.
5. Immutability. With concise syntax for copy constructing new instances with mutated field values (Records/Objects/Tuples)
6. Opting for Nullessness.
7. Contracts with blame.
8. Pattern Matching
9. Tail Recursion
10. Continuations (even better serializable continuations)
11. Seamless and simple, dynamic compilation and loading.
12. Named and Default Parameters.
13. Currying
14. Lazyness / Call-By-Name capable but not enforced.
15. Well behaved multiple inheritance / mixins.
16. Introspection and Reflection.
17. Runtime typing annotations.
18. Closures.
19. First Class Functions, Classes, Modules.
20. Component Versioning.
21. Optional Effects System.

Introspection, immutability, dynamic compilation are problematic

Dynamic compilation exceeds the risk-tolerable complexity threshold for critical systems. The jury is still out on immutability for low-level "systems" stuff, and full-featured introspection is problematic because it seems to rely on dynamic compilation and code generation.

One might respond that the "GP" aspect of this thread takes my concerns off the table, but anybody who writes a "GP" program that uses a database is kidding themselves by taking a strong position on these three issues. They have merely swept these issues into selected modules.

Which I think is perfectly fine, but if that is the goal, why not admit it and design languages that can handle the full range of what GP programs require?

Dynamic compilation exceeds

Dynamic compilation exceeds the risk-tolerable complexity threshold for critical systems.

I don't see why dynamic compilation necessarily introduces more "risk". You must either rely on the code generator or verify its output, whether compiling statically or dynamically.

I can see how having a code generator would be problematic size and performance-wise for embedded systems, but I don't see any additional risk.

One additional source of

One additional source of complexity in these systems is the transition from interpreted to compiled code, which to be efficient typically employs machine adapters. Stack layouts and calling conventions become nightmarish, and who generated the machine code of the code generator anyway? If it was another code generator, then now you have two to worry about. If it was itself, that is a whole other can of worms....All of which amounts to additional risk. Throw in a precise GC and watch your complexity skyrocket.

On paper that sounds good,

On paper that sounds good, but on the kinds of devices used in critical systems, you don't have the compute capacity for runtime verification.

Verifying the dynamic compiler when the JIT system is built would be fine, but until an example of this exists and has been qualified, anybody who claims this stuff is viable for critical systems just doesn't understand what "critical" means.

Regardless, truely critical systems generally run code that is completely fixed at system build time. Why would you want run-time code generation in such cases?

Continuations, too

Continuations, at least with unrestricted call/cc, are very problematic too. They break stateful abstractions, because call/cc can be used to duplicate control flow. And it is usually impossible to protect affected abstractions against such abuse.

Continuations to be deserved to be resolved...

I sort of interpreted the title as the start of a list for things that aren't yet resolved, that are problematic in some way. It made for a couple entries I disagree with (e.g. immutability with concise syntax), but continuations and reflection and introspection and macros and almost everything on the list fit that view nicely: very few entries on it are resolved.

Separate compilation units

As late as the 60s there were languages where you could only compile/interpret your program as a whole. I think this is no longer the case.

While at it, I think actual interpretation is essentially dead -- replaced by byte-codes.

Relatedly, the issue of pure-text macros (in the style of the C preprocessor) seems to be almost-decided -- C and C++ won't drop theirs, but I'm not familiar with any other takers.

And as a last point -- C has been elected as the common target for FFI.

Interpretation vs VM vs compliation

is a) an implementation detail, and b) has long been blurred by modern practice, including on-demand interpeters that turn text into bytecodes as needed, and JITs that do the same for machine code.

FYI, C# is a newish language

FYI, C# is a newish language that doesn't support separate compilation of source files (at the language level at least). Instead, they rely on a very fast compiler.

Another debate that is far from resolved.

The traditional role of "compiling" and "linking"

is reversed in many modern environments, with linkers (including dynamic linkers) doing more and more things traditionally done by compilers. The .Net environment, which stitches together MIL files at runtime, generating machine code when you start, is an extreme example.

Compiling and Linking

... I suspect they will eventually be turned entirely on their head. Linking and/or running the project will produce values, and compilers will operate on these values (e.g. a particular function might compile to 'main', a record of functions might compile to a DLL, etc.). The compiler will operate on values rather than on the source code, thus increasing the dynamism by which application products may be formed.

Well, at least I hope this is what happens. I think it's the only way to get the best of both worlds between dynamic interactive systems like smalltalk and the boiler-plate heavy but tiny images and runtimes of batch-compiled products.

This argument ignores startup cost

When the first attempt was made to integrate a JVM into DB2 to support server-side Java in the database, they discovered something rather nasty: 1.25M instructions for the JVM to reach the first bytecode of main, but 150K instructions average to complete a conventional transaction in the traditional way. OOPS!

JIT has it's place, and it's a large place, but it's not every place.

Not talking about JIT

No, I was talking about the "source code" for compilers changing from hand-written program text of the language to values or objects within the language. This is a different concept entirely than what you're imagining.

Of course, the compiler could also be used for other back-ends, such as for JIT. But it's more like using a modified JIT compiler (likely with a lot more optimizations than would be appropriate in a JIT) to transform a function into a binary string, save it to 'a.out', and run it later.

The effort is very much related to marshalling of data.

Don't blame this on JIT

Don't blame this on JIT (dynamic) compilation, blame this on the hideous complexity of the Java platform startup. We did some work on this for the ExoVM project--it was nuts even with the CLDC JDK. Now I am working on the real JDK, and it brings a whole new meaning to the word "nonlocal".

Separate Compilation will Die Another Day

Separate Compilation will be facing off against whole-program Partial Evaluation, Staged Compilation, and Dependency Injection (as is achieved with Hyperspaces and Aspect oriented programming).

I'm betting that Separate Compilation will put up a good fight then lose in the fifth round with two minutes remaining due to a surprise assault by lesser forms of source preprocessing.

It always amazes me when

It always amazes me when people make these claims without bothering to examine the measured impact of shared libraries on real system behavior.

Matching effects can be had in JIT systems, but not within current OS interfaces. Any single program will certainly run better with lazily generated code if the program runs for long enough. Systemically, though, the inability to re-use lazily generated code is a significant issue from several perspectives. This necessarily really argue against JIT. It may, in fact, argue against conventional operating systems.

Again... not talking about JIT

Again... I was not talking about JIT.

It always amazes me when people make these claims without bothering to examine the measured impact of shared libraries on real system behavior.

Shared libraries make for difficult configuration management, useful modular extensions, and intermediate performance gains that improve the more a library is shared. I won't deny those benefits. I say that Separate Compilation, at least of language modules in the traditional approach, will fail in spite of them.

I was primarily objecting to

I was primarily objecting to the whole-program assumption. The JIT comment was more a note that whole-system JIT can re-gain many of the advangtages of shared libraries, but most current JIT engines aren't whole-system.

I don't claim that shared libs are easy. In fact, I claim that they are generally a severe pain in the ass. This doesn't negate their performance advantages, and CPUs and memory just aren't growing that fast on small devices.

Separate compilation of

Separate compilation of language-level modules will never go away. Separate compilation of source files will. And the sooner the better!

Partially Agreed.

Separate compilation at some level of composable program components (e.g. services, processes, remote objects) will probably always be around.

When I say 'language level modules', though, I'm talking about source-code. There might even be many language 'modules' per source file.

First Class Locality

Separate compilation of units defined within the language is what I'm hoping to see, as that would provide the bridge between dynamic and static environments. Something like abstract kells would work out very well.

Goto Not Considered Harmful

Goto isn't dead; it's just resting. There are valid reasons for using it--especially in code that's religious about checking return values. There's a good discussion about the issues and the trade-offs in Steve McConnell's excellent book, Code Complete: http://www.stevemcconnell.com/ccgoto.htm. The point isn't that it's possible to avoid using gotos; the point is that there are valid reasons for using them. The alternatives involve trade-offs.

I'm not really sure there are many (any?) resolved debates. There are popular choices to be sure, but popular doesn't quite rise to the level of "resolved". (To me "resolved" things are things that are highly unlikely to ever change.)

Irrefutable photographic evidence

Here's irrefutable photographic evidence that goto is a goner. :-)

More seriously, what evidence do you have that goto is "just resting" as far as language design goes? Are there some stealth unstructured language projects hovering just under the radar that are just waiting to burst into the public eye and redeem goto's sullied name?

Computed Goto

The dead feature has been extended:

http://gcc.gnu.org/onlinedocs/gcc-3.1.1/gcc/Labels-as-Values.html

The extension is actively used:

http://blogs.sun.com/nike/entry/fast_interpreter_using_gcc_s

(Google for "computed goto" to find more such references.)

Apparently goto is still appreciated in lower-level languages, even though it's frowned on in higher-level ones.

So what exactly has been resolved?

Well, goto is frequently used in GCC

to simulate tailcall elimination. (A statement which may be backwards; tail-calls in HLLs compile down to simple branch instructions--the assembly language equivalent of the goto).

It ought to be dead

McConnell writes:

The gotos is useful in a routine that allocates resources, performs operations on those resources, then deallocates the resources. With gotos, you can cleanup in one section of code, and they reduce the danger of forgetting to deallocate the resources in each place you detect an error.

Not only is this already taken care of today's popular OO languages, but this definition also highlights a much more basic missing element from the type of language he is talking about.

He mentions the need to perform the _same_ clean up procedure from multiple places in a function. If only his language let him define a clean-up function local to this function, with resource names in its scope, he'd be almost there. Then come exceptions, continuations, etc.

Code cleanup...

C++ provides RAII, auto_ptr, and destructors. A great many languages provide garbage collection. With first-class functions or blocks or even some decent macros, one can use the 'withResource(<resource>,do <stuff>)' pattern that can ensure cleanup even in event of exceptions (albeit unlikely to handle resumable exceptions). But, without those features, and before introducing exceptions into the language, goto does have a place in supporting cleanup. Keep in mind that it is NOT "the same" procedure for cleanup each time: the amount of cleanup necessary depends on how many resources one has acquired.

I've been thinking about a pattern especially for resource cleanup...

  fun(x) {
     resource X = acquire_resource();
     fini { release(X); }
     resource Y = acquire_another_resource();
     fini { release(Y); }
     doSomething(X,Y);
     return something;
  }

The 'fini' would be a keyword for finalization steps, and would be injected (even in event of exception) before leaving scope of the enclosing block, in reverse order (so Y is released before X). I don't have RAII, but I figured the locality of the 'fini' statement would simplify syntax or macro expansions that could properly clean up resources.

It's been a few years since I thought about it. I wonder if I'll actually have opportunity to use this in the language I'm devising... not certain whether transactional actor model, even with some synchronous messaging, will really require 'acquiring' resources in these manners.

fini is a good idea

I think this is a crucially overlooked feature... I first posted about putting it in a language I was designing in 2004...

One insight I've had since then is that it's a little semantically confusing in an imperative language in which variables can change; what happens if you do 'fini ( something(X) )' and then reassign X? Presumably the original value of X should be used, but I wonder if that isn't going to break in more complicated situations -- and even if not, at a minimum it seems like it might introduce some debugging confusion.

It seems to me that whatever

It seems to me that whatever code reassigns X should handle cleanup for the original value of X. But that makes some assumptions about the behavior of the variable assigned to X (or, put another way, imposes some requirements on types that could be used with such a construct).

Problem with "fini" is

Problem with "fini" is that's impossible. If a catch block issues a raise, what context can the fini sensibly run in?

Not a problem.

'Fini' would essentially run at the same time (and same order) destructors would run in C++. Exceptions aren't a problem.

In the common C++ implementation of exception handling, the stack is actually still hanging around after throwing an exception (which is required for you to legally 'catch(type_reference&)'). This can be implemented by running the handler at the top of the stack and providing a pointer to the right frame to access the 'local' variables (the stack pointer is left alone so you can call functions from the exception handler without overwriting anything).

The actual destructors for each stack-frame back to the exception aren't run until immediately before exiting the exception context (e.g. by falling out the bottom of it, issuing 'return', or issuing 'continue'). If you raise another exception, you'll simply be in the same exception context on a lower stack frame. If you drop out the bottom of the stack, then it will exit exception context (possibly after signaling the debugger).

It's this approach to exceptions that also makes resumable exceptions viable in C++... you wouldn't want to resume after running destructors, but destructors don't run until after finished with the exception.

And if the fini code running

And if the fini code running in destructor position issues a raise, what happens to the original raise?

Buried

My sketched definition of 'fini' buries exceptions that went uncaught within them after firing a message off to the local logger. If you want to do something useful with exceptions, you'd need to catch the exception within the 'fini' then fire off a message somewhere or spawn a task to deal with the problem or 'resume' the exception based on a policy decision (e.g. abort/retry/ignore/continue with a parameter/etc.). If you want to bury them silently, you need to do so explicitly with a catch-all. The decision to bury rather than propagate aimed mostly to simplify reasoning about exception handling.

C++ idioms offer programmers simple advice on exceptions in destructors: see that? don't do that.

One point to consider is that you will run into this same problem even if you don't have features to help with cleanup. The possibility for encountering new errors while dealing with cleanup after other errors, resulting in two errors, is essential complexity. The problem doesn't go away when you close your eyes, or when you lack the tools (e.g. resumable exceptions) to deal with it.

Almost all languages'

Almost all languages' exception mechanisms react poorly when destructors (or finalizers) do bad things.

For example, if a Java or .NET type has a finalizer that blocks indefinitely, it will probably prevent any more finalizers being run.

I think the correct answer to the question, "what happens if I raise an exception in a destructor when an exception is being raised", is "don't do that".

D's scope

'fini' sounds like D's scope statement.

InD'd it does...

I developed 'fini' based on my use of C++ and prior to seeing it named elsewhere, but most incremental ideas like these ones tend to be reinvented a million times. It's a rather obvious extension to the language.

Weird idea

when 'doSomething' can make 'something' which can consist of any 'X' or 'Y' parts. In C++ you can't assume anything from 'doSomething' call even using const in definition due object referencing/aliasing. Thinks which are simple in pure functional language are impossible complicated in languages like C++. Impure functional languages and some other (Perl, Python, Java, ...) which have GC are solving it by GC. C++ is crippled in this way by design.

Not always decomposable.

Weird idea when 'doSomething' can make 'something' which can consist of any 'X' or 'Y' parts.

In the general case, I agree; memory management can be a major pain when dealing with objects that need explicit deallocation, and one usually resorts to making copies.

On the other hand, resources are not necessarily 'decomposable' in the manner you are imagining. X and Y don't need to expose any components to 'doSomething' excepting the references to the resources as a whole. I.e. one could implement them as actors or objects that reply to messages with some useful information or by executing commands, but do not expose inner structure to their callers.

That only leaves the question of whether 'release(X)' will fully break the references to X (so the next call might result in undefined behavior) or will simply revoke access to the underlying resource. There is a lot of reasonable opposition to using garbage-collection to collect non-memory garbage (such as file-descriptors) except as a very last resort.

goto doesn't have that place

It's not necessarily "exactly" the same clean-up procedure. If they differ, then you either have multiple labels to goto, or at the only goto target inspect resources for their allocation status before attempting to release them. You can do both with local functions.

This is a common theme occurring in his (McConnell's) other pro-goto arguments: e.g. "gotos and Sharing Code in an else Clause", so it is not about resource management per se. Goto just doesn't look like the right tool for any of it.

Goto considered poor man's tail call

If you need to change the exact behaviour then doing it explicitly via a call is likely rather neater than twiddling variables first.

When we emitted that code in

When we emitted that code in BitC, we used for(;;) and continue. The case where we needed/wanted goto turned out to be LETREC and *mutual* recursion.

at the only goto target

at the only goto target inspect resources for their allocation status before attempting to release them.

That might work IF you assume all 'resource' handles are neatly initialized to zero (or some other predetermined value) if one hasn't yet reached location in the code.

Anyhow, there are plenty of ways around using 'goto'. You favor local functions. Of course, it seems that the most common of the languages with 'goto' (C, C++) don't have local functions...

Goto not dropped

The consensus, such as it was, converged on restricting goto to outward-only control transfer, generally performed in a syntactically structured fashion, and often in a type-checked way. Try/catch, RETURN-WITH, and loop "break" constructs are all examples of this, and generally considered to be sensible and harmless.

Use of first-class labels in GCC is quite rare, appearing primarily in dynamic code generators.

Look at history

`typeless' variables as in B.
self-modifying code.
Wierd lexical schemes like APL.
`natural' English like FLOW-MATIC and COBOL.

Self-modifying code and eval

eval and monkey-patching are prevalent in dynamic languages. Could that be seen as self-modifying code?

What's with Java?

I note with some amusement that several of the suggestions made so far have met with "well, Java doesn't do that, so it isn't settled". Is Java now the odd duck of GP programming languages?

Yes, or...

Maybe just the most conservative of the lot. Or maybe just so widely known that it's easy to find vocal communities that will argue against anything.

I think its a notable development

that Java is increasingly being cast as a "legacy" language.

Whitespace as a token separator

is kinda a resolve issue--nobody strips it out like Fortran did.

And another syntactic prediction--the rise of internationalization (whether Unicode or otherwise) and the difficulties that come with case-folding in natural language other than English, will ultimately doom case-insensitive languages. Though I do expect some languages to continue and reject identifiers who differ only by case.

Whitespace

I think "whitespace as a token separator" might be resolved, but "syntactically significant whitespace" is still open (Python keeps the flame alive).

There once was a language in which type styles were significant.

Haskell is another notable

Haskell is another notable language in which whitespace is syntactically significant - though there's a whitespace-irrelevant form that code can be reduced to.

That one's far from dead IMO and may end up being won in favour of whitespace-significance in the long run.

Now that you mention it...

...F# offers a whitespace-significant option (#light) which "relaxes" some of OCaml's block syntax in favour of indent levels -- and pretty much everyone seems to prefer it. Looks like this one is actually spreading.

Much easier reading

There's no doubt that F#'s white-space block syntax is easier reading than OCaml style.

Boo has it, and Nemerle has an option to go from curly brace to block syntax. I wish C# would offer the option.

Cobra

Cobra, too.

and we can add YAML to that list

though it's not a general purpose programming language but a data serialization language, YAML is relatively new and embraces syntactic whitespace

Styles, colors, ...

Not only style, but also color (the inventor suggested to use style instead for colorblind programmers).

Not resolved but will be soon

  • The evil of the use of locks and semaphores to protect shared data will be widely accepted.
    In five years these constructs will be used only in system level contexts. We will look at them the same way we look at manual memory management today.
  • All (relevant) languages will ship with a testing framework as part of their standard library. Use and maintenance will still not happen often enough.
  • Most languages will target virtual machine code like the JVM and BEAM rather machine code.

Baader-Meinhof

Baader-Meinhof strikes again: I just ran across this posting today, saying something similar about burying locks and semaphores under higher-level constructs.

You are over optimistic, I'm

You are over optimistic, I'm afraid. It has been known since the days of Per Brinch Hansen and his work on Concurrent Pascal that locks and semaphores are evil. That is about 25 years ago. And still, many important languages designed after that period still use simple locking primitives.
If I remember correctly, Mr. Hansen wrote an article about how disgusted he was with computer science because of this fact.

Clojure

Is a nice start in a promising direction. For my own work I'm pursuing a transactional actor model that should achieve transactions across distributed communicating actors (motivating examples on page).

But, on the other side, many languages are still being designed, even those for systems programming, that don't even acknowledge concurrency and don't provide a process model. BitC, for example. It is almost impossible to effectively integrate transactions or higher-level protection primitives into languages that never acknowledged concurrency in the first place. One is stuck with semaphores and mutexes and slightly-higher-level concepts.

I think the prediction is aimed in the right direction, but I agree that the timeline offered is optimistic. Five years on, depending on language prevalence, we might start seeing widespread adoption and education in language transactional techniques among programmers. After that, it will be many more years before the debate is considered 'resolved'.

I, for one, am waiting for

I, for one, am waiting for the dustle to settle on the concurrency model debate before I settle on one for my language. I have enough work to do just to get my single threaded implementation off the ground :-)

You'll be waiting...

... for at least twenty more years, I suspect.

There are quite a few concurrency models out there. And I suspect that almost every single one of them is better than the concurrency model you get by default when you don't choose a concurrency model.

Choosing to not make a choice is making a choice.

Me too, though I do think

Me too, though I do think that transactional memory looks promising in this regard, and I'm fairly convinced that linear types are useful.

Exceptions

Taken over various mainstream programming languages exceptions and exception handling mechanisms are converging, not?

The current way of handling

The current way of handling exceptions is probably not the best solution and something will come a long to replace it. Anyways, the premise of this topic, that there are resolved debates, is probably wrong. Anything and everything is debatable, there is no right/wrong answer, just a bunch of tradeoffs to make and reuse ideas to ship your language and make it understandable to others (e.g., they understand exception handling in Java, so just do something like that).

Resolved negatively

Structural code editors (i.e. that would not let you make code that doesn't parse) failed. Turns out that today's style of IDEs, which let you write whatever you want and simply highlight errors and have good error recovery, have won.

Similarly, graphical programming (stitching together boxes and arrows) has been shown to be far inferior than good-ole textual programming, relegated to UML and design tools that generate boilerplate code.

I disagree on graphical

I disagree on graphical programming. Excel dominates in the business community, and Max/MSP dominates in the electronic music community.

Text *is* great for the wide world of compilers, so keep on trucking :)

Excel is textual

I don't know about Max/MSP, but Excel formulas are fundamentally textual. While there are nice shortcuts in the GUI for entering cell references and updating relative cell references as contents move, users need to be able to edit the formulas directly for anything more than the simplest functions.

Excel is a mixed, but most

Excel is mixed, but most would stop using it if the graphical interaction modes were dropped.

Max/MSP (http://www.cycling74.com/) has a bigger emphasis on visual composition. The only textual part you write is a leaf, or a native code extension. The community is huge and in love with the 'patcher' motif (e.g., applying it elsewhere, like to the web: http://www.lilyapp.org/).

Others in control and safety-critical systems

Not to mention LabVIEW, Simulink, and SCADE. Berkeley's Ptolemy II also supports code generation.

All of which I consider DSLs.

All of which I consider DSLs.

Fair point

That's a fair point. I'd somehow managed to forget that the thread is nominally about GP language design :-)

The term DSL gets stretched

The term DSL gets stretched pretty far. What about systems languages (C), enterprise languages (J2E), compiler languages (LISP), etc. -- are those DSLs?

If the deciding characteristic is social (number of programmers finding it effective for their tasks), I'd be tempted to call Excel a major one. If the distinction is diversity of commonly performed tasks, perhaps -- though individuals do the wackiest things with Excel.

None of those are DSLs

Well, of those, I would consider C and Lisp to be general purpose languages, and J2E is a just a massive library, not a language. The first two are both de jure and de facto GPLs because they were both intended for general purpose use and have been used so.

Historically, neither was

Historically, neither was intended for general purposes.

C was for Unix, and Lisps for AI.

Edit: In more modern use, C is for other low-level systems requiring reasoning about hardware abstractions, and as AI has largely moved on from the Lisp domain to ones more suited to statistics or performance (Matlab/R, C, or Python) and current Lisp seems more common for language manipulation with some notable exceptions. These domains are larger than, say fancy calculators or webpage templates, but they are still very much so concrete domains driving how the languages are being evolved and what typical users worry about.

But text vs binary is resolved

As in, since the invention of the assembler, everyone writes their programs as text instead of writing the machine level directly by toggling switches or something similar.

That said, I do hope something more structured than text will make it's way back to programming again.

Implicit "self"

To self or not to self...

Implicit Receivers

"I" am confused...

Well, I doubt you can call "implicit self" a resolved issue. In Actor Model there is good reason that even actors might not know it'self'... unless told.

Of course, you can't even call Object Oriented a resolved issue.

oh, and another one

Everybody seems to have agreed that Perl == ugly. I just about never see anyone try to counter that anymore

Avoid Language-specific Resolutions

I've never even seen Perl fans reject that one. I suspect beauty is not among the top three measuring sticks that fans of Perl apply to languages.

Regardless it's a bit off-topic to mention specific languages as 'resolved debates in general purpose language design'.

Of course they don't argue.

Of course they don't argue. Job security is a wonderful thing... :-)

Static syntax

I agree with David Barbour's comment that syntax isn't a settled matter. Beyond that, I think this is a place that there should be space for different approaches.

But one very valuable property of syntax does seem to be accepted wisdom: you should be able to lex/parse the syntax without running the program. This was not always the case: there are some legacy languages out there where the dynamics of the reader could be manipulated during program execution. TeX, a cornucopia of both good rendering algorithms and bad PL design decisions, is probably the most important such language.

Postscript: Following the mention of languages such as Katahdin & Converge in the Macro Systems thread, I see that the above isn't and shouldn't be accepted wisdom, since there is interest in embedding DSLs with user defined grammars & the whole enterprise isn't incoherent. I'll refine my claim to say there should be a well-behaved separation between syntax and evaluation of code, such as is violated on a grand scale by TeX, and also, in a less grand sense, by some macro expander languages such as m4.