Aha! Programming Language

I recently started my own programming language project:

Please check it out and let me know what you think.

Comment viewing options

Constructors and Destructors?

Destructors/finalizers were the feature that finally convinced me that language-supported OO systems were at least somewhat worthwhile. Before a project where I had a really solid NEED for destructors came up, I'd been doing OO by hand when I decided OO was right for a particular project, rolling my own vtables and dispatch.

Destructors, anyway, are certainly worth the trouble in languages without garbage collection and in garbage-collected languages with weak pointers. Constructors, on the other hand, have never yet been more than a convenience for me.

Overloaded operators / predefined generic functions are also a pretty nice win that get me things I couldn't get by rolling my own, short of a source-to-source transformation.

Ray

What is the question?

Excuse me, what was the question?

I don't believe GC is actually part of a language, it's just a memory management strategy, right?

Aha! has neither constructors nor destructors. It has no classes either. To create an object, one writes an 'obj' construct; if necessary, 'object factories' can be created - functions that return objects.

I don't really understand what is the NEED for destructors in a language with automatic memory management. Could you explain?

There's no real need, if you do automatic memory management.

if your memory management is automatic, you don't *really* need destructors. They're darn handy if you have aspects of memory management that aren't automatic, like a user-managed heap or 'weak' pointers that can point into deallocated memory if not handled. They can be useful otherwise but aren't really critical.

I wasn't really responding to a question; I was just commenting on a particular feature of an OO system (in a language without GC, as it happened) that was finally compelling enough to get me to use the provided system instead of just treating OO as a technique to use regardless of and without resort to language support.

Ray

No pointers

Fortunately, Aha! frees the developer from using pointers and manual memory allocation. I thought this part was obvious. I don't like the fact that such languages are automatically referred to as 'garbage-collected'; it's like giving medals to Olympians depending on how they trained - e.g. in the morning or in the afternoon, and not for their actual achievements.

I was just commenting on a particular feature of an OO system (in a language without GC, as it happened) that was finally compelling enough to get me to use the provided system instead of just treating OO as a technique to use regardless of and without resort to language support.

OK, so are you prepared to learn Aha! without waiting till it introduces destructors? :)

I don't really understand

I don't really understand what is the NEED for destructors in a language with automatic memory management. Could you explain?

Most languages with automatic memory management end up adding destructors, or finalizers, when they mature. The reason for that is that structures may hold references to resources (like a file handle, or a database connection) which needs to be cleared when the structure is garbage collected. Since there are lots of scenarios where you simply statically don't really know when these structures need to let go off resources, one cumbersome way out is to add finalizers.

Not the best approach

Most languages with automatic memory management end up adding destructors, or finalizers, when they mature.

Can you give some examples of such languages?

I don't believe using destructors for this purpose is the best possible solution. This approach makes it impossible to fully control the timing of the resource deallocation; since the finalizer is called when the object is destroyed/garbage-collected and not when the resource is not needed anymore, I would imagine that such systems tend to consume more resources than needed.

I think Haskell is one

I think Haskell is one example. But it's more something I know from reading a number of papers, or emails, after which I forgot the specific languages. I have a lousy memory for specifics. Does it matter?

A number of languages will have finalizers by default, given the experience of other language designers. I think Java is an example of that.

Oh, btw, I agree that finalizers aren't the ideal solution. But there are no other real solutions. Either you keep track of resource usage explicitly, or you finalize on garbage collection. I don't think there is a real alternative except for those.

keep track of resource usage

keep track of resource usage explicitly, or you finalize on garbage collection. I don't think there is a real alternative except for those

There's an additional option of developing programming abstractions that make resource management or life-cycle management relatively easy and implicit.

Timing the deallocations.

Java, like several other languages that use finalizers to control such things, does most of its memory management by reference-counting in most implementations.

Reference-counting isn't the most efficient of all garbage-collection algorithms, and is inapplicable to cyclic structures, but it has the advantage of being prompt. When you drop your last reference to something, it gets finalized and released immediately.

If you want to fully control the timing of resource deallocation, you only have to keep that resource out of cyclic structures. If you don't care so much, you can have it in a cyclic structure, and every so often there'll be an unpredictable delay before the cyclic collector finds and finalizes it.

Ray

Extensions

In Aha!, we promote a different approach. Allocation/deallocation of resources can be implemented using so-called 'application extensions'; such an extension is an object that controls a specific resource depending on the state of the application. For example, the extension that is associated with a specific file on the file system has a 'property' that accepts a 'callback' function that tells it when the file needs to be open (e.g. when specific data are available). There is an example of it on my Web site (SortFile); unfortunately, the API has not been published there yet.

Right...

You can do this, but one could argue your solution is worse than adding finalizers to your language.

Let's take a typical scenario, a web server which can have any number of open network sockets all of which use a common resource, a database connection. This is in some sense a typical reference counting problem: the database connection needs to be open while any socket object references it. (Usually database connections are pooled; but one shared resource is mostly equivalent from a theoretical perspective.)

In your scenario, the database connection would need to poll _all_ network connections to see whether any of them are alive, which is a bit bogus, as long as a network connection exists, the database connection is needed. Moreover, by backreferencing the network connections from a 'life' object you're keeping the network objects alive whether a connection is closed or not; this is completely opposite to what you'ld want. (You can solve this a bit by tracking all life connections in a pool, which is explicitly managing memory/resource usage. Still a considerable overhead.)

You could also just keep a reference count in the callback. But then still, polling occasionally is a lot more work than finalizing on garbage collection, since, when finalizing, the garbage collector does the reference counting for you. I.e., where you're polling continuously in probably every scenario you can come up with, there is little to no overhead with a finalizer and garbage collector scenario.

Moreover, the problem with finalizers is that you would like the database connection to die immediately when it is no longer needed/referenced. Your proposal would have the same 'bad' behavior as finalizers, i.e., the database connection would die when the callback is called, which would do the reference counting, pretty much similar to garbage collection, where garbage collection could, possibly, detect it earlier, pending on how often you poll.

(This whole exercise could have been written as: It is probably both cumbersome and expensive to continuously poll objects with an "Am I still needed?" message instead of just relying on them to die at one point or another.)

I've seen proposals, and I

I've seen proposals, and I think implementations, for hybrid collectors where reference counting is used for short-lived objects only, since many programs usually generate lots of short-lived objects (in the nursery). I doubt most implementations use reference counting for long-lived objects.

Actually, it's the opposite.

Actually, it's the opposite. Reference-counting for long-lived objects and copying collection for short-lived objects. This is called age-oriented collection.

Great, I can find your the

Great, I can find you the reverse scheme for Java. I'll Google a bit later.

(The way I look at it by now, they've tried every combination possible. And every paper will usually tell you that they outperform others on specific benchmark...)

That paper actually

That paper actually implements the age oriented GC for the JVM. You want to ref count the old generation because long-lived objects rarely die, which means a tracing GC would be touching these old objects over and over again for little reason.

Also keep in mind that this is not a naive ref counting strategy where you update the count on every pointer update.

Ah man. Like I said, the

Ah man. Like I said, the world is a big place. Every combination has been tried. I don't remember the specific paper, but I do remember the rationale behind it. Basically, since all reference counting of short lived object were mostly between one and zero, i.e., one reference from the stack, they decided that reference counting is simpler. Or rather, since most short-lived objects don't form cycles reference counting the nursery made sense to them. And since you need to reclaim cyclic structure, they implemented general graph based collection on the older generations.

Ah well. Can't find it. I

Ah well. Can't find it. I remember being baffled by a US study with reference counting on the heap, and a German master thesis project (I think) doing the exact opposite.

There is a study, yet another one, which compares both schemes from roughly the same author as your age-directed collection. Read it here.

The thing I got from reading garbage collection papers is that most optimizations will pay off some way or the other, and that schemes which take into account the memory scheme of the processor are probably among the fastest. No idea whether that conclusion is warranted though.

Nursery

Copy collection is superior for short-lived objects because it only touches the living ones. Further, for a "bump pointer" nursery, we need to ensure the memory is not fragmented. So there is a lot of pressure for copy collection in earlier generations. Haskell uses this approach.

I think the counterargument

I think the counterargument against that is that most shortlived objects are only referenced once. Since they are only referenced once, and hardly refer others, you don't get the typical 'stuttering' behavior of reference counting where a lot of objects die in one turn. Or put differently, memory in the nursery can be reclaimed immediately for shortlived objects when the reference is zero, alleviating the need to sweep the nursery. (Though the latter may be cheap.)

I don't think you can say things about this with any certainty.

(The problem is also that you need to study the intricacies of the actual implementation. The adagium by a few compiler writers I know was "Nothing beats free lists," though this is twenty years ago. Which begs the question how you keep track of memory. I am thinking concretely of the nursery: as one contiguous block, one list of free and allocated objects, or a series of free lists memorizing different sizes (to be) allocated. Just the remark "It is reference counting" isn't enough, the actual implementation matters a lot. One would need to implement a number of different schemes to get at some conclusion, and that would probably only hold pending specific benchmarks on specific processors.)

(Which may be a problem with the paper I just referred and, truthfully, only glanced at. The way I read the paper it seems to suggest to _reclaim the younger generation in a sweeping manner_ even when reference counting. If true, this totally bypasses a major advantage of reference counting which is the advantage of reclaiming an object _immediately_ when the reference count is zero. Btw, I don't like reference counting myself; just saying it isn't that easy.)

Update

I have updated the documentation on the web site: the Language Reference (second draft), the Getting Started guide, the Tutorial and examples. There are only minor changes to the language syntax. The Getting Started was completely reworked, thanks to the constructive criticism by Sean McDirmid.

Feedback

1) Does bound counting really help write correct programs? Consider programs that manipulate trees. It's easy to set up a lazy sequence that enumerates all of the nodes of the tree in some order. But if you want to traverse that sequence, you must worry about maintaining proper node counts (or at least bounds), so that you can assure the language that the traversal will be terminating. But the language does nothing to help you verify that your counts are correct. And if your bound estimations are wrong in some edge case, the traversal will fail in an unexpected way, probably leading to program failure.

So what has this discipline gotten us? It's forced us to attend to details we could have safely ignored and in the process exposed us to a new class of bugs. Will this forced attention to counting lead to programmers gaining a better understanding of their programs? Perhaps. Or will it will lead to bound estimates of (numNodes + 1000000) while the programmer ports his code to another language?

2) It looks like you intend some kind of analogy between objects and sequences when you use the syntax seq^ for head value and obj^ for object state. But that analogy doesn't work. A sequence is not completely defined by its head value, whereas an object's future is completely defined by it state.

3) The syntax of this language is annoyingly verbose in places. Why isn't 'all' the default when you open a new 'let' or 'where' block. For example:

let
x = 1
y = 2
end

Is there some reason for that? If the answer is just 'symmetry' or something, that's lame.

4) Your definition of the language semantics is incomplete - you don't discuss the operational semantics of this language at all. Presumably the following program:

let
((x-1)*(x-1) + (y-2)*(y-2) = 0)?
end

cannot reasonably be expected to bind x = 1 and y = 2. But it's not clear to me from your documentation whether you intend, e.g.:

let
xs = [1 2 3]
end
let
xs = [x y z] -- Defines x,y,z
end

or more general prolog style unification. It's my impression that you don't intend this kind of thing. If you do, there should be an explanation of how this works. If not, you should more clearly lay out the rules for what constitutes the definition of a variable.

5) I would replace 'unless' with a 'case' that takes a general number of statements:

case
x = case1
x = case2
x = case3
end

This works like 'any', but requires sequential evaluation.

6) I still think booleans should be a proper type. Comparisons should be of type boolean and always succeed. (exp)? should convert a boolean into success/failure.

Thanks

Thank you very much for looking at the language.

Does bound counting really help write correct programs?

It's a good question. I believe it does. Making the developer estimate the potential number of iterations, rather than believing that 'it will terminate eventually', should give him/her better understanding of the task in hand. I don't understand what you called a new class of bugs. Program failure is not a bug. A bug is rather 'failure to fail' when wrong results are about to be given (or when the program is in an infinite loop). See my updated Getting Started guide for my point of view on this.

A sequence is not completely defined by its head value, whereas an object's future is completely defined by it state.

In Aha!, a sequence is a particular case of an object. It's not defined by its head value but it's defined by its head value and the iteration rule (i.e. the skip action).

Why isn't 'all' the default when you open a new 'let' or 'where' block

That's what I thought initially. Then 1) I decided that having 'defaults' provokes bad decisions and 2) it turned out that in most examples I wrote 'any' is used more often than 'all'. Remember that 'all' works 'simultaneously', i.e. the statements in it can't use each other's output; you need 'let' or 'get' to 'execute in a sequence'.

Your definition of the language semantics is incomplete - you don't discuss the operational semantics of this language at all.

This is intentional: I'm against using operational semantics to define a programming language. The semantics should be described in terms of logic, not how it's supposed to be implemented.
Sorry, I didn't understand what you tried to show me with those examples.

I would replace 'unless' with a 'case' that takes a general number of statements

I disagree. This would make 'case' the preferred choice for some users, and this is not what I want. 'unless' is logically a more complex statement than 'any' and I prefer to use it only when it's really needed.

Comparisons should be of type boolean and always succeed.

There are many reasons I'm against Booleans in my language. I'm still not convinced that I need them - unlike the other primitive types.

Even though I didn't agree with you on many points, your feedback is valuable to me. It just shows me some of the things that potential users may struggle with.

I don't understand what you

I don't understand what you called a new class of bugs.

You write a correct program that produces a finite sequence. In another language, you could ask for the final value, but in Aha! you must estimate the size of the sequence. If you mis-estimate, that's a new bug.

The semantics should be described in terms of logic, not how it's supposed to be implemented.

If you don't have operational semantics, then you have a specification language and not a programming language. To actually get work done, programmers will need to learn the variant of Aha! that actually has an operational semantics - the one defined by your compiler.

Operational semantics? Too early.

If you mis-estimate, that's a new bug.

Of course, if you misuse any feature, it's a potential bug. There's no bullet-proof features.

If you don't have operational semantics, then you have a specification language and not a programming language.

Right, and I call it a declarative language. Any implementation that satisfies the specification is good. The specification of the language itself must contain just enough information for the implementer to decide how to do it, leaving enough flexibility for optimizations. In other words, it should say 'what' and not 'how'.

To actually get work done, programmers will need to learn the variant of Aha! that actually has an operational semantics - the one defined by your compiler.

When the language is implemented, the implementer should present the documentation describing how to use best this particular implementation, obviously. Should such documentation be part of the language specification? I don't think so. At this stage, I can only give some ideas or recommendations to the implementers.

BTW

[...]operational semantics [...] forces people to think about programs in terms of computational behaviours, based on an underlying computational model. This is bad, because operational reasoning is a tremendous waste of mental effort.

E.W.Dijkstra, On the cruelty of really teaching computing science

Definition

4) Your definition of the language semantics is incomplete - you don't discuss the operational semantics of this language at all. Presumably the following program:
let
((x-1)*(x-1) + (y-2)*(y-2) = 0)?
end
cannot reasonably be expected to bind x = 1 and y = 2. But it's not clear to me from your documentation whether you intend, e.g.:
let
xs = [1 2 3]
end
let
xs = [x y z] -- Defines x,y,z
end
or more general prolog style unification. It's my impression that you don't intend this kind of thing. If you do, there should be an explanation of how this works. If not, you should more clearly lay out the rules for what constitutes the definition of a variable.

You don't need operational semantics to understand that. Language Reference clearly states that the syntax for a definition is
 variable = expression 
so (expression = expression)? is not an allowed form of definition (it's rather an assertion). Definition simply binds a variable to the value of an expression, which must be computable in the context.
PS. The let statement must have at least one where clause.

No interest?

I'm trying again to revive this topic. It's a little strange that after I've formulated the design goals and main principles of Aha! in a new introductory document, the discussion completely stopped. Nothing to discuss? I don't believe so.

Its like this: you announce

Its like this: you announce your project and there is a lot of discussion, but then they point out some problems and you make some changes that need to be made. When you come back, they've already moved on to something else, and because your project is no longer new, they don't provide as much (if any) feedback anymore. You've used up your "first impression" credit and the "second impression" credit is not as big as the first.

When I write a paper, I either (a) have a lot of potential reviewers or (b) spend time making the first draft fairly good so I don't burn my single (or select few) reviewers out. I can give reviewer A an early draft and reviewer B a later draft, but I don't go back to reviewer A since he probably won't enjoy reading the draft twice.

That being said, I think your intro document is very much improved! It looks something like a "functional constraint programming" or something like that. But your story still seems kind of preachy and technical, I'm wondering what your layman story would look like? If someone is not a PL enthusiast, how would you sell your language to them? Also, as an academic, I really suffer from a lack of a survey on related work that would describe your work in the context of what's already out there.

Second impression

Thank you Sean for your comment. I'm a little surprised to hear that my story still seems kind of preachy and technical; I strived to make it easy to understand even by non-technical people.

It looks something like a "functional constraint programming" or something like that.

I would avoid classifying the language in any way - I'm afraid, trying to compare it to something you already know may be what prevents you from really understanding it.

You are espousing the

You are espousing the benefits of declarative programming as being a benefit in itself; imagine you had to describe your work without being able to use the word "declarative programming," without being able to rely on existing knowledge that declarative programming was good, how would that change your narrative?

For myself, I'm not a layman, but to reach PL experts, if this is what you want to do, you really need to focus on putting your work into the context of what already exists, since we are bombarded by many new languages in any given week its difficult to be impressed without being told exactly what we should be impressed about. Now, you don't have to really target us, we are not a big group of people whose opinions really count (and we probably won't use your language), it might be better just to go after the potential users of your language and gain acceptance that way.

'Declarative is good'

imagine you had to describe your work without being able to use the word "declarative programming," without being able to rely on existing knowledge that declarative programming was good, how would that change your narrative?

I don't think it would change much. Maybe I would remove a sentence or two, that's all. I do not rely on any prior knowledge that 'declarative is good'. Where do I?

it might be better just to go after the potential users of your language and gain acceptance that way

Maybe, but only after the language is implemented. So far I'm only trying to collect some feedback from people who understand programming.

I had trouble seeing this topic had continued on a second page, and only figured it out through persistence; other folks might try less hard. I recommend starting a new topic to discuss just the new introductory document.

A bold phrase near the top of the new GettingStarted document reads: "stop thinking of programs in terms of their behaviour." So I had the following reaction, which I hope is constructive.

That might be hard on a newcomer who wonders: What will I see happen when I use this language? You might want a section about what an Aha! coder perceives happen when using the language, but maybe that's in a tutorial. Presumably there's a code editing step, then maybe other steps before execution. A getting-started document might include a concrete example of someone doing those steps, so there's context a person imagines when thinking about using Aha!, wherein they get benefits of using this language rather than another PL.

So far, I'm imagining an interactive REPL: taking what I type, calculating the result of my code, then printing them. In absence of side effects, I have trouble picturing what else might be allowed to happen. So basically I suggest you tell folks what to imagine. It would not be bad to be extremely literal here, like advertising to "picture yourself doing XYZ", with a classic example of some travel ad saying, "Come to Jamaica and fall in love." While it might seem cheesy, it's probably necessary for some folks.

Or if you don't like an advertising metaphor, I have another that's both worse and better: a magician telling an audience what to see. This may be confusing, but is likely closer to the heart the issue, which is that people seldom know exactly how to take something without guidance or gentle hints. It's not evil unless your message includes, "You are getting very sleepy, and feel an urgent need to write a check now, made out to yours truly." (Hyperbole edges over into humor so easily; sorry about that.)

Good points

You make some valid points here.