Dodo

Hi all,

I am starting to work on my own pet language and I joined in to make it public here, as I like very much to follow the very informative discussions hosted at Lambda the Ultimate.

I hope to contribute more in the future. Though I have to confess that proofs and lambdas are not my forte, I did not really look into these since my student days. So what I can do at the moment is to answer the questions you may have on dodo as it gets better defined and evolves over time.

This is the URL of the dodo home page on Sourceforge.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Parser

Can I ask my own questions?

I would like to know if there is an easy way to make a parser for my language so that I don't get nasty surprises. I spotted a couple of ambiguities in the syntax already, hopefully that's fixed now. But automating the process would be nice.

Parser generators everywhere

Try googling for "parser generator" - there are zillions of them. Which one you choose can depend a lot on what language you want to use to host the parser. One recent LtU thread which contains relevant links is Parsers that allow syntax extensions. The classic parser generator is YACC, and the GNU version of that is Bison. Two well known Java ones are Antlr and JavaCC. There's also a Toy Parser Generator for Python. This list is ludicrously incomplete.

Thank you for the

Thank you for the links!
Looking into it.

Here is another good one:

the Gold Parser.

I've been using it a lot and it has got some nice graphical tools...plus engines for lots of programming languages.

Unfortunately I won't be

Unfortunately I won't be able to use Gold as it is available only on Windows. Never mind, I can look at the other options here.

The Spirit parser framework

One of the parser frameworks that may be worth a look is the Spirit parser framework -

The Spirit home page

Spirit is a C++-based framework which is (imo) very easy to use.
I say that and I'm really not a C++ coder at all - I'm just a copy-and-paster in C++. But, that's all you need to be in order to use Spirit. It's *really* fun to use.

You can write your grammar (in what is *very close* to BNF syntax) directly in C++. Have a look at a few examples in the apps repository, and that should give you a good feel for things.

Ambiguous grammars

When you say "I spotted a couple of ambiguities in the syntax already", I wonder if you are looking for a tool that will take a context-free grammar and tell you if it is ambiguous.

Unfortunately, it is undecidable whether a context-free grammar is ambiguous, but it is often possible to prove that some particular grammar is unambiguous; some parser generators will do this for you as part of generating the parser, but others will not. For YACC and Bison, if you don't use any precedence declarations in your input grammar and the tool runs successfully without printing out any messages about "shift/reduce conflicts" or "reduce/reduce conflicts", then the input grammar was unambiguous; if they do print such messages, then the grammar may or may not be unambiguous.

It is a matter of opinion whether support for ambiguous grammars in a parser generator (with no compile-time ambiguity test) is a bug or a feature; presumably, the authors of the Spirit framework think it's a feature, or they wouldn't put "capable of parsing rather ambiguous grammars" in the first paragraph of their website. My own view is that a programming language should have a formal grammar definition, and that an unambiguous LALR(1) context-free grammar is the form of formal grammar definition which will give implementors the most flexibility in correctly implementing the grammar. (LALR(1) is the name for the set of grammars that YACC and Bison can prove to be unambiguous.)

Other possibilities for formal grammar definitions would include ambiguous grammars augmented with precedence information, intended to be fed into YACC or Bison (but be careful here; YACC precedence declarations don't always mean what you might expect), or Parsing Expression Grammars (which have disambiguation information built into the grammar, in a formally specified way; but again, be careful, the disambiguation isn't always what you might expect).

Well I take it for granted

Well I take it for granted that the syntax of the language should not be ambiguous. My reasoning is that if it is difficult for the computer to parse it, how will it be for the person who reads the code?

I think I will simply try writing a BNF grammar for the language. That should help weed difficulties out. Should I use left recursion for LALR(1)? For example:

  arguments ::= arguments "," argument
              | argument

It seems several of the tools (Spirit, Gold) aim at being BNF compatible so it is a good starting point.

Left/Right recursion in parser

I'd use left or right recursion as appropriate for the subproblem at hand. For example, in parsing
1:2:3:4:5:[]
I'd use right recursion.

YACC can handle left as well as right recursion, although there may be a fixed limit with right recursion due to parse stack size.

dangling else

A common source of ambiguity is having both IF predicate THEN consequent and IF predicate THEN consequent ELSE alternative. There are two distinct concepts.

IF..THEN.. makes some code optional. When the predicate evaluates to false the code is skipped. An IF..THEN.. will always be a statement, even if the consequent is an expression; what would its value be when the predicate is false?

IF..THEN..ELSE.. choses between alternatives. If the consequent and alternatives are both expressions, then the IF..THEN..ELSE.. could also be an expression.

If you use distinct initial keywords, perhaps OPT and ALT or WHEN and IF, you will not need a precedence rule to disambiguate elses because they will no longer dangle. For example

IF p1 THEN 
  IF p2 THEN a1 
    ELSE c

is either

OPT p1 THEN 
  ALT p2 THEN c2 ELSE a2

or it is

ALT p1 THEN OPT p2 THEN c2 
  ELSE a1

There is then neither a conflict in the grammar, nor a paragraph in the language manual describing dangling else, nor a defensive coding convention to reduce errors.

Re: dangling else

Thank you for your suggestion. I will go through the syntax to check again, but I don't think dodo suffers from that problem.

In dodo "if" is used as a ternary operator

if (x > 0) sqrt(x) else exp(x/2)

The conditional branching uses "test", which is similar to "switch" in C

test
{
   x > 0   {return sqrt(x)}
   default {return exp(x/2)}
}

There can be other constructs that are not safe but I have none on the top of my head.

Found a parser

I am using Aurochs as parser at the moment, you can find a preview of the grammar on the dodo web site.

The reason that parser works for me is that you can see immediately the structure of your AST (it outputs a XML tree). The reason I do not recommend anybody using it is that when you output a XML tree, the only error you will see is "Parse Error (0)" when you do something wrong. Nice heh?

Anyway check out the current grammar, I was able to parse one of the examples with it.

Mixins

What I described as "qualifiers" correspond to my understanding of the word "mixin". I realise so far I did not define them in great detail, but can you determine if they are the same yet?

Dodo qualifiers

This is my description of dodo qualifiers, is that different from mixins?
------

A qualifier can be specified to add properties to the class that are the same for all classes that respond to that qualifier. For example the qualifier Motorised could add the attributes engine, wheels and speed to the class.

A qualifier is defined like a class with an attribute parent that is an instance of the class it is applied to. For example, if I want to define a Inhabited qualifier for the house I could write:

qualifier Inhabited
{
    link People[] inhabitants
    Drawing draw(Canvas support):
        Drawing result parent.draw(support)
        loop foreach(inhabitant in inhabitants)
        {
            .result = inhabitant.draw(result) #Drawing can act as Canvas
        }
        return result
    .
}

This qualifier replaces the draw function in House with its own that draws the inhabitants as well as the house. That supposes Inhabited is applied to a class that has a draw function, of course.

I hate loops

I just realised, for the example I pasted earlier I am using a loop for something that should not really necessitate one. It comes like that:

        loop foreach(inhabitant in inhabitants)
        {
            .result = inhabitant.draw(result)
        }

Which means if there are three inhabitants:

   .result = inhabitants[3].draw(inhabitants[2].draw(inhabitants[1].draw(result)))

Which is conceptually the same as:

   .result = draw(inhabitants[3], draw(inhabitants[2], draw(inhabitants[1], result)))

How do other languages denote that? I can imagine a function that takes 3 arguments: the list, the function and the initial value. But I don't know how to call it so that it makes sense.

From an FP perspective...

...the big three are Map, Filter, Fold. I gather the one you are looking for is fold (which can either be left-to-right or right-to-left). In Standard ML, fold left could be defined recursively as:

   fun foldl f u nil     = u
     | foldl f u (x::xs) = foldl f (f(u,x)) xs

A simple use is to sum the contents of a list:

   val mysum = foldl (fn (x,y) => x+y) 0 [1,2,3,4])

Or to put it in your terms:

f - function to apply
u - initial value
nil or x::xs - the list

I guess I should also add that you can use currying to define functions in terms of fold. So for example, a sumlist function could be defined as:

   val sumlist = foldl (fn (x,y) => x+y) 0
   val mysum = sumlist [1,2,3,4]

Map, filter, fold

OK so map and filter seem to be like SELECT and WHERE in SQL:

 SELECT f(p.*)
 FROM Person p, Toothpaste t
 WHERE p.job = "Salesman"
   AND t.taste = "strawberry"
   AND p.toothpaste_id = t.id

We already talked about fold, except it is sometimes called reduce and the initial value can be part of the list. I will keep fold and a separate initial value.

I also read about unfold which looks like a C for loop, I think I can ignore it.

What may be missing is some kind of join to combine two lists.

Concatenation's actually

Concatenation's actually just another fold if you've got a typical list structure: you use cons and then the second list as the initial value. You might want to look at the zipWith family though - map is effectively zipWith1.

Not concatenation

I am not looking at concatenation, I am looking at combining the two lists (like vector product):

  [ a, b, c ]  *  [ 1, 2 ]
=
 /               \
 | a*1, b*1, c*1 |
 |               |
 | a*2, b*2, c*2 |
 \               /

If I flatten the matrix it is effectively a concatenation of [ a, b, c ] * 1 and [ a, b, c ] * 2. So I guess your remark was relevant to my problem after all.

The combination operator would probably be pair(a, b). For example I could generate all the possible pairs (salesperson, strawberry toothpaste). Then I could filter again, like the join filter in the SQL statement above: p.toothpaste_id = t.id

List comprehensions

In Haskell, you can easily write all this using list comprehensions. E.g. your example (assuming properly typed persons and toothpaste lists):

[(p,t) | p<-persons, t<-toothpastes, job p==Salesman, taste t==Strawberry]
However, this is just defined as syntactic sugar for concatenation (i.e. fold), filter, and map, see Section 3.11 of the Haskell 98 Report.

Map and Filter

Not sure if it helps, but here's an ML version to define the two:

   fun map f nil     = nil
     | map f (x::xs) = (f x)::(map f xs)

   fun filter f nil = nil
     | filter f (x::xs) =
         if (f x)
            then x::(filter f xs)
            else (filter f xs)

Recommended reading

As Chris said, you're looking for fold. See the paper Tutorial on the universality and expressiveness of fold. Note that this paper is also mentioned in the Getting Started thread, which is full of references worth reading if you're designing a language.

The fold version of your example would look something like this in Scheme:

(fold (lambda (result inhabitant) (draw inhabitant result))
      result
      inhabitants)

or in Haskell:

foldl (\result inhabitant -> draw inhabitant result) result inhabitants

In both cases, the first argument to fold is a function which is applied in turn to each element of 'inhabitants'. The second argument is the starting value for 'result'; I'm assuming result is defined in the enclosing scope.

I am not sure why you use

I am not sure why you use a lambda expression here, is it just so that you can swap the arguments of draw? Is there no "foldr" or such that works with arguments in reverse order?

Anyway, if I do not take into account the arguments order what you wrote is like:

 (fold draw result inhabitants)

Which is close to my notation in dodo:

 fold inhabitants.draw(result)

Thank you for the links to read. I hope to ingest some of this before I have an interpreter out! :-)

Flip

There's a "foldr" in Haskell, but it doesn't do what you're asking. Foldr is a "right-associative fold" - it traverses the list from right to left, building up a value by applying the binary operation between each pair of elements.

Anton's example could've been written (in Haskell) like this:

foldl (flip draw) result inhabitants

Flip is a Haskell built-in that swaps the order of two arguments, eliminating the need for a lambda.

Continuations

How are continuations used, and why should I care about them?

Are they different from passing a message containing a callback function to a dynamic agent?
Sorry for asking again questions that were asked before, the answer is not clear for me yet.

Continuation passing style

In fact, I came with a style for continuation-passing style lambda functions that I quite like and I don't really see the use for standard lambda functions any more. Are there drawbacks in keeping only CPS lambdas in dodo? I would make sure that the continuation invokation is tailing so I imagine it can be optimised easily.

If desired, normal named functions are still there to use.

Programming with continuations

According to Anton: "For the most part, first-class continuations should be seen as a tool-building feature, not something you'd regularly use directly in an ordinary program."

Does the same apply to CPS?
Would you recommend continuations to build a control flow such as a parser algorithm or web site navigation (applications that make also use of abstract state and possibly versionning)?

Answered in followup

As James McCartney pointed out in a followup to my comment, the same doesn't necessarily apply to CPS. As my reply to James indicated, I was really thinking more of call/cc when I wrote that.

Whether to recommend using continuations (or CPS) depends on the language, the application, the requirements, the available alternatives. A number of web frameworks, for example, use call/cc under the hood so that the user's code doesn't have to be in CPS.

call/cc versus CPS

As James McCartney pointed out in a followup to my comment, the same doesn't necessarily apply to CPS. As my reply to James indicated, I was really thinking more of call/cc when I wrote that.

That seems to imply that the use of call/cc is not required to practice Continuation Passing Style. I was wondering about it, and I will need to make sure of it for my documentation work. Back to the textbook it seems...

That seems to imply that the

That seems to imply that the use of call/cc is not required to practice Continuation Passing Style


It isn't. In fact it is often used as a substitute for when you don't have call/cc and first-class continuations.

Algebraic data type

I am wondering if prototyping would give me algebraic data types for free. Consider the dodo code:

struct linkedlist
linkedlist emptylist
def linkedlist new struct:
   DodoValue head
   linkedlist tail emptylist
.
def emptylist new linkedlist:
   DodoValue head { throw UndefinedValue }  #explicit error
.
def foo new linkedlist(head "foo")
def bar new linkedlist(head "bar", tail foo)

Of course I need some type matching to do anything useful with it.

Capabilities vs I/O

Does it make sense to put capabilities and I/O in the same bag? My reasoning is that if a function requires capabilities, then it is likely to perform I/O or to interface with the host system. The converse looks reasonable as well: all I/O requires capabilities.

The compiler can consider any function that requires capabilities as impure and optimise it differently from pure functions. I hope that would not incur any serious loss of performance.

Associative functions and variable number of arguments

I would like to use a special signature for associative functions, since the associative property can be leveraged for automatic parallelisation. Associative functions verify the equality:

f(x, f(y, z)) = f(f(x, y), z)

If I define the signature as (using haskelly syntax)

assoc: `a `a -> `a

Then I can use the property to allow passing any number of arguments N to an associative function, with N >= 2.
Eg.

max(12, 5, 0, 1001, -3)

The compiler or VM can decide to group the arguments by pair as it sees fit, the result will be same since it is associative.
So far so good. Now what happens if I pass a single argument to my associative function? Logically the compiler would error out, but looking at the associative functions I can think of, it may actually make sense to return the single argument itself. Examples:

(+ x) = x + 0 = x
(* x) = x * 1 = x
max(x) = max(x, -infinite) = x
min(x) = min(x, +infinite) = x

Is there any theorem that associative functions have a neutral element such that f(x, neutral) = x?
I am specially interested in the single argument question in relation to fold, if I am correct that means that using fold with an associative function does not require a seed.

Monoids

Is there any theorem that associative functions have a neutral element such that f(x, neutral) = x?

No. You must assume that. In particular (and as will become relevant) monoids are a set (type) with a binary operation that satisfies three independent axioms: associativity and left and right unit.

I am specially interested in the single argument question in relation to fold, if I am correct that means that using fold with an associative function does not require a seed.

Lists are the free monoids (the binary operation is ++ and the unit is [], Haskell syntax). foldr is the unique monoid homomorphism from the free monoid (lists) to any other monoid (this is what freeness means). (f : A -> B is a monoid homomorphism (A,+,0) -> (B,x,1) if f(a+b) = f(a)xf(b) and f(0) = 1 where +, x, 0, 1 are the binary operations and units respectively and are meaningless despite their name (though if f(x) = ex then that is a monoid homomorphism with the usual interpretations of the operations and constants)).

foldr :: (a -> b -> b) -> b -> [a] -> b
doesn't look like a monoid homomorphism. This is because we use the associativity of (++) to fix the lists into a right associative form, i.e. [1,2,3] = [1]++([2]++[3]) = 1:2:3:[].

fold :: (m -> m -> m) -> m -> (a -> m) -> [a] -> m
fold mappend mempty f = foldr (\x xs -> f x `mappend` xs) mempty

foldr = flip . fold (.) id

-- or using the old instance for functions for Monoid
fold :: Monoid m => (a -> m) -> [a] -> m
fold f = foldr (mappend . f) mempty
foldr = flip . fold -- (edit: not quite so slick)

fold can use whatever parenthesization it feels like though assuming mappend is associative and mempty is its unit.

That settles it then. Thank

That settles it then. Thank you for monoids. In my case associative is all I want, so I will introduce a special fold that returns the seed if the list is empty instead.
How would you name such a beast?

Semigroup homomorphism

If you drop the identity element you get what is called a semigroup. Non-empty lists should be the free semigroup and foldr without the nil argument should be their fold.

Associativity

Sure. But since I am only looking at the associativity property, and the only notable property of semigroups seems to be associativity, I don't see what I can leverage from knowing that it is a semigroup. Maybe you were trying to tell me something about the free semigroup?

HaskellWiki

Crikes. I just read Fold - HaskellWiki which introduces foldr as

 foldr f z []     = z                  -- if the list is empty, the result is the initial value z
 foldr f z (x:xs) = f x (foldr f z xs) -- if not, apply f to the first element and the result of folding the rest

Am I wrong in suggesting that it should be instead

 foldr f z (x:[])   = f z x            -- if the list has only one element, apply f to the first
                                       -- element and the initial value z
 foldr f z (x:xs) = f x (foldr f z xs) -- if not, apply f to the first element and the result of folding the rest

since the "initial element" introduces a singularity, not being necessarily part of the set of values returned by f? I don't like that foldr can return z even if f has no neutral element.

In short, yes - it loses far

In short, yes - it loses far too many useful properties. Consider

foldr (:) []

as a trivial example that ought to work.

In short, yes - it loses far

In short, yes - it loses far too many useful properties. Consider
foldr (:) []
as a trivial example that ought to work.

Sure that is convenient. But what about that other simple example?

def sign a b = if a < 0 then if b < 0 then +1 else -1
   else if b < 0 then -1 else +1

foldr sign 0 [-3 5 -2] = +1
foldr sign 0 [-10] = -1
foldr sign 0 [] = 0

lf the input list is empty, I get a value that is neither +1 or -1!
Sometimes that may be useful, but it does not look correct as the default.

Uhm..

just do

foldr sign +1 [-3 5 -2] = +1
foldr sign +1 [-10] = -1
foldr sign +1 [] = +1

instead

Use a different initialiser

...which is part of the return set of the function.
Agreed, that is what a good programmer would do. But I can easily see an inexperienced programmer making the mistake and introduce weird bugs.

I guess that's only me. I will probably follow the convention, with a paragraph in the documentation for the caveats.

Just one little thing

Unless your language supports currying (like haskell and ML) I think it makes more sense to have the function argument positioned last, as its often a lambda, and considerably larger than the other arguments. Something like:

foldr [] someList (acc x => acc + x)

Currying

someList is often a rather large expression itself often. Further if your language supports higher order functions then it supports currying. I would have no problem with foldr :: (a->b->b,b) -> [a] -> b in a language that doesn't favor currying. Often function definitions are of the form f = foldr c n and it would still be nice to compose them.

Its my experience that the

Its my experience that the function argument often is the largest, but thats obviosly dependent on programing style. With support for currying i meant part synctax suport, and more importantly if the fold function is defined as a curried function. foldr :: (a->b->b,b) -> [a] -> b is a curryed function, so I think a language that includes that in its prelude is by definition a language that favors currying, at least in some regard.

Syntax concerns

My language does not support currying. The syntax is strictly to my taste, which is a bit backwards at times. For example fold is a special construct, to use it just do a normal function call with fold in front of the list and seed in front of the initial element:

fold someList + seed 0        or
int.add(fold someList, seed 0)

Lambda expressions are a pain to write though, due to static typing ala Java.

In that case

In that case wouldnt be better even to have some more looplike syntax for folds? perhaps like:

fold (list 

edit: hm that doesnt look right

Huh?

Lambda expressions are a pain to write though, due to static typing ala Java.

Haskell is statically typed, and lambda expressions are not a pain to write there. Do you mean lack of type inference so you have to add a type signature, perhaps?

Yes. It is more like Java

Yes. It is more like Java than Haskell.
I do want type inference, but am a bit scared of it.

M-Sets

(Wow, is every reply of mine in this thread going to start with a structure from abstract algebra?)

Well the issue is that every monoid homomorphism is a foldr, but not every foldr is a monoid homomorphism. I goofed slightly earlier in my definition of foldr (I editted it earlier today). The way my foldr definition in terms of fold works is by being a monoid homomorphism to the endofunction monoid, we then just apply 'z' to the composed list of actions. If foldr :: (a -> b -> b) -> b -> [a] -> b then foldr f is actually an action of [a] on b. I.e. b is a [a]-Set. Every monoid induces an action on itself. The upshot is that b does not have to be a monoid, 'z' is not the identity of anything.

Type conversion and variance

There is a feature in dodo which I call a conversion function, denoted ->(SomeType) label in a class definition. That is a function that is called implicitely by a runtime mechanism when the variable is used at a point that requires SomeType.

An application of this is futures (message), in which the conversion function does the work of fetching the value when it is used. Eg

message FutureInt
{
   Cap channel
   ->(int) fetch = service!channel.FetchInt()
}
...
FutureInt x = someFunction(y)
return x + 5  #use as int

In some ways that is similar to subclass polymorphism, which says that I can use int at a point that requires Number (int is a subclass of Number). The difference is that it is destructive, ie. for all intents and purpose the type of x is int in the expression x + 5. Also conversion is bidirectional, two types can convert to each other if it takes the programmer fancy.

The last point is interesting when looking at covariance and contravariance. I was reading "Is a cow an animal?" linked in that other thread, and (after letting the fact that class type parameters are abstract sink in) I realised that using conversion, there is a possible extension to the idea.

Imagine that I have a special type of cow, WetCow, which is identical to a Cow except that it only eats wet grass. As per the document a cow eats Grass, that is definitive and cannot be changed. Enters conversion. It is easy enough to convert Grass to WetGrass, you just need to sprinkle some water on it. So if I add the conversion function to Grass:

class Grass is PlantFood
{
...
   ->(WetGrass) wet = self.sprinkledWithWater
}

class WetGrass is Grass

Now WetCow can be a subclass of Cow and still type-safe:

class WetCow is Cow<type FoodType = WetGrass>

When a WetCow is fed grass, the conversion function magically enters in action to sprinkle it with water and the wet cow eats it. Do you see drawbacks to this approach? Is it really safe?

Implicit conversions

A system like this could be type-safe, or it could fail to be type-safe. It could also make it very easy to accidentally introduce non-termination. So it's a fine direction, but I think you need to be a little bit careful. If you haven't already, I would take a close look at Scala's implicit conversions, which do something very similar to this.

Dodo code samples

I added some code samples on the home page.
HuffmanTree is converted from Q, warmer is converted from Python and PingPong was written from scratch. I found it quite straightforward to convert from other languages, which probably shows that dodo is already quite versatile in its current form.

I do not have a compiler or interpreter yet, I even did syntax colouring by hand! So do not put too many hopes in using dodo for development.