mechanics of designing and implementing a language

Hi all,
I am trying to develop a language...mainly to experiment with relational models and to learn language/compiler design. Some how I managed to finish my undergrad in Comp. Sci. (about 4 years ago) without taking a compiler design course. Now I am trying to figure out how best to start developing this language. I have some ideas..basically start with java and add multimethods, list comprehensions that understand tuples and allow 'tuple joins,' operater overloading, etc., etc.

Any way, I started out by writing our what my code might look like...but it seems I've been concentrating too much on syntax rather than the semantics. Then I switched to drawing abstract syntax trees but they only seem to capture specific programs. (i learned some of these PLT related words in the past few days so excuse their abuse).

I'd like to code a small 'core.' Then extend then core as I learn new concepts (message passing concurency, lambda expressions, etc.). Developing a syntax, messing around with parsers, (as I mentioned earlier) seems to be more of a distraction at such an early stage. How should I approach this. I've been reading books, papers and looking at features of existing languages...I'm not sure what's the most effecient way of actually starting to implement all these (often overwhelming) ideas.

I'm not sure if it is relevant but I've been skipping around the following books:
PVR & SH's CTM of CP (somehow I think most of you know what that means)
Grune,Bal,Jacobs&Langendoen's Modern Compiler Design
Sebesta's Concepts of PLs
and obviously Date's Database Systems.

Finally, I'll likely use Java or C# for parsing, dealing with AST, code generation, etc. Eventually I'd like to either have this language be interpreted by Parrot or have it compiled...by first translating it to C.

Thanks!
Falcon

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Run, Don't Walk...

To Pierce's Types and Programming Languages. I think if you were to work through the text up to and including the "Featherweight Java" case study, if not to continue through the remainder of the text, you'd be in a much better position to tackle your project.

For what it's worth, you've already had the key insight, i.e. that mucking about with parsing and ASTs, while necessary at some point, is really a serious distraction from getting anything meaningful done. Having said that, TAPL uses O'Caml as its expository vehicle, and if you decide to continue working with O'Caml, Elkhound is an extremely good parser generator written in C++ that will generate parsers in either C++ or O'Caml.

Syntax Matters

With all due respect, I beg to differ with the notion that "mucking about with parsing and ASTs, while necessary at some point, is really a serious distraction from getting anything meaningful done".

From a human factors perspective, Syntax matters and is an under appreciated facet of language design. Just look at how easy it is to mangle a C typedef without triggering compile-time errors.

Given that the language design space has been pretty well explored by the major languages, your best bet at creating something that will be widely adopted is to focus like a laser on End User usability and syntax has a critical impact at that level.


Peter J. Wasilko, Esq.
Executive Director & Chief Technology Officer
The Institute for End User Computing, Inc.

These comments are not official IEUC positions unless otherwise noted.

Only Two Points...

...on which we might disagree: that syntax is an underappreciated facet of language design, and on the order in which the design of a language's semantics and syntax are designed.

My assertion based both on the majority of texts I've read and over twenty years of experience in the industry is that the design of a language's syntax is overappreciated relative to the design of its semantics. What I mean by this is that most texts and actual efforts I've observed make the implicit assumption that the language being designed is yet another block-structured lexically-scoped imperative/object-oriented language with Algol-inspired syntax. About the only question that seems to lack a consensus is whether to be statically typed, with explicit annotations, or dynamically typed. Because those design choices are so much a foregone conclusion, it seems like a reasonable idea to design and implement the language's syntax first. And if your language is going to be essentially another C++/Java/Python/Ruby/Smalltalk, that's probably a fair assumption.

My feeling, though, is that the moment you strive to actually do anything new in the language design space, you are, in fact, better off working from the back-end forwards. To give just the most obvious example, deciding to be statically typed but to use type inference is a decision that had to be made very early on, because it has a dramatic impact both on your back-end and your syntax. To give a slightly more exotic example, deciding not to have a boolean type, but instead to have notions of "success" or "failure" a lá Icon, likewise radically affect both your back-end and your syntax.

I don't know that falcon wants to do anything quite so exotic. But I find it interesting that s/he already has the sense that s/he is floundering, just flopping around in the syntax/parsing shallows. I claim that this isn't surprising: when you look at your AST, you need to have a strong sense of what it means, and unless you already have a semantic target in mind, it's too easy to end up flapping around.

Having said all of this, let me emphasize that I wholeheartedly agree that syntax is important, that the design of a "good syntax" is an art form, and that good syntactic design is regretably rare. I am one of those Lisp programmers who eventually came to accept that the human brain seems hardwired to benefit from more syntactic sugar than Lisp encourages, and have largely settled on the ML family has having the best semantic benefits of Lisp and the best syntactic benefits of Algol and its descendants. So there is a sense in which you're preaching to the choir. But I'm afraid that I still see too many language designers rush off to write a parser before having given any thought to the semantics of their language beyond the (unarticulated!) assumptions I've already enumerated.

Well Put!

Excellent reasoning, it looks like we don't dissagree at all on these points!


Peter J. Wasilko, Esq.
Executive Director & Chief Technology Officer
The Institute for End User Computing, Inc.

These comments are not official IEUC positions unless otherwise noted.

TAPL and implementation

TAPL is a very good book to understand statically typed languages, their various features, and advanced type systems. But although it has some comments about the efficiency of important algorithms (mostly typechecking) it does not cover realistic implementations.

Essentials of Programming Languages is a nice complement to TAPL, as already noted. Other very interesting book, more geared towards implementing Lisp dialects, is "Lisp in Small Pieces" by Christian Queinnec. In it are developed various interpreters and compilers for Scheme, and many features are characterized by giving a denotational semantics account of them. Even if your objective is not to implement a Lisp dialect, many lessons can be learned from this book.

wow, I ordered that book last week

should get it any day now :)

Almost Forgot...

...EOPL. Also a must-have. Slender but dense. Take advantage of the "EOPL" language in DrScheme and I think you'll find it very pleasant to work through.

It's very good. I am doing a

It's very good. I am doing a types course at uni, and while programming languages had always been something that I'd been interested in, some of the more theoretical aspects seemed beyond me. :) Pierce's book made it a lot more accessible and seems very thorough (not finished it yet).

Chris

Essentials of Programming Languages

Paul, Andrei,
I just took a look at "Essentials of Programming Languages" and it does indeed look very interesting. I've been meaning to pick up some lisp anyway since a lot of smart people speak highly of it. I read somewhere that while lisp is very good for compiling or interpreting little languages, it doesn't work so well if the syntax of the language is more similar to lisp (say, something like Java or C#). If lisp is so good for this purpose, then how come it isn't used more often as an 'intermediate' language. In other words, does it make sense for projects such as GCC and Parrot (PERL) to use a lisp like language as their interface between their compiler/intrepreter and language designers?
It seems I'm building up quite a library of expensive books, I better start coding soon :)

Lambda, the Ultimate IR

falcon: I read somewhere that while lisp is very good for compiling or interpreting little languages, it doesn't work so well if the syntax of the language is more similar to lisp (say, something like Java or C#). If lisp is so good for this purpose, then how come it isn't used more often as an 'intermediate' language.

From Using and Porting the GNU Compiler Collection (GCC): RTL:

Most of the work of the compiler is done on an intermediate representation called register transfer language. In this language, the instructions to be output are described, pretty much one by one, in an algebraic form that describes what the instruction does. RTL is inspired by Lisp lists. It has both an internal form, made up of structures that point at other structures, and a textual form that is used in the machine description and in printed debugging dumps. The textual form uses nested parentheses to indicate the pointers in the internal form.
Note that GCC 4.0 uses a new IR called Tree SSA, where the acronym "SSA" stands for "Static Single Assignment." The relationship between SSA and functional programming has been discussed on LtU before; scroll down to Anton's "Ob. refs" post for links to the two major relevant papers.

interesting papers

I actually read that post and the papers it mentions a few days ago. They are probably interesting but they are a little beyond right now...but I understand more of them now than I did a week or two ago. I also just ordered EOPL. Lisp syntax looks strange to me but the more I look at it better I comprehend it. It'll be disappointing if all this talk of LISP power (specifically for developing compilers) turns out to be dogmatic rants...but I'm looking forward to using EOPL as a 'practical' guide to actually implementing my ideas in code, while other books serve as theoretical preperation.

An interesting irony

That the low-abstraction languages get converted to a format similar to the high-abstraction language's.

The irony is that for years C used to be called "ever so slightly higher than assembler".

The irony is that SSA is more abstract than C

I don't think the conversion to SSA says anything about C's claim to being only slightly higher-level than assembler — that's still true of C. The conversion to SSA is precisely because of that — it's intended to fix some of C's problems, to make it tractable for automated processing by making it more referentially transparent. It's reasonable to say that SSA is more abstract than C. You could even call it higher-level, although it would probably need some syntactic sugar to really qualify as such.

PLAI

I read EoPL recently and was massively underwhelmed. Shriram Krishnamurthi's Programming Languages: Application and Interpretation seems to cover much of the same material with a similar approach in more depth and presents it better in my opinion.

As Paul Snively points out, Lisp-looking languages are used as intermediate languages and for GCC no less. However, being good at compiling and interpreting languages is not at all the same as being a good intermediate language. An intermediate language is the output of (the earlier parts of) a compiler, and ideally should make very little assumptions about the environment. Perhaps you meant implementation language, in which case the reasons are, on a case-by-case basis usually either 1) the typical reasons Lisp is not used, or 2) a compiler for a language is typically written in the language it's compiling (for general-purpose languages) as a kind of tradition and rite of passage for the compiler and language.

Inspiring Languages?

On a related note, which out-of-the-mainstream languages should a person look at to find semantic/syntatic inspiration? (For the sake of argument, lets say lisp, ML, etc. are mainstream) As an example, I recently discovered Joy, and found its concatenative and point-free nature to be refreshingly different (And it almost makes Scheme look crufty and bloated). What other lanugage features are out there that haven't been properly explored?

And as long as we're talking about off-the-beaten path languages, do the J or K languages bring any new features to the table which should be emulated by others?

full circle

One of the reasons I originally started learning the innards of computer science was because of languages such as APL, K (Kdb), etc. At my first job, while very highly paid employees and consultants were spending their time tuning the heck out of oracle and sybase, I saw a presentation for vector languages which promised orders of magnitude improvement in performance. I had just gone through 4 years of Comp. Sci. school and couldn't believe that billion dollars companies, with hoards of PhDs were out-done by such languages. Of course I learned that marketing, too much abstraction and simplification and even assumptions about users' needs and priorities leaves a lot of room for technical advances.

More specifically to the current topic. I've been trying to figure out WHY APL/J/K/Q are so much faster (as is claimed). I know they deal with arrays, but how is their manipulation of arrays so much faster than C or C++? It all ends up as the same machine code. I asked the same question on the APL newsgroup but didn't get the answers I was looking for (http://tinyurl.com/58y2v)
I was going the ask the same question on this forum, I guess I'll do it here :)

vlerq

By your comments about APL, vectors and the relational model, you might want to give a look at vlerq, that was just cited in another thread.

choosing well

I've gone back and forth over k several times; I flip back and forth between really liking what I see and really not liking it.

Arthur Whitney's focus on character count really does a disservice to the key innovations in the k system. Seeing k made me think of a "multiple representation" programming environment and language, where I could flip k source back and forth between several different forms -- character ultra-compressed "Whitney" notation, expanded function name "McCarthy", and a graphical form. I liked Epigram's notion of sheds, where the dirty, unfinished bits can live.

k's built-in tree structure is interesting too, and plays nice (I think) with many of Epigram's notions.

Pulling back, the real strength of K appears to be the set of verbs (functions) and adverbs it provides. With a couple of characters you can get behavior that takes yards of code in other languages. A second major strength is the top to bottom assumption that _everything_ needs to work with vectors. Writing an operator? Define _exactly_ what it does in dyadic, monadic, atomic, and vector cases, and every combination thereof. This is what k does.

I think Scheme could learn a lot from k. Why doesn't the + function know what to do with properly shaped vectors and lists? (+ (list 1 2 3) 5) should "do the right thing", as should (+ (list 1 2 3)(list 4 5 6)). And the fact that Scheme makes it difficult to write functions that "do the right thing" in these cases should be something the R6 committee considers.

It turns out that in our business looping over bunch of items happens a lot. Providing a good, general, function composition mechanism is important. k goes a step further and provides very high speed, terse notation for many of the most important functions.

I'd suggest that anyone looking for a good set of primitives that should be eligible for special treatment could start with those listed in k's documentation.

I think I read somewhere that when implementing K's core, the first thing Whitney does is create macros that correspond to k's verbs and adverbs. He then writes in terms of those.

map function

I thought the 'map' function allowed operations accross a whole list...no? Wouldn't that be the same as K's 'right thing?'

k is a branch, Scheme is a root

I think Scheme could learn a lot from k. Why doesn't the + function know what to do with properly shaped vectors and lists? (+ (list 1 2 3) 5) should "do the right thing", as should (+ (list 1 2 3)(list 4 5 6)).

That would be appropriate for one possible level above Scheme's RnRS core. As it is, these examples can be written as:

(map (cut + 5 <>) (list 1 2 3))   ; use srfi-26
(map + (list 1 2 3) (list 4 5 6))

(Assuming I correctly divined "the right thing").
It requires some thought about the shape of the data being operated on. That's not necessarily a bad thing, particularly in the language core where Scheme's philosophy is to provide the flexibility to support all sorts of different idioms. Someone else might come along and claim the "right thing" for (+ (list 1 2 3) 5) is (1 2 3 5).

One area where syntax does matter...

...is in source-level metaprogramming. By this I mean things like macro systems, refactoring browsers, editors, etc.

One of the oft-claimed strenghts of Lisp is that its concrete syntax maps directly onto its abstract syntax; making Lisp source much easier for programs to manipulate, then say C++ (which from a formal manipulation/analysis point of view, is one of the worst syntaxes around)*. On the other hand, many find sepxrs hard to read--even with training--for human programmers an alternate syntax may be preferable.

*Regarding C++ (and to lesser extent, C). Some of the well-known gripes of the syntax of these languages (I'm ignoring semantic issues here, like no garbage collection).

* Grammar (for C++) contains several reduce-reduce ambiguities.
* Many parts of the grammar are not LALR(1) or otherwise amenable to top-down or bottom-up parsing. While this is true (to some extent) to all languages, in many places in the C/C++ grammar the interpretation of a "symbol" (a variable name) depends on whether or not the variable name has been previously introduced as the name of a type (via a struct, union, class, enum, or typedef declaratoin) or not.

These, and other issues, make formal manipulation of C++ source more difficult than other languages. Not impossible, but decidedly non-trivial.

Some random comments on language design

Dear Falcon:
Designing a language is a huge endeavor, but it's worth tackling even if it's just a learning experience. Best of luck to you! Here are a few off-the-cuff suggestions to help you, gathered from my own experience:

  • First of all, what is the vision? Are you trying to solve a certain class of problems (making a DSL), or explore a new area (e.g., transparent distribution), or explore a new computation model (e.g., Prolog and logic programming)? You have to be clear on the vision.
  • A second point is not to try to do everything at once. Start with a small language and see how far you can get with it. Make it bigger only if you have to.
  • A third point is syntax and semantics. You should clearly separate the kernel language from the linguistic abstractions early on (as explained in CTM). You should also have a clear idea of how the kernel language executes; the particular semantic formalism is not important as long as it is simple and complete.

  • A fourth point is the actual implementation. I suggest, for your first implementation, to pick an existing language with a good implementation, and either write your new language as a library in it, or compile to that language. This is by far the easiest way to make a first implementation of the language. For example, the first implementation of Erlang was done in Prolog. Don't worry about speed. Just worry about getting it up and running, so you can get experience writing programs. This will prepare you for the second implementation, which is the real one. Good target languages are the usual bunch of powerful symbolic languages (Haskell, Scheme, Prolog, Oz, etc.).
  • A fifth point is programming. Once you have your implementation, use it intensively to write many different programs. Try to get other people to do so also. The kiss of death of any new language is that it is not usable in practice; writing programs will show you where the rough edges are so you can smooth them out.
  • A final point is regarding typing. I recommend that your language be strongly typed, so that you can reason in it. Whether you want static or dynamic typing is partly a question of personal choice, partly what your language is intended for (see the CTM section explaining this). But I strongly recommend against doing type inferencing for your language, at least until you understand very well how your language realizes your vision. Doing type inferencing will greatly increase the complexity of your work and will box you in, with only limited payoff (I know this is a heresy :-) ).

great suggestions

Peter, thanks for the suggestions (and thanks for a great book!).

My original purpose was to have a language which will allow me to experiment with the relational model (build a relational DB with features I think are missing from oracle,sybase,etc.). I started looking at parsers/compilers to take care of my query language. Since then I've found an amazing world of PLT, logic, basic (fundamental) math. I'm surprised by how much research is being done in this field (I didn't even know such a field existed). It is amazing to see what all these, slightly below the radar, languages can do compared to the so called 'main stream' languages.
Ironically, I think O'Caml might let me skip having to write my own language (it is fast, has tuples/records, first class functions, is FAST!).

Last night I seriously started looking at Oz. It looks fascinating, however I have to figure out how much time I should spend working with it. Right now I need to learn scheme (because smart people think I should), O'Caml (because it's speed and expressive power looks impressive), Haskell (because 'purely' functional language will force me to learn this type of programming). Other than concurrency, are there any reasons I should add Oz to this increasingly congested list? Will it remain an academic language or is it expected to be used in industry (which will have to mean that speed will be improved). Right now I'm more concerned about how 'easy' it is to work with a language (no needless restrictions), how it implements the latest research and best practices, etc. However, if I spend time developing a prototype of this database, it would be great if I could do it in a language which will EVENTUALLY become as fast and reliable as older and more mature languages (which is why developing my own language was actually the last resort).

BTW, I agree with all the suggestions you enumerated. I was thinking about an inferred type system but during implementation my brain would have collapsed under the weight of all this information and I would have switched back to simple, explicit, static typing any way:)

Off the Cuff

Let me just say that while I disagree with Peter re: type inferencing's value in general, I agree to the extent that if all it gets you is let-polymorphism and recursion, it's probably not enough bang for the buck relative to strong dynamic typing. My points of comparison for this are O'Caml on the static side and SBCL on the dynamic side. I'm plenty pleased with the performance of both of these implementations. It's worth noting that SBCL gets there by doing type inference, too; it's just that it doesn't refuse to compile if it fails to find a nice, specific type that it can compile efficiently.

Where static type inference wins is with a much richer type system: that of GHC 6.4 with its GADTs, or languages with dependent types, linear types, etc.

If you're trying to create a database language a lá Hughes and Date's Tutorial D, well, you should follow what they write about types in whatever edition of The Third Manifesto you have. I'm waiting for their new 2005 edition to be published and have thought about then doing exactly what you're doing: attempting a rigorous implementation of a "D." I'll probably use MetaKit as the back-end engine unless JC gets to performance-tuning vlerq a lot sooner than I expect him to.

tutorial D

I'm not necessarily looking at their Tutorial D language, but certainly their ideas of uber-capable information manipulation system. I have already pre-ordred Date's new book "Database in Depth " from Oreilly (http://www.oreilly.com/catalog/databaseid/) and I plan on ordering TTM as soon as the next version comes out.

Languages below the radar

It is amazing to see what all these, slightly below the radar, languages can do compared to the so called 'main stream' languages.

You have hit the nail on the head. The best languages are the ones that are slightly below the radar. They are popular enough to be well-designed and supported and unpopular enough not to have to pander to the lowest common denominator.

Other than concurrency, are there any reasons I should add Oz to this increasingly congested list?

There are a few reasons. One is that Oz combines the abilities of many of the others in a well-factored way. Oz gives you more choices than the others in what set of concepts you use. Seeing how all these different choices can live together harmoniously is an eye-opening experience. Another reason is constraint programming. And a third reason is distributed computing.

Will it remain an academic language or is it expected to be used in industry?

It is already used in industry. For example, commercial sports tournament schedulers have been written in Oz and sold by FriarTuck. My own research group is funded by the aerospace industry to develop intelligent software written in Oz (sorry, I can't say more than that; the work is confidential).

documentation

On the documentation page (I was looking at the pdf/ps documentation) there seem to be a large number of almost book length tutorials/refrecenes, etc. I think it will be extremely helpful if there was one 'Start Here' type manual which covered not just the basics (which the Oz Tutorial seems to do) but also includes chapters on how to make use of all this functionality. Obviously I would love a chapter on compiler construction :) If fact, I think it will be a good idea to have introductory articles, HOWTOs, etc. which intorduce the language, the environment, general capabilities as well as specfic features....but do so as self-contained 'article length' write-ups....or more likely I just need to break from googling and reading all this stuff for a few days and get back to it with a fresh mind :)

Other than concurrency, are t

Other than concurrency, are there any reasons I should add Oz to this increasingly congested list?

There are a few reasons. One is that Oz combines the abilities of many of the others in a well-factored way. Oz gives you more choices than the others in what set of concepts you use. Seeing how all these different choices can live together harmoniously is an eye-opening experience.

This is why CTM is great. It ties everything together and provides unity. The way I was taught was everything in bits and pieces, incremental to a degree but mixed and matched. Like mixing polymorphism and inheritence when learning object orientation. Understanding what the concepts actually are and how they interact gives you more choices and flexibility.

Another reason is constraint programming. And a third reason is distributed computing.

Can't say anything about the constraint programming as I haven't done much thus far, but from my experience Oz distribution model is a lot easier to work with than something like RMI (my final year project evaluated Oz in a distributed setting).

Declarative by default is a good default for distributed systems - managing state in DS is difficult enough without managing it all over the place. Local call semantics for procedures by default is better than RPC - you use message passing for RPC-like calls. Things like that were a real bonus and made things so much simpler than before.

A Dialyzer like tool for Oz might help though! :)

Danx

p.s. QTk is also a good reason to check out Oz!

I'd reconsider the parsing language...

Personally I wouldn't use Java or C# to write the parser in. This is definitely a place where a DSL is the way to go, and as such you'd be best off finding a good tool (yacc or bison come to mind) or library (there are a number, and I find I rather like the Parse::RecDescent module for perl, though recursive descent parsers have some annoyances), and use whatever language works well with the tool.

One thing you'll probably find is that the grammar you develop is going to be mostly language-independent (as most of the tools look to use a similar syntax for their grammars) and writing a naive compiler is pretty trivial.

I think you'll find that it's easier to use one of the higher-level languages (I use perl, but ruby and python would both work nicely, and I'd bet that you could do some nice things with either Scheme or Haskell) to get the compiler prototyped and working, and then if you need to you can port it to something a bit more annoying but generally faster such as C.

parsing language

Dan,
I had even forgotten that I mentioned I was going to use Java/C# for parsing. After 15 or 20 messages on this board, I have started looking at OCaml, Oz and Scheme (I also looked at Haskell since Pugs came out...but OCaml seems to do everything haskell does, but seems faster yet similar to my existing experience in imperative languages).
I thought writing my own compiler might actually work when I saw information about Parrot and how it abstracts away many complexities yet may provide a fast enough and cross platform enough system to be used in industry....unfortunately at this time I don't know how Parrot can be used to implement classes, message passing concurency (what are continuations?), multimethods, etc. In other words, I think you are a guy who could do a few write-ups for us newcomers ;)

Ocaml & Haskell

Ocaml does not do everything Haskell does. It's missing:
- Lazy evaluation
- Typeclasses
- Monads
- Kinds (higher-order types)

Also, Haskell libraries tend to be rather interesting, in that there's a tendency towards "everything is a combinator" DSLs. The Ocaml libraries I've seen use the language well, but they're still recognizable as libraries, and don't flirt with being languages in their own right like many Haskell libraries do. Haskell follows the Common Lisp tradition in this respect...

I'd recommend learning Ocaml first, if only because I started with Haskell and was very, very lost. Ocaml's more familiar to imperative/OO programmers, so you can learn about functional programming and type inference without completely leaving familiar ground. But I think that Haskell is definitely worth learning if you want to get into language design: it has semantic features worth learning that don't appear in any other language.

Almost Correct...

Jonathan Tang: Ocaml does not do everything Haskell does. It's missing:

  • Lazy evaluation
  • Typeclasses
  • Monads
  • Kinds (higher-order types)

This is almost correct: O'Caml is call-by-value rather than call-by-name or call-by-need, but it has a "Lazy" module and keyword, so you can be lazy on demand. :-)

Similarly, while it doesn't have monads as a language feature, you can certainly implement monads in O'Caml. One useful such monad is the O'Caml NAE Reactive I/O monad. A more comprehensive monad library can be found at http://sange.fi/~atehwa/omlib/. There's also this camlp4 syntax extension for a Haskell-style "do" construct.

Of course...

OCaml can have side effects so it doesn't need monads.

:)

Check out Paul's links :)

Check out Paul's links :)

Monadic programming

Still wrapping my head around monads, but as far as I understand them so far, they're much more than just a way to implement state in a pure language. They seem to be more like aspects for functional programming: they let you weave extra "stuff" into the program without the code needing to know anything other than the fact that it's dealing with a monad. State is one very-specific special case of this, where the extra "stuff" is the explicit state being threaded through the computation. But equating monads and state is like equating the Internet and reading the NYTimes online. It may be a good use, but it's hardly the full potential.

Paul's links look really interesting; I'll have to play with them when I get a chance.

And I'm wondering whether anyone's thought of doing a pure monadic programming language, where monads are the primary structuring mechanism. Or does that not make any sense? It seems like monads are a full abstraction paradigm, like functions or classes or aspects, and one that's been largely ignored outside of Haskell.

Premonoidal/binoidal programming

I am getting an impression that for concurrent PLs monads may be not the right choice. I didn't dive deep into this, so I may be extremely wrong.

Anybody working on a premonoidal category PL?

Monads as environments

They seem to be more like aspects for functional programming: they let you weave extra "stuff" into the program without the code needing to know anything other than the fact that it's dealing with a monad.

I recently tried writing a Haskell Su Doku solver and then rewriting it using a state monad, and found that using the monad enabled me to create a sort of stateful execution environment for my Su Doku-solving code.

Monads make it possible to build all sorts of things into an environment, from IO support to continuations, which in turn means that those features can be factored out from the code that runs in that environment. This is a common thing to want to do in any language; monads (plus do notation as sugar) provide an especially elegant way of doing it.

It seems like monads are a full abstraction paradigm, like functions or classes or aspects, and one that's been largely ignored outside of Haskell.

Googling for "monads in scheme" will turn up some wizardly stuff from Oleg, along with LtU discussion of the same. For a laugh, I recently translated some monadic parser combinator code from Hutton and Meijer into Python.

here's a simple question about scheme's extensibility

Ok, first of all, I want to say again that I appreciate all the suggestions here. I've decided that I need to hash out my ideas about the features of my language (type system, concurrency, 'native' datastructures, etc., etc.). Does it make sense to use lisp or scheme (what's the diff anyway?) to start experimenting with implementing parts of my language (without worrying about the language's syntax)? I hear the scheme is super extensible...so could I use it to build up a very simple type system...then add arrays, tuples/extensible records, object orientation and eventually syntax...layer by layer?

The syntax of scheme is pretty easy now that I've actually played around with it. If it can also be used to parse java like syntax, then I can slowly build up the language and finally implement the syntax and have it spit out code to GCC, Parrot, C or whatever else to get good speed. Any problems with this idea?

Falcon

Yes

Does it make sense to use lisp or scheme (what's the diff anyway?) to start experimenting with implementing parts of my language?

In general, my answer would by YES, this is a very good idea. Naturaly, some features are less suitable for this sort of experimentation than others.

Absolutely...

...you can write a compiler in Scheme, and will, if you work through EOPL, SICP, or as an earlier poster reminded me, Programming Languages: Application and Interpretation, which has the distinct advantage of being freely available online.

By working in Scheme, you can also benefit, if you choose, from using the excellent DrScheme environment. Note that both EOPL and PLAI have language-levels for DrScheme: EOPL's is included with the current distro, while PLAI's is available from the home page linked to above.

Which Reminds Me...

...of Graydon Hoare's excellent One-Day Compilers presentation: how to write a (simple) DSL in O'Caml in one day. YMMV if you aren't as familiar with O'Caml as Graydon is...

Previously on LtU

wow, so many LISPs and SCHEMEs

I never realized there were so many implementations of scheme! I am currently experimenting with DrScheme. But I noticed that Bigloo can compile to various platforms/interpreters and has (what looks like a) nice built in parser and regex functionality. Another implementation I saw claimed to be the fastest and the most reliable (initially I wasn't going to worry about speed...but if I can get speed...I'll take it). So what is the difference among all of them and is there one I should concentrate on? ... DrScheme doesn't seem to have a good way of deaing with huge non-ascii files (which is necessary for a relational database prototype). Any way, if there are some comparison's on the web, please let me know. Thanks.

SchemeQL

Given your interest in relational databases and your recent focus on Scheme, I suggest you take a close look at SchemeQL, and embedded DSL for querying databases from Scheme.

great pointer!

very interesting paper. Thanks.

most basic ingredients for a language

I don't know if anyone is following this thread any more, but after spending a couple of weeks going over PLT, I'm trying to figure out the following: what are the most basic ingredients for a language?

I would like to build a very small core language, then build it up using itself. From what I understand of lambda calculus, I would need a list with functions such as car/cdr/cons/etc. (although I'll surely use different names). I'll need functions which can be passed around (first order?). I'll need tuples/records...for some reason. I'll need if/else conditionals, recursion, basic data types (int/float/char/int[]/float[]/string). An associate array could be build using tuples/lists which might serve as a symbol table (needed for lots of other things). I'll have to combine tuples and symbol table to get object orientation.
Any way, you get the idea. In order to get a list of these basic 'ingredients' do I have to study the various calculi (lambda, pi, relational, abadi&cardelli's object oriented)? Most books I have mention features of languages, but not the basic pillars I need to build all of them.
By the way, I would like the 'build' the syntax using the language itself too. I really like the TXL idea (www.txl.ca).

I figure I get two benefits out of this. One is that the small core can be optimized more easily to get good performance when I need it. Secondly, I can build up the functionality of the language when I need a new feature (extending the language like one might design an API for java).

Types

I would warn you that you are about to make a very important decision with far reaching implications - what kind of data types to support. And I mean not whether to support int32 or unlimited integers (though this one is important as well). I mean, whether your language will have plain simple algebraic data types (a la ML), some kind of dependent types (a la Epigram), coalgebraic or bialgebraic types, etc. I assume you already decided, whether your language will have static or, eh, dynamic types?

I would suggest browsing through Programming languages bookmarks of Mr. (or Dr? I really miss his posts :-( ) Frank Atanassow, especially the "Experimental" group.

From what I understood, you want a minimalistic language, like Scheme or Pico, which probably means no type inference, and probably no static types (and especially no dependent types).

Regarding pillars - I thought the various calculi supposed to be them? For my job I prefer pi (not the vanila dialects, though), but YMMV.

PS: if you really are at loss, I would recommend to implement some well-known, simple but powerful language (I mean Scheme), and then explore the space of experimental languages to see what you like about them. You might want to implement several of them (no industry strength required) before really designing your own. But of course, nobody does it this way, the temptation is too strong :-)

Maybe I should ask for a commission... ;-)

what are the most basic ingredients for a language?

I'll make the obligatory recommendation of CTM.

They do a great job of breaking down the kernel functionality of various paradigms into small orthogonal sets of "commands". Even if you don't use the exact same basis for your language, they give you a good example of how to do it.

Unless you have a really good grounding in how to build up from a calculus/algebra, their approach is probably closer to where a programmer starts thinking, and will probably be easier for you to start with.

most basic types

It's useful to approach this via the correspondence between types and logic. In logic you would start with 'and', 'or', and 'implication'. The corresponding PL types are tuples, discriminated union, and function. Basic features required to support these types include element selection, case, lambda expressions, and function application.

Although I'm coming from an orientation toward static types, IMO this would provide a solid foundation in any language. These can be taken as the essentials for building program structure. Integer, string, etc. would be viewed as primitive types.

There's no one way to rank the ingredients in order of how basic they are. If your goal is to bootstrap efficiently into your own language, you can pick any tolerable subset.

As long as it's black

IMO this would provide a solid foundation in any language

As long as you are shooting for algebraic types, am I wrong?

primary colors

[tuples, discriminated union, and function] As long as you are shooting for algebraic types, am i wrong?

Of course that just gets you to the stage of algebraic types. The question was, "What's basic?" Starting with those types does not limit what you shoot for.

Supercombinators

For one approach to basic ingredients, and building up using the core itself, check out Supercombinators.

--Shae Erisson - ScannedInAvian.com

Biggest Decision

The biggest decision facing you is whether your language should be statically or latently typed. Once you've made that decision, everything else gets quite a lot easier. If you go for static typing, Types and Programming Languages, as we discussed earlier, is, IMHO, the best launching-off point, bar none. If you go for latent typing, then LISP in Small Pieces is the moral equivalent in that domain. :-)

I have a lot of sympathy for the CTM recommendation—it's head and shoulders the best CS text in the world, IMHO, and definitely follows the pattern you suggest of beginning with a very small kernel language and very judiciously adding features only as they become necessary. The only hesitation I have about it is that it implicitly commits to latent typing from the outset. Yes, there's Alice ML to explore the integration of the major Oz features into a Hindley-Milner type-inferenced regime, but until CTM in Alice ML is done, it might be more difficult than is warranted for your purposes to gain leverage from the observation.

CTM suggestion

Paul,Mark,
I was thinking of this book when I said that many books offer language features...but not how those features are really just built on top of some basic facilities. But I'll take another look...I might be confusing it with something else.

The middle way

I was thinking of this book when I said that many books offer language features...but not how those features are really just built on top of some basic facilities.

I suppose it depends on what you mean by facilities.

If you want something concrete, I would suggest learning an assembly language. They are usually fairly small and in practice generate all other semantics. Looking at the assembly generated by, say, some C code can teach you a lot about the "bare metal" supporting PL constructs.

If you want something more abstract, LC is probably your best bet. It has very few primitives and building up more complex stuff from it is the basis for a lot of literature. (Not very practical though)

CTM fits nicely in the middle, which is why I recommended it. It gives you the kernel "elements" of the language that can be used to build up the rest of the language, but it is abstract enough, so that you don't have to know the gory details of implementation for those elements, yet concrete enough that someone who understands programming can see what they do.

I guess, like a lot of things, it all depends on what you want. ;-)

Just the semantics, ma'am

The only hesitation I have about it is that it implicitly commits to latent typing from the outset.

I'm not sure that I agree that this is necessarily a problem. Whether you want a static type system or not, you need to work out your basic semantics, and that will look very similar (maybe identical if you have type erasure) whether you go with static or not.

While I agree that TAPL is a good book, and invaluable if you DO want a static type system, I don't know that it will get you semantically "up and running" as quickly. The two approaches together would probably pay off best in that case.

Besides, everybody here should read both books anyway. ;-)

I'm peddling as fast as I can

but until CTM in Alice ML is done

Not setting any speed records in the conversion, but been an enjoyable project thus far (I've always enjoyed language translation). Amazes me how similar the solutions in the languages turn out to be, though they have very different lineage. Alice requires the types to be laid out explicitly, but once done, the HM type inference makes the languages look like siblings.

The biggest difference I've found in the languages thus far is that Oz is more firmly grounded in logic/constraint with full support for unification. Alice has some unification built in, but it is not fully transitive - I've been told this was a conscious decision on their part. So far, this hasn't cause too much grief, as most of the examples are one-way unification or can be dissected into two one-way unifications. Will have to see how it plays out in the rest of the book.

Almost finished with chapter 3, only two more sections on Pickling and Components. Alice supports these two features so I just need to learn what's there. Fortunately, I've had help any time I've gotten stuck. The Alice & Oz folks are quite helpful, and could circle me several times if that was their quest, but they have bigger fish to fry. What with kids and working in a startup venture, the translation will take a while (and definitely cuts into my ability to post front page stories on LtU).

Comprehending Queries

I am trying to develop a language...mainly to experiment with relational models and to learn language/compiler design.

I would recommend reading Comprehending Queries paper - it discusses design of a (functional) query language based on monad comprehensions - might be exactly what you need.

Why compiling?

In my experience, developing a couple of languages and language extentions, languages like Scheme or Smalltalk are ideal for language prototyping. They let you focus on the intended semantics and base language constructs (and their interactions!) first, before you need to worry about syntax (since you do need to worry about that sooner or later). If you start with the syntax you typically have to change it all the time anyway, which is just uninteresting overhead in a stage where you're not even sure what you want. In Smalltalk for example (I'll use that as example since others already pointed you to Scheme), you just start with some classes and methods that implement your language version 0.1, and of you go. After a while you can add a parser to experiment with syntax. After the language is stabilized you can then decide what to do next: work on a compiler, extend the current version, etc.

Have fun :-)

I agree

That's what I usually recommend. Mostly Scheme, since that's what I am more familiar with. I also suggest building Haskell cominator libraries ("domain specfic emebedded languages" as they are sometimes called).

Nine times out of ten, I find out that those asking for advice are really interested in syntax, and haven't given much thought to the semantics, by the way...

In my experiance

I started by writing programs on paper that reperesnted what I thought were good ways to express the solutio to a problem. At this point they didn't follow any syntax rules, or even consistancy. Then when I had a fair number of samples I tried to find the common elements that made them attractive to me and formalize them. However perhaps you should take my advice with a grain of salt; I didn't end up with a popular languge (centum).