Ralph Griswold died

Ralph Griswold died two weeks ago. He created several programming languages, most notably Snobol (in the 60s) and Icon (in the 70s) — both outstandingly innovative, integral, and efficacious in their areas. Despite the abundance of scripting and other languages today, Snobol and Icon are still unsurpassed in many respects, both as elegance of design and as practicality.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Snobol

I remember a professor in college saying the best text processing language is not Perl, but Snobol.

Short obit.

Short obit.

Icon was the first language that I just couldn't get

at least at the time. I was strictly a BASIC/Pascal/assembly programmer, and Icon was just so *weird*. Even today it's very different, but when I first tried to learn it...wow.

I thought just the reverse ....

From picking up some Arizona site docs to completely rewriting a troublesome Pascal program (used for data analysis and billing from dialup logs) was 3 days. Maintenance of the system became a breeze as well

I though it was just the bees knees at the time. I still use it for data analysis and data conversions.

SNOBOL4 was used on site to generate DNS (Distributed Network Suoervisor) configuration files for our corporate metwork in the days when X25 (and the OSI 7 layer model) was the big competition to SNA.

SNOBOL

In doing some searches for SNOBOL I discovered this paper which might be of interest to some readers. I was looking for a tutorial or advanced overview of the string processing features of the language (ideally with how they compare to regular expressions and the like). If anyone has some good links or info, I'd appreciate it. (Although I'm sure some more time with Google will lead me somewhere interesting.)

Pretty much all of the

Pretty much all of the information about Snobol available on the Internet can be found following these links (with a great deal of overlapping):
http://www.snobol4.org
http://www.snobol4.com
ftp://ftp.snobol4.org/snobol
There are reference manuals, tutorials, implementations, and historical documents there.

Thanks!


Regarding Snobol, the one

Regarding Snobol, the one thing I realy would like to know about it but havent found anything about, is how much its based on Comit.

Icon References

The University of Arizona maintains a repository of Icon materials, including digital copies of Icon books. For an overview of Icon's string scanning, check out chapter 3 of The Icon Programming Language.

Icon generators

I'm skimming the icon manual.

Does anyone know offhand how

if (0 | 1) = (0 | 1) then *blah*

will be evaluated? Will it pass, trying all combinations, or will it fail, trying (1 = 0) and then (0 = 1)?

Stop as soon as it suceeds

It will succeed immediately with 0 = 0.

To get anything else you would need to formulate it like

if (0 | 1) = (1 | 0) ...

This would then do the following tests

0 = 1 fail
0 = 0 succeed with 0 as the result

Another example would be

if (0 | 10) = ( 1 | 2 | 3 | 4 | 10) then

results would be

0 = 1 fail
0 = 2 fail
0 = 3 fail
0 = 4 fail
0 = 10 fail
10 = 1 fail
10 = 2 fail
10 = 3 fail
10 = 4 fail
10 = 10 succeed with result as 10

Icon allows for the following kinds of tests

if i ‹ j ‹ k ‹ m then ....

This means that i ‹ j and j ‹ k and k ‹ m with the result being the value held in m.

Fixed typo above

d'oh

The (0 | 1) = (1 | 0) case is what I meant, but for some reason failed to type.

But thanks for the explanation; the key bit I was wondering was whether evaluation of generators was in parallel or nested. Your second example illustrates that it's parallel.

It is actually...

It is actually right to left resumption, innermost to outermost with evaluation left to right with right hand side being the success value, if that makes any sense.

So 0 is tested against 1 then 0, which is LR evaluation and RL resumption, and then 1 is tested against 1 and success, no testing against 0

Ralph E. Griswold (1934-2006)

Ralph E. Griswold (1934-2006), Regents Professor Emeritus in the Department of Computer Science, University of Arizona, co-creator of SNOBOL and the principal driving force behind the Icon programming language, passed away October 4th 2006, due to complications from pancreatic cancer.

Griswold's life work was in the area of non-numerical computing.[*] Griswold was the primary designer of series of string-manipulation languages (Icon was preceded by SL5 which was preceded by SNOBOL4). For many years strings were perhaps the most important and the most widely misunderstood data type in programming languages (perhaps now being displaced by XML trees). Griswold, though a university researcher for many years, should be considered firstly as a programming language innovator and inventor, and not an academic researcher in the usual sense of the word. Griswold, like the late Kenneth Iverson, invented and championed programming language constructs, and his contributions were among those that led the way to programming as currently understood. He was truly one of the founding figures of the field of programming language design.

SNOBOL was developed in the Programming Research Studies Department at Bell Telephone Laboratories (BTL) in 1962/3 by David J. Farber, Ivan P. Polonsky and Ralph Griswold.[1] SNOBOL was released on October 1963,almost exactly 43 years ago. In retrospect it is amazing to note that SNOBOL was originally implemented in just three weeks. SNOBOL officially stands for "StriNg Oriented symBOlic Language", but this rather atrocious "pseudo-acronym" was not the first suggested name. Indeed, one early choice for the language name was SEXI ("String EXpression Interpreter") but this name was deemed problematic. String processing in SNOBOL is done by creating declarative patterns and matching strings against them. SNOBOL4 patterns can be quite elaborate, can be combined to create new patterns, and are quite expressive. The SNOBOL4 book (Grisowld, Poage and Polonsky, 1968) contains many eye-opening examples of how pattern matching can be used. SNOBOL relies on pattern matching to such an extent that even a simple operation such as taking a substring must be done via pattern matching. SNOBOL suffers from lack of control structures, and control is handled via a GOTO like mechanism, which, interestingly, allows each program statement to specify two branch locations: one used on success (i.e., the string matches the pattern) and one on failure. While the lack of advanced control structures and abstraction mechanisms were the primary reason Griswold went on to develop new languages, he remarkably suggested that the test-and-goto structure of SNOBOL may be the reason why SNOBOL was successful among novice programmers, and was responsible for its initial acceptance and practical usefulness.

Icon, developed in the late 1970's, while alreday at Arizona, provides more conventional control structures, and allows programmers to easily create new pattern matching operations (via the use of generators and goal directed evaluation), thus blurring the SNOBOL dichotomy between conventional computation and pattern matching.[2] Icon is based on many ideas from SL5, specifically the notion of scanning environments and the "s ? e" scanning construct.[3] SL5 had a interesting generalized procedure mechanism and made use of a unique concept of "argument transmitters" which are procedures (either built-in or programmer defined) that are called when values are bound to arguments. This approach was abandoned, but may still be of some interest.

It is interesting that Icon does not provide a rich set of primitive pattern matching procedures, and does not support declarative pattern matching of the kind found in SNOBOL, or even in regular expression based libraries and languages. This decision seems quite radical, and indeed it may be the reason that languages such as Perl and AWK ( a language inspired heavily by SNOBOL, according to Griswold[4]) were preferred by programmers. This can be explained by appeal to the "worse is better" principle, which often refers to the Bell Labs software building approach. Both SNOBOL and AWK were Bell Labs creations, while Icon was not.

Griswold papers and books are highly readable. Those interested in string scanning should read the discussion in the SNOBOL4 book. The Icon Programming Language book is recommended reading for those interested in the notion of backtracking, with perhaps the best treatment of the subject for practically minded programmers.

On top of all the above, Grisworld's languages were among the first to introduce basic data types implementing associative arrays ("tables" or "hashes"), another element we take for granted these days.

As others have noted, computer science has always been a discipline where the founders were still around. This is changing. Ralph Griswold will be missed by a large community. His family and loved ones may find comfort in the fact he managed to influence his field of study and that his innovations and contributions are still very much with us.

[*] Griswold's grant applications were not for building programming languages, but rather for basic research. This is a lesson that researchers in the field should keep in mind when applying for grant money.

[1] Ralph E. Griswold, A History of the SNOBOL Programming Languages, SIGPLAN Notices 13(8), 1978.

[2] Ralph E. Griswold and Madge T. Griswold, History of the Icon Programming Language, SIGPLAN Notices 28(3), 1993.

[3] Ralph E. Griswold and David R. Hanson, An Overview of SL5, SIGPLAN Notices, April 1977.

[4] Interview with Ralph E. Griswold and Madge T. Griswold

SL5 and monads

First, that's a wonderful obituary.

I hope a technical followup won't be inappropriate:

SL5 had a interesting generalized procedure mechanism and made use of a unique concept of "argument transmitters" which are procedures (either built-in or programmer defined) that are called when values are bound to arguments. This approach was abandoned, but may still be of some interest.

This made me think of monads. Inside a "do" block, the "bind" procedure is called whenever a monadic value is bound to a variable. The ability to abstract and hide the operations that take place during this binding is central to the functionality of monads. So rather than having been abandoned completely, the SL5 approach could perhaps be seen as an early precursor to monads.

nitpick?

The problem I have with that is that 'bind' is entered first (well, in Haskell at least) before bindings(*) are made.

(*) Since 'bind' is also defined for 'a' function value, could we say that it is even 'aware' of the pseudo-bindings established in the desugared lambda exp that it receives as an arg? In the quoted paragraph it appears that the argument transmitters are intimately familiar with the environment.

Argument Transmitters

Actually, I wasn't reminded of monads, but maybe it's due to lack of imagination on my part. Let me give the example given in "An Overview of SL5" for AT, to make things more concrete.

positiveint:=procedure (n)
   if (n:= integer(n)) >0
     then succeed n
     else fail
end

gcd:=procedure (x:positiveint, y:positiveint)

positiveint is the user-defined procedure used as an argument transmitter. When used in the definition of gcd, only arguments that can be converted to integers (via integer(n)) will be passed to gcd. Other values will cause the call to fail.

Interestingly, the language provides three built-in transmitters: val (pass by value), ref (pass by reference) and exp (pass "by expression", similar to Algol's pass by name). Transmitters can be procedures (user defined,or built-in) or environments, which are "source-language data objects" in SL5 (i.e., they are first class).

I remind you of Wirth's opinion about pass by exression. Here's a connection between these two historical discussions. Also note that AT can be used to mimic some primitive form of AOP.

Argument transmitters and monadic 'bind'

Let me give the example given in "An Overview of SL5" for AT, to make things more concrete.

positiveint:=procedure (n)
   if (n:= integer(n)) >0
     then succeed n
     else fail
end

gcd:=procedure (x:positiveint, y:positiveint)

positiveint is the user-defined procedure used as an argument transmitter. When used in the definition of gcd, only arguments that can be converted to integers (via integer(n)) will be passed to gcd. Other values will cause the call to fail.

This illustrates the comparison to monads nicely. The implementation above apparently depends on the success/failure mechanism, so the argument transmitter is acting like the bind procedure of the Maybe monad, which doesn't apply its continuation on failure.

I'll demonstrate with Scheme, since macros will come in handy to counter Koray's nitpick. The success/failure behavior can be implemented starting with a standard Maybe monad, which can be defined minimally in Scheme as:

; minimal maybe monad
(define maybe (lambda (x) x))           ; 'return' for this Maybe monad
(define fail #f)                        ; the equivalent of 'Nothing'
(define (maybe-bind m k) (and m (k m))) ; 'bind' for this Maybe monad

This monad can be transformed to implement the positive integer test, by defining another monadic bind procedure in terms of maybe-bind:

(define (positive-int m succeed)
  (maybe-bind m
    (lambda (n)
      (if (and (integer? n) (> n 0))
          (succeed n)
          fail))))

The variables are named to match the SL5 argument transmitter, but note that this is a standard 'bind' procedure which takes a monad m and a continuation succeed. The similarity to the SL5 version is strong, with the difference being the need to use maybe-bind to access the monadic value, and the inclusion of an explicit continuation argument.

To implement the GCD procedure in a way that exploits the Maybe monad, we can adapt the mplus procedure from Haskell's MonadPlus type class, which has a very simple implementation for our Maybe monad — it's just an or:

(define (maybe-mplus x y) (or x y))   ; mplus for Maybe monad

Now gcd can be implemented as follows:

(define (mygcd x y)
  (positive-int x (lambda (x)
  (positive-int y (lambda (y)
    (maybe-mplus (mygcd (maybe y) (maybe (modulo x y)))
                 (maybe y)))))))

The comparison to zero is hidden inside the positive-int bind operation, and maybe-mplus is used to express the choice between branches, so the behavior of 'bind' is integral to the gcd algorithm, just as it is with the SL5 argument transmitter.

The above code is complete and fully operational, e.g. (mygcd 18 30) produces 6.

To address Koray's nitpick, mygcd can be sweetened with a macro: a lambda/m form which accepts a bind procedure for each of its arguments. I'll also cheat slightly in the interestes of readability, and omit the maybe calls, which aren't needed with our direct implementation of the maybe monad:

(define mygcd
  (lambda/m ((x positive-int) (y positive-int))
    (maybe-mplus (mygcd y (modulo x y)) y)))

This looks quite similar to the (incomplete) definition from SL5:

gcd:=procedure (x:positiveint, y:positiveint)

It would be interesting to see how the bodies compare.

In both cases, a named procedure is called for each argument, and that procedure tests the argument for a positive integer, either invoking a success continuation or abandoning the current continuation with a failure return. In my example, the procedure in question is a typical monadic 'bind' procedure, which corresponds closely to the SL5 argument transmitter in purpose and structure.

For completeness, here are the macro definitions:

(define-syntax monad-let*
  (syntax-rules ()
    ((_ () body ...)
     (begin body ...))
    ((_ ((arg0 bind0) (arg1 bind1) ...) body ...)
     (bind0 arg0
       (lambda (arg0)
         (monad-let* ((arg1 bind1) ...) body ...))))))

(define-syntax lambda/m
  (syntax-rules ()
    ((_ ((arg0 bind0) (arg1 bind1) ...) body ...)
     (lambda (arg0 arg1 ...)
       (monad-let* ((arg0 bind0) (arg1 bind1) ...)
         body ...)))))

Of course, Griswold's focus on nondeterministic evaluation approaches made it inevitable that his work would overlap with monads. It's interesting to see a specific example of that overlap, more than a decade before the initial work on monads.

BTW, an interesting paper in this area is Backtracking, Interleaving, and Terminating Monad Transformers.

Edit: Based on my limited knowledge of Icon, and without having seen the body of the SL5 gcd example when I wrote the above, I jumped to the conclusion that it was a recursive implementation, and that it would naturally exploit its own argument transmitter to achieve termination, so I did the same in my example implementation. Turns out I was being over-imaginative. I found a 1993 issue of The Icon Analyst, an allegedly bimonthly newsletter edited by Ralph and Madge Griswold. It contains an article "Lost Languages - SL5", which gives an imperative, non-recursive GCD example, so the argument transmitter is only called once on entry. This doesn't affect the comparison between argument transmitters and monadic bind, but it does mean that my gcd example above exploits the positive-int monad in a way that wasn't being done in the SL5 gcd example. However, it still seems to me that SL5 might have been capable of such an implementation.

The pros and cons of kitchen-sinking

It is interesting that Icon does not provide a rich set of primitive pattern matching procedures, and does not support declarative pattern matching of the kind found in SNOBOL, or even in regular expression based libraries and languages. This decision seems quite radical, and indeed it may be the reason that languages such as Perl and AWK ( a language inspired heavily by SNOBOL, according to Griswold[4]) were preferred by programmers. This can be explained by appeal to the "worse is better" principle, which often refers to the Bell Labs software building approach. Both SNOBOL and AWK were Bell Labs creations, while Icon was not.

I haven't read the whole of reference [4] yet, but there's a comment about this in the issue of The Icon Analyst that I found:

In retrospect, SL5 has many interesting features. [...] Not surprisingly, on rediscovering some of the features of SL5, we thought "Gee, those would be neat things to have in Icon." But Icon has grown large. If there's anything we think language designers should do is avoid the temptation to "kitchen sink" (to coin an abominable verb from a noun).

In this context, though, I think "worse is better" is a misleading description of the forces at work. It's more like "more is better": kitchen-sinking succeeds because it appeals to a larger market, particularly in languages, where most people don't want to have to learn and use multiple languages. They want the language they use to be as widely-applicable as possible, unless they have specialized requirements.

I think that in this

I think that in this specific context the issue is expressiveness. Being able to express patterns declaratively is important from an expressiveness point of view. While this approach has its difficulties (namely the things that led to the creation of SL5 and later Icon), it has distinct advantages (being declarative and thus easier to read, easy compositionality etc.) that were abandoned for no good reason.

(BTW, as far as I can tell, neither SL5 nor Icon have declarative pattern matching, as found in SNOBOL, so the issue isn't SL5 vs. Icon, but rather SNOBOL4 vs. (SL5 & Icon))

Work on providing SNOBOL4 pattern features to Icon/Unicon

There are libraries in Icon/Unicon that attempt to provide the pattern matching features of SNOBOL4.

The interesting things about Icon/Unicon is that you have the ability to write your own generators. In regards to string matching/processing functions, as long as you follow the string matching protocol, your functions are treated no differently to the system procedures. Hence you can extend the matching procedures any way you want to.

In addition, there is library functions available that do a similar job on lists as for strings. The only difference really is that you can't use ? as the pattern matching operator.

Just as an aside, in the IPL are the applications, SKEEM (a version od Scheme) and IDOL (Icon Object Language - this incorporated directly into Unicon). SKEEM is an interpreter and IDOL is a compiler to Icon. These show how to do fairly significant applications in Icon. The Programming in Unicon book (obtainable from the Unicon website) goes into details of application building in Unicon - for those who are interested.

LtU reference

I posted a link a while back on SNOBOL-Style Patterns for Unicon which goes into more detail on the subject.

Random observations

While writing my obit, several things occured to me. Maybe we can discuss some of them.

1. One amazing things about SNOBOL4 and Icon, as far as I can tell, is the little attention given to internationalization. This is understandable, historically, but for languages used for strings processing it surely lowers their relevance. Without mature unicode support, implementations of SNOBOL4 and Icon will have very little to offer.

2. The history of the implemention of SNOBOL is an interesting story in its own right. While Griswold gives the bare details in "A History of the SNOBOL Programming Languages", it would be nice to find a detailed description and discussion, covering SPITBOL ("SPeedy ImplemenTation of SNOBOL; Dewar 1971), MACRO SPITBOL, the influence of the FORTRAN I/O routines etc. As an amusing side story, check GNAT.Spitbol, a set of Ada packages written many years later by Dewar, derived from SPITBOL.

3. It occured to me that one can make the argument that just as strings were so very important, yet received little attention from mainstream langugae designers, so are XML trees these days. Carrying this argument to its logical end, one might argue that Erik is Griswold's true heir.

internationalization and Icon/Unicon

This is a subject that is discussed on occassion on the unicon-group. It requires changes to the runtime system that requires someone to be involved with.

Currently, some suggestions for analysing unicode involve treating the stream as b-bit values.

Anyone care to be involved in internationalization efforts would best need to contact Clint Jeffery or see his homepage or the Unicon website.

Side effects and string analysis

This quote from Griswold might be worth discussing:

The fact that decomposition of strings through pattern matching operates by the side effects has always been a problem. By its nature, this mechanism is error prone and unstructured. On the other hand, string analysis, by analogy with grammatical characterizations of languages, leads naturally to deriving multiple values from a single analysis. A well-structured, uncomplicated solution to this problem remains elusive.

(A History of the SNOBOL Programming Languages, SIGPLAN Notices 13(8), August 1978. Sec. 3.3.3)

Regular expresions vs. Lisp/Prolog

I think this relates to one of my own issues. People usually think of string processing in terms of regular expressions. If we think in terms of XML we actually have a parenthesized list expression if we make some sort of conversion. Processing a list in Lisp is normally recursive and functional. Prolog offers even better pattern matching if only we start out with properly formatted data (ie predicates), and Prolog certainly doesn't use state. Why don't more programmers choose these alternative?? What am I missing?

Puzzled!

Prolog for parsing

Having used prolog for parsing natural language text (in a very limited domain), I can say that it is both wonderful and terrible. DCGs are far more expressive, easier to use, and much much easier to read then regexps. The implementations that I generated however often required lots of tuning to make them efficient enough to use. From the hand optimizations that I made I came to the conclusion that most of this can be eliminated if we can know that we are dealing with strings (i.e. some notion of typing) some determinacy analysis, tabling (I wasn't using XSB) and some reordering of terms.

Prolog really is not enough out of the box to compete on the scale of simple expressions like the ones produced by regexps, and DCGs as given do not scale to serious problems easily. It would be nice to see these problems solved in the prolog style, since I think it really is infinitely better than using regexps.