REPL-schmepl?

Slide 19 of Andy Wingo’s The User in the Loop:

RDD: REPL-driven development
guile> (import (my-app))

Put the user in the [read-eval-print] loop

REPL provides discoverability and hackability

Compare this to page 1 of I Hate Forth by Jack Ganssle (Jul 2001):

Forth is a very satisfying environment for a programmer. It's totally interactive. The interpreter — which is generally quite small — lives in your target system. It's a bit like working with old-time Basic - change something and immediately test the new code. There's no compile, link or download. Adherents crow about how productive they are working with a tool like this that imposes no delays on their work.

Sure, it's fast. And fun. But let's get real: interactive development has no impact on requirements analysis, specification, software design, documentation, or even test. Fast tools make for fast debugging, nothing else.

How much time do we spend debugging: 80% of the project? I sure hope not. A well-designed, carefully coded system should consume no more than 20% of the schedule in debug. Healthy software organizations need less.

… how much time do we really save? … Minutes [per day], max.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

We're not interactive because it's hard, not because it's bad.

It is very hard to design a language with a good story for interactive programming. I think that is the reason why most languages are bad at interactive development. Most critics of this aspect I read are of the form "my preferred language isn't good at it so I feel the need to downplay its usefulness". The Forth citation is a perfect example of this.

It is possibly true that interactive programming is not useful for all workflows, or even for the dominating -- is it? -- use of programming languages, medium-to-large-scale project development. It is certainly essential in some uses of programming language, such as live computer-assisted musical composition or other programming-related artistic practices. I also suspect it is useful in other fields such as scientific experiments or system administration.

You can say that interactive development is not part of *your* needs regarding programming language. It would be stupid however to claim that it is useless for *everyone*.

Not really

[[ "my preferred language isn't good at it so I feel the need to downplay its usefulness". The Forth citation is a perfect example of this. ]]

No: the author first say that the REPL works fine, but then say that it's not very useful, so this *isn't* an example, your second part is interesting though.

50 lines per function?

A Forth function that's more than two lines long is in serious need of refactoring. And while it's true that Forth code is typically comment-free, so is Scheme code (something I really hate about the Scheme community).

on the value of planning (and documentation)

Re: comment 68408: “And while it's true that Forth code is typically comment-free…”

To put the quoted comment in the context of Ganssle’s original rant, let me quote one more graf from I Hate Forth:

   “Forth's interactive nature tends to obliterate documentation. Where do those comments go when (if) you enter them? Down to the target, of course, into the limited memory typical of resource-poor embedded systems. That's the nature of interpreters. Worse, even the best people tend to get sucked into a change/test frenzy using any interpreter (Forth, Basic, you name it). Documentation slows us down when we really just want to try something, so is inevitably neglected.”

Near as I can tell, Ganssle is talking about the practice of developing an application directly on the target platform, a capability that, according to Rather and Colburn (The Evolution of Forth, ACM SIGPLAN HOPL II), was unique to Forth:

[Chuck Moore’s] system was also unique for that time in that all software development took place on the minis themselves, using magnetic tape for source.

To place these software capabilities in context, it's important to realize that manufacturer-supplied system software for these early minicomputers was extremely primitive. The main tools were cross assemblers and FORTRAN cross compilers running on mainframes (although the FORTRAN cross compilers were too inefficient to do anything complex, given the tiny memories on the target machines). On-line programming support was limited to assemblers loaded from paper tape, with source maintained on paper tape.

Is it true that, as Ganssle put it, “Forth's interactive nature tends to obliterate documentation”? Or planning in general?

Leo Brodie had this to say on “The Value of Planning” in Thinking Forth:

In the nine phases at the start of this chapter we listed five steps before “implementation.” Yet in Chapter One we saw that an overindulgence in planning is both difficult and pointless.

Clearly you can’t undertake a significant software project — regardless of the language—without some degree of planning. Exactly what degree is appropriate?

More than one Forth programmer has expressed high regard for Dave Johnson’s meticulous approach to planning. Johnson is supervisor at Moore Products Co. in Springhouse, Pennsylvania. The firm specializes in industrial instrumentation and process control applications. Dave has been using Forth since 1978.

He describes his approach:

Compared with many others that use Forth, I suppose we take a more formal approach. I learned this the hard way, though. My lack of discipline in the early years has come back to haunt me. We use two tools to come up with new products: a functional specification and a design specification. Our department of Sales & Applications comes up with the functional specification, through customer contact.

Once we’ve agreed on what we’re going to do, the functional specification is turned over to our department. At that point we work through a design, and come up with the design specification. Up to this point our approach is no different from programming in any language. But with Forth, we go about designing somewhat differently. With Forth you don’t have to work 95% through your design before you can start coding, but rather 60% before you can get into the iterative process.

A typical project would be to add a functional enhancement to one of our products. For example, we have an intelligent terminal with disk drives, and we need certain protocols for communicating with another device. The project to design the protocols, come up with displays, provide the operator interfaces, etc. may take several months. The functional specification takes a month; the design specification takes a month; coding takes three months; integration and testing take another month.

This is the typical cycle. One project took almost two years, but six or seven months is reasonable.

When we started with Forth five years ago, it wasn’t like that. When I received a functional specification, I just started coding. I used a cross between top-down and bottom-up, generally defining a structure, and as I needed it, some of the lower level, and then returning with more structure.

The reason for that approach was the tremendous pressure to show something to management. We wound up never writing down what we were doing. Three years later we would go back and try to modify the code, without any documentation. Forth became a disadvantage because it allowed us to go in too early. It was fun to make the lights flash and disk drives hum. But we didn’t go through the nitty-gritty design work. As I said, our “free spirits” have come back to haunt us.

Now for the new programmers, we have an established requirement: a thorough design spec that defines in detail all the high-level Forth words—the tasks that your project is going to do. No more reading a few pages of the functional specification, answering that, reading a few more, answering that, etc.

No living programmer likes to document. By ensuring the design ahead of time, we’re able to look back several years later and remember what we did.

I should mention that during the design phase there is some amount of coding done to test out certain ideas. But this code may not be part of the finished product. The idea is to map out your design.

OTOH...

... a significant time is spent on maintenance in the long run; a REPL may be useful in this context too.

Less than 20%? Really?

First, Mr. Ganssle sets up a straw man:

Fast tools make for fast debugging, nothing else.

and then makes this outlandish claim:

How much time do we spend debugging: 80% of the project? I sure hope not. A well-designed, carefully coded system should consume no more than 20% of the schedule in debug. Healthy software organizations need less.

What kind of fairy-tale land does this dude live in? Where is this wonderful place where designs can actually be constant, thinking carefully reveals every corner case, and coding carefully means the semantics of a language or library never surprises you?

kinds of programming

I do like the REPL, though I'm not sure if it's for rational reasons. But note, that particular slide was about the REPL in the context of extending an already-existing application: to help the user-programmer take an idea of how a program could be different, and help them realize that idea.

There are other paths, like a good IDE. But a REPL is a cheap and effective way of letting a user learn about the important objects of a program, and to manipulate them in a trial-and-error fashion.

I do like the REPL, though

I do like the REPL, though I'm not sure if it's for rational reasons.

Of course there is a rational reason, you just can't state it in the terms you are used to dealing with.

But a REPL is a cheap and effective way of letting a user learn about the important objects of a program, and to manipulate them in a trial-and-error fashion.

This is the idea behind Sherry Turkle's identified Bricolage style of programming for bottom-up programming.

evidence as rational reason

I agree with gasche: a REPL is just hard unless we plan way ahead, or unless a language makes it easy (by knowing how almost everything prints, so display is immediate and clear). I think the biggest value comes from the P for print. Languages with direct print support have a huge advantage in the evidence department.

After the first time I put my code in a REPL over twenty years ago, I spent a while pondering why interactive testing was so effective. I decided I was getting a very high grade of evidence flow, which I used to vet most assumptions I had made about running code. And I was able to get this evidence on demand, almost as soon as I formed a question I wanted to answer. A REPL is quite a boon in bottom-up coding.

At the time I asked questions like, "Is all the state exactly what I expected? With sub-expression values I expected? Even if I try these edge cases? What about the empty set here?" (Today I also ask a lot of questions about latency; printing after the fact doesn't help there, unless latency info was captured somewhere.)

Sometimes I write code to print all my state in C and C++, and it's a pain. It's very expensive, and often much cheaper to debug than not doing it, both. (No, gdb doesn't show everything I want to see in a manner easy to reason about.)

I know many devs who say, "You only need to check the final result to see if everything in the middle was correct." But that's weak. Say your code has to navigate a twisty path from A to B, and that the only easy way to tell if you reached B is to check if C is true. So you assume: "if C, then B"; but that's false when you started with "if B then C". Maybe D also causes C, and you went nowhere near B. (Or worse yet, maybe you calculated B a thousand times over, and it was idempotent, and you don't see cycles you wasted. And maybe you trashed E, F, and G while you were at it.)

What's my point? A REPL can be a quick cure for confirmation bias.

latency

Today I also ask a lot of questions about latency; printing after the fact doesn't help there, unless latency info was captured somewhere

Well, if your interpreter show you the start/stop time and the time of each print request, this would give you some latency information..

of execution

Latency to execute, not latency to print. (Latency to execute printing is a special case which now rarely puzzles, since it's either cheap byte-wise writes to a buffer or very expensive i/o calls; it's only interesting if you manage to use an async buffered i/o api with async flush, where flush blocks your writer for async completion.)

It would also be nice if a language had (possibly optional) latency tracking for sync and async calls, independent of profiling. In practice, some latencies must be tracked at runtime in deployed production systems, since they're needed for diagnostics. It's especially important in async code, where patterns can be hard to see.

Generally you want a histogram of actual latencies observed, which implies a lot of space for statistics relative to other call frame overhead. So you could not afford having it on by default for all calls globally.

Breaking down Sean's remarks

REPLs are rationale!

Interactive programs (REPLs) are "self-adjusting computation" (Acar et al.).

What sort of benefits do we derive?

1. Persistent data structures -- in general, very valuable
2. Ability to ask Provenance questions -- the essence of many debugging questions
3. Ability to ask What If questions -- the essence of many exploratory programming activities

Unfortunately, there is some computational limits on what we can ask :(

The relationship between

The relationship between REPLs and self-adjusting computation (or kinetic, incremental, and other more traditional terms for it) is tenuous, at least for the the interesting parts. E.g., Sean's comment. You might want to pick a different citation or say more if you were trying to clarify his comment.

If you really meant that optimizing REPL interactions with incremental computations is a big deal, you might be interested in Philip Guo's recent work about incrementalizing (well, memoizing) Python for big machine learning / data processing scripts.

Self-adjusting REPL

Reinteract is a Python REPL that re-executes code when you change previous commands. Screencast. Doesn't do any memoization though (AFAIK), and sometimes the algorithm for determining whether things are stale doesn't work.

reflection

In any sufficiently reflective system a REPL is just (bells and whistles aside) a trivial utility function.

Also, good debuggers tend either to be approximations of REPLs, or special modes into which you can toss a REPL.

To argue against the value of a REPL, therefore, you must make at least a case against read, eval, print, looping, or debuggers.

REPL as an afterthought

Re: comment 68462:

In any sufficiently reflective system a REPL is just (bells and whistles aside) a trivial utility function.

True in theory, manifestly untrue in current practice. To make the quoted claim look more debatable than perhaps it is, let me rephrase it ever so slightly: “In any sufficiently reflective system, a REPL can be tossed in as an afterthought and it will Just Work™.”

Let's review the record:

Ruby

Ruby decides at compile-time whether something is a variable name or a method call. The way it does this is by keeping track of which symbols have already been used as variable names (by appearing on the left-hand side of an assignment, for example).

When a script is run, it is compiled first, then executed. However, when you type code into IRB (Ruby's standard REPL), your code is interpreted line by line. This may lead to situations where a symbol is interpreted as a variable when run in IRB, and as a method when run as a script.

Scheme

Says Marc Feeley: “It is sad that as a community we can't even agree on such a fundamental thing.”

Forth

You can't use IF … ELSE … THEN interactively. It can only be used in a definition. (In Forth lingo, IF has no interpretation semantics.)

Smalltalk

If you define class variables (as opposed to instance variables) and decide they should be initialized eagerly rather than lazily, you have to keep in mind that “class-side initialize methods are executed automatically when code is loaded into memory, they are not executed automatically when they are first typed into the browser and compiled, or when they are edited and re-compiled.” You must remember to type Foo initialize in a workspace and Ḏo it.

Common Lisp

Don't even get me started.

Kernel

Kernel has a nice story for interactive use (fexprs solve the issue of macro redefinition; first-class environments give a nice semantics for the toplevel.)

(John Shutt doesn't like the term reflective, though.)

REPL triviality

In all of the languages that you mention a REPL is a pretty trivial piece of code -- it's the evaluator that is the main source of the inconsistencies and inconveniences.

evaluating evaluators

Re: comment 68467:

In all of the [language implementations] that you mention, a REPL is a pretty trivial piece of code — it's the evaluator that is the main source of the inconsistencies and inconveniences.

As has been pointed out before, some of those languages would be well-advised to trade in their evaluators for calculators. But let's not digress. Partons du point où nous sommes. Which, I believe, is right about here:

The revised claim:
In any language runtime that sports an evaluator, a REPL is just a trivial utility function.

Vacuously true.

gasche’s earlier point still stands: “It is very hard to design a language with a good story for interactive programming.” Just as threads cannot be implemented as a library, so too interactive interpretation semantics cannot be bolted on to the compiled-execution semantics. You can kinda sorta make it work but none too satisfactorily.

a bad eval is a defect

Sure: it is hard to make a good eval.

If you want to state my point a little more strongly than I did at first it would be something like: defects in eval are defects in language design. The rationale for that is that on the one hand, REPL is trivial if you have the four acronymic basic elements -- and on the other hand if you lack those, the people who write your debugger will strive to close that gap as best they can. It's similar to a Greenspun-style argument about sufficiently complicated C programs or what have you ....

eval gone bad

Re: comment 68473:

… defects in eval are defects in language design

Just to make sure I understand, would the following comment be considered a good illustration of the above point?

… Forth, unlike Scheme, has not inherited an everything-is-batch-compiled-always fetish, like Scheme and, consequently, has never abandoned the idea of presenting the core “VM” of the operational model in a fully reflective way.

re: eval gone bad

Yes in the sense that I think Forth is more advanced than (standard) Scheme at "eval" and part of the evidence for that is the "gap" in Scheme's model between how macros work and how macro-less scheme works.

JIT

interactive interpretation semantics cannot be bolted on to the compiled-execution semantics

It should be easy the other way around: have the JIT compiled-execution semantics follow the interactive interpretation semantics.

In formal terms, you want a

In formal terms, you want a bisimulation between the operational semantics of your source language and the one of the compiled language: the forward simulation (source->target) tells you that, what your source program can do, the compiled program can also do, and the backward simulation tells you that the compiled program won't do anything you can't express in terms of execution of the source program.

Aids development at expense of reliability

My main experience with REPL development has been in Prolog, but it is also fun in Haskell, Python or even in C. The only difference is how much leg work you need to do to fake it in a language that doesn't look like it supports it directly. That leg work is normally in the form of manual serialisation of state so that you can squeeze a compile into the cycle.

In terms of hackability and discoverability it does rather limit your audience of users to programmers. Even among that audience the power and flexibility to make drastic choices does make it easier to break a system than it does to unbreak, or extend it.

The part that I would disagree with:

Sure, it's fast. And fun. But let's get real: interactive development has no impact on requirements analysis, specification, software design, documentation, or even test. Fast tools make for fast debugging, nothing else.

Working with a REPL is a matter of playing what-if. This is a powerful tool for debugging, but it also has a real impact on specification and design. The difficulty with either of these activities without access to a semi-functional prototype is that that they must be performed cold. In both cases the planning aspects are handled more productively if those activities can be done interactively. Playing what-if with an interpreter allows that level of interactivity.

Experience suggests that there is really a pair of nested loops that get used in this way. Some like (Read Evaluate Print Loop) Freeze Archive Document Loop. The first part roughly corresponds to playing-with / prototyping a system and the second part involves turning the bag of assumptions that results into something more enduring.

more like, “I hate debuggers”

Come to think of it, Ganssle makes a lot of good arguments, if you read his rant as directed against debuggers rather than Forth:

Debuggers suck. I know that statement is going to churn up a lot of hate mail. Advocates of debuggers are as passionate about their [babies] as Windows-haters are about Microsoft. I've never seen a tool that has such passionate devotees, astonishing when you remember we're only talking about a [development aid].

A debugger is a very satisfying environment for a programmer. It's totally interactive. […] Adherents crow about how productive they are working with a tool like this that imposes no delays on their work.

Sure, it's fast. And fun. But let's get real: interactive development has no impact on requirements analysis, specification, software design, documentation, or even test. Fast tools make for fast debugging, nothing else.

How much time do we spend debugging: 80% of the project? I sure hope not. A well-designed, carefully coded system should consume no more than 20% of the schedule in debug. Healthy software organizations need less.

Had he made this argument, Ganssle would have been in good company. As we all know, real programmers — including some of our own — do not use debuggers. It's true, they just don't. Echoing Ganssle's argument, Niklaus Wirth noted the following in Good Ideas, Through the Looking Glass:

The exercise of consientious programming proved to have been extremely valuable. Never contain programs so few bugs, as when no debugging tools are available!