Simple Question (I hope...): Forward declarations vs. "letrec" style or ML "and" style constructs

Is there some important semantic distinction I'm missing - if we only consider _function_ and, thus, perhaps algebraic type _constructor_ definitions - or does this just end up being a simple matter of syntactic convenience or inconvenience (e.g., imagine mutually recursive object "methods" facilitated via different class "forward declaration" - just an example for which some typical Scheme "letrec" or "fix" styled syntax might be challenged).

Any and all sagacious wisdom greatly appreciated. Thanks!

- Scott

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Explicit vs implicit recursion

Letrec and other such declarations make explicit which definitions are mutually recursive, and thus are subject to some combined fixed point construction, while with forward declarations that is not apparent, so it's less clear what it actually means semantically. In ML and similar languages, recursive bindings also affect typing.

In Haskell, where recursion is not explicit, the problem is solved by requiring inference of recursion groups prior to type inference. Presumably, that approach would also work with forward declarations. In that sense the difference you ask about just boils down to implicit vs explicit recursion.

Eh?

Letrec in Scheme, and its counterpart LABELS in Common Lisp, don't require mutual recursion, they simply permit it. Is it otherwise in ML or some other language?

Sure they do...

All Andreas means is that if you write something like:

(let ((foo 1)
      (bar 2))
  (letrec ((foo (lambda (x) e1))
           (bar (lambda (y) e2)))
     ...))

then both foo and bar are bound to function definitions inside the body e1 and e2, shadowing the outer bindings to 1 and 2. You don't have to use the inner binding, but you definitely can't use the outer binding anymore.

More importantly

More importantly, semantically, there is some form of fixed point involved with letrec (in Scheme e.g. implemented by tying the knot via mutation). Whether the bindings actually use all recursive references is a separate story. In that sense, "recursion inference" will be more precise (but also more fragile wrt program evolution).

Exactly

I concur. A few more points about design for recursion facilities:

- I believe it's important, for any kind of binder (for which recursivity is meaningful), to have *both* a recursive and non-recursive binding form; the choice of which form is made more convenient to use by "least effort" default is left to language designers, but not having both is a design failure¹.

- Given that recursive declarations have a different (and more complex) semantic status, I believe that forward declaration should not automatically entail implicit recursivity (despite their ability to make mutual recursion transparent). There should be a design space for explicitly non-recursive forward declarations, and this should be available to the user. Again, designers choose which one (recursive or non-recursive forward declarations) is easiest to use in a given language.

¹: a failure that is present in Haskell for terms, and in ML for types; in both cases there is only an (implicitly) recursive binding form.

Not sure I agree

I'm not sure I agree. I find the explicit recursion in ML quite a nuisance in practice. It tends to force a structure on your code that often is not the most practical, by e.g. spatially separating auxiliary definitions from their point of use. Although the semanticist in me appreciates the merits of explicit recursion, Andreas the programmer often wished to have something more Haskellish. There are two problems with that in ML, though: (1) side effects, and (2) modules.

(Btw, a bit of nitpicking: it's not entirely true that in "ML" type declarations are always recursive. While that's the case in OCaml, 'type' declarations in SML are not, only 'datatype' declarations. Whether that distinction is any better is arguable.)

Implicit / Explicit again

I think this is yet another instance of the implicit/explicit balance, where the best choice is a keyboard syntax that leaves recursion implicit, but edit-time feedback that makes the presence of recursion clear.

OK, with that I strongly disagree

Because edit time does not matter. Read time does. Hence, I think that all those trendy ideas for redefining crucial aspects of language design into IDE concerns are completely misguided. I'm a strong proponent of the old-fashioned idea that code should be perfectly readable without fancy tools.

I don't take your point

You start reading code as soon as you've entered it, so read and edit times are essentially the same. So I don't follow your point. Can you elaborate?

You can alternatively consider my point of view in the following way: semantically, recursion is important. When reasoning formally about the correctness of our programs, we don't want to have to reason about what recursion inference should decide our programs mean. It's preferable to simply observe what results were produced by such inference and then reason formally about them.

BTW, I agree with you that code should be readable and editable without special tools in a text only format. My point is that the fact that implicit resolutions can be resolved in the IDE and made explicit should inform the language design -- we shouldn't worry so much about leaving important things implicit if they can be explicit, at read time, in an IDE.

Edit times < read times

Yes, edit time is also read time. But not the other way round. I don't just read code in my editor, but I do so in diffs, in code review tools, in repo browsers, in discussion forums, and even on paper. In none of these circumstance can I rely on IDE support.

Btw, I'm only disagreeing with your general IDE argument. For the case of recursion specifically, I don't think it is all that relevant for readability to make it explicit. As I argued in my reply to Gabriel, it can even be harmful.

I agree with you that code

I agree with you that code should be readable and editable without special tools in a text only format

Why? This is holding back progress because it is incredibly difficult to build nice tooling on top of text. Even something trivial like syntax highlighting, code completion and renaming a variable become hard problems.

If you allow the user to

If you allow the user to type x+2*y with proper order of operations inferred, then you support parsing text. Might as well allow the user to paste in larger blocks of text from the web, etc. The important point for tooling is that tools act on a post-parse AST.

The input sequence to get a

The input sequence to get a particular editor to produce the right AST is not necessarily a human usable serialization format. For example if we serialize rich text documents from a word processor that way, you don't get a usable format. Or consider having a color picker in the IDE, you probably don't want to serialize that as the position of the mouse clicks that resulted in a color being picked. So you definitely need to do extra work to support this.

I suppose there is nothing wrong with a text based serialization format, but there are inevitably trade offs between ease of use for humans and ease of use for computers. You need to be careful not to let the mindset of the text serialization format influence what you think the IDE can do. For example renaming a function foo to bar is a semantically different thing than deleting the function foo and inserting a function bar with respect to version control. In the former case you want to automatically rename calls to foo to calls to bar, and in the latter case you want to raise a merge conflict saying that foo got deleted and you're calling an undefined function. So you somehow need to encode the identity of AST nodes in your plain text representation. There is also the question of which representation is the canonical one. IMO the rule should be: when in doubt, optimize for the best IDE experience, not for the best notepad experience.

From an implementation perspective, ditching plain text is much easier. Just dump the AST with your language's built in serialization method.

Parsing is some extra work,

Parsing is some extra work, but I think it's worth it as long as general tooling doesn't need to mess with it My argument that you're parsing if you support order of operations doesn't mean you're parsing the raw input stream. Presumably backspace isn't part of your grammar. Similarly for mouse clicks. I agree with your middle paragraph.

I'm not sure which example

I'm not sure which example you have in mind of definition-site choice forced by explicit recursion. On this topic, in OCaml I regret the absence of the ISWIM "where" keyword that allows post-use definition that is often convenient, and orthogonal to recursion choices. And I would be fine with a choice where "let" is recursive by default and you have to use "let nonrec" to get the other semantics (that's probably the right choice in Haskell where non-termination effects due to ill-defined recursion is considered acceptably omnipresent thanks to lazyness), the point is that having both semantics available is important.

(Re. nitpicking: I don't think the recursion difference of type/datatype is intentional, rather a side-effect (sic) of datatype's generativity and dangers of equirecursion. SML type definitions are recursive whenever possible.)

On cases where non-recursive let or (data)type is better:

  let nonrec x = normalize x in ...

  open M
  type state = ...
  type nonrec t = From_M of t | Internal of state

Example

The kind of example I have in mind (and stumble over quite frequently in, say, a compiler) is this:

let rec f1 x =
  ...
and f2 y =
.
. (* 578 lines of N mutually recursive function definitions *)
.
and fN z =
  ...

where I then need some smallish auxiliary definition(s) only intended for f12 and f13, and perhaps a few others for f3 and one for f20. They will all be separated 500 lines from their actual use sites.

If those auxiliary definitions are merely lets (and not e.g. types), have a value as their RHS, and don't have to be polymorphic, then you can get away with including them in the let rec. But often enough that is not the case.

I don't think the recursion difference of type/datatype is intentional, rather a side-effect (sic) of datatype's generativity and dangers of equirecursion.

Well, how would that not be intentional? It has everything to do with there being no unique fixed point for recursive type equations. ;)

Can you explain your example? I'm afraid I don't quite get it.

I suspect your example is

I suspect your example is not related to a particular declaration style but universal. My own workaround would be to move `f12`, `f13` and `f20` to be rather close together and define the auxiliary function as part of the mutually defined set of declarations (or before the block if it doesn't need to be recursive). I'm not sure how any style of definitions would solve the problem of "if I'm needed by several large terms, I'm far from at least one of them".

In "not be intentional", I meant that I don't picture writers of the SML definition thinking "oh right, datatypes are recursive by default, if someone wants a non-recursive binding form they'll hopefully be able to use `type`". That was an ill-informed (surely you know better about the SML definition) guess at a subjective and ultimately unimportant *interpretation* of a design choice, so I'll just shut up.

In the examples, using a recursive binding form would fail to convey the desired semantics. In the module example it would have been more convincing to use `include` rather than `open`, because here `type t = From_M of M.t | ...` is an easy workaround to avoid unwanted recursion.

Typically

Typically in those cases the fN correspond to a bunch of mutually recursive datatypes, and for readability I want to keep the ordering of their definitions.

It's certainly true that there are cases where some auxiliary functions are needed less locally. But IME they then also tend to be more general abstractions, and it is fine to define them elsewhere. Sure, there is no clear-cut distinction between these cases, just saying that this has been one of the most frequent annoyances in my programming experience with ML. That said, it's a luxury problems to have, compared to what you have to put up with in mainstream languages. :)

The programmer in me . . .

tends to agree with the semanticist in you. I find that readability is materially increased by having a predictable and uniform order in which definitions build up. It more than compensates for the occasional annoyance of mutual recursion being more difficult than it would be otherwise.

What about member function defs (type class defs, etc.)???

Well, let's recognize that there are definitions similar to function definitions but syntactically grouped differently than arbitrary function definitions. Think of a class declaration (a host of forward definitions often accompanies by object storage size definitions), signatures or interfaces, etc? Or typeclass definitions or definitions in Haskell-like "where" clauses (easier to grok than the first two examples).

Not all mutually recursive "function" or "method" definitions can simply be defined adjacently with some "rec" or "letrec" or "and" syntax marking them as explicitly self-recursive.

In these other cases (like class declarations/interfaces permitting mutual recursion in actual definition bodies), syntactic adjacency is simply NOT an practical option minus syntactic/code organization flights of newfound fancy.

So if a language permits interfaces/signatures/class declaration "forward declarations" permitting mutual recursion among the action "member function" (or whatever) definitions, is it a lost cause to require labels/letrec/fix type syntax and definition adjacency among "mere" function definitions?

At first blush, this seems like a vestigial hold over from simpler (function decl/def) days that no longer are so simple today.

Is there an example of a remotely successful language allowing typical OOP class decls and open recursion that requires some fancy "rec" or "letrec" or "lables" or "fun/and" style fancy adjacency syntax for mutually recursive OOP member functions?

If not and if we are not "retro" language designers, why should we simply not abandon the simplicity of "letrec" adjacency declarations of yore?

- Scott

p.s. Depending upon simplicity of compilation scheme and desire for loop efficiency, I can see utility for purely non-top-level, local declarations/self-recursive definitions which devolve into loop logic. But then again, so what? Sounds like a designer who wants their compiler to be an order of magnitude simpler than an everyday IDE? Just being provocative here. I have a keen appreciation for brutal simplicity as well. Tough calls, IMHO....

In these other cases (like

In these other cases (like class declarations/interfaces permitting mutual recursion in actual definition bodies), syntactic adjacency is simply NOT an practical option minus syntactic/code organization flights of newfound fancy.

I'm not convinced by this. Can you provide an example demonstrating the problem?

In ML, a general solution to "recursion between this and this" is to use recursive modules. They do partially lift the adjacency requirement, but trade it for an explicit "projection out of my recursive self" mark on identifiers that are going through the module knot. More importantly they provide a generic "mutually recursive block of declaration" syntax where "... and ... and ..." syntax are traditionally crafted for a single style of definitions (all types, all values, all classes...). However the semantics of the construct is made no simpler, and the problems with recursive declarations still happen at this level.

Another problem with recursive modules is that they are hard to define modularly (split into subcomponents that can be defined separately then somehow connected together). MixML's mixin modules are an elegant solution to this problem, at the cost of making recursion more pervasive (if you don't take steps to restrict it, you may have recursivity appear whenever you connect/merge two mixins), meaning the metatheory is again quite complex.

My problem with your argumentation is that it gives me the impression that you use a *syntactic* argument (mutually recursive blocks are not convenient to write) to hide a *semantic* problem under the carpet (it's hard to formalize what this highly-recursive spaghetti declaration means).
As long as you do the formal work of coming out with a proper semantics, yes, feel free to let your users write highly-recursive stuff as they see fit. But at some level they will have to understand what this mean as well, and if the answer is too complex (or too fragile) maybe they'll want some mean to declare simple, unproblematic not-too-recursive definitions as well.

(Besides, even for class and interface definitions, there surely are situations where you would want a non-recursive declaration syntax, to be able to shadow a previous name with a definition that uses it. It's not a *common* scenario, but when it does happen you're relieved to have a non-recursive form at hand, because the workaround, if it exists, is ugly.)

All recursive

If anyone wants to see a language with universal recursion, look at Felix. Felix does not allow function prototypes, so forward declarations cannot exist. Lookup is setwise (like labels in C). Order of processing files is irrelevant. Same applies to types.

I find this quite superior to Ocaml's compilation model, which ultimately is pretty disgusting: how can you have a functional programming language that's so dumb it cannot handle arbitrary separation and compilation of functions, except on a small scale?

The Felix compiler is written in Ocaml and the actual set-wise lookup rules live in a huge nest of let-rec functions which cannot be decoupled or split out into separate modules without major fragility involved with passing closures to simulate recursion. Heck, C and C++ have no problem separately compiling function definitions in any order you please.

Let-rec basically sucks. It prevents factorisation, and it improperly forces non-recursive functions into a recursion, thereby failing to be complete or honest: it's an impediment in practice and fails to model the underlying concepts in theory.

Recursive modules do not solve this problem: they make it worse by dragging many unrelated things into the recursive knot.

Fully global recursion, however, has many deficiencies. The most obvious is hijacking: modifying a function depended on by another without realising the impact. This is particularly bad in Felix where functions can be both polymorphic and overloaded.

The bottom line though: in my experience, global recursion is better than micro-recursion (let-rec). Now please show me a way to specify an approximation to the intended control flow graph that is in between these extremes!

[And similarly for types]

Not so

Heck, C and C++ have no problem separately compiling function definitions in any order you please.

That is not true. C++ has severe problems with that as soon as templates are involved, or classes, (e.g. constructors of mutually recursive classes?), or a few other things I cannot remember right now. Other kinds of declarations cannot be recursive at all (typedef, plain values, etc.). Moreover, even in the simple cases that work you need to forward declare everything and there is no type inference to start with (let alone polymorphic one). Templates, in turn, cheat around most of the problems of type checking and compiling mutual recursion by simply doing no type checking or compilation.

Setwise lookup?

I'm not sure what you mean by "setwise lookup", and would be interested in a clarification.