Are first-class environments enough?

Everyone (e.g. in the LtU discussion back from 2010) seems to assume that first-class environments in a Scheme-like lexically scoped language are sufficient to implement all known module systems and then some. Still, it strikes me that implementing renaming of imported bindings (R[67]RS and Chicken (rename), Racket (rename-in)) with just first-class environments is impossible in the presence of mutability, even if foreign code is forbidden from mutating things inside the module. Even (only)-style shadowing is impossible without built-in support specifically for it in addition to plain environment inheritance. Introducing all this into environment semantics makes the whole idea look much less clean than it initially did. So, is this better done with stronger things like first-class locations? Or am I missing something obvious?

There are suggestions that MIT Scheme indeed implements its module system using environments, but I couldn’t really understand how (and whether) it handles this issue. The description in the Kernel language report is the best reference I could get on the topic.

Finally, note that this question is orthogonal to whether mutating exported bindings is a good thing to do from the stylistic point of view. If a language has mutability and a module system, it’d better be consistent in how they interact — a negative example here being the behaviour of bindings created with from module import ... in Python.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Not really

You are assuming that "environment inheritance" means that the connection between environments always takes the form "if I don't know the name, ask my parent". But that need not be so: there can also be connections of the form "if my parent knows this name, I do not know it". That allows you to implement "(only x y z)" by placing such a negatively connected environment between parent and child, at least in the case that all environments but the most recent are locally immutable.

Maybe I don’t quite get it

Honestly, I don’t quite understand your idea of negative connectives. (Say I want environment E to import the bindings x, y and z from environment F; which relations do I need to establish? The “?” in the “E → ? → F” chain needs to accept responsibility for x, y, and z, yet deny it for any other name.)

Still, it seems to me that expanding the set of available parent-child relations has an uncomfortable consequence that you essentially get only the things you have explicitly built into the model: there needs to be one new connection type to implement (only), another one for (rename), yet another for (prefix), etc. This looks too specialized and simply not right. (Am I wrong here?) At the other extreme, there is a possibility of introducing an arbitrary mapping on the way from the child to the parent, but this seems too general to be manageable. What I was initially looking for is a middle ground between these.

Good question

I've also come across this problem and don't have an answer.

One way to work around it would be to make bindings immutable and use ML-like reference cells for mutable values.

Whoops, no (except)

First, restricting mutability to explicit cells/boxes/refs/whatever is, of course, a difficult decision. The author of Disciple has more to say about this issue, but the bottom line is that this makes it difficult to mix code that expects things to be mutable with that that expects them not to be. The problem may be less apparent with latent typing, but it still exists.

Second, it looks like environments, too, [the sets of names rather than just the values they map to] have to be immutable, regardless of whether one uses explicit reference cells or leaves them implicit but accessible when needed (which is what I meant by “first-class locations” in the initial post). Otherwise, (only) and (rename) become expressible, but (except) and (prefix) and all things that potentially affect an infinite set of names are still out of question. In any case, only by enumerating all the bindings (! — say, Kernel deliberately makes this impossble, and there’s a point there) in the affected environment can they be implemented.

Reflection

In reflective Lisps, to the best of my understanding (though I don't work with those much — Kernel aggressively avoids reflection), each environment is internally a procedure that takes a symbol as input and returns a value as output. There's clearly nothing you can't do using that model, though it's not to my taste as I find it overpowered and therefore impossible to stabilize.

This is one (extreme)

This is one (extreme) alternative; and yes, it looks unmanageable to me. On the other side, Kernel’s (provide) understands “providing” in a different manner from most Scheme module systems: it exports values rather than bindings, meaning that any mutations inside are invisible from the outside. (OK, provide is not a module system, but it is still as close as one can get.) Is this a conscious design decision? Environment semantics of Kernel make it impossible to do otherwise: mutable bindings are importable, but only if you ask for the whole environment; you can’t mix and match them.

Access to Kernel environments

Access to Kernel environments is carefully regulated, as a matter of stability. In the classical Scheme approach, if you can see a binding, you can mutate it; but Kernel has fexprs, so even things like $if and $define! are determined by bindings. One wants to be able to prove in static analysis that certain bindings in certain environments are not subject to mutation. My approach is that when mutating the bindings of an environment, the mutation is local, with no effect on the environment's ancestors; and there's no way given an environment to acquire its ancestors as first-class objects, nor given a combiner to acquire its static environment.

One thing that isn't supported by Kernel's environments is making some bindings visible without making others visible. So if one wanted to make certain bindings visible including their later mutations, while keeping other bindings strictly private, one would have to plan out one's environment hierarchy rather carefully.

Mutable, Immutable, and nonlocal bindings -- and sanity.

Yes. In a world where even the basic syntax is first-class functions, you have to be very careful about mutations.

I treat bindings as immutable by default. To get something that can be changed with (set!) you have to declare it with (define-variable) rather than just (define). You can shadow imported definitions with local ones, or with definitions from environments imported later.

Each "Import" form creates a constant in the local top-level environment referring to the environment in which the definitions from that import are made. (set!) for any mutable-but-nonlocal definitions requires the corresponding environment argument.

Within a module, you can (assert #coherence ) which amounts to an assertion that all the bindings imported from are locally visible, ie, have not been shadowed by any user-defined or other nonlocal bindings. The environment within which all the basic constructs (if, lambda, etc) are defined is globally known as #lang, so you can say (assert #coherence #lang) at the top of a module to ensure that you've inherited only the bindings for those elements that are defined in #lang.

IOW, I've gone out of the way to ensure that the code in a given module can rely on definitions it needs, deny other modules the right to modify its bindings, or refuse to compile under conflicting definitions. Also, I've gone out of the way to ensure that nobody can mutate a binding other than the specific binding (in the specific environment) that they intend.

Kernel approaches stability

Kernel approaches stability with first-class environments by means of structural properties of the language model, and at the level of environments rather than bindings. Your approach, if I'm following it, uses explicit rather than structural constraint, at the level of bindings rather than environments. It's not exactly a secret that I prefer structural over explicit constraints, and am quite hawkish in my pursuit of the Principle of Necessity ("Entities should not be multiplied unnecessarily", which William of Ockham used so effectively to cut his opponents' arguments to ribbons that it was later dubbed "Occam's Razor").

Iirc, something similar to your approach, rather than mine, was taken by (it took me a while to remember the name of it) Symmetric Lisp.

Boxes and Kernel

I model environments as immutable maps containing mutable boxes (i.e. Map VarName (MutableBox FirstClassValue)). This makes it straightforward to refer to the same mutable variable under multiple names in multiple environments: Just use the same box. It also makes it easy to construct limited environments which reference only the variables they need.

This approach is probably the same as Manuel J. Simoni's comment about "ML-like reference cells." Is it also what you mean by "first-class locations"? The boxes wouldn't necessarily be exposed to the programmer as first-class values, even if the language implementation uses this technique internally. (However, if this isn't exposed to the programmer, it might not be "enough" for some things.)

The description in the Kernel language report is the best reference I could get on the topic.

I'm not John Shutt, and I've only ever read about Kernel--I haven't used it--so I might make a fool of myself trying to say things about it here. :)

Kernel's environments are limited enough to make mutable exports a challenge, but I think they don't quite put them out of reach.

The way I interpret the R-1RK, a first-class environment is a mutable structure containing--but not fully exposing--a map of local bindings and a parent environment (possibly implemented as MutableBox (Map VarName FirstClassValue, ParentEnv)). The parent environment isn't first-class, so it may contain more data (e.g. a binding for a keyed static variable) or less (e.g. no parent of its own). A programmer can mutate a first-class environment, but only by adding or overwriting local bindings; there's no way to remove local bindings or modify any part of the parent (not even a Scheme-style nonlocal set!).

Since this environment model combines the concerns of name mapping and mutable state, Kernel programmers who want mutable exports may indeed have trouble trying to define rename-in or only from within the features of the R-1RK. Suppose we want ($define! foo ...) to modify some variable from another module. For that to work, our own local environment must be (an improper ancestor of) the same environment people typically use to look up foo, and "foo" must be the same name they look it up by. That pretty much rules out either of these name management features.

Fortunately, if we're willing to use tailor-made variable setters like ($define-in-module-system! foo ...), then I think the R-1RK is accommodating. We can have the module system manage two environments for every module: The module's evaluation environment, and a parent of that environment which can change in response to these setters. During each setter call, the module system can internally translate back and forth between different variable names, factoring in all the rename-in and only directives.

(Incidentally, if we were willing to call tailor-made variable getters as well, we could use our own representation for environments, regardless of language support. I've done something like this in Arc.)

If a language has mutability and a module system, it’d better be consistent in how they interact[...]

Since Kernel intentionally doesn't support nonlocal variable assignments (a notable contrast from Scheme), it would actually be pretty inconsistent to be able to modify module exports, the most nonlocal variables of all. I know even less about Python than I do about Kernel, but it seems to have the same lack of nonlocal assignments, yielding similar consistency.

On a less language-specific note, I think a module system has enough to worry about without the pressure to be consistent with local code. Sometimes local code snippets assume reasonable things that would be less reasonable at the module level: That everything is written by the same person, that one snippet will always be kept up to date with another, etc. It would be very nice to have a smooth transition from one-off hacks to publishable modules, but some inconsistency is justified.

Several points here

First, not being able to mutate things in environments other than the current one would make closures much less fun [1], and Kernel does indeed provide a non-local set! — provided that one captures a reference to the environment object itself first. This makes a lot of sense from a compiler implementer’s point of view: keeping track of escaping environment objects is like what is already done for escaping continuation objects; as an additional consequence, I find Shutt’s $set! much easier to understand than the one from Scheme.

Second, by “first-class” locations I meant exactly that those MutableBoxes are exposed to the programmer; what Manuel Simoni means, I think [feel free to correct me, of course], is to make environments more like Map VarName FirstClassValue, but allow people to put SRFI 111-style boxes there if they really want mutability. And what you describe feels more like rename-out and export lists rather than rename-in and only respectively — which is better than nothing, but still inadequate. The inheritance model makes it impossible to share the mutable boxes/reference cells between environments other than via inheritance, and thus renaming becomes hopeless.

Third, the frustration of my first post comes not from the fact that other people can’t mutate the bindings I export from my module, but from that I myself cannot mutate exported things from inside the module so that others see it — or I can, as long as they import my bindings wholesale (by inheriting their environments from mine) without renamings or restrictions.

[1] E.g. it would make the make-counter example impossible (note the difference between the last lines):

Scheme:

(define (make-counter)
  (let ([x 0])
    (lambda () (set! x (+ x 1)) x)))

Kernel:

($define! make-counter
  ($lambda ()
    ($define! x 0)
    ($define! env (get-current-environment))
    ($lambda () ($set! env x (+ x 1)) x)))

If you want... badly enough...

If you want... badly enough... to export bindings from a module M and then be able to mutate them later from inside the module, you could set up a set of local bindings in an orphan environment E that is named by a binding in the internal environment I of M (also, E could be an ancestor of I if that's convenient for your purpose). Then, export a child of E. Those with access to the export can then see, but not mutate, the bindings of E, while M itself can actually mutate E since it has a binding in I for E as a first-class object.

Clarifications

[...]Kernel does indeed provide a non-local set! — provided that one captures a reference to the environment object itself first.

Ah, for some reason I hadn't thought of that as being significantly different, but there's something to what you're saying. The make-counter example would still be possible without mutating any environments or variables...

($define! make-counter
  ($lambda ()
    ($let ((x (list 0)))
      ($lambda () (set-car! x (+ (car x) 1)) (car x)))))

...but get-current-environment at least lets us avoid repeating (car x), even if we still have to repeat ($set! env x ...).

By the way, you and I have some terminology confusion. Kernel's $set! cannot modify any part of an environment except the local part, so I may or may not call it "a non-local set!" depending on the context of the discussion.

[...]as an additional consequence, I find Shutt’s $set! much easier to understand than the one from Scheme.

I think this is ultimately a matter of opinion, but I'll make some observations just in case.

  • Getting a variable in Scheme: First look up the variable from the environment chain, then unbox.
  • Setting a variable in Scheme: First look up the variable from the environment chain, then modify the box.
  • Getting a variable in Kernel: Look up the variable from the environment chain.
  • Setting a variable in Kernel: Modify (or insert) this variable in the most local environment.

Kernel special-cases the most local environment, and this breaks the continuity of the environment chain. In Scheme, if we use an empty (let () ...) to add no variables, it makes no difference. In Kernel, an empty ($let () ...) still establishes a new location to hold the local definitions. So in some sense, this operation has no identity element in Kernel. It becomes important to have a mental model of not only the variables but also the point of separation between local and nonlocal variables.

Fortunately, Kernel at least has some kind of equivalence between ($let () ($let () ...)) and ($let () ...). The local/nonlocal distinction only goes one level deep.

Second, by “first-class” locations I meant exactly that those MutableBoxes are exposed to the programmer [...] And what you describe feels more like rename-out and export lists [...]

Hmm, I was describing two separate things: First, if I weren't tied to an existing language design and I wanted to build a module system with mutable exports, I would find it straightforward to use maps of boxes. Second, I think Kernel environments are enough for a pretty convenient implementation of this kind of system, as long as we're willing to use a nonstandard framework to do module-level variable assignment.

Third, the frustration of my first post comes not from the fact that other people can’t mutate the bindings I export from my module, but from that I myself cannot mutate exported things from inside the module so that others see it [...]

Ah, if we do one without the other, it's pretty consistent with Kernel's scoping! Just like other nonlocal variables, an imported variable can change in ways the local code observes over time, even though that code doesn't have the power to enact those changes.

And what you describe feels more like rename-out and export lists rather than rename-in and only respectively — which is better than nothing, but still inadequate. The inheritance model makes it impossible to share the mutable boxes/reference cells between environments other than via inheritance, and thus renaming becomes hopeless.

(I'm replying out of order a bit.)

We can share variables by pushing the variable state around. That's what the custom setters would do internally. Here's some pseudocode:

Define `module-metadata`, a keyed static variable.

Define `all-modules`, a global repository of module information.

To define a module using <import-export rules> and <module code>:
  Create a new entry in `all-modules`.
  Create an environment by binding the `module-metadata` key in a
    standard environment. Let the value of the `module-metadata` point
    back to the `all-modules` entry.
  This new environment is the module's import environment. Store it in
    `all-modules`.
  Store the import-export rules in `all-modules`.
  Mutate the import environment so it *appears* to contain the
    variables indicated in the import-export rules. At this point it
    will just contain their current values.
  Create a child of the import environment.
  This new environment is the module's export environment. Store it in
    `all-modules`.
  Evaluate the module code in the export environment. Discard the
    evaluation result.

To execute ($define-in-module-system! <definiend> <expression>):
  Evaluate the expression in the dynamic environment.
  Look up `module-metadata` in the dynamic environment. If it doesn't
    exist, signal an error.
  Get the export environment from the `module-metadata`, and check
    that it's `eq?` to the dynamic environment of this call. If it
    isn't, signal an error. (This prevents nonlocal mutation.)
  For every variable in the definiend...
    Using the import-export rules in `module-metadata`, determine the
      true identity of this variable.
    For every module in `all-modules`...
      Using the import-export rules in that module's own metadata,
        determine whether the module imports this variable.
      If it does:
        Mutate that module's import environment accordingly.

I hope this code block doesn't affect the forum width too much.... Fortunately, this comment is at a shallow nesting depth.

The conceptual difference

The conceptual difference here seems to be that you're thinking of an "environment" as simply a stack of bindings, whereas in Kernel an environment is a set of ("local") bindings together with a list of references to ancestor environments. The ancestors of E are not part of E, any more than the cdr of a non-empty list is part of the first cons-cell of the list. This is indeed a matter of distinguishing between local and non-local bindings. If one views an environment as a stack of bindings, it isn't possible to add a binding to E without ending up with a different environment from E.

I'm honestly unsure whether you're grokking this distinction. On one hand, your statements about technical consequences seem spot-on. On the other hand, various conceptual descriptions seem off-kilter. "The local/nonlocal distinction only goes one level deep" — this seems a little like saying that the car/cdr distinction only only goes one level deep. It's true there'd be equivalence between ($let () ($let () ...)) and ($let () ...), since there's no way to observe the theoretical outer local environment of the nested expression. "So in some sense, this operation has no identity element in Kernel" — that's like saying that the increment operation, 1+, has no identity element. Of course it doesn't; it always adds one, never more and never less. Likewise, $let always creates one local environment, neither more nor less.

The point of all this (pretty much) is in the elegant locality of mutation it affords. For example, all the standard bindings of Kernel are local to the root environment, which is never available as a first-class object to programs. Instead, programs are evaluated in a standard environment, which is, by definition, a child of the root environment. When a program P runs in a standard environment S1, no matter how much P mutates S1, none of those mutations will have any affect on the root environment (though they might shadow some of the bindings of the root environment, making them no longer visible in S1). This means that creating new standard environments S2, S3, etc. is very inexpensive, and the various Sk are unaffected by each other. The traditional Scheme alternative makes it impossible to make a binding visible without making it mutable (unless one were to either make certain bindings immutable by anyone, or introduce some baroque mechanism for giving permissions for mutating particular bindings).

Re: the conceptual difference

My comment below was in reply to that, sorry for misplacing it.

Comparisons

"So in some sense, this operation has no identity element in Kernel" — that's like saying that the increment operation, 1+, has no identity element. Of course it doesn't; it always adds one, never more and never less. Likewise, $let always creates one local environment, neither more nor less.

"Of course" this is true if we're only talking about Kernel, but my point is that Scheme is a different story.

(Well, I'm bending the truth here. I'm really thinking of a variant of Scheme which doesn't have internal define forms!)

I mention this empty let example in order to pin down a specific way in which I find Kernel more unintuitive than Scheme, so that it can perhaps be more useful than an empty emotional judgment. I do appreciate your reasons for this complexity in the broader scheme of things, especially the way it enables certain static guarantees.

Personally, when I think about fexprs, I lean toward a Kernel-like design but without any environment mutability. This restores the simplicity of let, making it possible to represent (or at least conceptualize) environments as maps. It also retains the hygiene and static guarantees. However, $define! stops being an option for establishing top-level definitions, and no alternative has really stood out to me.

Yeah, $define! (or define)

Yeah, $define! (or define) is always going to be awkward one way or another, from some perspective or other. I seem to recall the RnRS too have struggled over the years with how best to handle it.

In the absence of mutation, there's scarcely a difference between the two views of environments. And in the absence of first-class environments, the whole question is moot. [edit: okay, 'moot' is overstating; but not having first-class environments avoids a lot of awkward questions.]

Symmetric Lisp

Does the ALPHA form from symmetric Lisp count as a replacement for (define)? Looks like a separate construct (equivalent to e.g. (letrec (...) (current-environment))) or even a separate definitions language a la ML is needed if one cannot construct environments by mutation.

Defining requirements

I don't think $define! is absolutely necessary for programming in Kernel. The entire program can be a single top-level expression, and it can use fixpoint combinators or set-car! to accomplish recursion. Because of this, there are endless ways to "replace" $define!, depending on what we consider to be missing in this constrained language.

As for me, there are at least two contexts in which I'd prefer the full-powered Kernel language over this subset: One, the subset isn't great at a REPL, since the language standard no longer provides a stateful canvas we can use to build up a program over multiple commands. Two, there's no longer an obvious ritual for sharing code as a library.

Let's say I want Kernel to be a fantastic REPL language. At a REPL, when I write an unqualified name in a command, my intention is to refer to whatever that name might mean once I've finished writing my code. I want to try one possibility now and swap it out for another later on. So I would want Kernel to have stateful global variables, and yet I have no better idea than what it already supports... unless I'm willing to just enable nonlocal variable mutation and sacrifice the static guarantees.

Let's say I want Kernel to be a fantastic language for sharing libraries instead. That's a tall order; no existing language satisfies me for this purpose! Instead I've been pursuing my own module system, which involves dependent typing and an indexed database of installed modules. I think it's slightly inconvenient that Kernel has a local/nonlocal discontinuity, but my system will certainly have slight inconveniences as well, so I can't really recommend it as a replacement.

in very deed

I don't think $define! is absolutely necessary for programming in Kernel.

:-D  $define! belongs to the Environment mutation module, which is optional. The other two optional modules in the core being Pair mutation and Equivalence under mutation.

$define! belongs to the

$define! belongs to the Environment mutation module, which is optional.

Yes, I kept trying to figure out a way to work that in. :)

List of references to ancestor environments

[...] in Kernel an environment is a set of ("local") bindings together with a list of references to ancestor environments.

Oh, make-environment constructs an enivironment using any number of first-class parents. I did overlook that.

It remains that, the scope

It remains that, the scope of set! is different depending on whether you are inside or outside the empty let. I wonder if it would be preferable to make let define a transparent environment, which has no effect on the scope of set!, and reserve the use of solid environments to obvious semantic boundaries such as a function or module.

Kernel's $set! requires that

Kernel's $set! requires that you explicitly specify the environment to be mutated. So no, its scope does not depend on whether you are inside or outside the empty $let, but rather on what first-class environment you explicitly specify. Now, the scope of $define! does depend on whether you're inside or outside the empty $let; I see that as highly desirable, limiting the scope of a potentially dangerous operation. Likewise get-current-environment.