inline vs scatter/gather separate annotation

I want to explore a "what if?" question that involves separating aspects of code that are normally adjacent in programming languages, in the sense of being nearby in syntax and grammar. What if some parts are specified elsewhere? Remote specs might augment or replace local specs in code, when compiled against a particular build target, or when transpiled to a new inline representation where everything is a local spec again.

(What do I mean by inline? Roughly speaking, some contiguous sequence of local tokens in a legal grammar order. In a programming language, keywords are usually inline, modifying the meaning of source code nearby, per the semantics of grammar used.)

One metaphor that seems applicable is the idea of styles used in document text markup. The same way you can separate text styles in cascading style sheets from the documents using it, we might separate abstractions of code application and specialization from source code using it. For example, if you wanted to support a meta-object protocol (mop), a spec for what the mop means locally might be defined somewhere else, and applied only at compile or transpile time. All kinds of instrumentation for logging, profiling, and debugging, etc, might be put somewhere else where it interferes less with the basic stuff.

One objective would be to isolate accidental noisy stuff from essential basics, while another would be enabling replacement and update in new versions without updating the old original. When you do this to types of variables, you get something like generics.

You would want different parts of a code base to see different views, applying different styles and annotations to the same original. One part of a code tree might have permission to alter certain data structures, and see data types involved as mutable, while another part of the code tree sees only immutable types that cause compile time errors on any attempt to call API that would modify state.

Note when important information becomes separated, this amounts to a form of obfuscation. (Witness C switch statements turned into C++ virtual method calls that distribute each case into whatever file implements each relevant method.) If you use scatter/gather on code annotations, it's easy for a compiler to gather, but hard on a human being. So you would want to support generation of views with everything gathered in more inline form, just for human consumption.

I've been thinking about this idea a lot, but it's seldom a topic of conversation, so that's mainly why I bring the idea up, just to see if folks have interesting comments.

[Edit: I'm looking for comments on splitting or scattering code declarations, rules, and constraints, where gather in this context is code rewriting. While the scatter/gather metaphor comes from doing this to data, I'm talking about doing it to code. Applications to dataflow aren't relevant unless input and output is code. There's isn't a good word that means the opposite of inline, unless you want to go with plain English apart.]

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Designed for separating

I rather like the terseness of languages that lack static types.

But most code would work with static types, it's only a minority of code where the dynamic nature of the variables is used.

I bring this up because places where typing doesn't change the semantics of code are places where it could be separated out without making the semantics dependent on what's missing.

I guess I'm saying that if the language is designed so that the scattered information isn't vital to understanding the program, then nothing is lost by making that separation.

I'm somewhat reminded of

I'm somewhat reminded of recent musings on aspect-orientation.

I did say aspects of code in a similar sense

So far aspect oriented programming (AOP) has felt nebulous to me, without a clear or specific agenda, expressing vague intent instead. (Hang loose, avoid inheritance, mix and match something something.) If you noted a specific way it addresses or contributes to splitting code annotation into different places, so it can be optionally rebound in different contexts, I'd be happy to hear your observation.

aspect orientation

When I saw an HN post on code weaving in Golang, which mentioned aspect oriented programming (aop), I decided to read up on aop again, and found folks usually look to some feature in aop when they want any result like one I described. (Below I may quote from Wikipedia's Aspect-oriented_programming and from c2's AspectOrientedProgramming.)

So the conversation I wanted to have is indeed inoculated by aspect oriented programming, to the point I would have to draw comparisons, even though I meant something a bit different, which I will refer to as romp for rule oriented meta programming. (Yes, the verb to romp has funny synonyms when applied to code.) The concept I have in mind for romp is something like injecting macros based on explicit rules. Why use another term? Because aop was patented by Gregor Kiczales in 2002, so it's suicidal for any product to define itself in terms of aop, so you practically build infringement into the definition. Fortunately, I have little use for the notions of advice, join point, and point cut defined by aop, partly because they are too specific.

The general idea of meta-programming is to generate code, via code that produces code. Macros are the old-school concept that does this, because expanding a macro generates the actual code to be compiled or interpreted. Fortunately, macros are not patented. The term originates in the phrase macro instruction, meaning a big or high level instruction, so the word macro doesn't mean anything more exotic than big in the sense of expanding into smaller concrete instructions. Basically a macro is source code level abstraction, rather than runtime execution abstraction -- or at least, that is the opposition involved.

However, to ordinarily get free use of macros everywhere in code, you would have to write all code entirely in terms of macros, adding an obfuscating level of indirection, basically everywhere, just so you could change macro definitions.

The idea I had in mind is pretty simple. You ask, what if all the code had been macros? It's not all macros, but what if it had been? Suppose we make a tool that injects source level abstraction by inserting macros where now absent, but expanding into what it is now, unless you redefine the macros to do something else, usually to inject side effects like logging or statistics. This c2 quote is relevant:

In a sense, AspectOrientedProgramming is the opposite of FunctionalProgramming. The core mindset of FunctionalProgramming is computation without relying on SideEffects; in a sense, the core mindset of aspects is adding SideEffects. Aspects are like the DecoratorPattern applied to functions.

If new behavior injected by macros was unobservable, you might as well not have done it. So if you can see any effect, it amounts to new dataflow as a side effect -- perhaps an evidence trail about what happened in the original code, for example.

Note another use of code rewrite is to narrow types, making an interface appear to be a facade doing less, just to provably constrain part of a code base to less than all the features in the full interface. (Ordinarily, the motivation for information hiding is to enforce locality, by permitting some things only in well-known places, so analysis can be local instead of global.) This sort of interface rewrite supports exploratory verification that some sub-systems do not use features, even though possible, because you can type-check as if absent. The synonym of 'to romp' we might use here is 'to frisk', meaning to pat-down the code to see whether it carries any use of api it should not (or you hope not).

Injecting behavior amounts to advice in aop, but with macros it is just expansion, and since you can see the new expanded version, it suffers less from control flow obfuscation. Where you can inject behavior amounts to a join point in aop, but in romp (rule oriented meta programming) everything is a join point if you can write a rule pattern or specific rule that matches, so everything you understand is join-able, and doesn't need a new named entity. (When it is true of everything, it's not a distinguishing characteristic.)

I'm inclined to use Lisp as meta-language when specifying rule declarations, but the target for injecting macros can be any language, including C (which is my focus currently). It isn't necessary to make a meta-language executable in an imperative sense, since you can simply generate it dynamically, and get the same effect as executing. So declarative semantics is sufficient. You could have as many meta-meta languages as you like, if they generate meta-language in time for consumption by the compiler applying rules to inject macros.

Details in my approach to applying this probably aren't interesting, so I'll stop here since the above covers the basic idea, which might be restated as follows. You can define macros somewhere else, as annotations, without lacing them into code you want changed. Injecting macros can be done by rule, either quite general ones (e.g. patterns) or very specific ones (first use of X in function Y). How the result of macro expansion gets seen or used depends on your development environment.

HTML as runnable code, CSS as runtime extensions

I've been building up a particular approach I'd like to take to "isolate accidental noisy stuff from essential basics."

I see programming languages being good for something specific, namely for stateful UIs that are deeply nestable and mostly self-contained. By self-contained, I mean that they are only interlinked with the outside world in ways that are simple enough to boil down to a glossary of common notations (making it a language), which rarely or never change out from under you when you're trying to edit them. The parts that interlink with the outside world usually look like imports and embedded literals (e.g. a string or animated GIF in the middle of the code).

The imports don't necessarily have a known structure. As far as the user is concerned, the import can just refer to "that GIF" or whatever. If they ever need to resort to text editing, perhaps "that GIF" temporarily gets a GUID, and as long as they don't muck with the GUID, their text will keep referring to the same GIF.

In this way, programming languages are a lot like markup languages.

I'd say language runtime extensions are the CSS to that HTML.

By a "language runtime extension," I mean a service that acts as though it's a seamless part of the computation hardware the user is running their code on. It represents something the user is allowing the code to ambiently access, such as the very ability to do computation in the first place. While this is an inherently global kind of extension, if we want to install an extension in a more limited scope, we can install it inside a dynamic eval.

So, I'll go over this again from the bottom up: First we have a core language runtime and the hardware to run it, which rarely change. Then we have a set of ongoing services acting as extensions to that runtime. Whenever the user is maintaining any state that is particularly nested and self-contained, they'll probably use an interface that's a lot like a programming language. In particular, they may use something like a programming language to edit the behavior of their runtime extensions.

If we need to write code that is only the "essential basics," we can just write it under the assumption that there's a runtime extension installed that does all the "accidental noisy stuff" for us. Even if we think of it more as a compiler extension, it can still be a runtime extension that we happen to invoke during compilation.

Something that gets tricky is what should happen if the extension we're depending on doesn't exist. If we want extensions to follow the closed world assumption (CWA), then that lets us write extensions that detect their own error conditions and follow backup plans or provide graceful failure messages. If we want extensions to follow the open world assumption (OWA), then the user can trust that if an installed extension is working, it will keep working the same way even after additional extensions are installed. I'd like to support the good parts of both kinds of extension, but I haven't figured out how to reconcile them. I have the vague idea that I'd like to build two extension languages that can eval each other.

re: runtime extensions

Thanks for taking time to post your ideas. (I'll try to reward posters with feedback if no one else does, albeit perhaps a bit slowly.) Is there a summary sentence or paragraph that gives shape to the approach you want to build up?

I see isolation of accidental stuff as partly a naming issue, because when you separate things that refer to a common entity, they need a way to agree on names for it. (Or at least, a compiler must be able to figure out the name in common.) I had in mind tree-shaped scopes, named by paths, which works okay with a virtual file system, if a compiler consumes source code making statements about code inside a vfs. Presumably associating separated things is a problem everyone must solve with annotation, if context is not merely adjacency.

I see programming languages being good for something specific, namely for stateful UIs that are deeply nestable and mostly self-contained.

I rarely work on UI, so seeing PL as having a UI focus is hard for me. But I'm used to a related idea that interfaces are like languages in general -- that the API of a system is a language for programming it. So whatever you are driving, the way it is done has a kind of language in terms of entities, operations, and state descriptions.

The word extension occurs a lot in your discussion. Architectures oriented toward extension seem to get a lot of coupling in runtime details, with many dependencies, instead of (as one might hope) high independence of parts. Perhaps this is because communication is often expressed in highly runtime specific ways (you will be in my language as well as synchronous, etc) instead of something more message oriented.

Calling at the back door

Thanks for taking the time for that reply! I felt myself being more long-winded than usual in that post.

I could try to give a paragraph to describe my approach at the highest level, but I wouldn't know where to stop. I keep shifting my attention to different parts of the problem: Things like syntax; dispute arbitration; knowledge representation; knowledge communication orchestration; programs as bags of constraints that merely suggest causality flows rather than spelling out every step; and recently, memory layout.

Regarding the particular issue of separating accidental details...

An important aspect of my high-level approach is that I think of every program as having a back door. Its interpreter, runtime libraries, hardware, etc. may be compromised, and there's nothing the program can do about it. If the program opens the back door and bravely faces the person in control, then we can simplify I/O and error-handling designs. We no longer need a front door for I/O, and the program can directly ask how to deal with errors.

Where I said "language runtime extension" before, I'm referring to the set of tools the user at the back door has on hand. If a programmer wants to separate incidental details from the main program, they can write a main program that frequently consults the back door, and they can separately offer a supplementary tool. If a user invokes the program directly, they may find themselves overloaded with consultations, but using the tool, they can set up auto-handlers for most of them.

It's possible the program and its supplementary tool will be rather coupled to each other, but at least the separation you're talking about is achieved. The original developers may engineer upgrades either for the tool or for the main program independently. Likewise, dedicated users may use reverse engineering (and reading documentation, etc.) to engineer their own compatible upgrades.

I was about to say a few more things about naming and coupling, but I'm afraid of overloading this comment with digressions.

replumbing metaphors and door bindings

I like long-winded, in the sense it takes a few words to say anything interesting, so non-trivial worthwhile notions cannot easily be short. (For folks with long attention spans, hundreds of words are a natural length needing no summary. But point prioritization helps convey intent to focus on one thing or another. Central questions tend to be short, for example. So summary is agenda in abstract, and helps separate foreground issues from background subtext -- action from setting.)

Nice vein of organizational ideas you're exploring, about models of infrastructure. We probably need more well-known examples of model variants, so options come to mind more quickly as folks tune design ideas. I often work with people locked in a death grip with just one model, so other options don't register, and sound like speaking-in-tongues when I depart from the expected cliché. Monolithic centralization is often assumed, as if air travel is impossible without getting permission from the flight control tower, so talking about drones will sound like gibberish to someone thinking airport.

The back door metaphor seems useful.(A front/back spatial metaphor is somewhat horizontal, with front closer and back farther away, situating POV of a speaker at the front where an interface is, with backend away from interface). Sounds similar to trusted computing base (TCB), as far as you emphasize relying on something over which you cannot control, and therefore must accept the risks inherent in dependency. Since someone always has permission to go modify the trusted base, and that someone might be you when debugging for example, it doesn't help to have code on top make too many limiting assumptions.

A base or bed spatial metaphor is somewhat vertical, emphasizing a stack of layers, as in a cake, with trusted parts in a bottom foundation. I like the word bed for TCB because it's just one syllable, hard to confuse with other words, and rarely appears in tech contexts. So it would work okay in coining terminology about layers. However, one up-down orientation is somewhat monolithic, making the bottom feel more sacrosanct than it is, since moving a foundation can wreck a building. When coding you often do it anyway, so the base is not unreachable. Perhaps a fold spatial metaphor works slightly better, as if software is exotic origami with convoluted arrangement. If the bed is visible in the api where you can insert plugins, you can replace the TCB with dependency injection.

Software is very refoldable, so moving trusted dependencies around is not hard. The bed or back door of a module should be pluggable, and only needs to be folded the right way so access is easier, as long as you are that someone with permission to yank the tablecloth from underneath all the dishes anyway. (Of course, you don't want privilege escalation to be possible.) Folding software fluidly this way is really awkward in some architectures, when the bottom is riveted and cemented in place, so change involves too much demolition.

I've been working on painfully complex software many years now, so it's hard to think of it as one program rather than many cooperating independent entities. A model of lots of interacting parts is more fluid. But there's too much fragility (in what I see now) caused by some parts being frozen (riveted and cemented) as native OS processes, while other parts are frozen as ad hoc callback FSMs composed of webs of messages, timers, closures, queues, and object-based state machines. It would be simpler under a uniform abstract process model, so refolding was afforded by uniformity, consistency, and abstraction.

Patterns

One approach would use the equivalence between gather / scatter operations and map+filter / projection, at least in the context of working on arrays in parallel code i.e. GPU style code. The filtering step can be implemented with varying degrees of expressiveness / power. In simple numeric codes it could be an integer expression to compute array indices, in list comprehensions in Python an arbitrary expression is allowed. Where this tangent ties back into the discussion is that picking an expressive form of filter gives the programmer some degree of control over these gather operations on code.

Using the language grammar as the basis for a pattern grammar would allow the programmer to capture fragments (inlined strings) of lexemes (language constructs). This would look very roughly (using psuedo-Python syntax) like:

  decls = [ (X,Y)  for decl var X of type Y  in functionK ]

So basically: if you give the programmer a way to iterate over the program parse-tree and a pattern matching syntax for the host language then reflection becomes very expressive. As the patterns become more complex an efficient method of performing the search is needed. This was discussed a little in the Infer topic. Obviously this is only half the picture, and inserting / modifying views of the code is critical. I think that John has the right idea in his comment above, this also reminded me of aspect weaving. Unlike the gather side, expressing projection of changes back on to the code seems difficult. In aspect weaving the process that modifies the code is external, operating on fragments that do not have the power to rewrite the program.

The issue of controlling permission to alter code seems to be key: without any form of permissions it would be a high-level approach to self-modifying code. Any form of permissions would need to take into account the classical difficulties with self-modifying code - race hazards, atomic sets of alterations to prevent illegal intermediate codes etc.

CSS has the useful property that rules do not change the shape of the tree, only overwriting properties stored within each node. This does suggest some limited scatter operations on sets of nodes (e.g. declarations) within the program that may be similar to what you suggest (again in psuedo code):

  decls = [ decl for decl in someFunc
                 where decl = var X[Y] of Z ]
  for d in decls:  d.access = private

re: patterns

To me a pattern is a rule that applies to more than one place, and need not involve regular expressions, for example, which some people associate with patterns. (I don't think I would ever use regex when injecting macros by rule.) I see you are not referring to regex either. But your patterns still seem a bit text focused.

Using the language grammar as the basis for a pattern grammar would allow the programmer to capture fragments (inlined strings) of lexemes (language constructs).

For injecting macros by pattern rule, I don't think I would use language grammar as part of pattern grammar. I would prefer that patterns in meta-language refer to semantic concepts in a parse tree. I'd also include ideas like "before everything" and "after everything", which don't reference a parse tree node per se.

The more surface-oriented patterns are, the more fragile they ought to be, the same way that a C pre-processor working mainly with source text is awkward, error prone, and (um) intention-erasing.

So basically: if you give the programmer a way to iterate over the program parse-tree and a pattern matching syntax for the host language then reflection becomes very expressive.

Regarding expense, injecting macros need only occur when original source changes, if your intention is to track versions instead of freezing to an earlier specific version. Whatever the cost, it's cheaper than doing it by hand. I wouldn't aim to regenerate everything for every build in any case. [Edit: yes, I read your expressive as expensive, making this and the next paragraph a non sequitur of sorts.]

So far, the idea of what I might write patterns for does not strike me as expensive to execute. It seems less expensive than fully understanding the source, because it involves only recognizing where something happens and/or uses of relative relationships. If you wanted to completely rewrite everything by rules rather than inject macros, that would get more expensive, sure. But that seems more like an interactive developer-driven process anyway, with human UI latencies too.

As the patterns become more complex an efficient method of performing the search is needed.

I agree optimization is usually relevant. :-) I just think of it as interactive and incremental, rather than a roadblock.

This was discussed a little in the Infer topic.

I wasn't that interested in the Facebook project, though I do find their infrastructure issues interesting. (Generally I don't respond to feelers from Facebook recruiters, because I don't use Facebook, and it seems weird to work on anything you don't care about. It's easier to work at places that have no real public application face anyway, so one need not pretend to care about user interfaces.)

I think that John has the right idea in his comment above, this also reminded me of aspect weaving.

Yes, injecting macros by rule amounts to code weaving. But it would only be aspect weaving if you carefully mimicked a specific aspect feature.

Unlike the gather side, expressing projection of changes back on to the code seems difficult.

Write a new copy of the code with macros injected (and optionally expanded).

The issue of controlling permission to alter code seems to be key: without any form of permissions it would be a high-level approach to self-modifying code.

Yes permission matters. I've been treating that as an application detail I expect folks not to find interesting. I would allow meta-language to express constraints, which could be contradictory and result in errors of course. And using a virtual file system, there would also be virtual user level permissions for coarse grained effects.

Implicit configurations

Sounds a bit like some of the things in Oleg Kiselyov's implicit configurations paper. I believe I picked up that reference here on LtU, but I'll recycle it again:

Functional Pearl: Implicit Configurations

Ignore both the noisy "phantom type" mechanic and the brain-exploding reflection stuff, and look at what he does: you can provide typechecked local typeclass instances that propagate along with the unification-style type inference.

That functionality is one of the big things I want for my language project. I'm not sure what the best way to provide it is, but I hope there's some way to shave off all the hairy bits and take home a bald yak...

agree configuration is a use case

Configuration does seem like one of the use case reasons to version code interfaces and implementation.

Beta language fragements

Beta Language's fragment system seems to do what you want.

Thor called and asked for his hammer back.

Thanks, I read that and also searched, but reward was sparse, especially that fragment spec which reads like a reference, providing more examples than background explanation. It needs more walls of text. :-) I like more vertical space used by natural language prose than quoted code examples.

I was unable to see any relation to my post until searching Google for Beta language metaprogramming. And then I could see it was used as an aspect weaving touchstone by some older books. I appreciate the effort.

Should've provided some context....

I should've provided some context and a better reference.... IIRC you can leave "slots" in parts of your program (basically certain kinds of syntax nodes in an AST). A slot has a name and a syntactic category. You can then construct a fully realized program by plugging in code fragments from other files. The slot name is used to match a fragment with a specific slot and the category is probably used for simple checking.