request for namespace binding service terminology

(This is about semantics in a programming language for creation of responders.) I've been thinking about something off and on recently, and I wonder what the name of this idea is within the context of PL (programming language) terminology, if there is a standard name. If there's not a conventional name, I wonder if it resembles anything else. Also, if the idea itself seems incoherent, or inferior to a better way of looking at it, I'd like to hear that too.

When we write code to pragmatically do something, it calls an interface that already exists, that is already established, which is ready to respond to such invocation. At a given moment in time, some number of callable functions exist in a current process, as well as executable programs in an operating system context, and reachable servers over comm systems (of which networking is a subset). Tools at hand willing to respond to new uses thus offer affordances for latent services that can be invoked. System design includes arranging affordances you need for an app will exist at runtime, and bringing up a complex system — when it starts — typically involves carefully ordered initialization so components are ready before actual first use, or can lazily wait until a service is ready.

The idea I'm thinking of is related to boot-strapping, but that idea often refers to traversing a well-known path known statically ahead of time: getting from start at A to everything ready to go at B, where A and B are pretty much the same most of the time. I'm interested in how one plans a programming language or app framework (etc) so the concept is explicitly addressed by the tools, so there's a way to talk about designing points A and B in time, within a programming language for example. It's related to both compile-time build processes as well as runtime data structure creation necessary when a program starts before some operations can be supported. If servers and other actors are launched and wait to handle messages and connections, it's also about language used to describe timing, types, and semantics of such behavior.

If you write an app with a virtual file system, a configuration phase that mounts resources where they can be found by clients later is also under the same semantic umbrella. Managing namespaces in general, and actually creating bindings at runtime are also semantically related. So module systems in a programming language touch part the same topic, if the element of time is explicitly handled. Managing dynamic evolution of binding systems in time is the part I want a sub-lexicon of terminology to address, rather than devs merely assuming an implicit common understanding no one ever talks about. (It's hard to bring language tools to bear on things never encoded.)

I know Erlang has conventions for managing restart of processes that fail, and that sort of thing. Is there a larger practice outside Erlang that gives names to elements of service management? Best would be terminology that is applicable at all times during programming execution, and not just launch, so it's easy to talk about stopping and restarting embedded sub-systems without terminating host operating system processes, with jargon re-usable outside just one framework. Every time you think of another word that gets used to describe part of such processes, I might find it useful, so even short posts mentioning specific words is helpful.

My point in posting here? It seems reasonable for a programming language to be able to write specifications for time dependent behavior of bindings visible when starting and stopping visibility of callable entities with dependencies on one another. I suppose I could re-purpose terminology used for a narrow context for use in a more general way, but that easily confuses people.

A hard problem...

"There are only two hard things in Computer Science: cache invalidation and naming things."

But seriously, the way you're thinking about this is very general, which in some ways makes it less likely that you'll find established terminology. I can't think of anything at the language level that seems quite like what you're talking about, but at the systems/architecture level, it's reminiscent of "service discovery" and "service registry." You could try a search for those terms, but I suspect you'll immediately get mired in a bunch of "enterprise"-smelling stuff that doesn't interest you.

Maybe you could some mileage into PL-land from phrases like "first-class environments," but I really don't think that's actually what you're talking about...

By Matt Hellige at Tue, 2014-07-08 04:24 | login or register to post comments

close enough

I appreciate any "that reminds me of" reactions. I was hoping to avoid naming things any more than necessary. I had some systems/architecture exposure to service registries in mid-90's OpenDoc. You're right that enterprise-smelling solutions aren't that appealing, but even then I'm still interested in problem statements establishing precedent in use cases. When you put it like that, I suspect searching for broker might turn up some perfectly awful enterprise solution in ORB (object request broker) prose. Ah, "broker rendezvous" looks like a hit. I bet trawling through B2B stuff would turn up related semantics. Another good word to examine is discovery, which turns up in network architecture where I work. And pub-sub may be a space that adequately outlines problems of the sort I meant.

Actually, that probably gives me a long enough list of words to introduce a topic in relevant async service publish/subscribe problems, which can be addressed with lightweight process listeners for new offerings to appear, to drive new component connections when the option becomes available. Most likely subscribe covers the concept I was interested in handling: notify me when new bindings appear I might want to leverage. I can probably use a description of the way desktop files worked in ancient MacOS to explain basic ideas in concrete terms, since it acted as a registry of executables understanding document types.

Architecture with partly passive discovery can be really irritating to debug because failure to connect causes nothing to happen, and then you're stuck debugging an absence. But an instrumented notification system can show whether first steps ever happened.

By Rys McCusker at Tue, 2014-07-08 07:57 | login or register to post comments

Tangential

One last tangential comment... A blog post, What's in a name? from Olin Shivers. I ran across the link on Twitter and it reminded me of this thread...

By Matt Hellige at Wed, 2014-07-16 16:38 | login or register to post comments

that's a good and relevant link to Shivers piece

I see how Olin Shivers' post reminds you of this thread. (Quotes below are from What's in a name? you cited.) I would like other folks to read that, too, so these remarks have that objective, by quoting good parts and saying why anyone should care. I see Shivers posted that just after the first run of discussion here: coincidence.

Most of the time, my work involves naming systems and their design, debugging, upgrade, transformation, scaling, refactoring, optimization, and other alterations you might consider at the C language level. To the extent what I do involves math, it's usually about arrows and contexts in which they make sense, including cache invalidation. :-) Names, IDs, refs, pointers, keys, etc are all ways of talking about equivalent things—or parts of them—in different contexts. I distinguish between a lot of subtle differences in states for these things (far more than Eskimos supposedly had for snow), but these states don't have standard names, because most folks don't talk about them. Shades of timing, scope, conflicts, and validity contribute to dimensions. Time is a big part. When does this ID start being meaningful, and when does it stop? How is that affected by ID size and sync/async, local/remote, distributed coining of unique/non-unique names?

I hardly think about it consciously any more, like a fish presumably can't perceive water. I have trouble remembering a time when I didn't have a perspective like one Shivers describes, just like it's hard to remember not being able to program, or read before the age of five. I encourage young engineers to get this viewpoint sooner rather than later, as a prerequisite for hacking complex systems.

Newell and Simon's Turing award cited them for their work on "symbol systems." Newell once told me that this was just names, and then he explained his understanding of names: "They provide distal access." That is, a name is a local piece of data that stands for some other piece of data, which is presumably large and remote.

Folks with a OS perspective also think of pointers in virtual memory as symbols too. Similarly, Intel machine code is just an abstract specification of intentions, built on the instruction architecture, that's interpreted by hardware by dynamic conversion into micro-code at execution time. Storage systems are just different kinds of address spaces and naming systems. While symbols might not go all the way down, they go pretty far down.

So, I've now said the same thing a couple of times. Why not say it again? A name is an arrow: a link from arrow tail (reference) to arrow head (binding). (Or as compiler hackers prefer to say, from "use" to "def")

A lot can be said about the nature of source, arrow, and destination: when endpoints exist, the order in which they appear, as well as timing of name visibility, and when/how the arrow can be traversed, as well as morphisms of the arrow in different contexts that still amount to the same arrow.

I'm currently working on a refactoring of IPsec code where IDs that were static need to become dynamic, when referring to things that only start existing after negotiating narrowed traffic selectors. It's pretty hard to change code that assumes a particular life-cycle for IDs, without a written statement of those assumptions and the actual constraints that existed before code was drafted. The analysis of poor fit and what must change is almost entirely in terms of name mechanics.

By Rys McCusker at Thu, 2014-07-17 22:22 | login or register to post comments

Can we do a separate topic

Can we do a separate topic for that? I have a lot of problems with this, especially where he starts talking about the lambda calculus.

By Sean McDirmid at Thu, 2014-07-17 23:17 | login or register to post comments

fork new discussion

I expect no further response to my OP, so following up Shivers' post in a new topic (perhaps so you can pick it apart) seems a good idea. No reason to associate it with this one.

By Rys McCusker at Thu, 2014-07-17 23:58 | login or register to post comments

Prolog

Prolog can solve these kind of depedancy problems. That's good and bad news. The good news is it's an established technology, the bad news is to solve the general case requires a backtracking search. To guarantee termination (where possible) and find the shortest dependency chain would require a breadth first or iterative deepening search. The former would be faster, the latter more space efficient.

By Keean Schupke at Tue, 2014-07-08 06:10 | login or register to post comments

thanks

Maybe I can work up an analogy between Prolog backtracking and async publish/subscribe.

By Rys McCusker at Tue, 2014-07-08 07:59 | login or register to post comments

Database modifications

Naively, wouldn't publish be equivalent of adding new rules dynamically to the database (most Prologs support this), and subscribe the equivalent of evaluating a query on the database?

By Keean Schupke at Tue, 2014-07-08 09:06 | login or register to post comments

maybe...

I haven't used Prolog, but when I Google for "Prolog query database" I get so many hits on laymen's explanations, I feel I can hazard a guess here. So far a Prolog database sounds neither particularly async nor dynamic. A query against something missing replies "no" and doesn't block until the answer is yes.

But the answer to your question would be yes, it's equivalent, if the following were possible. Suppose you query first and get "no-such-service" in response; then you might want to issue a — thread or fiber style — blocking query-until-yes that expresses your interest in getting a yes answer later, when available. (Alternatively, you could subscribe for async notification.) Then when you publish new database facts later, causing the answer to become yes, you notify listeners and wake up anyone blocked on such notification. For full credit, you must also be able to cancel queries, so you can give up and get out before reply.

By Rys McCusker at Wed, 2014-07-09 02:17 | login or register to post comments

Small API tweek

I see, I had assumed the client would query the database and get a dependency chain. It would then use system services to check each of the services and see if they are running, starting them if they are not through a system API. This separates the dependency resolution from the task of starting and monitoring system services. You could provide a client side library that wraps it all up. This could be more efficient on a multiprocessor system as more of the work is done by the (multiple) clients instead of the (single) system service.

The Prolog-like language would just describe the dependencies, but has the advantage dependencies can be parameterized. So if you have a database:

1. library_b(int).
2. library_b(float).
3. library_a(X) :- library_b(X).

So two versions of library_b are available, and library_a can use either depending on what you ask for, so a query could be ":- library_a(int)." The result would be "yes" and the proof of this is the dependency chain:

3.  library_a(int) :-
3.1     library_b(int)

By Keean Schupke at Wed, 2014-07-09 05:28 | login or register to post comments

"configuration"

I suggest: "Configuration"

Software configuration management, network configuration language, module configuration language (c.f. Scheme 48), ...

I think if you poke around looking for concepts that recur across "configuration" domains you'll find things like interfaces and clients, questions of recursion and dependency, change management, modeling, ...

By Thomas Lord at Tue, 2014-07-08 14:46 | login or register to post comments

all the world's a configuration

Configuration seems a good way to frame part of the discussion. A shorter non-latin synonym might be setup, but I tend to abbreviate it as config anyway. I'm used to configuration being a statement of conditions desired to be true, that require ongoing interpretation to establish, without config info helping accomplish this. I'm slightly more interested in staging that causes a config to apply on demand.

Most devs I know view configuration as static and lacking a time dimension. It can encourage a view of the one-true-way things always are, where nothing is subject to change. This discourages a view of design as partly one of stage setting. Don't have a tool that does xyz? Then add it, and mount it there. Now you do. There's a element of inversion-of-control involved. Before you can knock over dominoes with a single push, they must first be set up just so. The ability to remount different things between runs is a way to get polymorphism out of configuring the environment.

Maybe staging is a good metaphor, where this is defined as configuration, to complement the idea of actors, who need a stage for performances.

By Rys McCusker at Wed, 2014-07-09 02:25 | login or register to post comments

configuration abstractions and metaphors

I would consider at least casually test-mining some existing "configuration" literature and practical work for suitable abstractions and metaphors, particularly SCM and network configuration management.

Other stuff like this:

http://en.wikipedia.org/wiki/Configuration_management

can also sometimes yield productive metaphors (or not).

I don't entirely get what you're looking for but when you talk about instantiating things in certain orders I think you are talking about a very general problem of either expressing non-recursive descriptions of essentially recursive computations or deriving non-recursive descriptions from recursive expressions. (Specific instances of this problem are what is going on with C's need for certain forward declarations (non-recursive expressions) and Scheme's need for a way to compile letrec (deriving a non-recursive description from a recursive expression).

Network and SaaS configuration foo may be sources of concepts for managing configuration changes, both planned and imposed.

By Thomas Lord at Wed, 2014-07-09 03:28 | login or register to post comments

hope this isn't too long

Reading the wikipedia Configuration_management page you cited, so far I don't see much relation to my post beyond vague concern for configuration. (Apologies in advance if lists below occupy excess vertical whitespace in pursuit of clarity. It's an attempt to avoid being vague, rather than be didactic or bombastic, which it might resemble instead.)

I can explain what I'm looking for, but maybe I should first note what I don't care about. I don't do hardware, and don't care about hardware beyond how it presents itself as interface that could just as easily be software. (Distinction between hardware and software doesn't matter to me. Ability to connect to interfaces based on symbols is all that matters, so symbols and their dispositions are the aspect of configuration I care about.) I also don't care about describing systems to other folks when I'm not involved. I develop software, and I'm interested in tools I use, or my peers use when interacting with things I develop, or that we develop jointly. The wikipedia page seems to focus on status of a system over its life cycle as a product, especially with regard to whether it satisfies obligations of parties to one another. That might or might not be related to what I need as an individual.

When I connect to a system to run any sort of experiment, I want to:

See how it's configured now.
Configure it a new way.
See how configuration changed. (Is it what I expected?)
See what happens when I run an experiment.
See evidence of how configuration influenced what happened.

When talking with peers about developing software, and testing what we develop, I want to:

Use plain language when talking about making changes.
Discuss how those changes will work in terms of symbols, bindings, and configuration.
Agree to a scheme to express how config will occur before running experiments.
Have means to compare notes on which config was known in use in each experiment.
Be able to repeat someone else's config when reproducing an experiment.
Be able to diagnose a system under study by looking at it's config.
See an accurate summary of config in bug reports that dumped config to logs.

I'm one of those rare people who likes unadorned s-expression syntax. I have no problem specifying anything and everything as trees and lists of pairs in syntax indistinguishable from Lisp. I detest XML, however, but not to the extent of wanting to wage holy wars over it. More like, detest in the sense of: wrinkling nose in disgust when a date barfs all over you in a movie theatre — a mixture of wine, pasta, garlic and popcorn. Basically: yuck. But good manners requires you to say, disingenuously, "It's okay, I'm sure it will come out."

In my current work environment, one leg of setup and configuration involves XML schemas, which are then used to auto-generate C++, which runs in one of the processes that sends around messages in a mysterious unspecified format. So it's not hard to see a log message that mentions a symbol, where that symbol appears nowhere in the codebase, so you can't find it by searching, because it only exists transiently during the build process before erasure. All you get is: sorry, there was an error in mysterious auto-generated code you can't find, good luck.

I say that mainly as an example of one way to do things that seems non-ideal, and is not in keeping with an idea of seeing a relationship between plans, what actually gets built, and what happens at runtime, logging notes, then comparing notes with anyone who repeats an experiment. It sucks as scientific technique, because evidence and audit trail is abysmal.

The ways in which this relates to posting in a programming languages forum include:

There should be a model people can discuss in normal language about what code will do.
It should be possible to express some of that in writing.
The written version should be amenable to machine processing.
A machine-processed form should support automated config with inspection.
A diff of config now and config desired might generate a human readable script to change configs.
Config ought to apply to every single crazy software level change needed, not just some.
Compiling config specs for execution in different contexts ought to be feasible.
Better PL style would mean the model makes sense, syntax is readable, and writing tools is not gratuitously hard.

Note I hope curt and dry expression doesn't sound like ideological aggression. Rather, I expect you would find it funny, being a pretty smart guy. One can hope anyway.

By Rys McCusker at Wed, 2014-07-09 06:57 | login or register to post comments

rez/derez

I've never seen these used outside of Tron, but there were used to describe the creation of actors, without any messy details. Tron Wikia -- rez/derez. I've been meaning to use these for something, but I've never really had need to describe anything with that sort of generality. And I'm chicken.

By mdouglas at Tue, 2014-07-08 14:52 | login or register to post comments

Can probably fit rez and derez verbs in somewhere

Rez and Derez are also names of 80's MacOS resource compiler and decompiler tools going back to (at least) System 6 and System 7, and apparently still used in Carbon for OSX. If those were ever intended to reference Tron, I never heard anyone at Apple say so, but it's possible. Amusingly, the urban dictionary cites both Tron and Apple resource compiler meanings next to each other, without assuming any relation.

These days folks usually mean something abstract by resource (space, time, or content available at specific compute nodes), and not the data element sense present in Inside Macintosh style Mac applications, where parts of an executable were stored in resource forks where you could fiddle with them — as part of configuration for internationalization or UI preference design changes. MacOS resources were carefully segregated data fragments you could inspect and/or edit to update a program's runtime behavior.

Some folks also used MacOS resource forks as a poor man's database, despite do-not-do-this warnings from Apple, and it was perfectly awful because of scaling limitations like a small max entity limit. The first btree implementation I wrote in school was actually a replacement for a resource-db prototype in a Mac app. That was indirectly how I ended up writing btrees for OpenDoc storage, because one of those guys (Hi Nick) worked on the OpenDoc team later, so I was pre-vetted for "understands data multiplexing" in oo apps written in C++.

Interestingly, when they cancelled OpenDoc, the NeXT folks explained it was redundant with existing NeXT features which "had services already"; so apparently the only OpenDoc feature considered noticeable was editor binding, as a step in transforming data from one thing to another. Maybe it's another no-accounting-for-taste thing, but they were adamant about the "have services already" position. It seemed like a weird perspective.

By Rys McCusker at Wed, 2014-07-09 02:40 | login or register to post comments

Service Life Cycle, Typestate, AmOP, etc.

Keeping track of service states and their dependencies - especially during initialization and finalization - has been described as a life cycle in several documents I've read, albeit those documents are aligned more with enterprise frameworks than with PL.

There is perhaps some relationship to PL features:

typestate focuses on how access to object methods can change across invocations; cf. languages Plaid and Rust
ambient oriented programming focuses on how (mobile) agents can discover and hook into available resources
kell calculus focuses on spatial distribution of resources
join calculus focuses on temporal distribution of resources

In my opinion, we should be focusing on resilient designs that are insensitive to issues like startup order and service disruptions. In context of resilience, the whole idea of modeling service life cycles is quite flawed... i.e. because initialization time bindings inherently mean we cannot easily rebind without re-initialization. That makes it difficult to switch from primary to fallback services or vice versa, without a big restart button.

So rather than a stateful life cycle, I'd rather have service state be a simple function of the states of other available services... almost as though every service is continuously restarting.

By dmbarbour at Tue, 2014-07-08 15:28 | login or register to post comments

ambient is a cool term

Your focus on resilience is completely agreeable: yes, more of that please. The last time I designed a dedup protocol, I had a goal of replacing an earlier version that was so complex it was like balancing shelled eggs on a plate, where handling numerous error cases was hard. I started by saying failure was expected with this low frequency, and when it happens, we NAK and go on; then all the error paths led to NAK, and it became simple. (Except for initialization, of course, and the part where the cache in another process falls over and can't get up. But that's life.)

I pretty much hate calculi as dry and uninspiring. (Maybe they inspire someone, but it isn't me.) Wikipedia's ambient calculus entry is pretty interesting, though, because it defines a general idea that captures the useful notion of scope with boundaries.

I'm primarily interested in message-oriented programming. I view everything as bit-strings moving around, arranged in high level organization as messages. I can even look at function-calling as message-passing, if you push args on a stack and treat jumping to code as a message meaning "take these args, do it, then reply by returning." Got bit-string transactions? I see messages. (I'm probably biased toward this view by 30 years of backend data-plane focus.)

Some of the calculi cited presume mobile code is a good idea, which I don't, outside installation and configuration. I view data as mobile, while code is under draconian lockdown. Code as data only comes from within the highest trust boundary, never from what a stranger hands you, unless run in a throttled sandbox. Kell calculus is a variant of ambient calculus; ambient refers to an excution context that moves, and Kell is a spelling variant of cell, meaning locality according to Stefani, 2003. So those seem to be about moving code, which doesn't apply to any development context I see waxing now.

The join calculus seems be about structured rendezvous and synchronization barriers, and I don't think of any of that as hard. I just take it for granted suitable synchronization primitives will be spun up at need, and I'll provide several up front. The timing part of async code doesn't bother me. I can probably make basics of reasonable code usage clear to others without resorting to theory-oriented authority.

I really like the typestate link, though. Thanks! It expresses exactly something I was talking to a coworker about, around a year or so ago. Basically, I pointed out that mutable types effectively change sub-types when mutated, if this alters what methods are legal to call in the new state. The discussion came up in the context of our chagrin at debugging code that effectively called methods in an API at random, instead of obeying any of the injunctions of the sort "only call B after calling A". All the rules had been ignored. We realized we should have run a state machine inside tracking which methods could be callled, so we could assert on violations. This would have been dynamic typestate tagging.

By Rys McCusker at Wed, 2014-07-09 03:08 | login or register to post comments

I view data as mobile, while

I view data as mobile, while code is under draconian lockdown.

So you never use SQL?

By naasking at Wed, 2014-07-09 12:29 | login or register to post comments

little bobby tables

Can you answer the question for yourself and explain why you ask? Then I'll answer for myself, and say what my company does too, provided you first say whether you ship any product yourself with SQL used, in fair trade.

When I design an interface, I take care to avoid limiting how it's done, so I only require what I need in contract semantics. A pluggable store can use SQL, or not, under the interface visible to a store's client. Swapping things for comparison is a good idea. And sometimes a dumb but very simple version helps diagnose interactions. For example, an in-memory database for small data-set unit tests is easy to step through in gdb.

You can infer correctly from this that SQL falls under a "don't care" operational heading, for me. But I still haven't answered your question yet, if you want to trade.

By Rys McCusker at Wed, 2014-07-09 21:04 | login or register to post comments

Sure, I use SQL all the time

Sure, I use SQL all the time (or query DSLs that compile to SQL). My point was only that mobile code is more prevalent than your first post seemed to imply (Javascript in browsers is mobile code too, so you're using it right now without sweating over the security implications).

It's use is only going to grow from here because in terms of efficient use of resources, it simply can't be beat. Limiting its use only to the "highest trust levels" isn't the way forward though. Proper isolation is the way to go, and dmbarbour below cited some good links in that vein.

By naasking at Thu, 2014-07-10 03:47 | login or register to post comments

not at the moment

I have never used SQL, but expect to use it as an option in something, for form's sake. I'm going to do something pretty strange instead, in a throw-away environment using a virtual file system, which will at first persist to tar file format, after honoring path maps to include, exclude, or reroute trees noted in config. At first I thought the tar file idea was more on the stupid side than usual, for a testing stub option; but then I noticed a lot of entertaining possibilities, which I won't enumerate since I don't see a PL angle. But it's my target for source code rewriting source and destination at the moment for global code analysis and transformation. Being able to unpack it and use normal tools with regular files is a nice fringe benefit. If my employer uses SQL anywhere, I haven't noticed any sign.

(Javascript in browsers is mobile code too, so you're using it right now without sweating over the security implications).

Actually I'm not using Javascript right now, and I do sweat the security implications. I always have Javascript disabled, except for small windows in time when I need to get something done. And then I wipe my environment after turning it off again. What I really want is a button to toggle Javascript on a per window basis.

By Rys McCusker at Thu, 2014-07-10 05:03 | login or register to post comments

tar file features

Looks like this topic is done, so here's a couple more bits before closing up shop. In a PL level model, I want to reveal some structure within an everything-is-a-file subsystem. Limitations and quality of service would be exposed through some reflective api, so it could be quite limited, or quite involved depending on the host env.

In a host env subclass persisting to a tar file, append-only semantics can be used for a log-structured file system, making it easy to avoid losing things, with smaller hot-spots in i/o access patterns. In addition to unused parts of tar file format, meta info can go in files persisted along-side others, to track extra attributes and relationships (probably in text format if one plans to unpack into a file system and examine via text editors). I'd include VMS style version numbers, so old versions of files stay around until explicitly abandoned (or collected due to explicit config rules). And checksums add a bit of integrity checking, to help verify relationships are likely still true: file B with checksum and size (bc, bs) is the output of file A with with checksum and size (ac, as) run through transform T. Stuff like that, oriented toward app check-pointing, as opposed to hardcore database features. When everything lands as a separate file in an archive, you can browse all inputs and outputs, and support obvious kinds of debugging.

Append-only stores support naive global atomic transactions by marking start and end of all changes grouped inside one transaction. That lets you make consistent sets of file updates in something like a tar file as primitive database. It would be good enough for multi-stage source code compilation that leaves behind all intermediary steps for debugging.

By Rys McCusker at Wed, 2014-07-16 03:48 | login or register to post comments

Simple read/write

Simple read/write interfaces, like your tar file, suffice for probably most purposes, but they are not performance-optimal. Solutions utilizing mobile code will always be asymptotically more efficient, if only due to reduced round trips. This still applies no matter how much promise pipelining you do, which was covered in an interesting thread on the E-lang mailing list a few years back.

By naasking at Fri, 2014-07-18 00:44 | login or register to post comments

round trips

... but they are not performance-optimal.

Right, simple things are usually not optimal in performance, especially when you keep scaling. Getting the best result tends to require complex optimization. (And then you make a team bigger and get Melvin Conway's Law effects too, with leaky abstractions at interface boundaries, maybe raking back some of the performance).

The main advantage of something really simple, if you can make interface match that of something that can be fancy but isn't required to be, is that one person can become thoroughly familiar with code very quickly, for more direct debugging with fewer edge cases and less obfuscation from optimization. Unit tests based on simple components can have fewer dependencies when complex things may depend on a lot of libraries. Something like the tar file solution is suited for one-person prototypes, or small teams with undemanding loads. (Answering "what happens when you run out of resources?" effectively requires some clever optimization.) I wouldn't want to support a build system for a lot of devs on a huge code base, based on a preliminary simple version.

I'm not at liberty to describe a high performance non-SQL solution where I work, though it would illustrate my original point better than my tar file example, which wasn't really about mobile code. I should get back to your mobile code point.

Solutions utilizing mobile code will always be asymptotically more efficient, if only due to reduced round trips.

In a closed system where you control all code, you can arrange it's already present at every node statically, so it need not be mobile. So you're describing an open system supporting extension by third party devs, who want code running close to data. But when a db is owned and operated on private hardware by those same third party devs, it's mostly a matter of interface. Conway's law gives you division between db and app, but the devs still own both, and mobile code is still all within a highest trust boundary run by those devs, who aren't really operational third parties then.

Are you characterizing a high end enterprise compute server solution that obligates you to run mobile code from third parties in order to have viable performance? That sounds like defining a problem to require mobile code as a good solution. Or do you see that as a natural division of labor that should happen in a lot of the solution space at all scales?

By Rys McCusker at Fri, 2014-07-18 02:59 | login or register to post comments

In a closed system where you

In a closed system where you control all code, you can arrange it's already present at every node statically, so it need not be mobile.

Even those scenarios requires a degenerate form of code mobility for system upgrade purposes.

Conway's law gives you division between db and app, but the devs still own both, and mobile code is still all within a highest trust boundary run by those devs, who aren't really operational third parties then.

I don't think this was ever true. Firstly, code ought to be more mutually suspicious than it is, but we never had the ability to express such constraints (capabilities and effect typing now can).

Secondly, end users have always wanted to run queries against data, and your choice has always seemingly been to provide degenerate access interfaces (CRUD or limited range queries) for safety, or to increase the attack surface. This is a false choice.

Or do you see that as a natural division of labor that should happen in a lot of the solution space at all scales?

Mobile code is everywhere. Every server/client communication is the exchange of degenerate programs, ie. HTTP, SMTP, SCSI, etc. Nearly every such command language could benefit from expanding the DSL's expressiveness.

For instance, user/supervisor transitions in kernels are on the order of thousands of cycles, and it's largely wasted time that could be avoided if the expressiveness of the system call DSL were increased.

Even DSLs within a single program is better than a simple call-return API. Consider arbitrary precision arithmetic, which often has a low-level API that tries to reuse as storage as possible, vs. the high-level API which allocates profusely but is simple to express. The necessity of choosing what interface to use comes from the fact that we can't reify the computation as data for the arithmetic library to consume. If we could, then the library itself could make near-optimal allocation decisions, AND the interface remains high-level.

Mobile code is the future, so instead of fighting it, we ought to find safe ways to express and evaluate it.

By naasking at Fri, 2014-07-18 13:09 | login or register to post comments

gentlemen of the rails

Mostly I follow up to be polite. Since folks have been talking about mobile code for twenty years, and it's stagnant beyond sending javascript in web pages (in which I also see little value), I don't find it useful to talk about. Mobility often seems like a solution in search of a problem. I don't consider declarative requests that get completely compiled by a receiver to be mobile code; that's more spontaneous local generation of code to handle a message. Blind execution of binary not understood locally is a 100% match for mobile code, in my book, and distance from that weakens the extreme on a spectrum.

When mobility seems unrelated to my original post, pursuing angles feels pointless in the context of my problem; in particular, there's no way to judge forward progress in discussion without any goal or focus.

Even those scenarios requires a degenerate form of code mobility for system upgrade purposes.

That's why I said "outside installation and configuration" when I first said I don't presume mobile code is a good idea. Conversations are less useful when counterpoints in the middle match original explicit points.

Mobile code is the future, so instead of fighting it, we ought to find safe ways to express and evaluate it.

I generally ignore it, not fight it, but I would fight designs fostering dependence on centralized authority in complex rent-seeking "cloud" solutions. Less slogan in the mix would suit me. I'm fine with remote clients saying "here's the sort of thing I want in reply", but not happy about clients saying "do exactly what I say without fully understanding it."

(If I wanted to disparage mobile code, I'd call it hobo code and compare it to depression-era train-hopping by vagrants to emphasize parasitic quality. But there's not much to fight about until it's a problem.)

By Rys McCusker at Sun, 2014-07-20 19:06 | login or register to post comments

Since folks have been

Since folks have been talking about mobile code for twenty years, and it's stagnant beyond sending javascript in web pages (in which I also see little value), I don't find it useful to talk about.

Code that crosses an isolation boundary is mobile code, which also happens on the same machine. There is a great deal of mobile code in operating systems. Flash, Javascript, and Java are all in-browser examples. SQL is everywhere. So I disagree that mobile code is "stagnant". It's more prevalent than it's ever been, and not going away anytime soon.

I'm fine with remote clients saying "here's the sort of thing I want in reply", but not happy about clients saying "do exactly what I say without fully understanding it."

The server always knows what a client is requesting. If you expose a simple CRUD API, the client does not gain any authority if you then provide a strongly normalizing "eval" command that can execute a series of those CRUD operations in a single reply-response cycle. It's merely an optimization.

Paranoia over mobile code isn't warranted, paranoia over poorly designed mobile code might be, but not much more than a poorly designed service. The latter is where any real security vulnerabilities will live. At worst, mobile code simply adds a DoS opportunity if it's not strongly normalizing.

By naasking at Mon, 2014-07-21 17:36 | login or register to post comments

Nitpick

At worst, mobile code simply adds a DoS opportunity if it's not strongly normalizing.

Strong normalization is mostly worthless as an assurance of termination in practice.

By Matt M at Mon, 2014-07-21 17:41 | login or register to post comments

Sure, a client can still

Sure, a client can still provide a very large term to DoS, but this is trivial to limit by size quota (all web servers already have upload size limits). The only remaining problems are small terms that don't terminate, hence strongly normalizing.

By naasking at Mon, 2014-07-21 17:48 | login or register to post comments

I must be lost

What language are we talking about that doesn't allow small terminating-in-theory-only programs?

By Matt M at Mon, 2014-07-21 18:06 | login or register to post comments

Are you sure you meant,

Are you sure you meant, "that doesn't allow small terminating-in-theory-only programs"? Because I said a trivially safe mobile code is one that allows small, terminating-in-theory-only programs, ie. that extending a service with such a language does not expand the attack surface, but does provide considerable optimization opportunities.

By naasking at Mon, 2014-07-21 18:13 | login or register to post comments

total functional denial of service

I think you are mainly pointing out that, modulo denial of service risks, mobile code should be a performance-only feature with no impact on security.

But almost as an aside, you wrote: "mobile code simply adds a DoS opportunity if it's not strongly normalizing". Matt notes that the DoS opportunity exists even for strongly normalizing language. If you can express Ackermann function, or N-queens, or the sum of integers from zero to a googol... your language is more than expressive enough for denial of service attacks. Assuming, of course, that we naively compute a program to termination.

To me, it seems we need to approach denial of service threats and vulnerabilities in a manner orthogonal to the language's expressiveness and termination properties. Strong normalization or termination has its uses, but not for avoiding DoS risks.

By dmbarbour at Mon, 2014-07-21 18:48 | login or register to post comments

I'm sure I'm still lost and

I'm sure I'm still lost and that nothing interesting is going to come from this conversation except possibly my being less lost in the conversation. By terminating-in-theory-only I mean Ackermann or something for which you can prove termination but wouldn't want to wait for it. Permitting such a function could be a DoS attack, even in a language with strong normalization and its source code isn't very long so disallowing small programs wouldn't seem to block it. That's pretty much my whole point. I'm sure you understand this, though, so again I assume I'm lost. Apologies in advance.

By Matt M at Mon, 2014-07-21 18:51 | login or register to post comments

termination for mobile code

I'm sure you understand this, though

I've had this conversation with naasking a few times. He's been very insistent that termination is essential and useful for mobile code. I think we'd need a language with space-time bounds in the type system to really achieve what he wishes... and where the typechecking itself is space-time bounded.

By dmbarbour at Mon, 2014-07-21 19:12 | login or register to post comments

SQL vs alternative key/val stores

Bulk caches need nothing fancier than a key/val store with read, write, and hold operations. So an interface need define little more than the exact nature of keys, values, and semantics of each operation. Decode only reads. Done efficiently, encode has a better high level operation that means something like, "See this key/val pair? Can you give me a token holding this until I release it later, in case the receiver naks? Feel free to say no, write a new copy, or use an old one, I don't care. But you better not reneg if you say yes and give me a token to redeem later."

In other words, the store interface is incredibly simple. The really complex part is reasoning behind why a receiver will recognize any given key with a given expected probability. It's the relationship of distributed caches to one another that involves hairy engineering, with protocols to maintain it, including resolution of errors, disconnects, component restarts, and resource management flow control. Sometimes the naive answer to a sub-problem has terrible consequences you won't discover without a simulation that runs long enough on a suitable population.

Using SQL for that sort of key/val store might be dumb, as unnecessary overkill, when a higher transaction rate might be achieved with something simpler. But, you know, it would work, as long as all interfaces were non-blocking and async with support for very large numbers of outstanding concurrent transactions. When you divvy up tasks among a team of engineers, it would be hard to stop the storage dev from using SQL if a prototype succeeds. Also, nothing stops them from doing something much more complex either, which sadly seems more the norm.

By Rys McCusker at Thu, 2014-07-10 23:35 | login or register to post comments

mobile code

Omnidirectional mobile code - i.e. such that clients can extend and interact with remote services via semi-autonomous agents, and services can casually install aspects of themselves near their clients - is among the most promising areas for advancing state of the art in systems programming. Even shallowly, there are many advantages for pursuing this feature. Abandoning this whole fruitful area out of fear is not a position I consider respectable. And draconian sandboxes will never be able to support the degree of composition needed to fully take advantage of mobile and distributed code.

We can develop languages and idioms that, together, make mobile code safe, secure, easy to reason about, easy to maintain. Relevant features include object capability model, algebraic models of effects (or at least monadic effects), value sealing and parametric polymorphism, etc.. E language and Microsoft's F* language make interesting studies for secure computing. As might my own Awelon bytecode.

Besides, the modern approach to software installation, configuration, and maintenance is terribly flawed from a security perspective. A typical installer has global access to the filesystem, and applications are granted very coarse grained authorities. We could do a LOT better with fine-grained distribution of mobile code, i.e. such that services and applications extend and integrate other services and applications in a precise but relatively ad-hoc manner. Ka-Ping Yee's principles for Secure Interaction Design would make applications MORE usable, extensible, mashable, and maintainable, not less so.

By dmbarbour at Wed, 2014-07-09 16:49 | login or register to post comments

I described, not prescribed (didn't mean to FUD)

I have no objection to your selling folks on mobile code, and didn't mean to rock your boat. Generally I don't take political interest in advocating a view I want sold. I only meant to characterize current state of pragmatic style in products I work on now. However, I see a lot of problematic issues in getting where you want to go, though some version seems doable when safety is a primary feature.

I'd run untrusted code if it uses white-listed symbols only, and operates under a resource budget that can be exhausted, killing any fibers going past hard boundaries, with feedback on future scheduling to further throttle or deny persistent violators. I'd hate to be the admin, though, who has to listen to whiny excuses about why a few more cycles should be given to get the next batch of promised results.

By Rys McCusker at Wed, 2014-07-09 21:29 | login or register to post comments

resource budget that can be

resource budget that can be exhausted

Including an 'economy' in code is of interest to me for this reason, e.g. bind execution to a virtual wallet. Unfortunately, I haven't found a good way to integrate this concept pervasively at the language level. Linear logic seems related.

By dmbarbour at Thu, 2014-07-10 03:44 | login or register to post comments

pervasive wallet

A virtual wallet sounds good, so insufficient funds can cause negative feedback. Machinery to say "no" seems an important part, and metering resources ought to be manageable, given care to think of each one that can be converted to another resource.

In code compiled to run in a green (lightweight) process, I plan to meter each green process, so max cycles and space (etc) have hard limits from spawn time. Untrusted code can only spawn other processes sharing the same budget, so caps cannot be bypassed by spawning entities. In contrast, trusted process spawners can set budgets any way that seems reasonable, including adaptive hard limits for untrusted code.

I don't think I'd try to run a distributed economy. I guess distributed nodes could share reputation info. I would not want to show untrusted code anything about a distribution graph, so there would be no distributed model untrusted code could see.

By Rys McCusker at Wed, 2014-07-16 03:48 | login or register to post comments

If you enjoyed the ideas of

If you enjoyed the ideas of the ambient calulus, you may be interested in an evolved view of it from one of the creators of RabbitMQ: http://www.ccs.neu.edu/home/tonyg/esop2014/
A non-calculus form is the Racket package called Marketplace.

By Ian Johnson at Wed, 2014-07-09 17:17 | login or register to post comments

existence of calculus form undermines value of non-calculus form

Thanks for the reference. If I cared for math, I'd like The Network as a Language Construct (pdf), because it has parts directly relevant to some discussion here. But I have no use for documents primarily about math instead of prose descriptions. I get no insight at all myself, and it cannot be used as contributory docs I expect average devs to read as explanation, when modest reading comprehension and limited time is all I want to require. (Yes, I have to enable Javascript briefly to read a pdf.)

By Rys McCusker at Thu, 2014-07-10 08:20 | login or register to post comments

Instead you might find the

Instead you might find the docs to Marketplace concepts itself a more useful resource.

By Ian Johnson at Thu, 2014-07-10 16:58 | login or register to post comments

Browse archives

Active forum topics

New forum topics

Long Time No See
17 weeks 11 hours ago
Long Time No See
17 weeks 11 hours ago
Long Time No See
17 weeks 11 hours ago
Prefix languages without
23 weeks 21 hours ago
Ah, cancel that request.
1 year 11 weeks ago
.
1 year 11 weeks ago
First-class link?
1 year 11 weeks ago
Video Presentation
1 year 33 weeks ago
Also published in ICFP
1 year 37 weeks ago
About identifiers...
1 year 39 weeks ago

User login

Navigation