Objects as Modules in Newspeak

In Objects as Modules in Newspeak, Gilad Bracha et al. describe a way to avoid the coupling in inherent constructs found in many OO languages such as a global namespace, "static" stateful variables, globally accessible object constructors, etc.

We describe support for modularity in Newspeak, a new programming language descended from Smalltalk and Self. Like Self, all computation — even an object’s own access to its internal structure — is performed by invoking methods on objects. However, like Smalltalk, Newspeak is class-based. Classes can be nested arbitrarily, as in Beta. Since all names denote method invocations, all classes are virtual; in particular, superclasses are virtual, so all classes act as mixins. Unlike its predecessors, there is no static state in Newspeak, nor is there a global namespace. Top level classes act as module definitions, which are independent, immutable, self-contained parametric namespaces. They can be instantiated into modules which may be stateful and mutually recursive. Naturally, like its predecessors, Newspeak is reflective: a mirror library allows structured access to the program meta-level.

There's a lot in here that should be of interest to LtUers interested in object capability based security.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Time saver

Thank you, now I will not have to reinvent the wheel for my language.
:)

Interesting, but how does it work?

This sounds really interesting, but I can't quite figure it out how it maps to languages that I know better (e.g. Java/C++/C#/Python/etc.). I have skimmed this paper, and looked at several related posts and articles, I am still left with some questions.

Basically if a programmer wants to conveniently access application level data (e.g. the command-line arguments, the executable file name, application configuration data, standard input stream, standard output stream, thread pool, memory manager, log file, just to name a few) how do they do this, if a Newspeak module isn't global, static, and stateful? How would this look if we applied the idea to Java?

Other related links:

Basically if a programmer

Basically if a programmer wants to conveniently access application level data (e.g. the command-line arguments, the executable file name, application configuration data, standard input stream, standard output stream, thread pool, memory manager, log file, just to name a few) how do they do this, if a Newspeak module isn't global, static, and stateful?

Everything would be passed as a parameter instead of available ambiently. Consider the "main" entry point of a program as being passed an OS or Shell object which contains all of these properties, and this object or some subset of it is passed to other modules as needed. This allows you to mock or override it. This is the approach required by capability security.

This helps a lot

Okay, so we need to encapsulate the global state in a parameter. How can we conveniently make it visible to the modules which need it I wonder? I see in Newspeak, they seem to rely on the nested class mechanism. I would be tempted in a new language to do this via some kind of module mechanism. Where a module is a new kind of entity that acts like a class, but has its names available to all classes it contains. I suspect that this would violate capability security though. Do Ruby modules/mixins make it possible to implement capability secure systems?

Okay, so we need to

Okay, so we need to encapsulate the global state in a parameter. How can we conveniently make it visible to the modules which need it I wonder?

Capability languages usually use something like the factory pattern, where the factory is initialized with the appropriate data by parameter passing.

Do Ruby modules/mixins make it possible to implement capability secure systems?

Doubtful due to pervasive reflection and mutation.

Still confused.

Capability languages usually use something like the factory pattern, where the factory is initialized with the appropriate data by parameter passing.

So all object construction is done via factories? Does each class then have to be a factory for its own objects? How does the initial data get to each factory?

Based on what you are saying I'm currently imagining that the application has a single application object, which creates a set of factories on start-up.

Sorry, to ask so many questions, but I just have a hard time imagining how to apply the ideas to a Java-like language.

So all object construction

So all object construction is done via factories?

All object construction is via the usual constructors, it's just that some of those objects implement the factory pattern; usually this is needed only if you want to share a set of objects amongst all instances of a class.

Think of it this way: "main" is the constructor for your program. Upon entry, it initializes all objects it needs to run, as usual. Main accepts a coarse-grained "World" object via which it references everything external to the program, ie. filesystems, network sockets, etc. So File.Open() is not available ambiently as a static method, it's instead an instance method of World.Filesystem, ie. World.FileSystem.RootDirectory.OpenFile(string).

In reality, most capability systems do not provide an app the World object, they would pass in a small set of default capabilities (like an app-specific "home/scratch" directory), and a reference to a trusted "Powerbox". The Powerbox is an object that provides a trusted path between the user and the underlying trusted system. When the app needs a file that is not in its own private scratch directory, it does Powerbox.OpenFile(string userMessage), and that opens a file dialog with the given prompt for the user.

I know you're familiar with C#, so perhaps this set of slides I used a few years ago might be helpful. It discusses the Powerbox pattern and the various restrictions C# would need to become capability-secure. It might be helpful in explaining the reasons behind the capability security restrictions in familiar C# terms and using .NET abstractions for isolation, like AppDomain.

Very interesting, and thanks

Very interesting, and thanks for the slides.

Funneled through the entry point of the program

It is stateful if I understand well. To access that you need the entry point of the program to pass you the module instances it possesses.

If that was applied to Java the class containing main (entry point module) would be instantiated with a big list of external modules it needs to run (should be configurable). If a class needs to use say, System.out then the main class would pass System.out to it when it is instantiated (or when calling its methods).

For dodo I did not go quite that far, I made all top-level classes potential entry points so a module does not need the main class to inform it. In fact the imperative side of dodo is pretty much like Java, and can access external modules as freely as it wants. But the functional side is more like Newspeak and uses capability-based access to system resources.

What I am missing is the means to inherit, instantiate a module -- I planned for different implementations of a module interface and programmatic loading of a module though, so it may be ok. I need to think on how to let a module being loaded know about the current context, so that it can be configured at load time.

If that was applied to Java

If that was applied to Java the class containing main (entry point module) would be instantiated with a big list of external modules it needs to run (should be configurable). If a class needs to use say, System.out then the main class would pass System.out to it when it is instantiated (or when calling its methods).

This is helpful. So the next question is how does each subsystem in my program (and imagine that it is a million lines of code with several dozen sub-systems) share access to the list of modules they need? It just looks like there would be an explosion of parameters all over the place, if every class has to pass modules, to every other class who may potentially need them.

If it may "potentially" need

If it may "potentially" need them then you should not pass it. The whole point of capabilities is to reduce the boundaries of security vulnerabilities. Only give a capability to some other code if you know it is needed.

Services vs. Modules

An object in a capability language is confined except for any input/output services provided to it from the outside. This is different from Java, where any arbitrary new object can get full access to 'the' console, 'the' filesystem, joysticks, webcam, databases, TCP/IP, video output, sensors, actuators, etc. This confinement has many advantages for testing (simulating external services), distribution (working with multiple instances of external services), security (monitoring/controlling/revoking access to an external service), and performance (partial evaluation, GPU and FPGA programming).

Because all new objects are confined, sharing source-level modules without introducing a new parameter is never a problem. Modules can introduce new functions, procedures, classes, etc. but will not provide you the runtime data and resources above which to operate them.

In practice, most new objects and services in a capability system are parameterized by just a few external services, which allows them to be precisely parameterized. For example, you can create quite a few interesting and useful programs parameterized only by a 'console' service with two interfaces: 'prompt ' (returning a reply), and 'print ' (no return). Web services similarly work with a limited vocabulary - GET/POST (and more rarely, HEAD/PUT/DELETE). The blackboard metaphor works with the only shared service being a database. Pipes and filters designs get by often with reading from one stream and pushing to another, sometimes plus a service to interact with a file system.

When creating new services in a capability language, the vocabulary is often somewhat richer to support type-safety, to avoid serialization and parsing overheads... but the above practice still holds true: most new services are instantiated from just a few other services.

In a full capability system, one would not have some fixed-form "main" as an entry point. A new 'application'-service would be given just the services it needs by an IDE that is aware of the language. That is, one would compile the source into a confined abstract procedure then parameterize and execute it to produce and activate the new service.

But, we need to interface with Operating Systems that have been designed around ambient-authority languages. For these cases, the idea is to create a small transition layer, such as 'main', that will quickly wrap and secure external services prior to instantiating and combining new sub-services.

One can do this by parameterizing 'main' with a set of common services. A surprising number of esoteric languages only support simple character IO, so one could certainly do the same. In Java, one might pass a 'System' object into 'main' - except with NO static methods. That would allow 'main' to wrap and control console and command-line services and distribute them. Of course, one would also need to wrap all the other IO services, including access to the GUI shell (i.e. Java Swing) and access to the TCP subsystem that are not provided via 'System' at the moment.

Some goals to consider, other than mere modularity and security, are support for automated code distribution, recognition of common vs. local resources (e.g. timer events vs. specific printer), disruption tolerance and persistence, dependency management, system upgrade, extensible runtimes, support for system administration and policy injection.

In pursuit of these goals, I favor a plugin-extensible abstract service factory... i.e. 'main' could ask for "mouse" or "console" and get a couple generic services, or even ask for one-off proposals that may later be executed. Each factory has a few simple security properties to represent classes of ambient authority and control distribution. For example, an object originally situated on a remote host might still have access to the remote printer or be subscribed to the remote keyboard, but switch to using the more local timer-event service. In this design, 'main' acts primarily by firing up a few services from the runtime-provided factory and hooking them together (often doing so based on input or an input-file) and the bulk of the work is provided by plugin-provided factories. The capability-secure language serves as important 'glue' between these external services - supporting distribution, persistence, transform, organization (registry, etc.)

Re: Services vs. Modules

Agreed on all this, just wanted to comment on the heading of your post. A large enough module is akin to a service and can be treated the same way for capabilities, management and versioning.

Dependency inversion principle

Thanks for taking the time to write such a detailed response. These are some great points, and are really giving me some insight into what a well-designed module system needs to do.

I can't help but wonder if maybe a language can satisfy the core principle, but ease the syntactic weight by passing a "program" object pointer implicitly between modules during initialization.

The program object could then behave as the abstract service factory described, providing each module with the different modules it needs.

Am I also mistaken in understanding what is being described as being related to the dependency inversion principle?

Wish for the new year

I wish people would stop talking about "dependency inversion" as if it was anything interesting. It is just a bizarre term for "abstraction", primarily wrt something module-like.

Seconded.

Seconded.

And abstraction means something here?

Can either you or Andreas provide a unambiguous definition, with references, of what "abstraction" in the context of modules even means?

Naive approximation

I learned that the abstraction of a module is class. I.e. objects are reinstantiable modules.

Buzzword involution interjection

I thought the correct buzzwords were "dependency injection" and "inversion of control".

Its because I wasn't just

Its because I wasn't just using a buzzword. I was referring to a well defined software-design principle. See the link posted in this comment.

Right, but I didn't reply to

Right, but I didn't reply to you :)

Both forms are used

Both forms are used. It's just:

"dependency inversion" ~ module abstraction
"dependency injection" ~ module instantiation

About relationships between abstractions

Here is the definition of dependency inversion from Wikipedia:

In object-oriented programming, the dependency inversion principle refers to a specific form of decoupling where conventional dependency relationships established from high-level, policy-setting modules to low-level, dependency modules are inverted (e.g. reversed) for the purpose of rendering high-level modules independent of the low-level module implementation details. The principle states:

  1. High-level modules should not depend on low-level modules. Both should depend on abstractions.
  2. Abstractions should not depend upon details. Details should depend upon abstractions.

The term was coined in this paper by Robert C Martin.

This is a bit more specific and useful than "abstraction", which is a severely overloaded and abused term.

Modularity Basics

1. High-level modules should not depend on low-level modules. Both should depend on abstractions.
2. Abstractions should not depend upon details. Details should depend upon abstractions.

That's simply the basics of modular programming: always program against abstract signatures. Or more specifically, (1) parameterize relative to signatures, (2) implement relative to signatures. (1) is abstracting clients from providers, (2) abstracts providers from clients. FWIW, this is what ML has functors and sealing for -- and lambda calculus universal and existential type abstraction, for that matter.

Wow.

That makes perfect sense. Thank you for clarifying. I thought you were just being snarky at first, but now I see what you are getting at.

Is it just me, or is there a ridiculously large gap in communication between software engineers and computer language theorists?

Perhaps

Perhaps I was being snarky. I'm sorry for that. Sometimes I can't help getting annoyed by this frequent phenomena -- esp in the OO world -- of rediscovering the blatantly obvious, giving it a ridiculous name, and then selling it as an important new achievement...

See also this old thread.

Sadly

This reflects my experience as well. Ttrying to talk about the algebraic properties of software with people who have limited functional programming experience, let alone PLT experience, can be quite difficult. It seems to me that any competent programmer should know what's meant by "these two functions should compose to the identity function," e.g. in reference to a parsing/generating pair of functions. A competent programmer should know what "idempotent" means. A competent programmer should know what a "partial function" is. I do feel fairly strongly that programmers should have had more exposure to functional languages in their programming education (not necessarily computer science education!) than just a survey course that leaves us feeling like it's nice to have some exposure to the programming equivalent of Latin. Not enough programmers seem to understand, deeply, in our bones, that the act of programming is an act of applied mathematical logic, and the more of that we understand, the better off we'll be.

Not enough programmers seem

Not enough programmers seem to understand, deeply, in our bones, that the act of programming is an act of applied mathematical logic, and the more of that we understand, the better off we'll be.

I think there are certain logical brain-types that will most definitely benefit from a strong math and FP education, and they'll go looking for it. There are also creative brain-types that don't care to learn so much, though they should, of course, have some understanding of math. Don't assume one way is better than the other, I have seen the logical types complain that a problem is too hard because its solution doesn't fit their mold of what programming is.

Not everyone thinks the same way. Diversity is good. Learn what you are interested in and passionate about is better than following a preset mold.

Dependency Inversion

Yes, dependency inversion is a big part of programming in a capability language. Since objects are confined unless they carry reference to a shared service, providing dependencies directly is necessary.

As noted above, this is nothing special... it's simply proper abstraction.

Far more remarkable is the strange notion that objects should create their own dependencies. Even the idea that an object should create and encapsulate its very own mutable 'integer' variable - as opposed to a (potentially shared) integer variable being passed to it via constructor - is of questionable virtue. Objects that 'own' variables end up attempting to serve two roles, as both capability and object-graph description, and manage to achieve each of these poorly.

ease the syntactic weight by passing a "program" object pointer implicitly between modules during initialization.

While doing this may ease syntactic burden, it would also violate capability principles. It's essentially the same as having static fields in classes... i.e. one can consider all static fields to be members of an implicit 'program' object.

The ability to implicitly pass parameters about is not bad so long as one can readily control the full set of hidden parameters. I would consider dynamically scoped variables a more promising feature for this purpose. For capability systems, one would need the ability to essentially capture all the dynamic variables and manipulate them as a set: it is important that 'main' or any other module can control, filter, etc. any objects referenced by these implicit parameters just as easily as they can explicit parameters - this is not something that could easily be performed with a 'program' object. For distribution and parallelization, it would be best if dynamic variables cannot be mutated (i.e. the set of variables and values is not mutable, even if the values may reference mutable things).

There are other ways to reduce syntactic burden, perhaps more effectively than the sort of implicit context described above.

In most OO languages, such as Java, object graphs are constructed procedurally. This is a bad thing, because it forces programmers to explicitly deal with order-of-construction and does not readily support composition or abstraction of object graphs without relying on a framework (i.e. for dependency injection).

I believe syntactic support for declaratively describing and parametrically abstracting large object graphs would greatly ease initialization burdens. Simultaneously, it could also simplify specification of special relationships such as automatic distribution (A nearby B), observer patterns (A observes B), survival dependency and redundancy (A depends on B), and allows for a wider variety of immutable objects (no need for mutation during initialization), which would in turn allow a variety of inlining and graph-reduction optimizations.

Rings a bell...

So the next question is how does each subsystem in my program (and imagine that it is a million lines of code with several dozen sub-systems) share access to the list of modules they need? It just looks like there would be an explosion of parameters all over the place, if every class has to pass modules, to every other class who may potentially need them.

Have you ever seen a big app wired together with Spring? Particularly when people get fond of auto-wiring, it's not unusual to see dozens of classes, each with ten or fifteen parameters, in exactly the pattern you describe.

Not familiar with Spring

I actually hadn't, I don't work with Java. Thank you for the example. So is auto-wiring the way of the future for Java frameworks?

When you say dozens of classes, do you mean that most of the classes look like that, or only a small percentage? I ask because our notion of what constitutes a "big" app may differ.

Auto-wiring, etc.

Personally, I don't think that auto-wiring is "the way of the future." That type of thing has been in common use for a number of years, but while it has a number of advantages, I rather feel that they're outweighed by the disadvantages.

When I say dozens of classes, I mean a small percentage of the classes in the system.