State of objects

Imagine having a purely object-oriented language, meaning that all algorithms (functions/methods/procedures) are methods in objects, and all value storing are also handled by objects.

When a compiler see code for creating an object, it will implicitly associate a state with it. The state can be any of the following:
1) Allocated but not yet initiated.
2) Initiated, ready for use
3) Destructed

It uses these states to determine what is legal syntax or not, and probably for doing optimization and what not.

I'm playing with the idea to make these states not only visible for the programmer, but also extendable. If you combine this with the possibility to condition methods (members?) with the state, you can actually define an object that e.g. only allows a certain method to be invoked once. This would enable the compiler to enforce this statically whenever possible, and only do this enforcement dynamically when it is not possible to do this statically.

Is there anyone out there who have tried this already?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

See Plaid by Jonathan

See Plaid by Jonathan Aldrich et al.

States and interfaces

Very interesting reading, it did set my head spinning for a while... There is one thing I have a hard time dealing with:

In my opinion, methods is what should concern others (though not necessarily all methods), while members is something that is always private to an object. The object may then choose to define getter/setter for each of its members, and if the same syntax can be used to call a getter/setter as it would when accessing a private member, then other objects aren't really affected by the difference. The reason for this way of thinking, is that it creates a kind of "default private" for members, and where you specifically have to define whether you want others to be able to read, read and update or only update a member. Changing your mind later is no problem, anyone accessing your object in the "legal" way is not affected.

I also think this also makes objects more adapted for remote communication, since methods give you more control over how and when the member is updated.

Having said all that, I don't like Plaid's way of mixing members and methods into a certain state. I'd like to be able to say "here's an object adhering to this interface", never mind what state it is in or even what object it is. In "normal" object oriented programming this is usually accomplished using inheritance. With states defining both methods and members, things starts to become messed up in this area. In my humble opinion...

Any thoughts on this?

I don't know Plaid very

I don't know Plaid very well, and personally I'm not a big fan of verbosity. But from the OOPSLA 2011 paper, states can inherit from other states using the case-of, and hence would seem to completely subsume the purpose of classes.

And doesn't that make sense? A state indicates that an object is in some state, a class indicates that an object is being something, how are they so different? A duck is in the state of being a duck, it just happens that it will probably always be in that state; while flying is a transient state (we could imagine a magic wand that transmutes a duck into a dog, but that is unusual). I did something similar in SuperGlue and am doing something similar in YinYang, but for different reasons unrelated to correctness.

In other words, once you have dynamic inheritance, static inheritance is simply a special case of that.

State change vs transformation

I wonder whether the magic wand is transforming a duck into a dog or if it changes the state of Animal from duck to dog?

I do agree that specifying different methods and members for different states do make sense. I'm just trying to connect role oriented programming where an object can assume different roles, which I see as kind of similar but not identical. I can also see a point in describing interfaces (probably just a list of methods) as a first class entity, where you do not really care what object implements it.

But then I realize that it would quite neat to speak about state transitions when declaring the methods. This would make the compilers job easier, since more sanity checks could be done at compile time. Compare with the requirement in Java to always specify what exceptions a methods might throw with having to specify what new states this method might cause to the object. If thinking in these terms, then interface might not be able to live standalone.

Sorry about all the confusions. As I said, the paper made my head spin, apparently the spinning hasn't ceased yet ;-)

In World of Warcraft, you

In World of Warcraft, you can change things into sheep for a certain amount of time. You are changing the state of the object momentarily, whether you can represent that in the type system or not is another issue.

My point about verbosity: with a powerful static type system, you wind up annotating more since there is a higher bar on correctness. I'm not sure if this is manageable, but looking at the Plaid paper, it doesn't seem as bad, they have thought out the system fairly well.

Plaid Verbosity

I expect Plaid's typestate would prove verbose if you tried to use it in a fully static manner. For example, you would generally need to remove objects from collections before operating on them, and sort them into collections based on their final typestate, and use tagged union types to return references that might be in different typestates.

Fortunately, Plaid appears to provide a degree of dynamic analysis (if (file instate OpenFile) ...) which should help mitigate verbosity.

This sort of gets down to

This sort of gets down to the biggest problem with dynamic inheritance in a statically type-checked/imperative setting: what happens when the type of an object "changes" while there is an outstanding reference to the object that depends on the old type? In a declarative language (without lasting aliased assignment), you don't have to worry about this, but that is hardly practical. On second reading, it seems that Plaid only relies on dynamic type state (as of OOPSLA 2011), sounds like they are evolving their idea away from static analysis?

Plaid Concurrency

It seems Plaid uses a mix of linear typing (full/shared/pure references) and transactions (atomic{} blocks) to protect aliased references and constrain observable changes. I would assume that a shared reference must return to the 'guaranteed' typestate before the end of each transaction, and this is probably where most dynamic checks are required.

But, yes, it does seem that Plaid has moved in the direction dynamic typing, from their 'Gradual Typestate' paper. From said paper, it also seems access to dynamic checks is explicit, via distinct assertd operators and the like. This mix of static-by-default, with dynamic checks typing at statically arranged boundaries, strikes me as more acceptable than the more typical reverse...

Could you clarify your meaning for your 'in a declarative language' sentence? I can think of several declarative languages (e.g. temporal logic) where concurrent manipulations would make it difficult to reason about pre-condition and post-condition state, and where aliasing is not the issue.

Declarative is timeless and

Declarative is timeless and so concurrency is something that must happen under the covers (while preserving declarative reasoning). If the logic includes some sort of imperative operation like "start to be true", then it is no longer declarative even if still a logic.

The question really comes down to how do you know if an object is in a state or not? If the object is "this" and you are currently executing that state, then you know that state is active. If not, you depend on the type of the object's reference, which is only valid in a declarative/timeless context. Imperative assignment messes with this since you can read the object from the reference later in the wrong context. Declarative assignment is clean, since the assignment no longer occurs if the context doesn't match the state requirements.

Declarative is not timeless

Declarative is not 'timeless'. One can model time in a declarative manner. I would grant, however, that time and order cannot be implicit to syntax if the model is to be declarative. I.e. "start to be true" is bad, but "is true at time T+1" is okay because we can still apply declarative reasoning: monotonic, idempotent, commutative, associative.

However, understanding your definition (even if I disagree with it) does help clarify your meaning to me.

I would posit that the real issue is that shared references model shared state, and (by nature) shared state is not locally deterministic. Doesn't matter whether it is declarative or not.

Anyhow, Plaid uses a sort of linear typing. To use a reference is to consume it, though the reference is often returned to you after the the operation. This helps mitigate the whole context issue, since we aren't using 'old' references - i.e. we cannot still use the 'full' reference after splitting it into 'shared' references or 'pure' references.

Implicit time doesn't make

Implicit time doesn't make the language non-declarative, only effects that occur implicitly at specific points in time do (implicit-time FRP is still declarative if it doesn't deal with discrete events). Of course, if time is explicit then you can parameterize your effects explicitly, and your language remains declarative, but in practice this isn't very workable!

I would also classify code amenable to undecidable theorem proving as still capable of being declarative (as in F*). In fact, whether you can formally reason about the code at all is not relevant to the code being declarative (you can build a Turing machine out of a declarative language!). But declarative is overloaded...it could mean a language where time is explicit or irrelevant, or it could mean a language with simple semantics amenable to analysis, or it could just mean markup :)

You can share state and still be declarative: say you have a mutable set of "Open File," and a declarative "add" operation that places an element in the set as long as add operation is active. The add operation is predicated by the fact that the element added is an open file. If the precondition fails, an added file is no longer open, then the add operation is no longer active and the file is no longer shared through the set. The point here is that "add" is not an imperative operation, rather a declarative one, it doesn't just execute discretely, it has a continuous execution bounded by predicates that are part of the type system.

This is how SuperGlue and YinYang should work (though I've never gone so far to explain my idea for sets until today :) ). Its not always easy: sometimes you need to add something to a set when some predicates are true, but you need it to remain past when the predicates are no longer true. Also, when your predicates cease being true, the element is transparently removed from the set, which can be a pain in the neck for debugging (crashing might have been more desirable).

Declarative

To my understanding, 'declarative' is something of a continuum based around the following properties of 'declarations':

  • spatially idempotent: if you've said it once, you've said it a thousand times; meaning of declaration is independent of its multiplicity
  • associative and spatially commutative: meaning of a declaration is independent of its placement, up to parameters.
  • monotonic: declarations add to a system. You must use time or priority to model declarations that override or cancel other declarations.

I do not consider declarative to mean 'continuous', though certain continuity properties are weakly implied by monotonicity. I like continuous semantics for other reasons, such as resilience and simplicity.

I agree we can have shared state and be 'declarative'. The issue with typestate references is that we cannot have shared state and be 'locally deterministic' (i.e. we need a global view to determine the state of a reference).

Regarding your 'add' example: I use a similar concept for reactive sets and collections in RDP. Debugging can be accommodated by keeping a little extra history (a few seconds worth) while in debug mode.

Typing Protocols

In general, constraining syntax based on state can be achieved by use of dependent types, or by use of extensible grammar models (Christiansen grammars, Recursive Adaptive Grammars, etc.).

Such syntax-based constraints is don't work so well if you might have concurrent interaction with an object. In the general case, a reference held by Alice might change its 'type' due to an action by Charlie. Subjecting this to useful 'static' type analysis is difficult, unless you supplement the analysis with sub-structural types (linearity, regions, single-use objects).

Similarly, working with collections is more specialized - can't just have a list of files, need a list of 'open' files... and if you want to close an open file, you first need to remove it from said list.

Providing linear types (i.e. references that must be consumed at least once and at most once) is a very effective option for protecting and enforcing protocols and state.

Consumed references

If a reference is always "consumed" once and once only, how does that work in practice? E.g. if you issue several calls on a reference you got, which I suspect should be quite common?

The usual way is that the

The usual way is to have the call on the reference return a similar reference: the type would be something like A ⊸ T⊗A, that is, takes (and consumes) A, and returns both T and A.

Consumed references

If the object is remote, then this should translate into a reference promised is being returned, to ensure that you can continue executing even though you haven't got your new reference yet. Then you continue on the believe that the reference is to an object of a certain state, but when you get it for real, it was an object in another, incompatible state. Interesting. Then you'd need to provide some way for the application to backtrack what it had done. Ultimately, you should simulate getting an exception about non-existing method when it happened, undoing anything you did after it.

Or, as an example: How do I solve the following:

1) Call method1 on reference for object1, getting a promise for a reference on object1
2) Call method2 on reference for object1, which is a promise and therefore will allow this even though it wasn't legal due to a state change occurring in the object1 between 1 and 2.
3) Call method3 on reference for object2
4) Now the promise used in 2 is resolved, realizing that the call made in 2 was illegal. What now?

Partial Failure

If the reference is linear, such that you are 'consuming' it, then the situation you state is not possible: nobody else has a reference to object 1, therefore object1 does not change state between steps 1 and 2. If there is a state error in object 1, you get to blame the illegal move on the remote system for violating the linear type.

Of course, placing blame doesn't solve any actual problems. And, when such state and safety errors occur, they must not compromise security. (It is important that applications fail securely.)

Fortunately, every 'cloud' has a silver lining. It is entirely feasible to model all abstraction failures in a consistent, composable way: disruption. We treat it as though the network connection were suddenly lost. This is sort of a see-no-evil, hear-no-evil, speak-no-evil model for failure handling. In this case, it would mean that:

  • for your call to method2, the callee says: "hey, this is an invalid type for that call. I'll just pretend I did not hear it!" then send this complaint in place of a proper response.
  • the response/promise from method1 is broken with disruption properties, which must have been part of your remote-object semantics anyway, therefore we can say that method3 already knows how to handle this case.

Note: Expected errors should be part of the domain model, and reported using a tagged union type or other error handling mechanism. Normal errors should not be treated the same as abstraction failures and disruption.

Exactly how we handle disruption is based on the programming model. In an E-like language with promises, we would essentially break the reference. In a language with distributed transactions, we might roll back the transaction.

For RDP, I shut down the behavior and appropriate logical connections, thus propagating disruption 'reactively' in both directions, up to boundaries precisely defined by developers utilizing standard abstract 'proxy' services. (Such proxies allow developers to recognize disruption, to set fallbacks in place for graceful degradation, and to possibly use cached responses during a disruption period.)

Partial failure is, of course, one of The Big problems for distributed systems. (Some even call it the 'defining' problem.) I aim for simplicity, rather than precision, so that developers can reason more easily about failure modes and resilience. Not all programming models are well suited for distributed systems development. Side-effects in RDP were designed with disruption and resilience as critical concerns. But message-passing models make partial-failure difficult to recover from because (a) we can't tell whether a disruption failure is before or after receipt of the message, (b) it is not clear how much history we should keep, (c) redundant observers or services can easily diverge due to subtle differences in message arrival order and disruption patterns.

So you really need to ask a few different questions: How will you model disruption and other runtime failures in general? And do you have the right computation model in the first place?

Partial Failure

I come from OSE, which uses the following error handling philosophy:
Divide the application into protection domain. Within each protection domain, be sure to find errors early, and when found, causes a quick restart of that protection domain. Connections between protection domains consists of copy-by-value asynchronous messaging, i.e. quite loosely coupled. No shared memory between them.
I've also done file system designing and implementation, and there I've become a bit fond of transactional thinking. I guess I sometimes see things as transactions whether this was the intention or not ;-)

For me, a method call shares a lot of the properties that I would assign to transactions. I like the call to finish successfully, or not having any side-effects at all except for reporting the error. This require some careful design in C to accomplish, but in my experience results in a more robust design.

If I would transfer these design ideas into a program language design, I'd see an object as the smallest possible protection domain. Saying that references can become illegal as you described above, and operating on it will render the object itself illegal. This means that unless you check for the validity of the reference, an error will propagate.

This means that you define a protection domain as which object that will be paranoid about its references, always checking their validity before doing anything. You could provide some language support for this, e.g. having an error method that is invoked in case an illegal reference is called or having try code block that is exited into a catch clause when an illegal reference is used, similar to how exceptions work. The object could also have a property telling the compiler to make sure all cases of invalid references are handled, so no such bug will slip through.

The main idea here is that a method will always finish its work, or the object dies. That's the only two options. If combining with statetypes, roles or other ways to describe in interfaces how an object can change behavior (e.g. open/closed files) then regular exceptions shouldn't be needed to handle such cases. And this should therefore mean that this simple "do or die" strategy should actually work.

Do you think this kind of error handling strategy would work?

Regarding Transactions

I did develop and prototype the idea of a transactional actors model, but my work on integrating this with sensor-fusion, publish/subscribe patterns, and foreign services led me to the conclusion that transactions are not a very good idea. (They're semi-good. They're quasi-good. They're the margarine of good. They're the Diet Coke of good. Just one calorie, not good enough...)

My philosophy regarding 'restarts' has become that: to the extent it's safe to perform a restart of a subsystem, it was a design error to keep explicit state anyway. If you can restart, there should be nothing (semantically) to reset, though one might regenerate connections and restore caches.

The idea of rendering object references invalid can be a good one, but you must be careful that this cannot be used as a denial-of-service attack. You still need to fail securely.

Destructors in references

I've been playing with an idea that extends this idea with a faulty reference that has a destructor with the responsibility to undo what the call to the reference had done. How to express this in the language is what I've got problems with right now... The problem is that the implementation probably should be part of the object being referenced, since it is there that the knowledge exists how to undo an operation.

But if I could make this work, I could set up dependency trees between promise references. If all those get affirmative answers eventually, all is well. But if such a reference gets an error indication, its destructor will be called, which in turn will cause the destruction of all other references on which it depends, and so on.

Is this an approach that has been tried somewhere?

Explicit Dependencies and Cascading Failure

For my earlier work with actors model, I had a few annotations that one could declare between actors. One of them was dependency: if actor A depends on actor B, then whenever B is inaccessible (e.g. due to network disruption), then A is inaccessible. This is a transitive relationship, so any actor depending on A will also shut down. For actors, inaccessible meant that messages bounce, promises break.

This notion was leveraged with the idea of 'suicide actors', which accept a command to 'die' then become permanently inaccessible. This would cascade transitively to all actors depending on the suicide actor. When an actor becomes 'permanently inaccessible', we can garbage-collect it. Thus, this served as a secure approach to object destruction and 'delete', an effective basis for job control, and makes it easy to support 'watchdog' patterns like those used in Erlang (i.e. destroy a whole subnet then regenerate it).

It is easiest to handle failure at a high level, and at well defined boundaries in the application. Limping along means we don't see the problem. These cascading failures provide the boundaries for failure recognition and recovery. I imagine that a type-failure should result in the same sort of cascading failure as disruption - not necessarily permanent, but always obvious.

I plan to adapt this dependency notion for my RDP model, but I'm not far enough along to actually implement it. I believe that I will reverse my earlier default: implicit dependencies, explicit proxy services for when we want to 'observe' disruption and react to it (e.g. by using a cache or fallback).

Anyhow, I don't believe your 'tree of dependencies between promises' idea would work in its current form. The problem is how that would interact with promise pipelining, i.e. where a break might occur anywhere along the 'pipe' but you can't exactly undo the earlier stages. Transactions won't work very well because you'll inevitably want collaboration patterns: e.g. observer-patterns, data-fusion.

I use temporal semantics. They don't allow me to undo the past, but I can at least model a subsystem 'breaking' at a clean, precise logical instant. This makes the failure modes far more consistent and easier to understand.

Cascading failure

I'm not quite convinced my idea will not work. The thing is that running the destructor of a promise reference is not necessarily the same thing as undoing the operation. It is more a way of informing that something went wrong for something we're dependent on. How you react to is up to the programmer who writes the destructor code.

The default mode isn't to propagate the destruction. If you do not check for a broken promise reference before taking the next action, then the whole application will be aborted. But if you do check, then you can run code to handle this situation.

Since I'm more of a "regular engineer" rather than being properly educated language constructor, I feel I need to try things out to realize what the problem really is...