An Overview of the Singularity Project

Singularity is a research project in Microsoft Research that started with the question: what would a software platform look like if it was designed from scratch with the primary goal of dependability? Singularity is working to answer this question by building on advances in programming languages and tools to develop a new system architecture and operating system (named Singularity), with the aim of producing a more robust and dependable software platform. Singularity demonstrates the practicality of new technologies and architectural decisions, which should lead to the construction of more robust and dependable systems...
Singularity... starts from a premise of language safety and builds a system architecture that supports and enhances the language guarantees.

An interesting overview of what sounds like an intersting project.

The choice of implementation language is also interesting:

Singularity is written in Sing#, which is an extension to the Spec# language developed in Microsoft Research. Spec# itself is an extension to Microsoft’s C# language that provides constructs (pre- and post-conditions and object invariants) for specifying program behavior. Specifications can be statically verified by the Boogie verifier or checked by compiler-inserted run-time tests. Sing# extends this language with support for channels and low-level constructs necessary for system code....integrating a feature into a language allows more aspects of a program to be verified. Singularity’s constructs allow communication to be statically verified.

An interesting aspect is the support for meta-programming, which is implemented in an unusal manner:

Compile-time reflection (CTR) is a partial substitute for the CLR’s full reflection capability. CTR is similar to techniques such as macros, binary code rewriting, aspects, meta-programming, and multi-stage languages. The basic idea is that programs may contain place-holder elements (classes, methods, fields, etc.) that are subsequently expanded by a generator.

Many other intersting design decisions are discussed in the paper (e.g., various DbC facilities), so do check it out.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Ada

Safe programming languages are not recent phenomena. Pascal and Ada are safe, statically verifiable imperative languages. Modula-3, Dylan, and Java are safe object-oriented languages. Safe languages have become more popular with faster processors, more refined type systems, and improved run-time systems. Nevertheless, they are not widely used for system implementation because their time and space overhead is higher than low-level languages such as C or C++...

Ada is, in fact, used for many hard realtime systems, so I would take this comment with a grain of salt. A useful overview of the facilities provided for this by the language can be found here (note that this is from 1998, things evolved since then).

Ada95 is object oriented, by the way. Ada2005 adds support for interfaces, and the object.method notation...

By the way, Ada provides rich support for specifying data layout and supports many similar "systems programming" needs.

On the other hand...

I question whether Ada can rightly be regarded as "safe" with such functions as UNCHECKED_DEALLOCATION (which, incidentally, is one of the tricks needed to make Ada truly fast).

Freeing memory

The fact that you can deallocate memory doesn't make a language unsafe. Safety is a technical term here, and itrefers to the typing discipline.

Incidently, an Ada implementation can make use a of garbage collector, by the way (say on CLR, or the JVM) and then one doesn't need to manually release memory.

Are you sure?

Doesn't the possibility of dangling references violate safety, since the types of pointed-to objects can no longer be guaranteed? BTW, "permitting GC" (as opposed to "requiring GC") is a red herring, because even C++ is compatible with the Boehm collector, and nobody in their right mind would call that a safe language. Let me illustrate what I believe is a lack of Progress (in C++, though it should be trivial to translate to Ada...it's been nearly a decade since I touched the language):

1) int* x = new int(5);
2) int* y = x;
3) delete x;
4) *y;

Now I claim that the semantics of line 4 are undefined in both C++ and the equivalent Ada program. And if they are undefined, then progress fails at line 3, because there is no proper way to evaluate line 4 (as in, evaluating line 4 does not yield a well-defined value according to the language semantics). If I've misunderstood type safety, I'd be glad to review a good reference (though it's hard to find a good one online).

interesting name for a project

Malicious users would comment on this by saying Microsoft is trying to grow an AI -- the Singularity -- so that it commands the empir, err, company in future years, and that the AI is being modeled after Bill's brain...

.Net is certainly scoring a few points recently for interesting academic projects seeing some mainstream acceptance.

Reminds me of this quote...

"More good code has been written in languages denounced as bad than in languages proclaimed wonderful—-much more."

—Bjarne Stroustrup
(in The Design and Evolution of C++) (1994)

Perfection or perception

Even a perfect system will not work in an imperfect environment.

My first thought....

... was that the project's premise ought to impose a constraint that the project not be affiliated with Microsoft; a company which is long known for its lackluster approach to dependability.

Of course, such a cheap shot is unfair. MS has lots of talented people working for it; including many academics and professionals who have an in-depth understanding of software reliability, dependability, and security issues.

Unfortunately--in the past--the efforts of those folks have been undermined by many of the business objectives of the company--which range from the bothersom disease of "put in lots of features and don't worry about the bugs!" (a malady which is hardly unique to Microsoft), to the horrible cancer of "integrate the browser with the operating system so we can kill off Netscape"--technically backward measures taken for anti-competitive reasons).

Perhaps we are seeing a tiger trying to change its stripes--MS is bleeding as more and more people consider and adopt other platforms (mainly Linux). Perhaps Gates and Ballmer and co. will finally "get it". Ot perhaps this project is doomed in its infancy--something cool which the labs gets to work on (and gives MS a bit of P/R), but which isn't likely to make it into an end product, because it doesn't meet some MS business objective.

And that's the crux of the problem--many of the techniques used for SW reliability--including design transparency, peer review/audit, and modularity--are anathema to the business model of a monopolist.

What about the technical ideas?

Forget about the corporation for a moment. Do the technical ideas make sense?

Certain aspects are interesting and promising...

...even if they aren't terribly novel. (Which isn't a bad thing, necessarily). Compiler verification of contracts (invariants) is always an interesting problem, and the "Software Isolated Process" (a single-threaded, dynamically-compiled VM instance essentially) is an well-known concept--a straightforward extension of what the CLR or JVM currently does.

In further praise of MS, I'm happy to see that shared state concurrency appears to be banished from the model. While Sing# isn't a single-assignment language like Erlang, this is a welcome step away from a programming model which is a notorious source of bugs.

Does it make sense?

That is what I'm wondering, and in case you haven't already noticed, I'm a total n00b. So I can't answer.

What is the price of message based communication between processes?

On page 6, what is meant by "...applications as a security principal..."? And how would that be different from "user permissions"?

Application Manifests sounds *super cool*. Does this really mean, easier installation, better dependancy management, and cleaner un-installs?

How truthful is it that, "faulty device drivers cause 85% of diagnosed Windows system crashes"?

SIPs low overhead and Haskell.

Reading about SIPs it strikes me these concepts are very similar to those in Haskell... For example Haskell threads are very lightweight, they are separate "object" spaces (variable locality comes from scoping, not memory management), and you communicate between them in a type-safe manner (using channels).

Infact if you consider SPJs software-transactional-memories, the singularity stuff seems a bit behind... The idea seems to be to borrow some stuff from functional/haskell programming and bolt it onto C# (wait where have I seen this before... Cw (c-omega) and LINQ ... which are derived from HaskellDB) Throwing in a separate specification language for good measure.

It raises the question why not write it in Haskell? (and will it not be similar to HoS/House)?

Objecting to Inheritance or Objectives of Inheritance

If we're going to object to the inheritance of ideas from other languages, then we might as well discount concurrent Haskell as well - it lifts many of its ideas from occam, including the notion of ultra-lightweight processes that can only communicate via type-safe channels.

So why not just implement everything in occam? Well, mostly because Haskell has all sorts of neat features that simply aren't available in occam. More to the point, it's silly to object to the idea that one programming language can't learn from another one. Our ultimate objective is surely to improve PLs in general. That's inevitably going to involve inheriting ideas from other languages.

Yes! Equal Opportunity Improvement

and Equal Opportunity Admiration, for example Autrijus' Visual Basic Rocketh talk at EuroOSCon.

I will never be a language snob, only a feature snob!

Type safe channels (like Occa

Type safe channels (like Occam) is not the same thing. Haskell being a functional language means the channels are typed and higher-order... You can declare a channel where the types are function types (And that is types, not function pointers which are untyped). Haskells light-weight processes are not the same thing either, they come for free from the nature of the spineless-tagless-graph-machine that is the core of GHCs implementation, whereas Occam's come from having many transputers running in parallel. Occam, AFAIK when running on a single CPU has the same overheads associated with threading and processes as any other language. The light-weight processes in Haskell are like threads that have the separation of processes - in other words they run in the same address space, but the type-system guarantees they will not interfere with each other, and this condition is prooved at compile time. Singularity includes the higher order nature of channels and the type-system driven light weight processes. So again the emphasis is on the static proovability of properties - something which Haskell does already - and nothing like Occam. Remember Haskell was developed by a pooling of functional talent - to put a stop the the over-proliferation of different languages all trying to implement the same typed-lazy-functional paradigm. So if C# is developing into another typed-lazy-functional language they should do the right thing and use Haskell ...

Perhaps I distracted you from my point by focusing on Haskell. What I really meant to say are all these things are such that they come for free (or practically for free) in a functional language like Haskell. In other words functional languages are the natural domain of expression for these kind of ideas...

I can see nothing new here - except the application of ideas from functional programming to C#... which looks like prior art incase they try to file for any patents. Perhaps C# will take on so many idea's from functional languages it will eventually become more functional than imperitive...

As for language snob - That was my point... It seems some people are too busy evangelising C# and adding features to see that what they will end up with is already available (IE Haskell has a mixed functional/imperitive model, using Monads to encapsulate the effects of the imperitive sections).
Why not just use the other language that provides the features. If you turn visual-basic into Pascal, would it not have been better just to use Pascal? VB's main competitor was Delphi (which is Pascal), until they poached Delphi's lead developer, who is responsible for some of new ideas. Also MS employed former academic Erik Meijer who brought his work on HaskellDB to C# and VB. So perhaps these changes were not unpredictable. I think what makes me uncomfortable is that MS is paying people to recreate and carry on work started in other languages - but ontop of the .Net platform.

You could almost say they are _embracing_ these ideas... Linq also _extends_ the ideas in HaskellDB by applying them to XML and objects (a fairly obvious extension - but cool too). We all know what follows "embracing and extending..."

I appreciate the point about improving all PLs, so the relavent question I suppose is "what does this add"... and thats where the relavence of my comments should be obvious - we already have most of this. The ideas in Spec# and Sing# may be interesting. However I don't like the idea of two separate languages - so this gives me a direction to progress this... Is there a single language that could replace both C# and Sing#, a language capable of writing both the specification and the implementation?

The Process of Channelling old ideas to new languages

Haskell being a functional language means the channels are typed and higher-order... You can declare a channel where the types are function types (And that is types, not function pointers which are untyped).

I'm not sure exactly what your point is here. Granted, occam does not have first-class functions. But it most assuredly does have typed channels, and if it did have first-class functions I'm sure they would be passed around just as in Haskell - it seems like a trivial extension to the basic idea of typed channels. It's worth noting that modern occam (as exemplified by the KRoC project) includes the notion of mobile channels and mobile processes, which can be passed around dynamically. But the mobile processes and channels are both statically typed, and the channels used to pass them around are also statically typed. No untyped function pointers allowed.
Haskells light-weight processes are not the same thing either, they come for free from the nature of the spineless-tagless-graph-machine that is the core of GHCs implementation, whereas Occam's come from having many transputers running in parallel. Occam, AFAIK when running on a single CPU has the same overheads associated with threading and processes as any other language.
And you would be completely wrong. Occam was designed from the get-go to have ultra-fast context switching between processes, even when running on a single transputer. Of course, INMOS also provided for the possibility of easily scaling up to multiple transputers if necessary, but it is not a requirement. Since the transputer is now defunct, modern occam runs on x86 processors. Just one at a time. I've had systems of up to several million processes running at one time, on one processor, in the same address space. Statically proven not to interfere with each other.
What I really meant to say are all these things are such that they come for free (or practically for free) in a functional language like Haskell. In other words functional languages are the natural domain of expression for these kind of ideas...
And low-overhead, secure concurrency comes for free in a language designed from the ground up for it. Like occam. Which makes occam the natural domain of expression for those kind of ideas (especially given occam's close ties to CSP semantics). So why add those kind of features to Haskell (which has a semantics much closer to the λ-calculus than the π-calculus)? By your logic, it should never have happened.

Look, I'm not particularly fond of MS myself. And I do feel like they often suffer from a nasty case of NIH syndrome (even in MS Research). But trying to claim that they shouldn't implement old ideas into new languages, or that "we already have most of this" misses the point completely. All sorts of languages lift ideas from other languages (as the Lisp folks often point out). Haskell and ML learn from each other. And from other languages. Oz incorporates good ideas from all over the place. Everybody takes ideas from Lisp. Sometimes adding an old idea to a new language that has other features not found in the source language let's us see useful synergies between features that we hadn't noticed before. That's the way that individual languages, and language design in general, grow and evolve. If we made it a rule that no one was allowed to implement any idea that had already been implemented in another language all we'd achieve would be to slow down the pace of improvement in language design by several orders of magnitude.

I agree with most of that...

I agree with most of that... Except it ignores elegance... Sure you can bolt any feature onto any language (almost... I think if you model language features as monad-transformers you find that some features exclude certain other features) but will you not end up with a frankinsteins-monster? Some features have a synergy... and these features seem to me to be much happier in a functional languge than a derivative of C/Java/Pascal. The most elegant ideas derive all the required features from a few simple core ideas.

Of course it is great that microsoft are doing this, and its always good to see good ideas being adopted. I don't dispute this - but thats a different question from is this the best way to clean-room produce a secure operating system.

Dont forget Hos and House, a Haskell based OS that boots on PC architecture and has some (I haven't read enough about Sing# to know if all its properties could be simulated by type-level logic programming with type-classes in Haskell) of the same features for free just because its written in Haskell. To be fair it could probably be in ML or Clean just as well.

Any ideas about combining the specification language and the implementation language?

The need to specify what we want to specify

Any ideas about combining the specification language and the implementation language?

It depends on what you mean by "specification". Type signatures are a specification, just a somewhat limited form. It can probably be argued that declarative programming is a form of specification. So, in that sense, Haskell already combines specification and implementation. However, I'm guessing that's not quite what you mean. For a language closer to what I think you have in mind, you might take a look at HasCASL. It's an extended algebraic specification language (CASL) that embeds Haskell as a sublanguage. I haven't really used it at all, but it sounds promising.

Practically speaking, I'm skeptical that a single combined spec/implementation language would be all that widely applicable. A lot depends on what kind of properties you intend to specify - the huge proliferation of specification languages is mostly driven by the fact that different specification domains have different requirements. Perhaps the best option is to provide a kind of meta-specification/implementation language, which can be used to develop domain-specific specification/implementation languages. I have no idea what that would look like, although there may well be some clues in HasCASL since CASL itself has been extended to encompass a variety of other formalisms (including LTL, CSP, and state-based approaches).

CASL as meta-specification

Perhaps the best option is to provide a kind of meta-specification/implementation language, which can be used to develop domain-specific specification/implementation languages. I have no idea what that would look like, although there may well be some clues in HasCASL since CASL itself has been extended to encompass a variety of other formalisms (including LTL, CSP, and state-based approaches).

Indeed, that is in principle what CASL is supposed to be: a base specification language upon which you can build various different domain specific languages. It is, in a sense, designed to address precisely the issue that no one specification langauge can really suitably address all the different domains without becoming tortuously convoluted and unusable. Instead it aims to provide a common basic structure upon which different formalisms can be elegantly constructed in a consistent way.

There is good theoretical underpinning for this too - there is a lot of work on the theory of institutions which provides a framework in which different logical systems in a meta-theoretic way and transfer general results across into new systems. I'd reccomend the paper by Andrez Tarlecki on the subject which provides a nice introduction to the framework.

And low-overhead, secure conc

And low-overhead, secure concurrency comes for free in a language designed from the ground up for it. Like occam. Which makes occam the natural domain of expression for those kind of ideas (especially given occam's close ties to CSP semantics). So why add those kind of features to Haskell (which has a semantics much closer to the λ-calculus than the π-calculus)? By your logic, it should never have happened.

Because in Haskell's case there aren't any features that really want taking out - there's an embedding of the λ-calculus into the π-calculus, and Haskell is already adept at turning itself into λ-calculus-embedded-into-something-else.

To put it another way, Haskell's a natural domain to start building other ideas in because it tends not to get in your way with existing features that interfere. Me, I just wish people'd talk about (denotational) semantics more and category theory (as a Big Thing) less - that's where the magic lies.

(not that I disagree with your post, just explaining some of the original logic)

Just one at a time. I've had

Just one at a time. I've had systems of up to several million processes running at one time, on one processor, in the same address space. Statically proven not to interfere with each other.

How much memory did the millions of processes take up? I recall a post on the Erlang list talking about topping out at 1.8 million processes before running out of addressable memory. At 1Kb per process that's close to 2Gb.

Efficient process representation

We recently wrote an Erlang-like DSL in Erlang. The implementation is based, my friend Tony Rogvall tells me, on Quake: each process is just a type (so you know what code to run) and a fixnum (the continuation / instruction pointer). In this model you should be able to fit a process into one machine word which perhaps is what Quake does.

Predictably our processes have grown a heap and the heap has been used to implement a stack. So it goes.

On a 16Gb Sparc...

A note on the erlang mailing list about Mnesia and running millions of processes on a 16GB Sparc:

http://forums.trapexit.org:81/phpBB/viewtopic.php?t=5429

Pushing Mnesia

Just must brag that we're surely the record holders for punishment of Mnesia :-). We have pushed it to >60M records using a custom backend of BerkeleyDB stored on a cheap 12-disk SCSI array. Mnesia scalability is everything in our telecom system since we store one record per subscriber and lots of operators have tens of millions of subs.

With the disk arrays we do blow out to 8U of rack space for a full cluster but it still looks comical compared with the racks and racks that make up Java/Oracle systems at these sites :-)

try this link instead

The memory of processes

How much memory did the millions of processes take up? I recall a post on the Erlang list talking about topping out at 1.8 million processes before running out of addressable memory. At 1Kb per process that's close to 2Gb.

Occam has a per-process memory overhead of roughly 0.2kB. Beyond that you're dealing with the memory required for the code and data in each process, but that's little different than what you'd face with a large object-based system (at least AFAIK). It's probably worth pointing out here that occam was originally designed for (and used in) the embedded domain, and executed on transputers with limited built-in memory.

Could you clarify?

You said "You can declare a channel where the types are function types (And that is types, not function pointers which are untyped)."

Why do you say that function pointers are untyped?
In C, you cannot assign a function address to a function pointer with a different prototype, so they are "typed", weakly of course as it is C but typed nonetheless..

Okay... I guess thats my mist

Okay... I guess thats my mistake - I guess I really meant weakly typed. In an attempt to justify myself though, the point of typing a channel is that you get a strong guarantee that the function passed will not cause an error. Allowing a function type to be cast removes this guarantee - so I would argue that in this case (as in any where you are looking for provable correctness) allowing a cast is as bad as no type at all.

This explains some of the weirdness I felt at the thought of extending a language with casts to have design by contract. I guess C# is somewhat better than C by forcing you to declare such code unsafe... but as unsafeness is not protected by the type system its not much better than C really. Presumably one of the features of Spec#/Sing# is to restrict the use of unmanaged code. It would be better if C# used monads and could therefore require any function using unmanages code to be in the Unmanaged monad... thus a function could be restricted from running unmanaged code by the type of the calling function.

I really like this work. That

I really like this work. That is all.

Dependability?

From the Singularity report:

"If Singularity’s goal is more dependable systems, why does this report include performance measurements? The answer is simple: these numbers demonstrate that architecture that we proposed not only does not incur a performance penalty, but is often as fast as or faster than more conventional architecture. In other words, it is a practical basis on which to build a system.

On the other hand, this paper does not validate our goal of increased dependence. Measuring that aspect of a system is significantly more challenging than performance. We do not yet have results for Singularity."

How can a project that is supposedly about dependability not give any evaluation of its dependability? (It is also strange that they use the word "dependence" here, when the rest of the report talks about dependability.) I am very skeptical that this work is really about dependability. The real goal seems to be something else.