Designing a languge — what theory is a must?

Let's suppose I am designing a simple programming language (imperative, statically typed, C-inspired), trying to integrate into it various ideas I like and hoping to be able to combine them into something nice.

I think I can implement the language (e.g. write the compiler), but I want my language to have at least some theoretical backing — I want to know that the semantics are sound, according to some definition of soundness. I do not want to dig too hard into formal PL theory — I merely want to be able to convince myself that the design is sane.

  • What is the proper way of describing the static/dynamic semantics, to be able to reason about them more or less formally?
  • What kind of «soundness» should I aim for, and what is the minimal sane required level of it?
  • What are the most important properties of the language/type system, looking from the theoretical perspective (from a 80/20 perspective — the greedy 80%)?
  • How can I approach formalizing those properties, and what are the typical ways of proving them?

Sorry if these questions are a bit too broad — without dedicated PL design education it is unclear for me where should I start. I will appreciate greatly if someone can try to answer these in simple terms, if possible.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Proofs of a type system safety?

You may be interested in the previous thread: Proofs of a type system safety?.

Your language is "sound" when you know which constructs may result in failure (eg. 'assert', 'raise', out-of-bound access...) or non-termination (eg. loops, general fixpoint); this is what often considered as "type safety". If failure turns out to be expressible in a way you didn't plan in advance, you haven't checked hard enough and may need an ugly hack (think Java's runtime-check-on-array-write). Non-termination is much harder to catch (eg. ML higher-order references) and must usually be tolerated.

Bonus points if you can predict what can happen when it does fail: "activate exception handlers then stop with this return code" is good, "undefined behaviour, may launch the missiles" is bad; this is often called "memory safety" (when restricted to memory).

Type safety from a practical point of view

Thanks a lot for the answer!

Thinking about it a bit, is it true, that from a practical point of view, type safety means:

  1. During execution, well-typed program remain well-typed (e.g. ArrayStoreException in Java is needed to maintain this)
  2. Only operations that can be performed on data are those that are specified by its type
  3. No language construct/feature (except explicit cast) can result in casting a variable to the wrong type (like a subtyping + allowing covariant method argument types)?

2) seems like an easy thing, 3) — like you need to thouroughly check everything that can result in a cast.

1) on the other hand, looks hard (looks like it was never formally done for almost all «serious» languages?). How can you approximate it without defining precise operational semantics etc. (not sure if I fully understand the term, but it seems to be hard for a non-toy language)?

Subject reduction in practice

Quite serious programming languages have the subject-reduction property. The standard example is Standard ML which was completely formalized (in particular has a subject reduction proof) in a machine-checkable format but other languages have been formalized such as large subsets of OCaml, subsets of Java, Scala, and, more recently, Javascript (in the untyped setting the subject reduction is weaker, you only have a notion of "well-formed programs" that is preserved).

Indeed, it is hard to prove anything without a formal semantics. And it is not only superficial complexity: it is easy to forget something without a formal semantics, so there is value in forcing oneself to produce one. It is actually not terribly complicated, but you need to learn how to do it.

I'm not saying that you *must* do it. But I think that if you are really interested in "theory", you should do the effort to learn it. On the medium-to-long term it will be a much better investment than trying to work around your lack of formal training.

Pierce TAPL is a good ressource, maybe a bit too complete. If you want lighter ressources to get a first taste, I don't know what the classical advice is. Xavier Leroy, one of the OCaml designers, has course slides about (untyped) lambda-calculus operational semantics (part 1) (and implementation and compilation, part 2), which also mentions exceptions, mutable state, and continuations (part 3). Having a look at them may give you an idea of what an operational semantics looks like.

That said, I have no idea how big your language is. If you have a lot of different runtime concepts, you would get a big formalization and need some work to define the runtime semantics. If your type system is simple, the subject reduction proof, if it is correct, is likely to be easy but very tedious (you have to consider a lof of simple but different cases). That's what it takes.

Type Theory Tutorial

I wrote a small tutorial tutorial on type theory for some students who had little or no background in type theory and programming language foundations. They seemed to like it, so I thought it could help newcomers.

"Casting"

Casting means at least 5 distinct things depending on who says it and in what context. Some are more problematic than others.

1) Conversion. Promoting an 32 bit int to a 64 bit int or a float to a string, or even the other way around. While conversion can be problematic for users depending on how you do it, but it's generally not a problem from a formal standpoint since it's really no different from having functions that can do the conversion.

2) Reinterpreting bits to mean something totally different. In C cast a *Foo to a *void to a *Bar. Hilarity ensues. I'm not sure it's possible to have that kind of cast be anything but type unsafe. (In C that kind of cast kills "progress" because any program that does it has an undefined meaning).

3) "Upcasting" sub types to super types. Generally not a problem formally, but it does mean the static type system loses some information. To regain the information you generally need the more dangerous cousin, downcasting. Also requires care in ensuring that the representation of sub types is compatible with that of super types or that there's some other mechanism for not doing the wrong thing at runtime after a cast.

4) "Downcasting" super types to sub types. To keep it safe you need a lot of runtime bookkeeping plus a semantics for failure (e.g. raising an exception). The more sophisticated your type system, the harder it can be to ensure all the necessary bookkeeping tags flow properly for timely runtime checks.

5) Removing or adding effects, e.g. C++ const_cast. This completely takes the teeth out of the effect system unless, like downcasting, there's a way to add a runtime check that enforces the effect constraint and the result of a failed runtime check won't violate the effect constraint supposedly being enforced.

Tricky

I agree with Gasche that you really have to get pretty formal if you want to prove anything, but proving things about real-world imperative languages is difficult.

It's common to pick a subset of the language and prove things about the subset, but even that requires being comfortable with type system proofs. Also it's hard to pick the "right" subset of the language; some feature might seem inconsequential at first, but still end up having a big effect on soundness.

Another option is to do research on things that people have been found to be troublesome in the past. This might give you a good "spidey sense" about dangerous features. I was programming in Java for a long time and never ran into the covariant array problem. I only learned about it when I read about it somewhere online. Operator overloading can be tricky when you have first-class functions. "null" is a pretty bad wart, but eliminating it in an imperative language can be more difficult than you think: since you have no default value for uninitialized references, the Java object initialization hole (http://java.dzone.com/articles/java-object-initialization) becomes even more problematic.

There are a few other guidelines that you can use, like Liskov substitutability, making sure syntactic sugar is exactly equivalent to what you claim it is equivalent to, etc.

I guess I agree that formalization is a pain, but the only other option I can think of is just learning A LOT about the history of language design and being diligent about applying the lessons learned.

I guess I agree that

I guess I agree that formalization is a pain, but the only other option I can think of is just learning A LOT about the history of language design and being diligent about applying the lessons learned.

If you want to design a great/successful language, you better know A LOT about your history regardless, since much of what we know about languages isn't included in formal semantics. If you need to do a lot of formal reasoning about your language to figure out if it does what it says it does, then you are probably doing it wrong and your language will be too complex for people to use. If it "looks" simple you might want to do the formal reasoning to back up your assumptions, but your feeling for it being right shouldn't be based on this reasoning (you should be fairly confident before hand).

Formal reasoning's role in PL is as a tool to analyze the design space and is necessary when you want hard assurances; e.g., if you are targeting mission critical systems or the language is very popular already and its worth it. Java didn't need a formal semantics during its design; that it was done later by Guy Steele (et al) and minor problems were found was worth it considering that Java became popular. It wasn't worth it when Gosling was designing Java to be the next big thing.

If you need to do a lot of

If you need to do a lot of formal reasoning about your language to figure out if it does what it says it does, then you are probably doing it wrong and your language will be too complex for people to use. If it "looks" simple you might want to do the formal reasoning to back up your assumptions, but your feeling for it being right shouldn't be based on this reasoning (you should be fairly confident before hand).

I disagree. Combination of innocuous language features can lead to surprising results, and I haven't found any other way to check for it than formal reasoning. I know of some examples of people "feeling" right about their system, and yet important bugs being found out when using it or later performing a formalization. For example:

- C++ templates were not designed, if I understand correctly, to be a turing-complete language for metaprogramming. The current uses of templates in this way, which are a mixed blessing, where not envisioned by the people designing the templates in the first place. I don't think they had any idea how expressive the construct actually is, felt a formalization unnecessary (as for eg. Perl parsing). This is a funny example because it isn't about something bad happening. It's to be expected that powerful tools are used in new, creative ways that their designer didn't plan in advance, but it's debatable whether it was a good thing in this case.

- It came as a surprise to most people that mixing references with type polymorphism leads to unsoundness in absence of value restriction. Those are not features that are too complex to use: ML polymorphism and references are quite simple compared to what we have in today's language.

- When Martin-Löf designed his dependently typed system to represent constructive mathematics in the seventies, he was confident, I suppose, that it was sound. Yet it was proved not to be, and that (the dangers of impredicativity) had an extremely profound influence on the development on the field. Granted, this example could be considered "too complex to use", but it's about a formal system that could be described in less than one page, using concepts and notations that were well understood at that time.

Java didn't need a formal semantics during its design; that it was done later by Guy Steele (et al) and minor problems were found was worth it considering that Java became popular.

The type system was found unsound because a mutable data structure was considered covariant. I wouldn't call it a "minor" problem; the dynamic check that had to be added has, I think, an important impact on the performances.

Of course it could also be considered a "good thing" as it motivated continued development of the JIT techniques initially developed for dynamic languages of the Smalltalk family :-)

Of course, it all depends on what you consider a "minor" or "major" problem. If you don't care in type systems as long as your language is successful, then indeed formalizing the type system is "not worth it". The OP seemed to care about what theorists know of how to design languages (which, as you noted, is certainly not the only knowledge about how to design language), so I suppose he was interested in theoretical guarantees.

The type system was found

The type system was found unsound because a mutable data structure was considered covariant. I wouldn't call it a "minor" problem; the dynamic check that had to be added has, I think, an important impact on the performances.

I believe the store hazard for arrays in Java was known well in advance of any of the formalization work; a reference would be nice if this problem was actually discovered after the fact. The killer example you are probably looking for is covariant overriding in Eiffel (which predates Java). Even then, the Eiffel community (or Bertrand?) had mostly shrugged this problem off as not that important.

The OP seemed to care about what theorists know of how to design languages (which, as you noted, is certainly not the only knowledge about how to design language), so I suppose he was interested in theoretical guarantees.

Theory doesn't inform design very much unless you are designing a complicated language. My feeling is that perhaps you shouldn't be designing a complicated language, because it will have lots of problems even if you get the theory right (if your theory is correct but complicated, think about how that will affect your poor users!). But then there are domains where complicated languages are warranted, so what do I know?

I believe the store hazard

I believe the store hazard for arrays in Java was known well in advance of any of the formalization work; a reference would be nice if this problem was actually discovered after the fact.

I do not mean that those defects were found by formalization, but that formalization would have protected the designers from them. Most design defects are indeed found by experimentation rather than formalization; this is also the case, I believe, in the ML value restriction case.

The way I reconstructed history (only from publicly available document; asking the right people would give much more information), the Java designers were not aware at first of the array assignment problem. The reason I think that is that the initial specification documents (I could only find "The Oak Language Specification", version 0.2, apparently written in 1994) made no mention of this irritating dynamic check, while the first "Java language specification", in 1996, clearly insists on the problem.

In the Oak document, subtyping is clearly explained and informally specified, and it is precised that downcasts result in dynamic check. I *suppose* they would have talked about the array-write dynamic check if they had been aware of it. At that time, I suppose, subtyping was already in use all over the place and they probably couldn't (or didn't want to) think about changing array subtyping.

Note that while the discovery of the problem *in the java community* was probably made by practical experimentation, the problem was already well-known and formalized in the academic programming language community. Subtyping was largely explored during the 80's (maybe culminating in Cardelli's Quest, which was published and formalized in 1991, precisely at the time Java design began) and its interaction with mutability was by then well known in the research community.

I had an other example in mind that I forgot to add in my last post: cryptographic protocols. For example, the Needham-Schroeder protocol was used for key exchange for over twenty years, before being proved unsafe. Yet it is a quite simple protocol that had been studied by security expert. In this case, the attack was found (1995, Gavin Lowe) when trying to formalize and reason about such protocols.

Theory doesn't inform design very much unless you are designing a complicated language.

My point was that even relatively simple languages can have subtle errors, and that only formalization can give you good guarantee that they don't (in particular, expert confidence has repeatedly been shown wrong). Or are the first Java, Eiffel to be considered "complicated" languages? I'm not sure exactly where "complicated" quantitatively stands in your point of view.

I find it hard to believe,

I find it hard to believe, without explicit evidence, that array co-variance was an accident vs. by design. Really, what choice did they have? Parametric polymorphism wasn't really ready yet, and they needed to solve the array copy problem. In this context, array covariance seems like a reasonable solution with reasonable tradeoffs (dynamic checks to prevent store hazards).

My point was that even relatively simple languages can have subtle errors, and that only formalization can give you good guarantee that they don't (in particular, expert confidence has repeatedly been shown wrong). Or are the first Java, Eiffel to be considered "complicated" languages? I'm not sure exactly where "complicated" quantitatively stands in your point of view.

We will have to agree to disagree then. I believe that even with insight of formalization, the same decision would have been made in Java since there were no good alternatives at the time. Eiffel, I really have no idea, it could have really been ignorance given our experience with subtyping pre-1986.

At any rate, worse will often be better: you can formalize your language early, or spend that time doing more design or releasing early. At some point formalization makes sense, but early on? It really depends on your domain.

I find it hard to believe,

I find it hard to believe, without explicit evidence, that array co-variance was an accident vs. by design.

I understand that; my "evidence" is highly indirect. The best way to know would be to ask people involved at the time.

Really, what choice did they have? Parametric polymorphism wasn't really ready yet, and they needed to solve the array copy problem. [..] I believe that even with insight of formalization, the same decision would have been made in Java since there were no good alternatives at the time.

This is not incompatible with my position. Formalization helps you to be sure that you have found all the typing issues, instead of leaving them to discovery by chance. How you then handle the issues is independent, and I agree that "ok, we keep it that way but add a dynamic test here" would be a reasonable choice.

My personal suggestion would be to drop array covariance, but allow arrays to subtype to covariant immutable arrays¹. This is simple, sound, and, I believe, reasonable to work with. That said, I don't have much experience programming with both mutable arrays and subtyping, so I may miss some important desired use cases (I'd be interested if you have some).

¹: and contravariant mutable-but-unreadable arrays

Turing Completeness of C++ Templates

If I may correct you gasche: the completeness of templates was discovered at a C++ committee meeting during a meal break, by one of the participants attempting to compute x! using a specialisation to terminate the induction. This occurred long long before the 98 Standard was published, at the Munich meeting I think. The amazing thing was it was actually tried out on a portable machine with an existing implementation and it worked!

Thank you, that's a very

Thank you, that's a very good story.

another template story

Around 1995 I visited my old manager at Taligent and he showed me the latest C++ STL headers. I had been reading functional programming texts in ML, so I blurted out my impression STL was functional programming. He grinned and added, "And it's Turing complete." (I think that was to goad me, since I had long joked a subset of C++ consisting only of constructors and destructors was Turing complete, given coding styles I disliked but kept seeing.)

By 1996 numerous folks at Apple casually assumed C++ templates were Turing complete in conversation. I had assumed it was known from the start at design time.

But why have uninitialized references?

"null" is a pretty bad wart, but eliminating it in an imperative language can be more difficult than you think: since you have no default value for uninitialized references, the Java object initialization hole (http://java.dzone.com/articles/java-object-initialization) becomes even more problematic.

But why allow uninitialized references? They are an invitation to make a mess of your program anyway. Null is an overgeneralization: classes which really require it, like linked lists, can and should provide type-specific null pointers as a singleton subclass which is a sibling of the link class.

null is incredibly convenient

When I'm working with constructs that don't allow for nulling, it can be incredibly inconvenient as I don't have my easy-no, doesn't apply, this value isn't being used in this context, etc... Replacing null with a generic None type (as in Scala) then merely replaces our NPEs with invalid cast exceptions (ICEs), so what?

The parent post is referring to Java initialization in an imperative context. You can't create/assign all of your objects at once, so null sits in the field between inits. You could design a language that restricts when fields can be used lexically to ensure initialization is complete, but that would be a straight jacket that I'd never want to use. Scala's lazy initialized fields are actually a nice alternative when their isn't a dynamic cycle.

We can't get rid of null unless we can really get rid of NPEs, not just turn them into ICEs, and preserve the expressiveness that tolerating null gives us (the ability to create cycles).

Typestate or Null Object

Plaid's typestate would effectively allow capturing the intermediate states when constructing an object.

Associating a null value with each relevant type (a null object pattern) is also feasible. In Haskell you could use a typeclass for this.

I'd like to be rid of exceptions entirely. Exceptions don't scale or compose well. I discuss various alternatives in an older post.

I disagree.

Null is easier to understand than a singleton subclass which is a sibling of the link class. And under the circumstances where null leads to an error (ie, forgetting to check for it) the construct you intend to replace it with also leads to an error.

It's a net loss for readability and understanding, with no gain in correctness. It's better just to use Null.

And under the circumstances

And under the circumstances where null leads to an error (ie, forgetting to check for it) the construct you intend to replace it with also leads to an error.

This doesn't seem true in general. You can add to an empty list, you can't add to a null list. You can concatenate an empty string, you can't concatenate with a null string. John is right, the null value is the way to go, because the type of failure, if any, is specific to the abstraction. This is a huge gain over a somewhat meaningless global semantics for all null values.

One method is small-step

One method is small-step reduction semantics; this approach is taken in Felleisen, Findler, & Flatt's "Semantics Engineering with PLT Redex". It's a very dense read and involves lots of formal logic and proofs but it works through defining small-step reduction semantics for an untyped lambda calculus (IIRC) and proves various useful properties about them.

Semantics

Each of your questions falls within the same broad area: semantics of languages. Start with this wiki page and look at the list of textbooks at the bottom. Nielsen & Nielsen is available on line and the first couple of chapters are devoted to your questions. Which textbook works best is a matter of personal preference though, so if you have trouble making headway try one of the others.

Actor Model is popular

The Actor Model is becoming popular as a foundation for programming language design. See the paper "Actor Model of Computation: Scalable Robust Information Systems" published in Inconsistency Robustness 2011.

Translation

Another option to consider is whether you can define a translation from your language (or a core for your language) to an existing language that is known (or strongly believed) to be sound.

Some things seem tricky to translate

I know modules in many languages can hide a data type eg)
module Hidden (
fromZipper,
toZipper,
next,
prev,
)
data LinkedList a = LinkedList (Maybe (LinkedList a)) a (Maybe (LinkedList a))
etc ...

How can things like that be represented in another language?
I think sigma types can be used to represent some cases of this.
Eg)
module (
toUnit,
fromUnit
) where
data Unit a = Unit a | Hidden a
toUnit = Unit
fromUnit (Unit x) = x

off the top of my head, can be translated to Coq like so

Inductive Unit' a := Unit : a -> Unit' a | Hidden : a -> Unit' a.

Definition Unit a = { x : Unit' a | exists (x : a), Unit a x }

but I'm not sure how to translate more complicated examples.

Translation may break abstraction

Abstract types can be translated to existential types. For a modern account of this in the context of ML modules, see F-ing modules by Andreas Rossberg, Claudio Russo and Derek Dreyer (2010). The old reference is Mitchell's and Plotkin's "Abstract types have existential types" (1988), but it is behind a paywall.
Simple existential types are a painful to program, and it is argued that they are not as modular as ML abstract types. There are several more fine-grained presentations of abstract types, such as Montagu's Open existential types and Dreyer's RTG and LTG calculi.

However, you only need that if you want to guarantee that your translation preserves the abstractions of the source program. This is important in particular if you don't have a type system for the source language: the only way you're going to tell whether or not abstract was respected is by typing the translation.

If you have a type system in your source program, you don't need to care. Typecheck (so you know that accepted programs do not break abstraction), then translate to any language you want, possibly without any such abstraction mechanism. You don't care as you already no that abstraction is respected. This is what happen when you compile the original program to, say, assembly, which has no static notion of abstraction.

Design doesn't always have to involve verification

One more thing: it might help if we understood the kinds of ideas you are talking about developing. Not every idea really requires formal verification, and I think that the old hands here (of which I am not one) can usually tell whether a design is heading into dangerous or uncharted territory where formal verification is important.

If you are adding some kind of powerful type-level or compile-time computation facilities, an advanced type-inference algorithm, a system for controlling or tracking effects, or new kinds of abstraction facilities (especially higher-order abstractions), then yeah, you should probably be thinking carefully about whether your type system is sound, and whether all your type-checking and -inference algorithms will actually terminate.

If you are mostly just building an "imperative, statically typed, C-inspired" language as you describe, and want to add a few interesting features that mostly leave the type system as-is (the Go language would be an example of this camp), then a soundness proof might not be the first order of business.

Purpose

This is exactly right. There are all kinds of reasons you might want to write a programming language where soundness does not feature highly: for itch scratching or point proving, you might want to realise some intuition you have about how a PL might work; as an exercise, you might want to test that you understand how to build a compiler.

For either of these kinds of motivation, a grasp of formal semantics might be useful, either to make your realisation more convincing to others, or to sharpen your understanding of what you are doing with your implementation, but the value of the project does not depend on soundness. In fact, researching formal semantics might be more rewarding after you have seen the problems with your design or implementation.

The answers in this thread seem to have generally assumed you are writing a PL that is going to be used in at least one big, needed programming project. There soundness becomes very important, since its absence is rather likely to either waste lots of programmer time or cause costly problem upon deployment.

Here again, you can have soundness without any thought of formal semantics, but given the usefulness of at least a grasp of what formal semantics is for, well such an implementor is unserious. Formal semantics reveals unobvious problems, and a high proportion of problems with programming language design and implementation are unobvious.

Rigor should be a primary purpose

Rigor should always be a primary purpose in programming language design. Of course, clarity is also important. And if done right, rigor should not detract from clarity.

The primary focus in PL

The primary focus in PL design is problem solving, if you achieve rigor without making headway on that, it's pretty much wasted effort. It's like building a bridge, that you can know the bridge is provably safe is pointless if it doesn't cross the river. Often clarity and rigor only come over time as we have a more developed understanding of the problem space, we built many crappy bridges before we learned how to do it right.

If you can base your PL design on rigorous work and it still adresses your problem space, great! If not, then don't start with rigor unless you really know where to look. It's often useful to design the PL and see where it falls down, then at least you know where to target your efforts. If you take the approach of rigor first, you won't be able to innovate as much in germs of design.

Garbage programming languages

Unfortunately, our landscape is polluted with garbage programming languages. So it's best to aim for a modicum of rigor and clarity.

Ironically enough, more

Ironically enough, more garbage languages are used than nice rigorous ones, and the reason just isn't about marketing. Rigor has it's own overhead on both the designer and users, and sometimes (and often actually) you just need to get it done; aka worse is better.

The most widely used langauge started out as "garbage"

Arguably, JavaScript (the most widely used programming language) started out as "garbage" and only by the most strenuous efforts by *very* capable people at ECMAScript are we just now beginning see a little bit of recovery. But even ECMA conceded defeat and decreed that JavaScript would forever forbid adaptive concurrency.

That doesn't make sense. The

That doesn't make sense. The committee prohibits observable concurrency, not an optimization for threading.

By "garbage", you mean taking ideas from 2 important languages, both of which had informal foundations? (I am intimately aware of the 'garbage' parts, but the garbage is the sort that plagued even the ML community.)

I am intimately aware of the

I am intimately aware of the 'garbage' parts, but the garbage is the sort that plagued even the ML community.

I don't understand this comment. After I learned JavaScript, I dreaded going back and reading or writing some JS. Justifiably it turns out, because the experience was universally unpleasant and fraught with peril.

By contrast, I never dreaded having to go back and read or write some ML, and didn't have many issues with it. It's not just dynamic languages, because I didn't dread reading/writing my Ruby code over time.

The libraries designed to make JS "sane" and iron out the portability issues alone imply the difficulties people have with JS. Granted, the DOM is the single worst source of headaches, but pure JS itself has some odd idiosyncracies too, ie. unpredictable dynamic type tests and iteration being the worst in my experience.

I don't think JS is garbage as a language, but there's definitely something to the statement that it's still unpleasant to write JS in it's typical environment.

Even if we have a formal

Even if we have a formal semantics for a kernel (ML), we have to worry about other features (modules, etc., that aren't/weren't in the kernels) and for properties dictated by use. The ugly bits of JS aren't really about the core. E.g., ocap research has cleaned up many security relevant properties, but what does ocap research have to say about sml/ocaml? Or a language that was entirely specified?

On that note, PLT's latest POPL paper ("Run Your Research") is a fun read. Putting symbols in your paper doesn't make your language safe/correct (and actually seems to be an indicator that it isn't). However, that generally isn't the point of putting in those symbols -- it's to explore, understand, and communicate. JS got it right by copying others who do such things; the grit gets cleaned up in practice.

what does ocap research have

what does ocap research have to say about sml/ocaml?

See Emily. Not exactly how I'd do it, but capabilities-in-OCaml has been explored by some E folks.

Hope.you see my.point now :)

Hope.you see my.point now :) this is a natural phrnomena.

Does rigor necessitate formalism?

Would it be fair to say that the people responsible for the evolution of C++ approach their task with due rigor? I think it would, even if formal verification is not of the utmost concern for them. Their main goal is to ensure that the human implementors of their language can agree on its rules, so that the human users of the language can rely on its portability.

Safety/soundness is always a good design goal. What matters most in practice, though, is how a given design process prioritizes those goals.

Unfortunately...

If you really want to implement a C derivative, I think you'll quickly find out that the formal semantics are the least interesting of the whole exercise.

You test your language...

I don't think you can do what you're asking without getting very deeply into formal language theory.

Your only other option is to test your language, by writing lots of test programs; then "executing" by "dry-running" them on paper or in your imagination.

The test programs should include some "standard" programming problems, e.g. dining philosophers, calculating primes.